Build SRE agents that understand your infrastructure
Build SRE agents that understand your infrastructure
Build SRE agents that understand your infrastructure
Build SRE agents that understand your infrastructure
Just describe a workflow and provision safe access to dev tools. Then Unpage handles all the wiring, giving you production-ready agents in minutes.




Example Agents
Example Agents
Define task-specific agents in natural language.
Reduce alert noise
Unpage agents use context from your infrastructure graph to triage alerts, eliminate false positives, and surface real risks.
# cpu-alert-agent.yaml# Used by the router to determine which agent to use for an alertdescription: >Use this agent to analyze alerts that meet the following criteria:-The alert is related to CPU usage exceeding thresholds-The alert comes from AWS CloudWatch or Datadog-The affected resource is a compute instance (EC2, container, etc.)# Instructions for the agentprompt: >You are an agent specialized in analyzing high CPU usage alerts.When investigating a CPU alert, follow these steps:1. Check the current CPU metrics to verify the alert is still active2. Look at CPU metrics for the past hour to see if this is a spike or sustained usage3. Check logs from around the time the alert started for any errors or unusual activity4. Look for any recent deployments or changes that might explain the high usage5. Check if similar resources are experiencing the same issueBased on your findings, update the incident with:- Current status of the issue- Likely cause based on available evidence- Recommended next steps- Whether this appears to be a critical issue requiring immediate human attentionBe concise but thorough. Include specific metrics, timestamps, and log entriesthat support your analysis.NEVER make up information or assume values you haven't verified.# Tools the agent can usetools:- core_current_datetime- core_convert_to_timezone- metrics_get_metrics_for_node- metrics_list_available_metrics_for_node- graph_get_resource_details- graph_get_neighboring_resources- graph_get_resource_topology- papertrail_search_logs- pagerduty_post_status_update- pagerduty_get_incident_details- aws_describe_ec2_instanceinvestigate incidents
Create and run agents that investigate common issues that SRE teams regularly respond to, like SSL connection failures or high disk usage alerts.
description: >Investigate SSL/TLS connection failures# Instructions for the agentprompt: >- Extract the domain/hostname from the PagerDuty alert about connection failures.- Use shell command \`shell_check_cert_expiration_date\` to check the certificate expiration dates- Parse the certificate dates to determine if the cert is expired or expiring soon- If certificate is expired or expiring within 24 hours:-Post high-priority status update to PagerDuty explaining the root cause-Include the exact expiration date and affected resourcestools:- shell_check_cert_expiration_date- pagerduty_post_status_updateReduce alert noise
Unpage agents use context from your infrastructure graph to triage alerts, eliminate false positives, and surface real risks.
# cpu-alert-agent.yaml# Used by the router to determine which agent to use for an alertdescription: >Use this agent to analyze alerts that meet the following criteria:-The alert is related to CPU usage exceeding thresholds-The alert comes from AWS CloudWatch or Datadog-The affected resource is a compute instance (EC2, container, etc.)# Instructions for the agentprompt: >You are an agent specialized in analyzing high CPU usage alerts.When investigating a CPU alert, follow these steps:1. Check the current CPU metrics to verify the alert is still active2. Look at CPU metrics for the past hour to see if this is a spike or sustained usage3. Check logs from around the time the alert started for any errors or unusual activity4. Look for any recent deployments or changes that might explain the high usage5. Check if similar resources are experiencing the same issueBased on your findings, update the incident with:- Current status of the issue- Likely cause based on available evidence- Recommended next steps- Whether this appears to be a critical issue requiring immediate human attentionBe concise but thorough. Include specific metrics, timestamps, and log entriesthat support your analysis.NEVER make up information or assume values you haven't verified.# Tools the agent can usetools:- core_current_datetime- core_convert_to_timezone- metrics_get_metrics_for_node- metrics_list_available_metrics_for_node- graph_get_resource_details- graph_get_neighboring_resources- graph_get_resource_topology- papertrail_search_logs- pagerduty_post_status_update- pagerduty_get_incident_details- aws_describe_ec2_instanceinvestigate incidents
Create and run agents that investigate common issues that SRE teams regularly respond to, like SSL connection failures or high disk usage alerts.
description: >Investigate SSL/TLS connection failures# Instructions for the agentprompt: >- Extract the domain/hostname from the PagerDuty alert about connection failures.- Use shell command \`shell_check_cert_expiration_date\` to check the certificate expiration dates- Parse the certificate dates to determine if the cert is expired or expiring soon- If certificate is expired or expiring within 24 hours:-Post high-priority status update to PagerDuty explaining the root cause-Include the exact expiration date and affected resourcestools:- shell_check_cert_expiration_date- pagerduty_post_status_updateReduce alert noise
Unpage agents use context from your infrastructure graph to triage alerts, eliminate false positives, and surface real risks.
# cpu-alert-agent.yaml# Used by the router to determine which agent to use for an alertdescription: >Use this agent to analyze alerts that meet the following criteria:-The alert is related to CPU usage exceeding thresholds-The alert comes from AWS CloudWatch or Datadog-The affected resource is a compute instance (EC2, container, etc.)# Instructions for the agentprompt: >You are an agent specialized in analyzing high CPU usage alerts.When investigating a CPU alert, follow these steps:1. Check the current CPU metrics to verify the alert is still active2. Look at CPU metrics for the past hour to see if this is a spike or sustained usage3. Check logs from around the time the alert started for any errors or unusual activity4. Look for any recent deployments or changes that might explain the high usage5. Check if similar resources are experiencing the same issueBased on your findings, update the incident with:- Current status of the issue- Likely cause based on available evidence- Recommended next steps- Whether this appears to be a critical issue requiring immediate human attentionBe concise but thorough. Include specific metrics, timestamps, and log entriesthat support your analysis.NEVER make up information or assume values you haven't verified.# Tools the agent can usetools:- core_current_datetime- core_convert_to_timezone- metrics_get_metrics_for_node- metrics_list_available_metrics_for_node- graph_get_resource_details- graph_get_neighboring_resources- graph_get_resource_topology- papertrail_search_logs- pagerduty_post_status_update- pagerduty_get_incident_details- aws_describe_ec2_instanceinvestigate incidents
Create and run agents that investigate common issues that SRE teams regularly respond to, like SSL connection failures or high disk usage alerts.
description: >Investigate SSL/TLS connection failures# Instructions for the agentprompt: >- Extract the domain/hostname from the PagerDuty alert about connection failures.- Use shell command \`shell_check_cert_expiration_date\` to check the certificate expiration dates- Parse the certificate dates to determine if the cert is expired or expiring soon- If certificate is expired or expiring within 24 hours:-Post high-priority status update to PagerDuty explaining the root cause-Include the exact expiration date and affected resourcestools:- shell_check_cert_expiration_date- pagerduty_post_status_updateReduce alert noise
Unpage agents use context from your infrastructure graph to triage alerts, eliminate false positives, and surface real risks.
# cpu-alert-agent.yaml# Used by the router to determine which agent to use for an alertdescription: >Use this agent to analyze alerts that meet the following criteria:-The alert is related to CPU usage exceeding thresholds-The alert comes from AWS CloudWatch or Datadog-The affected resource is a compute instance (EC2, container, etc.)# Instructions for the agentprompt: >You are an agent specialized in analyzing high CPU usage alerts.When investigating a CPU alert, follow these steps:1. Check the current CPU metrics to verify the alert is still active2. Look at CPU metrics for the past hour to see if this is a spike or sustained usage3. Check logs from around the time the alert started for any errors or unusual activity4. Look for any recent deployments or changes that might explain the high usage5. Check if similar resources are experiencing the same issueBased on your findings, update the incident with:- Current status of the issue- Likely cause based on available evidence- Recommended next steps- Whether this appears to be a critical issue requiring immediate human attentionBe concise but thorough. Include specific metrics, timestamps, and log entriesthat support your analysis.NEVER make up information or assume values you haven't verified.# Tools the agent can usetools:- core_current_datetime- core_convert_to_timezone- metrics_get_metrics_for_node- metrics_list_available_metrics_for_node- graph_get_resource_details- graph_get_neighboring_resources- graph_get_resource_topology- papertrail_search_logs- pagerduty_post_status_update- pagerduty_get_incident_details- aws_describe_ec2_instanceinvestigate incidents
Create and run agents that investigate common issues that SRE teams regularly respond to, like SSL connection failures or high disk usage alerts.
description: >Investigate SSL/TLS connection failures# Instructions for the agentprompt: >- Extract the domain/hostname from the PagerDuty alert about connection failures.- Use shell command \`shell_check_cert_expiration_date\` to check the certificate expiration dates- Parse the certificate dates to determine if the cert is expired or expiring soon- If certificate is expired or expiring within 24 hours:-Post high-priority status update to PagerDuty explaining the root cause-Include the exact expiration date and affected resourcestools:- shell_check_cert_expiration_date- pagerduty_post_status_update
Reduce alert noise
Unpage agents use context from your infrastructure graph to triage alerts, eliminate false positives, and surface real risks.
# cpu-alert-agent.yaml# Used by the router to determine which agent to use for an alertdescription: >Use this agent to analyze alerts that meet the following criteria:-The alert is related to CPU usage exceeding thresholds-The alert comes from AWS CloudWatch or Datadog-The affected resource is a compute instance (EC2, container, etc.)# Instructions for the agentprompt: >You are an agent specialized in analyzing high CPU usage alerts.When investigating a CPU alert, follow these steps:1. Check the current CPU metrics to verify the alert is still active2. Look at CPU metrics for the past hour to see if this is a spike or sustained usage3. Check logs from around the time the alert started for any errors or unusual activity4. Look for any recent deployments or changes that might explain the high usage5. Check if similar resources are experiencing the same issueBased on your findings, update the incident with:- Current status of the issue- Likely cause based on available evidence- Recommended next steps- Whether this appears to be a critical issue requiring immediate human attentionBe concise but thorough. Include specific metrics, timestamps, and log entriesthat support your analysis.NEVER make up information or assume values you haven't verified.# Tools the agent can usetools:- core_current_datetime- core_convert_to_timezone- metrics_get_metrics_for_node- metrics_list_available_metrics_for_node- graph_get_resource_details- graph_get_neighboring_resources- graph_get_resource_topology- papertrail_search_logs- pagerduty_post_status_update- pagerduty_get_incident_details- aws_describe_ec2_instanceinvestigate incidents
Create and run agents that investigate common issues that SRE teams regularly respond to, like SSL connection failures or high disk usage alerts.
description: >Investigate SSL/TLS connection failures# Instructions for the agentprompt: >- Extract the domain/hostname from the PagerDuty alert about connection failures.- Use shell command \`shell_check_cert_expiration_date\` to check the certificate expiration dates- Parse the certificate dates to determine if the cert is expired or expiring soon- If certificate is expired or expiring within 24 hours:-Post high-priority status update to PagerDuty explaining the root cause-Include the exact expiration date and affected resourcestools:- shell_check_cert_expiration_date- pagerduty_post_status_updateReduce alert noise
Unpage agents use context from your infrastructure graph to triage alerts, eliminate false positives, and surface real risks.
# cpu-alert-agent.yaml# Used by the router to determine which agent to use for an alertdescription: >Use this agent to analyze alerts that meet the following criteria:-The alert is related to CPU usage exceeding thresholds-The alert comes from AWS CloudWatch or Datadog-The affected resource is a compute instance (EC2, container, etc.)# Instructions for the agentprompt: >You are an agent specialized in analyzing high CPU usage alerts.When investigating a CPU alert, follow these steps:1. Check the current CPU metrics to verify the alert is still active2. Look at CPU metrics for the past hour to see if this is a spike or sustained usage3. Check logs from around the time the alert started for any errors or unusual activity4. Look for any recent deployments or changes that might explain the high usage5. Check if similar resources are experiencing the same issueBased on your findings, update the incident with:- Current status of the issue- Likely cause based on available evidence- Recommended next steps- Whether this appears to be a critical issue requiring immediate human attentionBe concise but thorough. Include specific metrics, timestamps, and log entriesthat support your analysis.NEVER make up information or assume values you haven't verified.# Tools the agent can usetools:- core_current_datetime- core_convert_to_timezone- metrics_get_metrics_for_node- metrics_list_available_metrics_for_node- graph_get_resource_details- graph_get_neighboring_resources- graph_get_resource_topology- papertrail_search_logs- pagerduty_post_status_update- pagerduty_get_incident_details- aws_describe_ec2_instanceinvestigate incidents
Create and run agents that investigate common issues that SRE teams regularly respond to, like SSL connection failures or high disk usage alerts.
description: >Investigate SSL/TLS connection failures# Instructions for the agentprompt: >- Extract the domain/hostname from the PagerDuty alert about connection failures.- Use shell command \`shell_check_cert_expiration_date\` to check the certificate expiration dates- Parse the certificate dates to determine if the cert is expired or expiring soon- If certificate is expired or expiring within 24 hours:-Post high-priority status update to PagerDuty explaining the root cause-Include the exact expiration date and affected resourcestools:- shell_check_cert_expiration_date- pagerduty_post_status_updateReduce alert noise
Unpage agents use context from your infrastructure graph to triage alerts, eliminate false positives, and surface real risks.
# cpu-alert-agent.yaml# Used by the router to determine which agent to use for an alertdescription: >Use this agent to analyze alerts that meet the following criteria:-The alert is related to CPU usage exceeding thresholds-The alert comes from AWS CloudWatch or Datadog-The affected resource is a compute instance (EC2, container, etc.)# Instructions for the agentprompt: >You are an agent specialized in analyzing high CPU usage alerts.When investigating a CPU alert, follow these steps:1. Check the current CPU metrics to verify the alert is still active2. Look at CPU metrics for the past hour to see if this is a spike or sustained usage3. Check logs from around the time the alert started for any errors or unusual activity4. Look for any recent deployments or changes that might explain the high usage5. Check if similar resources are experiencing the same issueBased on your findings, update the incident with:- Current status of the issue- Likely cause based on available evidence- Recommended next steps- Whether this appears to be a critical issue requiring immediate human attentionBe concise but thorough. Include specific metrics, timestamps, and log entriesthat support your analysis.NEVER make up information or assume values you haven't verified.# Tools the agent can usetools:- core_current_datetime- core_convert_to_timezone- metrics_get_metrics_for_node- metrics_list_available_metrics_for_node- graph_get_resource_details- graph_get_neighboring_resources- graph_get_resource_topology- papertrail_search_logs- pagerduty_post_status_update- pagerduty_get_incident_details- aws_describe_ec2_instanceinvestigate incidents
Create and run agents that investigate common issues that SRE teams regularly respond to, like SSL connection failures or high disk usage alerts.
description: >Investigate SSL/TLS connection failures# Instructions for the agentprompt: >- Extract the domain/hostname from the PagerDuty alert about connection failures.- Use shell command \`shell_check_cert_expiration_date\` to check the certificate expiration dates- Parse the certificate dates to determine if the cert is expired or expiring soon- If certificate is expired or expiring within 24 hours:-Post high-priority status update to PagerDuty explaining the root cause-Include the exact expiration date and affected resourcestools:- shell_check_cert_expiration_date- pagerduty_post_status_updateReduce alert noise
Unpage agents use context from your infrastructure graph to triage alerts, eliminate false positives, and surface real risks.
# cpu-alert-agent.yaml# Used by the router to determine which agent to use for an alertdescription: >Use this agent to analyze alerts that meet the following criteria:-The alert is related to CPU usage exceeding thresholds-The alert comes from AWS CloudWatch or Datadog-The affected resource is a compute instance (EC2, container, etc.)# Instructions for the agentprompt: >You are an agent specialized in analyzing high CPU usage alerts.When investigating a CPU alert, follow these steps:1. Check the current CPU metrics to verify the alert is still active2. Look at CPU metrics for the past hour to see if this is a spike or sustained usage3. Check logs from around the time the alert started for any errors or unusual activity4. Look for any recent deployments or changes that might explain the high usage5. Check if similar resources are experiencing the same issueBased on your findings, update the incident with:- Current status of the issue- Likely cause based on available evidence- Recommended next steps- Whether this appears to be a critical issue requiring immediate human attentionBe concise but thorough. Include specific metrics, timestamps, and log entriesthat support your analysis.NEVER make up information or assume values you haven't verified.# Tools the agent can usetools:- core_current_datetime- core_convert_to_timezone- metrics_get_metrics_for_node- metrics_list_available_metrics_for_node- graph_get_resource_details- graph_get_neighboring_resources- graph_get_resource_topology- papertrail_search_logs- pagerduty_post_status_update- pagerduty_get_incident_details- aws_describe_ec2_instanceinvestigate incidents
Create and run agents that investigate common issues that SRE teams regularly respond to, like SSL connection failures or high disk usage alerts.
description: >Investigate SSL/TLS connection failures# Instructions for the agentprompt: >- Extract the domain/hostname from the PagerDuty alert about connection failures.- Use shell command \`shell_check_cert_expiration_date\` to check the certificate expiration dates- Parse the certificate dates to determine if the cert is expired or expiring soon- If certificate is expired or expiring within 24 hours:-Post high-priority status update to PagerDuty explaining the root cause-Include the exact expiration date and affected resourcestools:- shell_check_cert_expiration_date- pagerduty_post_status_update
Reduce alert noise
Unpage agents use context from your infrastructure graph to triage alerts, eliminate false positives, and surface real risks.
# cpu-alert-agent.yaml# Used by the router to determine which agent to use for an alertdescription: >Use this agent to analyze alerts that meet the following criteria:-The alert is related to CPU usage exceeding thresholds-The alert comes from AWS CloudWatch or Datadog-The affected resource is a compute instance (EC2, container, etc.)# Instructions for the agentprompt: >You are an agent specialized in analyzing high CPU usage alerts.When investigating a CPU alert, follow these steps:1. Check the current CPU metrics to verify the alert is still active2. Look at CPU metrics for the past hour to see if this is a spike or sustained usage3. Check logs from around the time the alert started for any errors or unusual activity4. Look for any recent deployments or changes that might explain the high usage5. Check if similar resources are experiencing the same issueBased on your findings, update the incident with:- Current status of the issue- Likely cause based on available evidence- Recommended next steps- Whether this appears to be a critical issue requiring immediate human attentionBe concise but thorough. Include specific metrics, timestamps, and log entriesthat support your analysis.NEVER make up information or assume values you haven't verified.# Tools the agent can usetools:- core_current_datetime- core_convert_to_timezone- metrics_get_metrics_for_node- metrics_list_available_metrics_for_node- graph_get_resource_details- graph_get_neighboring_resources- graph_get_resource_topology- papertrail_search_logs- pagerduty_post_status_update- pagerduty_get_incident_details- aws_describe_ec2_instanceinvestigate incidents
Create and run agents that investigate common issues that SRE teams regularly respond to, like SSL connection failures or high disk usage alerts.
description: >Investigate SSL/TLS connection failures# Instructions for the agentprompt: >- Extract the domain/hostname from the PagerDuty alert about connection failures.- Use shell command \`shell_check_cert_expiration_date\` to check the certificate expiration dates- Parse the certificate dates to determine if the cert is expired or expiring soon- If certificate is expired or expiring within 24 hours:-Post high-priority status update to PagerDuty explaining the root cause-Include the exact expiration date and affected resourcestools:- shell_check_cert_expiration_date- pagerduty_post_status_updateReduce alert noise
Unpage agents use context from your infrastructure graph to triage alerts, eliminate false positives, and surface real risks.
# cpu-alert-agent.yaml# Used by the router to determine which agent to use for an alertdescription: >Use this agent to analyze alerts that meet the following criteria:-The alert is related to CPU usage exceeding thresholds-The alert comes from AWS CloudWatch or Datadog-The affected resource is a compute instance (EC2, container, etc.)# Instructions for the agentprompt: >You are an agent specialized in analyzing high CPU usage alerts.When investigating a CPU alert, follow these steps:1. Check the current CPU metrics to verify the alert is still active2. Look at CPU metrics for the past hour to see if this is a spike or sustained usage3. Check logs from around the time the alert started for any errors or unusual activity4. Look for any recent deployments or changes that might explain the high usage5. Check if similar resources are experiencing the same issueBased on your findings, update the incident with:- Current status of the issue- Likely cause based on available evidence- Recommended next steps- Whether this appears to be a critical issue requiring immediate human attentionBe concise but thorough. Include specific metrics, timestamps, and log entriesthat support your analysis.NEVER make up information or assume values you haven't verified.# Tools the agent can usetools:- core_current_datetime- core_convert_to_timezone- metrics_get_metrics_for_node- metrics_list_available_metrics_for_node- graph_get_resource_details- graph_get_neighboring_resources- graph_get_resource_topology- papertrail_search_logs- pagerduty_post_status_update- pagerduty_get_incident_details- aws_describe_ec2_instanceinvestigate incidents
Create and run agents that investigate common issues that SRE teams regularly respond to, like SSL connection failures or high disk usage alerts.
description: >Investigate SSL/TLS connection failures# Instructions for the agentprompt: >- Extract the domain/hostname from the PagerDuty alert about connection failures.- Use shell command \`shell_check_cert_expiration_date\` to check the certificate expiration dates- Parse the certificate dates to determine if the cert is expired or expiring soon- If certificate is expired or expiring within 24 hours:-Post high-priority status update to PagerDuty explaining the root cause-Include the exact expiration date and affected resourcestools:- shell_check_cert_expiration_date- pagerduty_post_status_updateReduce alert noise
Unpage agents use context from your infrastructure graph to triage alerts, eliminate false positives, and surface real risks.
# cpu-alert-agent.yaml# Used by the router to determine which agent to use for an alertdescription: >Use this agent to analyze alerts that meet the following criteria:-The alert is related to CPU usage exceeding thresholds-The alert comes from AWS CloudWatch or Datadog-The affected resource is a compute instance (EC2, container, etc.)# Instructions for the agentprompt: >You are an agent specialized in analyzing high CPU usage alerts.When investigating a CPU alert, follow these steps:1. Check the current CPU metrics to verify the alert is still active2. Look at CPU metrics for the past hour to see if this is a spike or sustained usage3. Check logs from around the time the alert started for any errors or unusual activity4. Look for any recent deployments or changes that might explain the high usage5. Check if similar resources are experiencing the same issueBased on your findings, update the incident with:- Current status of the issue- Likely cause based on available evidence- Recommended next steps- Whether this appears to be a critical issue requiring immediate human attentionBe concise but thorough. Include specific metrics, timestamps, and log entriesthat support your analysis.NEVER make up information or assume values you haven't verified.# Tools the agent can usetools:- core_current_datetime- core_convert_to_timezone- metrics_get_metrics_for_node- metrics_list_available_metrics_for_node- graph_get_resource_details- graph_get_neighboring_resources- graph_get_resource_topology- papertrail_search_logs- pagerduty_post_status_update- pagerduty_get_incident_details- aws_describe_ec2_instanceinvestigate incidents
Create and run agents that investigate common issues that SRE teams regularly respond to, like SSL connection failures or high disk usage alerts.
description: >Investigate SSL/TLS connection failures# Instructions for the agentprompt: >- Extract the domain/hostname from the PagerDuty alert about connection failures.- Use shell command \`shell_check_cert_expiration_date\` to check the certificate expiration dates- Parse the certificate dates to determine if the cert is expired or expiring soon- If certificate is expired or expiring within 24 hours:-Post high-priority status update to PagerDuty explaining the root cause-Include the exact expiration date and affected resourcestools:- shell_check_cert_expiration_date- pagerduty_post_status_updateReduce alert noise
Unpage agents use context from your infrastructure graph to triage alerts, eliminate false positives, and surface real risks.
# cpu-alert-agent.yaml# Used by the router to determine which agent to use for an alertdescription: >Use this agent to analyze alerts that meet the following criteria:-The alert is related to CPU usage exceeding thresholds-The alert comes from AWS CloudWatch or Datadog-The affected resource is a compute instance (EC2, container, etc.)# Instructions for the agentprompt: >You are an agent specialized in analyzing high CPU usage alerts.When investigating a CPU alert, follow these steps:1. Check the current CPU metrics to verify the alert is still active2. Look at CPU metrics for the past hour to see if this is a spike or sustained usage3. Check logs from around the time the alert started for any errors or unusual activity4. Look for any recent deployments or changes that might explain the high usage5. Check if similar resources are experiencing the same issueBased on your findings, update the incident with:- Current status of the issue- Likely cause based on available evidence- Recommended next steps- Whether this appears to be a critical issue requiring immediate human attentionBe concise but thorough. Include specific metrics, timestamps, and log entriesthat support your analysis.NEVER make up information or assume values you haven't verified.# Tools the agent can usetools:- core_current_datetime- core_convert_to_timezone- metrics_get_metrics_for_node- metrics_list_available_metrics_for_node- graph_get_resource_details- graph_get_neighboring_resources- graph_get_resource_topology- papertrail_search_logs- pagerduty_post_status_update- pagerduty_get_incident_details- aws_describe_ec2_instanceinvestigate incidents
Create and run agents that investigate common issues that SRE teams regularly respond to, like SSL connection failures or high disk usage alerts.
description: >Investigate SSL/TLS connection failures# Instructions for the agentprompt: >- Extract the domain/hostname from the PagerDuty alert about connection failures.- Use shell command \`shell_check_cert_expiration_date\` to check the certificate expiration dates- Parse the certificate dates to determine if the cert is expired or expiring soon- If certificate is expired or expiring within 24 hours:-Post high-priority status update to PagerDuty explaining the root cause-Include the exact expiration date and affected resourcestools:- shell_check_cert_expiration_date- pagerduty_post_status_update
Reduce alert noise
Unpage agents use context from your infrastructure graph to triage alerts, eliminate false positives, and surface real risks.
# cpu-alert-agent.yaml# Used by the router to determine which agent to use for an alertdescription: >Use this agent to analyze alerts that meet the following criteria:-The alert is related to CPU usage exceeding thresholds-The alert comes from AWS CloudWatch or Datadog-The affected resource is a compute instance (EC2, container, etc.)# Instructions for the agentprompt: >You are an agent specialized in analyzing high CPU usage alerts.When investigating a CPU alert, follow these steps:1. Check the current CPU metrics to verify the alert is still active2. Look at CPU metrics for the past hour to see if this is a spike or sustained usage3. Check logs from around the time the alert started for any errors or unusual activity4. Look for any recent deployments or changes that might explain the high usage5. Check if similar resources are experiencing the same issueBased on your findings, update the incident with:- Current status of the issue- Likely cause based on available evidence- Recommended next steps- Whether this appears to be a critical issue requiring immediate human attentionBe concise but thorough. Include specific metrics, timestamps, and log entriesthat support your analysis.NEVER make up information or assume values you haven't verified.# Tools the agent can usetools:- core_current_datetime- core_convert_to_timezone- metrics_get_metrics_for_node- metrics_list_available_metrics_for_node- graph_get_resource_details- graph_get_neighboring_resources- graph_get_resource_topology- papertrail_search_logs- pagerduty_post_status_update- pagerduty_get_incident_details- aws_describe_ec2_instanceinvestigate incidents
Create and run agents that investigate common issues that SRE teams regularly respond to, like SSL connection failures or high disk usage alerts.
description: >Investigate SSL/TLS connection failures# Instructions for the agentprompt: >- Extract the domain/hostname from the PagerDuty alert about connection failures.- Use shell command \`shell_check_cert_expiration_date\` to check the certificate expiration dates- Parse the certificate dates to determine if the cert is expired or expiring soon- If certificate is expired or expiring within 24 hours:-Post high-priority status update to PagerDuty explaining the root cause-Include the exact expiration date and affected resourcestools:- shell_check_cert_expiration_date- pagerduty_post_status_updateReduce alert noise
Unpage agents use context from your infrastructure graph to triage alerts, eliminate false positives, and surface real risks.
# cpu-alert-agent.yaml# Used by the router to determine which agent to use for an alertdescription: >Use this agent to analyze alerts that meet the following criteria:-The alert is related to CPU usage exceeding thresholds-The alert comes from AWS CloudWatch or Datadog-The affected resource is a compute instance (EC2, container, etc.)# Instructions for the agentprompt: >You are an agent specialized in analyzing high CPU usage alerts.When investigating a CPU alert, follow these steps:1. Check the current CPU metrics to verify the alert is still active2. Look at CPU metrics for the past hour to see if this is a spike or sustained usage3. Check logs from around the time the alert started for any errors or unusual activity4. Look for any recent deployments or changes that might explain the high usage5. Check if similar resources are experiencing the same issueBased on your findings, update the incident with:- Current status of the issue- Likely cause based on available evidence- Recommended next steps- Whether this appears to be a critical issue requiring immediate human attentionBe concise but thorough. Include specific metrics, timestamps, and log entriesthat support your analysis.NEVER make up information or assume values you haven't verified.# Tools the agent can usetools:- core_current_datetime- core_convert_to_timezone- metrics_get_metrics_for_node- metrics_list_available_metrics_for_node- graph_get_resource_details- graph_get_neighboring_resources- graph_get_resource_topology- papertrail_search_logs- pagerduty_post_status_update- pagerduty_get_incident_details- aws_describe_ec2_instanceinvestigate incidents
Create and run agents that investigate common issues that SRE teams regularly respond to, like SSL connection failures or high disk usage alerts.
description: >Investigate SSL/TLS connection failures# Instructions for the agentprompt: >- Extract the domain/hostname from the PagerDuty alert about connection failures.- Use shell command \`shell_check_cert_expiration_date\` to check the certificate expiration dates- Parse the certificate dates to determine if the cert is expired or expiring soon- If certificate is expired or expiring within 24 hours:-Post high-priority status update to PagerDuty explaining the root cause-Include the exact expiration date and affected resourcestools:- shell_check_cert_expiration_date- pagerduty_post_status_updateReduce alert noise
Unpage agents use context from your infrastructure graph to triage alerts, eliminate false positives, and surface real risks.
# cpu-alert-agent.yaml# Used by the router to determine which agent to use for an alertdescription: >Use this agent to analyze alerts that meet the following criteria:-The alert is related to CPU usage exceeding thresholds-The alert comes from AWS CloudWatch or Datadog-The affected resource is a compute instance (EC2, container, etc.)# Instructions for the agentprompt: >You are an agent specialized in analyzing high CPU usage alerts.When investigating a CPU alert, follow these steps:1. Check the current CPU metrics to verify the alert is still active2. Look at CPU metrics for the past hour to see if this is a spike or sustained usage3. Check logs from around the time the alert started for any errors or unusual activity4. Look for any recent deployments or changes that might explain the high usage5. Check if similar resources are experiencing the same issueBased on your findings, update the incident with:- Current status of the issue- Likely cause based on available evidence- Recommended next steps- Whether this appears to be a critical issue requiring immediate human attentionBe concise but thorough. Include specific metrics, timestamps, and log entriesthat support your analysis.NEVER make up information or assume values you haven't verified.# Tools the agent can usetools:- core_current_datetime- core_convert_to_timezone- metrics_get_metrics_for_node- metrics_list_available_metrics_for_node- graph_get_resource_details- graph_get_neighboring_resources- graph_get_resource_topology- papertrail_search_logs- pagerduty_post_status_update- pagerduty_get_incident_details- aws_describe_ec2_instanceinvestigate incidents
Create and run agents that investigate common issues that SRE teams regularly respond to, like SSL connection failures or high disk usage alerts.
description: >Investigate SSL/TLS connection failures# Instructions for the agentprompt: >- Extract the domain/hostname from the PagerDuty alert about connection failures.- Use shell command \`shell_check_cert_expiration_date\` to check the certificate expiration dates- Parse the certificate dates to determine if the cert is expired or expiring soon- If certificate is expired or expiring within 24 hours:-Post high-priority status update to PagerDuty explaining the root cause-Include the exact expiration date and affected resourcestools:- shell_check_cert_expiration_date- pagerduty_post_status_updateReduce alert noise
Unpage agents use context from your infrastructure graph to triage alerts, eliminate false positives, and surface real risks.
# cpu-alert-agent.yaml# Used by the router to determine which agent to use for an alertdescription: >Use this agent to analyze alerts that meet the following criteria:-The alert is related to CPU usage exceeding thresholds-The alert comes from AWS CloudWatch or Datadog-The affected resource is a compute instance (EC2, container, etc.)# Instructions for the agentprompt: >You are an agent specialized in analyzing high CPU usage alerts.When investigating a CPU alert, follow these steps:1. Check the current CPU metrics to verify the alert is still active2. Look at CPU metrics for the past hour to see if this is a spike or sustained usage3. Check logs from around the time the alert started for any errors or unusual activity4. Look for any recent deployments or changes that might explain the high usage5. Check if similar resources are experiencing the same issueBased on your findings, update the incident with:- Current status of the issue- Likely cause based on available evidence- Recommended next steps- Whether this appears to be a critical issue requiring immediate human attentionBe concise but thorough. Include specific metrics, timestamps, and log entriesthat support your analysis.NEVER make up information or assume values you haven't verified.# Tools the agent can usetools:- core_current_datetime- core_convert_to_timezone- metrics_get_metrics_for_node- metrics_list_available_metrics_for_node- graph_get_resource_details- graph_get_neighboring_resources- graph_get_resource_topology- papertrail_search_logs- pagerduty_post_status_update- pagerduty_get_incident_details- aws_describe_ec2_instanceinvestigate incidents
Create and run agents that investigate common issues that SRE teams regularly respond to, like SSL connection failures or high disk usage alerts.
description: >Investigate SSL/TLS connection failures# Instructions for the agentprompt: >- Extract the domain/hostname from the PagerDuty alert about connection failures.- Use shell command \`shell_check_cert_expiration_date\` to check the certificate expiration dates- Parse the certificate dates to determine if the cert is expired or expiring soon- If certificate is expired or expiring within 24 hours:-Post high-priority status update to PagerDuty explaining the root cause-Include the exact expiration date and affected resourcestools:- shell_check_cert_expiration_date- pagerduty_post_status_update
Install Unpage and run your first agent in < 5 minutes.
Install Unpage and run your first agent in < 5 minutes.
Get Started
Join the Community
Connect directly with Unpage engineers and other Unpage users to share what you've built, ask questions, request new features, and provide feedback for further improvement.
Connect directly with Unpage engineers and other Unpage users to share what you've built, ask questions, request new features, and provide feedback for further improvement.