View detailed service activity and operational health with the service detail page - Amazon CloudWatch
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

View detailed service activity and operational health with the service detail page

Application Signals is in preview release for Amazon CloudWatch and is subject to change.

The service detail page displays an overview of your services, operations, dependencies, canaries, and client requests for a single service that has been enabled for Application Signals. To view this page, open the CloudWatch console, choose Services under the Application Signals section in the left navigation pane, and choose the name of any service from the Services table or the Top services or dependency tables.

The service detail page is organized into the following tabs:

  • Overview — This tab displays an overview of your service, including number of operations, dependencies, synthetics, and client pages. The tab shows key metrics for your entire service, top operations and dependencies.

  • Service operations — This tab displays a list of the operations that your service exposes, along with key metrics for each operation.

  • Dependencies — This tab displays a list of dependencies that your service calls, and a list of dependency metrics.

  • Synthetics canaries — This tab displays a list of synthetics canaries that call your service, and key metrics for canary execution.

  • Client pages — This tab displays a list of client pages that call your service, along with client page metrics.

View your service overview

The service overview page summarizes the components that make up your service, and highlights key performance metrics to help you identify issues that require troubleshooting.

For services hosted in Amazon EKS, choose any link in Service Details to view Cluster, Namespace, or Workload information in CloudWatch Container Insights. For services hosted in Amazon ECS or Amazon EC2, the Service details page shows the Environment value.

Under Services, the Overview tab displays a summary of the following:

  • Service operations. These service operations are listed by health as determined by service level indicators (SLI) that are defined as a part of a service level objective (SLO).

  • Service dependencies. Top dependencies are listed by fault rate.

  • Synthetics canaries associated with your service, including the number of failing canaries.

  • Enabled client pages listed by top pages with asynchronous JavaScript and XML (AJAX) errors.

The following illustration shows an overview of your services:

Service overview widgets

The Overview tab also displays a graph of four dependencies with the highest latency across all services. Use the p99, p90 and p50 latency metrics to quickly assess which dependencies are contributing to your total service latency, as follows:

Service operations latency graph

For example, the previous graph shows that 99% of the requests made to the customer-service dependency were completed in approximately 4,950 milliseconds. The other dependencies took less time.

Graphs displaying the top four service operations by latency show the volume of requests, availability, fault rate, and error rate for those services, as shown in the following image:

Service operations volume, availability, fault rate, and error rate graphs

View your service operations

Choose the Service operations tab to display the Service operations table, and a set of metrics for the selected operation. The table contains a list of operations discovered by Application Signals. This list includes service level indicator (SLI) status, number of dependencies, and metrics for latency, volume, faults, errors, and availability, as shown in the following image:

Service operations table

The metrics in service operations are evaluated over a time interval that depends on how long ago the requests that you are investigating were made. Requests that were made less than 15 days prior, are evaluated over 1 minute intervals. Requests that were made between 15 and 30 days prior, inclusive, are evaluated over 5 minute intervals. Requests that were made more than 30 days prior are evaluated over 1 hour intervals. For example, if you are investigating requests that caused a fault 15 days ago, the call volume metric is equal to the number of requests per 5 minute interval.

Filter the table to make it easier to find what you're looking for, by choosing one or more properties from the filter text box. As you choose each property, you are guided through filter criteria and will see the complete filter below the filter text box. Choose Clear filters at any time to remove the table filter.

SLI status is displayed for each operation in the table, including the number of healthy or unhealthy SLIs and the total number of service level objectives (SLOs) you have created. SLIs can monitor latency, availability, and other operational metrics to ensure service quality. Choose the SLI status for an operation to display a popup containing a link to any unhealthy SLIs, and a link to see all SLOs for the operation, as shown in the following table:

Service operation SLI status

If no SLOs have been created for an operation, choose the Create SLO button within the SLI Status column. To create additional SLOs for any operation, select the radio button next to the operation name, and then choose Create SLO from the Actions dropdown at the top right of the table. When you create SLOs, you can see at a glance which of your services and operations are performing well and which are unhealthy. For more information, see service level objectives (SLOs).

The Dependencies column shows the number of dependencies this operation calls. Choose this number to open the Dependencies tab filtered to the selected operation.

View service operations metrics, correlated traces, and application logs

Application Signals correlates service operation metrics with Amazon X-Ray traces, making it easier to troubleshoot operational health issues. Choose the option next to a service in the Service operations table to see a set of graphs above the table with metrics for Volume and Availability, Latency, and Faults and Errors. Hover over a point in a graph to view more information.

Select a point to open a diagnostic pane that shows correlated traces, metrics, and application logs for the selected point in the graph.

The following image shows the tooltip that appears after hovering over a point in the graph, and the diagnostic pane which appears after clicking on a point. The tooltip contains information about the associated data point in the Faults and Errors graph. The pane contains Correlated traces, Top contributors, and Application logs associated with the selected point.

Correlated traces for faults and errors

Correlated traces

Choose a Trace ID from the Correlated traces table to open the X-Ray trace details page for the chosen trace. The trace details page contains a map of service nodes that are associated with the selected trace and a timeline of trace segments.

Top contributors

The Top contributors tab gives metrics for Call volume, Availability, Avg latency, Errors, and Faults, broken down by infrastructure components, as shown in the following example image for an application deployed on an EKS platform:

Service operation top contributors

In top contributors, you can group by pod, node or pod template hash. The following definitions apply:

  • Call volume - The number of requests per time interval.

  • Availability - The percentage of time over the time interval that no faults were detected.

  • Avg latency - The average time that the requests were running averaged over a time interval.

  • Errors - The number of errors per group selected measured over a time interval. You can group by pod, node or pod template hash.

  • Faults - The number of faults per time interval.

For more information about pods, nodes and pod template hashes, see the following paragraphs.

For applications deployed on Amazon EKS or Kubernetes the Top Contributors tab shows operational health metrics grouped by Node, Pod and PodTemplateHash. The following definitions apply:

  • A pod is a group of one or more Docker containers that share storage and resources. A pod is the smallest unit that can be deployed on a Kubernetes platform. Group by pods to check if errors are related to pod-specific limitations.

  • A node is a server that runs pods. Group by nodes to check if errors are related to node-specific limitations.

  • A pod template hash is used to find a particular version of a deployment. Group by pod template hash to check if errors are related to a particular deployment.

For more information about pods, see Using pod templates.

For applications deployed on Amazon EC2, the Top contributors tab shows operational health metrics grouped by instance ID, and auto scaling group. The following definitions apply:

  • An Instance ID is a unique identifier for the Amazon EC2 instance that your service runs. Group by instance ID to check if errors are related to a specific Amazon EC2 instance.

  • An auto scaling group is a collection of Amazon EC2 instances that allow you to scale up or down the resources you need to serve your application requests. Group by auto scaling group if you want to check if errors are limited in scope to the instances inside the group.

For applications deployed using custom instrumentation, the Top Contributors tab shows operational health metrics grouped by Host name. The following definitions apply:

  • A host name identifies a device such as an endpoint or Amazon EC2 instance that is connected to a network. Group by host name to check if your errors are related to a specific physical or virtual device.

View top contributors in Log Insights and Container Insights

You can refine the results by modifying the automatic query that generated metrics for your top contributors by viewing them in log insights. You can also view infrastructure performance metrics by specific groups such as pods or nodes in container insights. You can sort clusters, nodes or workloads by resource consumption and quickly identify anomalies or and mitigate risks pro-actively before end user experience is impacted. An image showing how to select these options follows:

Top contributors table

In Container Insights, you can view metrics for your Amazon EKS or Amazon ECS container that are specific to the grouping of your top contributors. For example, if you grouped by pod for an EKS container to generate top contributors, container insights will show metrics and statistics filtered for your pod.

In Log Insights, you can modify the query that generated the metrics under Top contributors using the following steps:

  1. Select View in Log Insights. The Logs Insights page that opens contains an query that is automatically generated and contains the following information:

    • The log cluster group name.

    • The operation that you were investigating with CloudWatch.

    • The aggregate of the operational health metric interacted with on the graph.

    The log results are automatically filtered to show data from the last five minutes before you selected the data point on the service graph.

  2. To edit the query, replace the generated text with your changes. You can also use the Query generator to help you generate a new query, or update the existing query.

Application logs

The Application logs tab shows a query that can generate application logs for your current log group, service and insert a timestamp. A log group is a group of log streams that you can define when you configure your application. The application logs query returns the logs, recurring text patterns and graphical visualizations for your log groups. For more information about log groups, see Working with log groups and log streams.

To run the query, select Run query in Logs Insights to either run the autogenerated query or modify the query. To edit the query, replace the autogenerated text with your changes. You can also use the Query generator to help you generate a new query or update the existing query.

The following image shows the sample query that is automatically generated based on the selected point in the service operations graph:

Application logs table

In the preceding image, CloudWatch has automatically detected the log group that is associated with your selected point, and included it in a generated query.

View your service dependencies

Choose the Dependencies tab to display the Dependencies table and a set of metrics for the dependencies of all service operations or a single operation. The table contains a list of dependencies discovered by Application Signals, including metrics for latency, call volume, fault rate, error rate, and availability.

At the top of the page, choose an operation from the drop-down list to view its dependencies, or choose All to see dependencies for all operations.

Filter the table to make it easier to find what you're looking for, by choosing one or more properties from the filter text box. As you choose each property, you are guided through filter criteria and will see the complete filter below the filter text box. Choose Clear filters at any time to remove the table filter. Select Group by Dependency at the top right of the table to group dependencies by service and operation name. When grouping is turned on, expand or collapse a group of dependencies with the + icon next to the dependency name.

Dependencies table

The Dependency column displays the dependency service name, while the Remote Operation column displays the service operation name. When calling Amazon services, the Target column displays the Amazon resource, such as DynamoDB table or Amazon SNS queue.

To select a dependency, select the option next to a dependency in the Dependencies table. This shows a set of graphs that display detailed metrics for call volume, availability, faults, and errors. Hover over a point in a graph to see a popup containing more information. Select a point in a graph to open a diagnostic pane that shows correlated traces for the selected point in the graph. Choose a trace ID from the Correlated traces table to open the X-Ray Trace details page for the selected trace.

Dependency graphs and correlated traces

View your Synthetics canaries

Choose the Synthetics Canaries tab to display the Synthetics Canaries table, and a set of metrics for each canary in the table. The table includes metrics for success percentage, average duration, runs, and failure rate. Only canaries that are enabled for Amazon X-Ray tracing are displayed.

Filter the table to make it easier to find what you're looking for by choosing one or more properties from the filter text box. As you choose each property, you are guided through filter criteria, and will see the complete filter below the filter text box. Choose Clear filters at any time to remove the table filter.

Synthetics canaries table

Select the option next to a canary in the table to select the canary and see a set of graphs that display detailed metrics success percentage and duration. Hover over a point in a graph to see a popup containing more information. Select a point in a graph to open a diagnostic drawer that shows correlated canary runs for the selected point in the graph. To open the CloudWatch Synthetics Canaries page, choose the Run time for a canary run.

Synthetics canary graphs and runs

View your client pages

Choose the Client pages tab to display the Client pages table and a set of metrics for the selected client page, including page loads, web vitals, and errors. The table contains a list of client pages that call your service.

To display your client pages in the table, configure your CloudWatch RUM web client for X-Ray tracing and turn on Application Signals metrics for your client pages. Choose Manage pages at the top right of the table to manage which pages are enabled for Application Signals metrics.

Filter the Client pages table to make it easier to find what you're looking for, by choosing one or more properties from the filter text box. As you choose each property, you are guided through filter criteria and will see the complete filter below the filter text box. Choose Clear filters at any time to remove the table filter. Select Group by Client to group client pages by client. When grouped, choose the + icon next to a client name to expand the row and see all pages for that client.

Client pages table

To select a client page, select the option next to a client page in the Client pages table. You will see a set of graphs that display detailed metrics. Hover over a point in a graph to see a popup containing more information. Select a point in a graph to open a diagnostic drawer that shows correlated performance navigation events for the selected point in the graph. Choose an event ID from the list of navigation events to open the CloudWatch RUM Page view for the chosen event.

CloudWatch RUM client page requests
Note

To see AJAX errors within your client pages, use the CloudWatch RUM web client version 1.15 or newer.

Currently, up to 100 operations, canaries, and client pages, and up to 250 dependencies, can be shown per service.