How Amazon CloudWatch Internet Monitor works
This section provides information about how Amazon CloudWatch Internet Monitor works. This includes descriptions of how Amazon collects the data that it uses to help detect connectivity issues across the internet, and how performance and availability scores are calculated.
Contents
- The Amazon advantage
Internet Monitor focuses monitoring on just the subset of the internet that's accessed by the users of your Amazon resources, instead of broadly monitoring your website from every Region in the world as other tools do. It’s also a cost effective solution, affordable for large and small companies.
Internet Monitor uses the same powerful probes and issue-detection algorithms that Amazon takes advantage of internally and alerts you to connectivity issues that affect your application by creating health events in Internet Monitor. Internet Monitor then gives you access to the resulting performance and availability map, by overlaying the traffic profile that it creates from your active viewers, based on your application resources.
Using this information, Internet Monitor shows you just relevant events (that is, the events from places where you have active viewers), and just the impact those events have on your overall viewer volume. So, how much an impact an event has, percentage-wise, is based on your total traffic world-wide.
Internet Monitor publishes to CloudWatch Logs internet measurements every five minutes for the top 500 city-networks (client locations and ASNs, typically internet service providers or ISPs) that send traffic to each monitor. Optionally, you can choose to publish internet measurements for all monitored city-networks (up to the 500,000 city-networks service limit) to an Amazon S3 bucket. For more information, see Publishing internet measurements to Amazon S3 in Amazon CloudWatch Internet Monitor.
The benefits of Internet Monitor include the following:
Using Internet Monitor doesn't place additional load or cost on your application that's hosted on Amazon.
You don't need to include performance measurement code in your client-side resources, or in your application.
You can get visibility into performance and availability across the internet that your application is connected to, including "last mile" information.
Note that because Internet Monitor creates measurements based on your Amazon resources, Internet Monitor only creates events that are specific to your application traffic. Global internet issues in general are not reported. In addition, when the service location is an Amazon Web Services Region, the measurements and events emitted are designed to represent connectivity at a Regional level and don’t accurately represent connectivity between an end user location and an Availability Zone.
- How Amazon measures connectivity issues
Amazon CloudWatch Internet Monitor uses internet connectivity data between different Amazon Web Services Regions and Amazon CloudFront points of presence (POPs) to different locations through networks or Autonomous System Numbers (ASNs), typically internet service providers (ISPs). This connectivity data is used internally by operators in Amazon, on a daily basis, to proactively detect connectivity issues across the global internet.
For every Amazon Web Services Region, we know which portions of the internet communicate with the Region and do the following:
We actively monitor those portions of the internet, with a rolling 30-day window.
We use both network and higher-level protocol probes, including both inbound and outbound probing.
- How Amazon calculates availability and RTT
Amazon has active and passive probes that measure the latency (performance) at the 90th percentile and reachability (availability) from every Amazon Web Services Region and from the CloudFront service to the entire internet. Abnormal patterns in connectivity between a service and a customer location is monitored, and then reported as alerts to the customer.
Round-trip time (RTT) is how long it takes for a request from the user to return a response to the user. When round-trip time is aggregated across end user locations, the value is weighted by the amount of your traffic that is driven by each end user location.
As an example, with two end user locations, one serving 90% of traffic with a 5 ms RTT, and the other serving 10% of traffic with a 10 ms RTT, the result is an aggregated RTT of 5.5 ms (which comes from 5 ms * 0.9 + 10 ms * 0.1).
Note that there are differences for resources about measuring last-mile latency. For Internet Monitor latency measurements, VPCs, Network Load Balancers, and WorkSpaces directories do not include last-mile latency.
- How Internet Monitor calculates performance and availability scores
Amazon has substantial historical data about internet performance and availability between Amazon services and different city-networks (locations and ASNs). By applying statistical analysis to the data, Internet Monitor can detect when the performance and availability for your application has dropped, compared to an estimated baseline that it has calculated. To make it easier to see those drops, that information is reported to you in the form of health scores: a performance score and an availability score.
Health scores are calculated at different granularities. At the finest granularity, we compute the health score for a geographic region, such as a city or a metro area, and an ASN (a city-network). We also roll up the individual health scores to overall health score numbers for an application in a monitor. If you view performance or availability scores without filtering for any specific geography or service provider, Internet Monitor provides overall health scores.
Overall health scores span your whole application for the specified time period. When the performance or availability score for your application's city-network pairs across your application reaches or drops below the corresponding health event threshold for performance or availability Internet Monitor triggers a health event. By default, the threshold is 95% for both overall performance and availability. Internet Monitor also creates health events based on local thresholds—if the option is enabled, as it is by default—based on values that you configure. To learn more about configuring health event thresholds, see Change health event thresholds.
When you explore information in the monitor and log files to investigate issues and learn more, you can filter by specific cities (locations), networks (ASNs or internet service providers), or both. So, you can use filters to see health scores for different cities, ASNs, or city-network pairs, depending on the filters that you choose.
An availability score represents the estimated percentage of traffic that is not seeing an availability drop. Internet Monitor estimates the percentage of traffic experiencing a drop from the total traffic seen and availability metrics measurements. For example, an availability score of 99% for an end user and service location pair is equivalent to 1% of the traffic experiencing an availability drop for that pair.
A performance score represents the percentage of traffic that is not seeing a performance drop. For example, a performance score of 99% for an end user and service location pair is equivalent to 1% of the traffic experiencing a performance drop for that pair.
- Geolocation accuracy in Internet Monitor
-
For location information, Internet Monitor uses IP-geolocation data supplied by MaxMind
. The accuracy of the location information in Internet Monitor measurements depends on the accuracy of MaxMind's data. - What Internet Monitor includes in calculations for TTFB and RTT (latency)
-
Time to first byte (TTFB) refers to the time between when a client makes a request and when it receives the first byte of information from the server. Amazon calculations for TTFB measure the time elapsed from Amazon EC2 or Amazon CloudFront to the Internet Monitor measurement node (including the last mile of the node). That is, Internet Monitor measures time from the user to the Amazon EC2 Region for TTFB for EC2, and from the user to CloudFront for TTFB for CloudFront.
For round-trip time (RTT), Internet Monitor includes the time from the city-network (that is, the client location and ASN, typically an internet service provider), as mapped by the public IP address, to the Amazon Web Services Region. This means that Internet Monitor does not have last mile visibility for users who access the internet from behind a gateway or VPN.
Note that there are differences for resources about measuring last-mile latency. For Internet Monitor latency measurements, VPCs, Network Load Balancers, and WorkSpaces directories do not include last-mile latency.
Internet Monitor includes average TTFB information in the Traffic optimization suggestions section of the Traffic insights tab on the CloudWatch dashboard, to help you evaluate options for different setups for your application that can improve performance.
- When Internet Monitor creates and resolves health events
Internet Monitor creates and closes health events for the application traffic that you monitor based on the current thresholds that are set. Internet Monitor has a default threshold configuration, and you can also set your own configuration for thresholds. Internet Monitor determines the overall impact that connectivity issues are having on your application, and the impact on local areas where your application has clients, and creates health events when the thresholds are crossed.
Internet Monitor calculates the impact of connectivity issues on a client location based on the historical data about internet performance and availability for network traffic that's available to the service through Amazon. It applies the information relevant to your application, based on the geographic locations for ASNs and services where clients use your application: the city-network pairs that are affected. The locations are determined from the resources that you add to your monitor. Then Internet Monitor uses statistical analysis to detect when performance and availability has dropped, affecting the client experience for your application.
The performance and availability scores that Internet Monitor calculates are represented as the percentage of traffic that is not seeing a drop. Impact is the opposite of this: it's a representation of how much an issue is problematic for a customer's end users. So if there is a global availability drop of 93%, for example, the corresponding impact would be 7%.
When the performance or availability score for your application's city-network pairs globally reaches or drops below the corresponding health event threshold for performance or availability, this triggers Internet Monitor to generate a health event. By default, the threshold is 95% for both performance and availability. The values to meet, or drop below, the threshold are cumulative, so it could mean several smaller events combine to meet the threshold percentage, or that a single event meets or falls below the threshold level.
As long as performance or availability scores that triggered the event are at or below the corresponding health event threshold percentage for overall impact, the health event stays active. When the score or combined scores that triggered the event rise above the threshold, Internet Monitor resolves the health event.
Internet Monitor also creates health events based on local thresholds and the percentage of overall traffic that an issue has an impact on. You can configure options for local thresholds, or turn off local thresholds altogether.
To learn more about configuring health event thresholds, see Change health event thresholds.
- Health event report timing
Internet Monitor uses an aggregator to gather all signals about internet issues, to create health events in monitors within minutes.
When possible, Internet Monitor analyzes the origin of a health event, to determine whether it was caused by Amazon or an ASN. Health event analysis continues after an event is resolved. Internet Monitor can update events with new information for up to an hour.
- How Internet Monitor works with IPv4 and IPv6 traffic
Internet Monitor measures health toward a network over only IPv4, and shows you health events, and availability and performance metrics, if you serve traffic to that network over any IP family (IPv4 or IPv6). If you serve traffic from a dual-stack resource, such as a dual-stack CloudFront distribution, Internet Monitor raises a health event and shows a drop in a performance score or availability score only if IPv4 traffic has the same issues for the resoure as IPv6 traffic does.
Note that the Internet Monitor metrics for overall bytes in and bytes out accurately reflect all internet traffic (IPv4 and IPv6).