使用计划查询和原始数据进行深入分析 - Amazon Timestream
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅 中国的 Amazon Web Services 服务入门 (PDF)

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

使用计划查询和原始数据进行深入分析

您可以使用整个车队的汇总统计数据来确定需要向下钻取的区域,然后使用原始数据向下钻取精细数据,以获得更深入的见解。

在此示例中,您将看到如何使用聚合仪表板来识别与其他部署相比 CPU 利用率似乎更高的任何部署(部署适用于给定区域、单元、孤岛和可用区内的给定微服务)。然后,您可以使用原始数据进行深入研究,以便更好地理解。由于这些向下钻取可能很少见,并且只能访问与部署相关的数据,因此您可以使用原始数据进行此分析,而无需使用计划查询。

每次部署向下钻取

下面的仪表板提供了对给定部署中更精细的服务器级统计数据的深入分析。为了帮助您深入了解队列的不同部分,此仪表板使用了区域、单元、孤岛、微服务和 availability_zone 等变量。然后,它会显示该部署的一些汇总统计信息。

Dashboard showing deployment statistics with filters for region, cell, silo, and other parameters.
CPU distribution graph showing consistent patterns for avg, p90, p95, and p99 values over 24 hours.

在下面的查询中,您可以看到在变量下拉列表中选择的值被用作查询WHERE子句中的谓词,这使您可以只关注部署的数据。然后,该面板绘制了该部署中实例的聚合 CPU 指标。您可以使用原始数据在交互式查询延迟的情况下进行深入分析,以获得更深入的见解。

SELECT bin(time, 5m) as minute, ROUND(AVG(cpu_user), 2) AS avg_value, ROUND(APPROX_PERCENTILE(cpu_user, 0.9), 2) AS p90_value, ROUND(APPROX_PERCENTILE(cpu_user, 0.95), 2) AS p95_value, ROUND(APPROX_PERCENTILE(cpu_user, 0.99), 2) AS p99_value FROM "raw_data"."devops" WHERE time BETWEEN from_milliseconds(1636527099476) AND from_milliseconds(1636613499476) AND region = 'eu-west-1' AND cell = 'eu-west-1-cell-10' AND silo = 'eu-west-1-cell-10-silo-1' AND microservice_name = 'demeter' AND availability_zone = 'eu-west-1-3' AND measure_name = 'metrics' GROUP BY bin(time, 5m) ORDER BY 1

实例级统计信息

此仪表板进一步计算另一个变量,该变量还列出了 CPU 利用率较高的服务器/实例,按使用率降序排序。用于计算此变量的查询如下所示。

WITH microservice_cell_avg AS ( SELECT AVG(cpu_user) AS microservice_avg_metric FROM "raw_data"."devops" WHERE $__timeFilter AND measure_name = 'metrics' AND region = '${region}' AND cell = '${cell}' AND silo = '${silo}' AND availability_zone = '${availability_zone}' AND microservice_name = '${microservice}' ), instance_avg AS ( SELECT instance_name, AVG(cpu_user) AS instance_avg_metric FROM "raw_data"."devops" WHERE $__timeFilter AND measure_name = 'metrics' AND region = '${region}' AND cell = '${cell}' AND silo = '${silo}' AND microservice_name = '${microservice}' AND availability_zone = '${availability_zone}' GROUP BY availability_zone, instance_name ) SELECT i.instance_name FROM instance_avg i CROSS JOIN microservice_cell_avg m WHERE i.instance_avg_metric > (1 + ${utilization_threshold}) * m.microservice_avg_metric ORDER BY i.instance_avg_metric DESC

在前面的查询中,变量是根据为其他变量选择的值动态重新计算的。为部署填充变量后,您可以从列表中选择单个实例,以进一步可视化该实例的指标。您可以从实例名称的下拉列表中选择不同的实例,如下面的快照所示。

List of Amazon Web Services (Amazon) resource identifiers for Demeter instances in eu-west-1 region.
Dashboard showing CPU utilization, memory usage, GC pause events, and disk I/O metrics for an Amazon instance.

前面的面板显示所选实例的统计信息,下面是用于获取这些统计信息的查询。

SELECT BIN(time, 30m) AS time_bin, AVG(cpu_user) AS avg_cpu, ROUND(APPROX_PERCENTILE(cpu_user, 0.99), 2) as p99_cpu FROM "raw_data"."devops" WHERE time BETWEEN from_milliseconds(1636527099477) AND from_milliseconds(1636613499477) AND measure_name = 'metrics' AND region = 'eu-west-1' AND cell = 'eu-west-1-cell-10' AND silo = 'eu-west-1-cell-10-silo-1' AND availability_zone = 'eu-west-1-3' AND microservice_name = 'demeter' AND instance_name = 'i-zaZswmJk-demeter-eu-west-1-cell-10-silo-1-00000272.amazonaws.com' GROUP BY BIN(time, 30m) ORDER BY time_bin desc
SELECT BIN(time, 30m) AS time_bin, AVG(memory_used) AS avg_memory, ROUND(APPROX_PERCENTILE(memory_used, 0.99), 2) as p99_memory FROM "raw_data"."devops" WHERE time BETWEEN from_milliseconds(1636527099477) AND from_milliseconds(1636613499477) AND measure_name = 'metrics' AND region = 'eu-west-1' AND cell = 'eu-west-1-cell-10' AND silo = 'eu-west-1-cell-10-silo-1' AND availability_zone = 'eu-west-1-3' AND microservice_name = 'demeter' AND instance_name = 'i-zaZswmJk-demeter-eu-west-1-cell-10-silo-1-00000272.amazonaws.com' GROUP BY BIN(time, 30m) ORDER BY time_bin desc
SELECT COUNT(gc_pause) FROM "raw_data"."devops" WHERE time BETWEEN from_milliseconds(1636527099477) AND from_milliseconds(1636613499478) AND measure_name = 'events' AND region = 'eu-west-1' AND cell = 'eu-west-1-cell-10' AND silo = 'eu-west-1-cell-10-silo-1' AND availability_zone = 'eu-west-1-3' AND microservice_name = 'demeter' AND instance_name = 'i-zaZswmJk-demeter-eu-west-1-cell-10-silo-1-00000272.amazonaws.com'
SELECT avg(gc_pause) as avg, round(approx_percentile(gc_pause, 0.99), 2) as p99 FROM "raw_data"."devops" WHERE time BETWEEN from_milliseconds(1636527099478) AND from_milliseconds(1636613499478) AND measure_name = 'events' AND region = 'eu-west-1' AND cell = 'eu-west-1-cell-10' AND silo = 'eu-west-1-cell-10-silo-1' AND availability_zone = 'eu-west-1-3' AND microservice_name = 'demeter' AND instance_name = 'i-zaZswmJk-demeter-eu-west-1-cell-10-silo-1-00000272.amazonaws.com'
SELECT BIN(time, 30m) AS time_bin, AVG(disk_io_reads) AS avg, ROUND(APPROX_PERCENTILE(disk_io_reads, 0.99), 2) as p99 FROM "raw_data"."devops" WHERE time BETWEEN from_milliseconds(1636527099478) AND from_milliseconds(1636613499478) AND measure_name = 'metrics' AND region = 'eu-west-1' AND cell = 'eu-west-1-cell-10' AND silo = 'eu-west-1-cell-10-silo-1' AND availability_zone = 'eu-west-1-3' AND microservice_name = 'demeter' AND instance_name = 'i-zaZswmJk-demeter-eu-west-1-cell-10-silo-1-00000272.amazonaws.com' GROUP BY BIN(time, 30m) ORDER BY time_bin desc