每台设备的最后一点 - Amazon Timestream
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅 中国的 Amazon Web Services 服务入门 (PDF)

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

每台设备的最后一点

您的应用程序可能会要求您读取设备发出的最后一次测量值。在给定设备之前,可能有更通用的用例来获取设备的最后一次测量值date/time or the first measurement for a device after a given date/time。当您拥有数百万台设备和多年的数据时,此搜索可能需要扫描大量数据。

在下面,您将看到一个示例,说明如何使用计划查询来优化对设备发出的最后一个点的搜索。如果您的应用程序需要,也可以使用相同的模式来优化第一个点查询。

根据源表计算

以下是一个查询示例,用于查找特定部署(例如,给定区域内给定微服务的服务器、单元、筒仓和 availability_zone)中发布的最后一个测量值。在示例应用程序中,此查询将返回数百台服务器的最后一次测量值。另请注意,此查询具有无限的时间谓词,它会查找任何早于给定时间戳的数据。

注意

有关maxmax_by函数的信息,请参见聚合函数

SELECT instance_name, MAX(time) AS time, MAX_BY(gc_pause, time) AS last_measure FROM "raw_data"."devops" WHERE time < from_milliseconds(1636685271872) AND measure_name = 'events' AND region = 'us-east-1' AND cell = 'us-east-1-cell-10' AND silo = 'us-east-1-cell-10-silo-3' AND availability_zone = 'us-east-1-1' AND microservice_name = 'hercules' GROUP BY region, cell, silo, availability_zone, microservice_name, instance_name, process_name, jdk_version ORDER BY instance_name, time DESC

要按每日粒度预先计算的派生表

您可以将前面的用例转换为计划计算。如果您的应用程序要求要求您可能需要跨多个区域、单元、孤岛、可用区和微服务获取整个队列的这些值,则可以使用一次计划计算来预先计算整个队列的值。这就是 Timestream LiveAnalytics 的无服务器定时查询的强大功能,它允许这些查询根据应用程序的扩展要求进行扩展。

以下是一个查询,用于预先计算给定日期所有服务器的最后一个点。请注意,查询只有时间谓词,没有维度的谓词。时间谓词将查询限制在根据指定的计划表达式触发计算之日起的最后一天。

SELECT region, cell, silo, availability_zone, microservice_name, instance_name, process_name, jdk_version, MAX(time) AS time, MAX_BY(gc_pause, time) AS last_measure FROM raw_data.devops WHERE time BETWEEN bin(@scheduled_runtime, 1d) - 1d AND bin(@scheduled_runtime, 1d) AND measure_name = 'events' GROUP BY region, cell, silo, availability_zone, microservice_name, instance_name, process_name, jdk_version

以下是使用前面的查询进行计划计算的配置,该查询在世界标准时间每天 01:00 执行该查询,以计算过去一天的聚合。调度表达式 cron (0 1 * *? *) 控制此行为,并在一天结束一小时后运行,以考虑迟到一天的任何数据。

{ "Name": "PT1DPerInstanceLastpoint", "QueryString": "SELECT region, cell, silo, availability_zone, microservice_name, instance_name, process_name, jdk_version, MAX(time) AS time, MAX_BY(gc_pause, time) AS last_measure FROM raw_data.devops WHERE time BETWEEN bin(@scheduled_runtime, 1d) - 1d AND bin(@scheduled_runtime, 1d) AND measure_name = 'events' GROUP BY region, cell, silo, availability_zone, microservice_name, instance_name, process_name, jdk_version", "ScheduleConfiguration": { "ScheduleExpression": "cron(0 1 * * ? *)" }, "NotificationConfiguration": { "SnsConfiguration": { "TopicArn": "******" } }, "TargetConfiguration": { "TimestreamConfiguration": { "DatabaseName": "derived", "TableName": "per_timeseries_lastpoint_pt1d", "TimeColumn": "time", "DimensionMappings": [ { "Name": "region", "DimensionValueType": "VARCHAR" }, { "Name": "cell", "DimensionValueType": "VARCHAR" }, { "Name": "silo", "DimensionValueType": "VARCHAR" }, { "Name": "availability_zone", "DimensionValueType": "VARCHAR" }, { "Name": "microservice_name", "DimensionValueType": "VARCHAR" }, { "Name": "instance_name", "DimensionValueType": "VARCHAR" }, { "Name": "process_name", "DimensionValueType": "VARCHAR" }, { "Name": "jdk_version", "DimensionValueType": "VARCHAR" } ], "MultiMeasureMappings": { "TargetMultiMeasureName": "last_measure", "MultiMeasureAttributeMappings": [ { "SourceColumn": "last_measure", "MeasureValueType": "DOUBLE" } ] } } }, "ErrorReportConfiguration": { "S3Configuration" : { "BucketName" : "******", "ObjectKeyPrefix": "errors", "EncryptionOption": "SSE_S3" } }, "ScheduledQueryExecutionRoleArn": "******" }

根据派生表计算

使用上述配置定义派生表,并且至少有一个定时查询实例已将数据实体化到派生表中后,您现在可以查询派生表以获取最新的测量结果。以下是对派生表的查询示例。

SELECT instance_name, MAX(time) AS time, MAX_BY(last_measure, time) AS last_measure FROM "derived"."per_timeseries_lastpoint_pt1d" WHERE time < from_milliseconds(1636746715649) AND measure_name = 'last_measure' AND region = 'us-east-1' AND cell = 'us-east-1-cell-10' AND silo = 'us-east-1-cell-10-silo-3' AND availability_zone = 'us-east-1-1' AND microservice_name = 'hercules' GROUP BY region, cell, silo, availability_zone, microservice_name, instance_name, process_name, jdk_version ORDER BY instance_name, time DESC

合并源表和派生表

与前面的示例类似,派生表中的任何数据都不会有最新的写入数据。因此,您可以再次使用与之前类似的模式来合并来自派生表的数据以获取较旧的数据,并将源数据用于剩余的小费。以下是使用类似的 UNION 方法进行此类查询的示例。由于应用程序要求是在某个时间段之前找到最新的测量值,而这个开始时间可能已经过去,因此编写此查询的方法是使用提供的时间,使用指定时间起最多一天的源数据,然后对较旧的数据使用派生表。从下面的查询示例中可以看出,源数据上的时间谓词是有界限的。这样可以确保对数据量明显更高的源表进行高效处理,然后在派生表上使用无限的时间谓词。

WITH last_point_derived AS ( SELECT instance_name, MAX(time) AS time, MAX_BY(last_measure, time) AS last_measure FROM "derived"."per_timeseries_lastpoint_pt1d" WHERE time < from_milliseconds(1636746715649) AND measure_name = 'last_measure' AND region = 'us-east-1' AND cell = 'us-east-1-cell-10' AND silo = 'us-east-1-cell-10-silo-3' AND availability_zone = 'us-east-1-1' AND microservice_name = 'hercules' GROUP BY region, cell, silo, availability_zone, microservice_name, instance_name, process_name, jdk_version ), last_point_source AS ( SELECT instance_name, MAX(time) AS time, MAX_BY(gc_pause, time) AS last_measure FROM "raw_data"."devops" WHERE time < from_milliseconds(1636746715649) AND time > from_milliseconds(1636746715649) - 26h AND measure_name = 'events' AND region = 'us-east-1' AND cell = 'us-east-1-cell-10' AND silo = 'us-east-1-cell-10-silo-3' AND availability_zone = 'us-east-1-1' AND microservice_name = 'hercules' GROUP BY region, cell, silo, availability_zone, microservice_name, instance_name, process_name, jdk_version ) SELECT instance_name, MAX(time) AS time, MAX_BY(last_measure, time) AS last_measure FROM ( SELECT * FROM last_point_derived UNION SELECT * FROM last_point_source ) GROUP BY instance_name ORDER BY instance_name, time DESC

前面只是如何构造派生表的一个例子。如果您有多年的数据,则可以使用更多级别的聚合。例如,您可以在每日聚合之上设置每月汇总,也可以在每日汇总之前设置每小时汇总。因此,你可以合并最近填写最近一小时,每小时填写最后一天,每天填写最后一个月,每月填写较早的时间。您设置的级别数量与刷新计划之间的差异将取决于您对这些查询出现问题的频率以及同时发出这些查询的用户数量的要求。