创建不进行分区的 Amazon WAF 日志表 - Amazon Athena
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅 中国的 Amazon Web Services 服务入门 (PDF)

创建不进行分区的 Amazon WAF 日志表

本节介绍如何创建不进行分区或分区投影的 Amazon WAF 日志表。

注意

出于性能和成本原因,不建议使用非分区架构进行查询。有关更多信息,请参阅 Amazon 大数据博客中的 Top 10 Performance Tuning Tips for Amazon Athena(Amazon Athena 的十大性能优化技巧)。

创建 Amazon WAF 表
  1. 将以下 DDL 语句复制并粘贴到 Athena 控制台中。根据需要修改字段以匹配您的日志输出。修改 Amazon S3 存储桶的 LOCATION 以对应用于存储日志的存储桶。

    此查询使用 OpenX JSON SerDe

    注意

    SerDe 期望每个 JSON 文档都位于单行文本中,并且不使用行终止字符分隔记录中的字段。如果 JSON 文本采用美观的打印格式,当您在创建表后尝试对其进行查询时,可能会收到类似以下内容的错误消息:HIVE_CURSOR_ERROR: Row is not a valid JSON Object(HIVE_CURSOR_ERROR:行不是有效的 JSON 对象)或 HIVE_CURSOR_ERROR: JsonParseException: Unexpected end-of-input: expected close marker for OBJECT(HIVE_CURSOR_ERROR:JsonParseException:意外的输入结束:对象的预期关闭标记)。有关更多信息,请参阅 GitHub 上 OpenX SerDe 文档中的 JSON 数据文件

    CREATE EXTERNAL TABLE `waf_logs`( `timestamp` bigint, `formatversion` int, `webaclid` string, `terminatingruleid` string, `terminatingruletype` string, `action` string, `terminatingrulematchdetails` array < struct < conditiontype: string, sensitivitylevel: string, location: string, matcheddata: array < string > > >, `httpsourcename` string, `httpsourceid` string, `rulegrouplist` array < struct < rulegroupid: string, terminatingrule: struct < ruleid: string, action: string, rulematchdetails: array < struct < conditiontype: string, sensitivitylevel: string, location: string, matcheddata: array < string > > > >, nonterminatingmatchingrules: array < struct < ruleid: string, action: string, overriddenaction: string, rulematchdetails: array < struct < conditiontype: string, sensitivitylevel: string, location: string, matcheddata: array < string > > >, challengeresponse: struct < responsecode: string, solvetimestamp: string >, captcharesponse: struct < responsecode: string, solvetimestamp: string > > >, excludedrules: string > >, `ratebasedrulelist` array < struct < ratebasedruleid: string, limitkey: string, maxrateallowed: int > >, `nonterminatingmatchingrules` array < struct < ruleid: string, action: string, rulematchdetails: array < struct < conditiontype: string, sensitivitylevel: string, location: string, matcheddata: array < string > > >, challengeresponse: struct < responsecode: string, solvetimestamp: string >, captcharesponse: struct < responsecode: string, solvetimestamp: string > > >, `requestheadersinserted` array < struct < name: string, value: string > >, `responsecodesent` string, `httprequest` struct < clientip: string, country: string, headers: array < struct < name: string, value: string > >, uri: string, args: string, httpversion: string, httpmethod: string, requestid: string >, `labels` array < struct < name: string > >, `captcharesponse` struct < responsecode: string, solvetimestamp: string, failureReason: string >, `challengeresponse` struct < responsecode: string, solvetimestamp: string, failureReason: string >, `ja3Fingerprint` string, `oversizefields` string, `requestbodysize` int, `requestbodysizeinspectedbywaf` int ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://amzn-s3-demo-bucket/prefix/'
  2. 在 Athena 控制台查询编辑器中运行 CREATE EXTERNAL TABLE 语句。这将注册 waf_logs 表,并使其中的数据可用于来自 Athena 的查询。