Choose a SerDe for your data
The following table lists the data formats supported in Athena and their corresponding SerDe libraries.
Data format | Description | SerDe types supported in Athena |
---|---|---|
Amazon Ion | Amazon Ion is a richly-typed, self-describing data format that is a superset of JSON, developed and open-sourced by Amazon. | Use the Amazon Ion Hive SerDe. |
Apache Avro |
A format for storing data in Hadoop that uses JSON-based schemas for record values. |
Use the Avro SerDe. |
Apache Parquet |
A format for columnar storage of data in Hadoop. |
Use the Parquet SerDe and SNAPPY compression. |
Apache WebServer logs |
A format for storing logs in Apache WebServer. |
Use the Grok SerDe or Regex SerDe. |
CloudTrail logs |
A format for storing logs in CloudTrail. |
|
CSV (Comma-Separated Values) |
For data in CSV, each line represents a data record, and each record consists of one or more fields, separated by commas. |
|
Custom-Delimited |
For data in this format, each line represents a data record, and records are separated by a custom single-character delimiter. |
Use the Lazy Simple SerDe for CSV, TSV, and custom-delimited files and specify a custom single-character delimiter. |
JSON (JavaScript Object Notation) |
For JSON data, each line represents a data record, and each record consists of attribute-value pairs and arrays, separated by commas. |
|
Logstash logs |
A format for storing logs in Logstash. |
Use the Grok SerDe. |
ORC (Optimized Row Columnar) |
A format for optimized columnar storage of Hive data. |
Use the ORC SerDe and ZLIB compression. |
TSV (Tab-Separated Values) |
For data in TSV, each line represents a data record, and each record consists of one or more fields, separated by tabs. |
Use the Lazy Simple SerDe for CSV, TSV, and custom-delimited
files and specify the separator
character as |