

# Create tables for ETL jobs
<a name="schema-classifier"></a>

You can use Athena to create tables that Amazon Glue can use for ETL jobs. Amazon Glue jobs perform ETL operations. An Amazon Glue job runs a script that extracts data from sources, transforms the data, and loads it into targets. For more information, see [Authoring Jobs in Amazon Glue](https://docs.amazonaws.cn/glue/latest/dg/author-job-glue.html) in the *Amazon Glue Developer Guide*.

## Creating Athena tables for Amazon Glue ETL jobs
<a name="schema-etl-tables"></a>

Tables that you create in Athena must have a table property added to them called a `classification`, which identifies the format of the data. This allows Amazon Glue to use the tables for ETL jobs. The classification values can be `avro`, `csv`, `json`, `orc`, `parquet`, or `xml`. An example `CREATE TABLE` statement in Athena follows:

```
CREATE EXTERNAL TABLE sampleTable (
  column1 INT,
  column2 INT
  ) STORED AS PARQUET
  TBLPROPERTIES (
  'classification'='parquet')
```

If the `classification` table property was not added when the table was created, you can add it using the Amazon Glue console.

**To add the classification table property using the Amazon Glue console**

1. Sign in to the Amazon Web Services Management Console and open the Amazon Glue console at [https://console.amazonaws.cn/glue/](https://console.amazonaws.cn/glue/).

1. In the console navigation pane, choose **Tables**.

1. Choose the link for the table that you want to edit, and then choose **Actions**, **Edit table**.

1. Scroll down to the **Table properties** section.

1. Choose **Add**.

1. For **Key**, enter **classification**.

1. For **Value**, enter a data type (for example, **json**).

1. Choose **Save**.

   In the **Table details** section, the data type that you entered appears in the **Classification** field for the table.

For more information, see [Working with tables](https://docs.amazonaws.cn/glue/latest/dg/console-tables.html) in the *Amazon Glue Developer Guide*.

## Use ETL jobs to optimize query performance
<a name="schema-etl-performance"></a>

Amazon Glue jobs can help you transform data to a format that optimizes query performance in Athena. Data formats have a large impact on query performance and query costs in Athena.

Amazon Glue supports writing to the Parquet and ORC data formats. You can use this feature to transform your data for use in Athena. For more information about using Parquet and ORC, and other ways to improve performance in Athena, see [Top 10 performance tuning tips for Amazon Athena](https://amazonaws-china.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/).

**Note**  
To reduce the likelihood that Athena is unable to read the `SMALLINT` and `TINYINT` data types produced by an Amazon Glue ETL job, convert `SMALLINT` and `TINYINT` to `INT` when you create an ETL job that converts data to ORC.

## Automate Amazon Glue jobs for ETL
<a name="schema-etl-automate"></a>

You can configure Amazon Glue ETL jobs to run automatically based on triggers. This feature is ideal when data from outside Amazon is being pushed to an Amazon S3 bucket in an otherwise suboptimal format for querying in Athena. For more information, see [Triggering Amazon Glue jobs](https://docs.amazonaws.cn/glue/latest/dg/trigger-job.html) in the *Amazon Glue Developer Guide*.