Generating column statistics - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Generating column statistics

Follow these steps to manage statistics generation in the Data Catalog using Amazon Glue console or Amazon CLI.

Console
To generate column statistics using the console
  1. Sign in to the Amazon Glue console at https://console.amazonaws.cn/glue/.

  2. Choose Data Catalog tables.

  3. Choose a table from the list.

  4. Choose Generate statistics under Actions menu.

    You can also choose Generate statistics button under Column statistics tab in the lower section of the Tables page.

  5. On the Generate statistics page, specify the following options:

    The screenshot shows the options available to generate column stats.
    • Table (all columns) – Choose this option to generate statistics for all columns in the table.

    • Selected columns – Choose this option to generate statistics for specific columns. You can select the columns from the drop-down list.

    • All rows – Choose all rows from the table to generate accurate statistics.

    • Sample rows – Choose only a specific percent of rows from the table to generate statistics. The default is all rows. Use the up and down arrows to increase or decrease the percent value.

      Note

      We recommend to include all rows in the table to compute accurate statistics. Use sample rows to generate column statistics only when approximate values are acceptable.

  6. (Optional) Next, choose a security configuration to enable at-rest encryption for logs.

  7. Choose Generate statistics to run the process.

Amazon CLI

In the following example, replace values for DatabaseName, TableName, and ColumnNameList with actual database, table, and column names. Replace account ID with a valid Amazon Web Services account, and role name with the name of the IAM role that you're using to generate statistics.

aws glue start-column-statistics-task-run --input-cli-json file://input.json { "DatabaseName": "<test-db>", "TableName": "<test-table>", "ColumnNameList": [ "<column1>", "<column2>", ], "Role": "arn:aws:iam::<123456789012>:role/<Stats-Role>", "SampleSize": 10.0 }

You can generate column statistics also by calling the StartColumnStatisticsTaskRun operation.