Generating column statistics for Iceberg tables
Follow these steps to configure a schedule for generating statistics in the Data Catalog using Amazon Glue console or Amazon CLI or the or run the StartColumnStatisticsTaskRun operation.
To generate column statistics
-
Sign in to the Amazon Glue console at https://console.amazonaws.cn/glue/
. -
Choose Tables under Data Catalog .
-
Choose an Iceberg table from the list.
-
Choose Column statistics, Generate on demand,under Actions menu.
You can also choose Generate statistics button under Column statistics tab in the lower section of the Tables page.
-
On the Generate statistics page, provide the statistics generation details. Follow steps 6-11 in the Generating column statistics on a schedule section to configure a schedule for statistics generation for Iceberg tables.
You can also choose to generate column statistics on-demand by followin the instructions in the Generating column statistics on demand
Note
Sampling option is not available for Iceberg tables.
Amazon Glue calculates the number of distinct values for each column of the Iceberg table to a new Puffin file committed to the specified snapshot ID in your Amazon S3 location.