Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions,
see Getting Started with Amazon Web Services in China
(PDF).
Managing the schedule for column statistics generation
You can manage the scheduling operations such as updating, starting, stopping, and
deleting schedules for the column statistics generation in Amazon Glue. You can use Amazon Glue
console, Amazon CLI, or Amazon Glue column statistics API operations to perform these tasks.
Updating the column statistics generation
schedule
You can update the schedule to trigger the column statistics generation task after it
has been created. You can use the Amazon Glue console, Amazon CLI, or run the
UpdateColumnStatisticsTaskSettings operation to update the
schedule for a table. You can modify the parameters of an existing schedule, such as the
schedule type (on-demand, or scheduled) and other optional parameters.
- Amazon Web Services Management Console
To update the settings for a column statistics generation task
Sign in to the Amazon Glue console at https://console.amazonaws.cn/glue/.
Choose the table that you want to update from the tables list.
In the lower section of the table details page, choose Column statistics.
Under Actions, choose Edit to update the schedule.
Make the desired changes to the schedule, and choose Save.
- Amazon CLI
-
If you are not using Amazon Glue's statistics generation feature in
the console, you can manually update the schedule using the update-column-statistics-task-settings
command. The following example shows how to update column statistics
using Amazon CLI.
aws glue update-column-statistics-task-settings \
--database-name 'database_name
' \
--table-name 'table_name
' \
--role arn:aws:iam::123456789012
:role/stats_role
\
--schedule 'cron(0 0-5 16 * * ?)
' \
--column-name-list 'col-1
' \
--sample-size '20.0
' \
--catalog-id '123456789012
'\
--security-configuration 'test-security
'
Stopping the schedule for column statistics generation
If you no longer need the incremental statistics, you can stop the scheduled generation to save resources and costs.
Pausing the schedule doesn't impact the previously generated statistics. You can resume the schedule at your convenience.
- Amazon Web Services Management Console
To stop the schedule for a column statistics generation task
On Amazon Glue console, choose Tables under Data Catalog.
Select a table with column statistics.
On the Table details page, choose Column statistics.
Under Actions, choose Scheduled generation, Pause.
Choose Pause to confirm.
- Amazon CLI
-
To stop a column statistics task run schedule using the Amazon CLI, you can use the following command:
aws glue stop-column-statistics-task-run-schedule \
--database-name ''database_name
' \
--table-name 'table_name
'
Replace the database_name
and the table_name
with the actual names of the database and table for which you want to stop the column statistics task run schedule.
Resuming the schedule for column statistics generation
If you've paused the statistics generation schedule, Amazon Glue allows you to resume
the schedule at your convenience. You can resume the schedule using the Amazon Glue
console, Amazon CLI, or the StartColumnStatisticsTaskRunSchedule operation.
- Amazon Web Services Management Console
To resume the schedule for column statistics generation
On Amazon Glue console, choose Tables under Data Catalog.
Select a table with column statistics.
On the Table details page, choose Column statistics.
Under Actions, choose Scheduled generation, and choose Resume.
Choose Resumeto confirm.
- Amazon CLI
-
Replace the database_name
and the table_name
with the actual names of the database and table for which you want to stop the column statistics task run schedule.
aws glue start-column-statistics-task-run-schedule \
--database-name 'database_name
' \
--table-name 'table_name
'
Deleting column statistics generation schedule
While maintaining up-to-date statistics is generally recommended for optimal query performance, there are specific use cases where removing the automatic generation schedule might be beneficial.
If the data remains relatively static, the existing column statistics may remain accurate for an extended period,
reducing the need for frequent updates. Deleting the schedule can prevent unnecessary resource consumption and overhead associated with regenerating statistics on unchanging data.
When manual control over statistics generation is preferred. By deleting the automatic
schedule, administrators can selectively update column statistics at
specific intervals or after significant data changes, aligning the process
with their maintenance strategies and resource allocation needs.
- Amazon Web Services Management Console
To delete the schedule for column statistics generation
On Amazon Glue console, choose Tables under Data Catalog.
Select a table with column statistics.
On the Table details page, choose Column statistics.
Under Actions, choose Scheduled generation, Delete.
Choose Deleteto confirm.
- Amazon CLI
-
Replace the database_name
and the table_name
with the actual names of the database and table for which you want to stop the column statistics task run schedule.
You can delete column statistics schedule using the DeleteColumnStatisticsTaskSettings API operation or
Amazon CLI. The following example shows how to delete the schedule for
generating column statistics using Amazon Command Line Interface (Amazon CLI).
aws glue delete-column-statistics-task-settings \
--database-name 'database_name
' \
--table-name 'table_name
'