Column statistics API
The column statistics API describes Amazon Glue APIs for returning statistics on columns in a table.
Data types
ColumnStatisticsTaskRun structure
The object that shows the details of the column stats run.
Fields
-
CustomerId
– UTF-8 string, not more than 12 bytes long.The Amazon account ID.
-
ColumnStatisticsTaskRunId
– UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.The identifier for the particular column statistics task run.
-
DatabaseName
– UTF-8 string.The database where the table resides.
-
TableName
– UTF-8 string.The name of the table for which column statistics is generated.
-
ColumnNameList
– An array of UTF-8 strings.A list of the column names. If none is supplied, all column names for the table will be used by default.
-
CatalogID
– Catalog id string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.The ID of the Data Catalog where the table resides. If none is supplied, the Amazon account ID is used by default.
-
Role
– UTF-8 string.The IAM role that the service assumes to generate statistics.
-
SampleSize
– Number (double), not more than 100.The percentage of rows used to generate statistics. If none is supplied, the entire table will be used to generate stats.
-
SecurityConfiguration
– UTF-8 string, not more than 128 bytes long.Name of the security configuration that is used to encrypt CloudWatch logs for the column stats task run.
-
NumberOfWorkers
– Number (integer), at least 1.The number of workers used to generate column statistics. The job is preconfigured to autoscale up to 25 instances.
-
WorkerType
– UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.The type of workers being used for generating stats. The default is
g.1x
. -
Status
– UTF-8 string (valid values:STARTING
|RUNNING
|SUCCEEDED
|FAILED
|STOPPED
).The status of the task run.
-
CreationTime
– Timestamp.The time that this task was created.
-
LastUpdated
– Timestamp.The last point in time when this task was modified.
-
StartTime
– Timestamp.The start time of the task.
-
EndTime
– Timestamp.The end time of the task.
-
ErrorMessage
– Description string, not more than 2048 bytes long, matching the URI address multi-line string pattern.The error message for the job.
-
DPUSeconds
– Number (double), not more than None.The calculated DPU usage in seconds for all autoscaled workers.
ColumnStatisticsTaskRunningException structure
An exception thrown when you try to start another job while running a column stats generation job.
Fields
-
Message
– UTF-8 string.A message describing the problem.
ColumnStatisticsTaskNotRunningException structure
An exception thrown when you try to stop a task run when there is no task running.
Fields
-
Message
– UTF-8 string.A message describing the problem.
ColumnStatisticsTaskStoppingException structure
An exception thrown when you try to stop a task run.
Fields
-
Message
– UTF-8 string.A message describing the problem.
Operations
StartColumnStatisticsTaskRun action (Python: start_column_statistics_task_run)
GetColumnStatisticsTaskRun action (Python: get_column_statistics_task_run)
GetColumnStatisticsTaskRuns action (Python: get_column_statistics_task_runs)
ListColumnStatisticsTaskRuns action (Python: list_column_statistics_task_runs)
StopColumnStatisticsTaskRun action (Python: stop_column_statistics_task_run)
StartColumnStatisticsTaskRun action (Python: start_column_statistics_task_run)
Starts a column statistics task run, for a specified table and columns.
Request
-
DatabaseName
– Required: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.The name of the database where the table resides.
-
TableName
– Required: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.The name of the table to generate statistics.
-
ColumnNameList
– An array of UTF-8 strings.A list of the column names to generate statistics. If none is supplied, all column names for the table will be used by default.
-
Role
– Required: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.The IAM role that the service assumes to generate statistics.
-
SampleSize
– Number (double), not more than 100.The percentage of rows used to generate statistics. If none is supplied, the entire table will be used to generate stats.
-
CatalogID
– UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.The ID of the Data Catalog where the table reside. If none is supplied, the Amazon account ID is used by default.
-
SecurityConfiguration
– UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.Name of the security configuration that is used to encrypt CloudWatch logs for the column stats task run.
Response
-
ColumnStatisticsTaskRunId
– UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.The identifier for the column statistics task run.
Errors
AccessDeniedException
EntityNotFoundException
ColumnStatisticsTaskRunningException
OperationTimeoutException
ResourceNumberLimitExceededException
InvalidInputException
GetColumnStatisticsTaskRun action (Python: get_column_statistics_task_run)
Get the associated metadata/information for a task run, given a task run ID.
Request
-
ColumnStatisticsTaskRunId
– Required: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.The identifier for the particular column statistics task run.
Response
-
ColumnStatisticsTaskRun
– A ColumnStatisticsTaskRun object.A
ColumnStatisticsTaskRun
object representing the details of the column stats run.
Errors
EntityNotFoundException
OperationTimeoutException
InvalidInputException
GetColumnStatisticsTaskRuns action (Python: get_column_statistics_task_runs)
Retrieves information about all runs associated with the specified table.
Request
-
DatabaseName
– Required: UTF-8 string.The name of the database where the table resides.
-
TableName
– Required: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.The name of the table.
-
MaxResults
– Number (integer), not less than 1 or more than 1000.The maximum size of the response.
-
NextToken
– UTF-8 string.A continuation token, if this is a continuation call.
Response
-
ColumnStatisticsTaskRuns
– An array of ColumnStatisticsTaskRun objects.A list of column statistics task runs.
-
NextToken
– UTF-8 string.A continuation token, if not all task runs have yet been returned.
Errors
OperationTimeoutException
ListColumnStatisticsTaskRuns action (Python: list_column_statistics_task_runs)
List all task runs for a particular account.
Request
-
MaxResults
– Number (integer), not less than 1 or more than 1000.The maximum size of the response.
-
NextToken
– UTF-8 string.A continuation token, if this is a continuation call.
Response
-
ColumnStatisticsTaskRunIds
– An array of UTF-8 strings, not more than 100 strings.A list of column statistics task run IDs.
-
NextToken
– UTF-8 string.A continuation token, if not all task run IDs have yet been returned.
Errors
OperationTimeoutException
StopColumnStatisticsTaskRun action (Python: stop_column_statistics_task_run)
Stops a task run for the specified table.
Request
-
DatabaseName
– Required: UTF-8 string.The name of the database where the table resides.
-
TableName
– Required: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.The name of the table.
Response
No Response parameters.
Errors
EntityNotFoundException
ColumnStatisticsTaskNotRunningException
ColumnStatisticsTaskStoppingException
OperationTimeoutException