Column statistics API - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Column statistics API

The column statistics API describes Amazon Glue APIs for returning statistics on columns in a table.

Data types

ColumnStatisticsTaskRun structure

The object that shows the details of the column stats run.

Fields
  • CustomerId – UTF-8 string, not more than 12 bytes long.

    The Amazon account ID.

  • ColumnStatisticsTaskRunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The identifier for the particular column statistics task run.

  • DatabaseName – UTF-8 string.

    The database where the table resides.

  • TableName – UTF-8 string.

    The name of the table for which column statistics is generated.

  • ColumnNameList – An array of UTF-8 strings.

    A list of the column names. If none is supplied, all column names for the table will be used by default.

  • CatalogID – Catalog id string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID of the Data Catalog where the table resides. If none is supplied, the Amazon account ID is used by default.

  • Role – UTF-8 string.

    The IAM role that the service assumes to generate statistics.

  • SampleSize – Number (double), not more than 100.

    The percentage of rows used to generate statistics. If none is supplied, the entire table will be used to generate stats.

  • SecurityConfiguration – UTF-8 string, not more than 128 bytes long.

    Name of the security configuration that is used to encrypt CloudWatch logs for the column stats task run.

  • NumberOfWorkers – Number (integer), at least 1.

    The number of workers used to generate column statistics. The job is preconfigured to autoscale up to 25 instances.

  • WorkerType – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The type of workers being used for generating stats. The default is g.1x.

  • Status – UTF-8 string (valid values: STARTING | RUNNING | SUCCEEDED | FAILED | STOPPED).

    The status of the task run.

  • CreationTime – Timestamp.

    The time that this task was created.

  • LastUpdated – Timestamp.

    The last point in time when this task was modified.

  • StartTime – Timestamp.

    The start time of the task.

  • EndTime – Timestamp.

    The end time of the task.

  • ErrorMessage – Description string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

    The error message for the job.

  • DPUSeconds – Number (double), not more than None.

    The calculated DPU usage in seconds for all autoscaled workers.

ColumnStatisticsTaskRunningException structure

An exception thrown when you try to start another job while running a column stats generation job.

Fields
  • Message – UTF-8 string.

    A message describing the problem.

ColumnStatisticsTaskNotRunningException structure

An exception thrown when you try to stop a task run when there is no task running.

Fields
  • Message – UTF-8 string.

    A message describing the problem.

ColumnStatisticsTaskStoppingException structure

An exception thrown when you try to stop a task run.

Fields
  • Message – UTF-8 string.

    A message describing the problem.

Operations

StartColumnStatisticsTaskRun action (Python: start_column_statistics_task_run)

Starts a column statistics task run, for a specified table and columns.

Request
  • DatabaseNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the database where the table resides.

  • TableNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the table to generate statistics.

  • ColumnNameList – An array of UTF-8 strings.

    A list of the column names to generate statistics. If none is supplied, all column names for the table will be used by default.

  • RoleRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The IAM role that the service assumes to generate statistics.

  • SampleSize – Number (double), not more than 100.

    The percentage of rows used to generate statistics. If none is supplied, the entire table will be used to generate stats.

  • CatalogID – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID of the Data Catalog where the table reside. If none is supplied, the Amazon account ID is used by default.

  • SecurityConfiguration – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Name of the security configuration that is used to encrypt CloudWatch logs for the column stats task run.

Response
  • ColumnStatisticsTaskRunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The identifier for the column statistics task run.

Errors
  • AccessDeniedException

  • EntityNotFoundException

  • ColumnStatisticsTaskRunningException

  • OperationTimeoutException

  • ResourceNumberLimitExceededException

  • InvalidInputException

GetColumnStatisticsTaskRun action (Python: get_column_statistics_task_run)

Get the associated metadata/information for a task run, given a task run ID.

Request
  • ColumnStatisticsTaskRunIdRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The identifier for the particular column statistics task run.

Response
  • ColumnStatisticsTaskRun – A ColumnStatisticsTaskRun object.

    A ColumnStatisticsTaskRun object representing the details of the column stats run.

Errors
  • EntityNotFoundException

  • OperationTimeoutException

  • InvalidInputException

GetColumnStatisticsTaskRuns action (Python: get_column_statistics_task_runs)

Retrieves information about all runs associated with the specified table.

Request
  • DatabaseNameRequired: UTF-8 string.

    The name of the database where the table resides.

  • TableNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the table.

  • MaxResults – Number (integer), not less than 1 or more than 1000.

    The maximum size of the response.

  • NextToken – UTF-8 string.

    A continuation token, if this is a continuation call.

Response
  • ColumnStatisticsTaskRuns – An array of ColumnStatisticsTaskRun objects.

    A list of column statistics task runs.

  • NextToken – UTF-8 string.

    A continuation token, if not all task runs have yet been returned.

Errors
  • OperationTimeoutException

ListColumnStatisticsTaskRuns action (Python: list_column_statistics_task_runs)

List all task runs for a particular account.

Request
  • MaxResults – Number (integer), not less than 1 or more than 1000.

    The maximum size of the response.

  • NextToken – UTF-8 string.

    A continuation token, if this is a continuation call.

Response
  • ColumnStatisticsTaskRunIds – An array of UTF-8 strings, not more than 100 strings.

    A list of column statistics task run IDs.

  • NextToken – UTF-8 string.

    A continuation token, if not all task run IDs have yet been returned.

Errors
  • OperationTimeoutException

StopColumnStatisticsTaskRun action (Python: stop_column_statistics_task_run)

Stops a task run for the specified table.

Request
  • DatabaseNameRequired: UTF-8 string.

    The name of the database where the table resides.

  • TableNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the table.

Response
  • No Response parameters.

Errors
  • EntityNotFoundException

  • ColumnStatisticsTaskNotRunningException

  • ColumnStatisticsTaskStoppingException

  • OperationTimeoutException