Considerations and limitations - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Considerations and limitations

The following considerations and limitations apply to generating column statistics.

Considerations
  • Using sampling to generate statistics reduces run time, but can generate inaccurate statistics.

  • Each column statistics run requires processing the entire dataset.

  • Data Catalog doesn't store different versions of the statistics.

  • You can only run one statistics generation task at a time per table.

  • If a table is encrypted using customer Amazon KMS key registered with Data Catalog, Amazon Glue uses the same key to encrypt statistics.

Column statistics task supports generating statistics:
  • When the IAM role has full table permissions (IAM or Lake Formation).

  • When the IAM role has permissions on the table using Lake Formation hybrid access mode.

Column statistics task doesn’t support generating statistics for:
  • Tables with Lake Formation cell-based access control.

  • Transactional data lakes - Linux foundation Delta Lake, Apache Iceberg, Apache Hudi.

  • Tables in federated databases - Hive metastore, Amazon Redshift datashares

  • Nested columns, arrays, and struct data types.

  • Table that is shared with you from another account.