SDK for PHP 3.x

Client: Aws\Glue\GlueClient
Service ID: glue
Version: 2017-03-31

This page describes the parameters and results for the operations of the AWS Glue (2017-03-31), and shows how to use the Aws\Glue\GlueClient object to call the described operations. This documentation is specific to the 2017-03-31 API version of the service.

Operation Summary

Each of the following operations can be created from a client using $client->getCommand('CommandName'), where "CommandName" is the name of one of the following operations. Note: a command is a value that encapsulates an operation and the parameters used to create an HTTP request.

You can also create and send a command immediately using the magic methods available on a client object: $client->commandName(/* parameters */). You can send the command asynchronously (returning a promise) by appending the word "Async" to the operation name: $client->commandNameAsync(/* parameters */).

BatchCreatePartition ( array $params = [] )
Creates one or more partitions in a batch operation.
BatchDeleteConnection ( array $params = [] )
Deletes a list of connection definitions from the Data Catalog.
BatchDeletePartition ( array $params = [] )
Deletes one or more partitions in a batch operation.
BatchDeleteTable ( array $params = [] )
Deletes multiple tables at once.
BatchDeleteTableVersion ( array $params = [] )
Deletes a specified batch of versions of a table.
BatchGetBlueprints ( array $params = [] )
Retrieves information about a list of blueprints.
BatchGetCrawlers ( array $params = [] )
Returns a list of resource metadata for a given list of crawler names.
BatchGetCustomEntityTypes ( array $params = [] )
Retrieves the details for the custom patterns specified by a list of names.
BatchGetDataQualityResult ( array $params = [] )
Retrieves a list of data quality results for the specified result IDs.
BatchGetDevEndpoints ( array $params = [] )
Returns a list of resource metadata for a given list of development endpoint names.
BatchGetJobs ( array $params = [] )
Returns a list of resource metadata for a given list of job names.
BatchGetPartition ( array $params = [] )
Retrieves partitions in a batch request.
BatchGetTableOptimizer ( array $params = [] )
Returns the configuration for the specified table optimizers.
BatchGetTriggers ( array $params = [] )
Returns a list of resource metadata for a given list of trigger names.
BatchGetWorkflows ( array $params = [] )
Returns a list of resource metadata for a given list of workflow names.
BatchStopJobRun ( array $params = [] )
Stops one or more job runs for a specified job definition.
BatchUpdatePartition ( array $params = [] )
Updates one or more partitions in a batch operation.
CancelDataQualityRuleRecommendationRun ( array $params = [] )
Cancels the specified recommendation run that was being used to generate rules.
CancelDataQualityRulesetEvaluationRun ( array $params = [] )
Cancels a run where a ruleset is being evaluated against a data source.
CancelMLTaskRun ( array $params = [] )
Cancels (stops) a task run.
CancelStatement ( array $params = [] )
Cancels the statement.
CheckSchemaVersionValidity ( array $params = [] )
Validates the supplied schema.
CreateBlueprint ( array $params = [] )
Registers a blueprint with Glue.
CreateClassifier ( array $params = [] )
Creates a classifier in the user's account.
CreateConnection ( array $params = [] )
Creates a connection definition in the Data Catalog.
CreateCrawler ( array $params = [] )
Creates a new crawler with specified targets, role, configuration, and optional schedule.
CreateCustomEntityType ( array $params = [] )
Creates a custom pattern that is used to detect sensitive data across the columns and rows of your structured data.
CreateDataQualityRuleset ( array $params = [] )
Creates a data quality ruleset with DQDL rules applied to a specified Glue table.
CreateDatabase ( array $params = [] )
Creates a new database in a Data Catalog.
CreateDevEndpoint ( array $params = [] )
Creates a new development endpoint.
CreateJob ( array $params = [] )
Creates a new job definition.
CreateMLTransform ( array $params = [] )
Creates an Glue machine learning transform.
CreatePartition ( array $params = [] )
Creates a new partition.
CreatePartitionIndex ( array $params = [] )
Creates a specified partition index in an existing table.
CreateRegistry ( array $params = [] )
Creates a new registry which may be used to hold a collection of schemas.
CreateSchema ( array $params = [] )
Creates a new schema set and registers the schema definition.
CreateScript ( array $params = [] )
Transforms a directed acyclic graph (DAG) into code.
CreateSecurityConfiguration ( array $params = [] )
Creates a new security configuration.
CreateSession ( array $params = [] )
Creates a new session.
CreateTable ( array $params = [] )
Creates a new table definition in the Data Catalog.
CreateTableOptimizer ( array $params = [] )
Creates a new table optimizer for a specific function.
CreateTrigger ( array $params = [] )
Creates a new trigger.
CreateUserDefinedFunction ( array $params = [] )
Creates a new function definition in the Data Catalog.
CreateWorkflow ( array $params = [] )
Creates a new workflow.
DeleteBlueprint ( array $params = [] )
Deletes an existing blueprint.
DeleteClassifier ( array $params = [] )
Removes a classifier from the Data Catalog.
DeleteColumnStatisticsForPartition ( array $params = [] )
Delete the partition column statistics of a column.
DeleteColumnStatisticsForTable ( array $params = [] )
Retrieves table statistics of columns.
DeleteConnection ( array $params = [] )
Deletes a connection from the Data Catalog.
DeleteCrawler ( array $params = [] )
Removes a specified crawler from the Glue Data Catalog, unless the crawler state is RUNNING.
DeleteCustomEntityType ( array $params = [] )
Deletes a custom pattern by specifying its name.
DeleteDataQualityRuleset ( array $params = [] )
Deletes a data quality ruleset.
DeleteDatabase ( array $params = [] )
Removes a specified database from a Data Catalog.
DeleteDevEndpoint ( array $params = [] )
Deletes a specified development endpoint.
DeleteJob ( array $params = [] )
Deletes a specified job definition.
DeleteMLTransform ( array $params = [] )
Deletes an Glue machine learning transform.
DeletePartition ( array $params = [] )
Deletes a specified partition.
DeletePartitionIndex ( array $params = [] )
Deletes a specified partition index from an existing table.
DeleteRegistry ( array $params = [] )
Delete the entire registry including schema and all of its versions.
DeleteResourcePolicy ( array $params = [] )
Deletes a specified policy.
DeleteSchema ( array $params = [] )
Deletes the entire schema set, including the schema set and all of its versions.
DeleteSchemaVersions ( array $params = [] )
Remove versions from the specified schema.
DeleteSecurityConfiguration ( array $params = [] )
Deletes a specified security configuration.
DeleteSession ( array $params = [] )
Deletes the session.
DeleteTable ( array $params = [] )
Removes a table definition from the Data Catalog.
DeleteTableOptimizer ( array $params = [] )
Deletes an optimizer and all associated metadata for a table.
DeleteTableVersion ( array $params = [] )
Deletes a specified version of a table.
DeleteTrigger ( array $params = [] )
Deletes a specified trigger.
DeleteUserDefinedFunction ( array $params = [] )
Deletes an existing function definition from the Data Catalog.
DeleteWorkflow ( array $params = [] )
Deletes a workflow.
GetBlueprint ( array $params = [] )
Retrieves the details of a blueprint.
GetBlueprintRun ( array $params = [] )
Retrieves the details of a blueprint run.
GetBlueprintRuns ( array $params = [] )
Retrieves the details of blueprint runs for a specified blueprint.
GetCatalogImportStatus ( array $params = [] )
Retrieves the status of a migration operation.
GetClassifier ( array $params = [] )
Retrieve a classifier by name.
GetClassifiers ( array $params = [] )
Lists all classifier objects in the Data Catalog.
GetColumnStatisticsForPartition ( array $params = [] )
Retrieves partition statistics of columns.
GetColumnStatisticsForTable ( array $params = [] )
Retrieves table statistics of columns.
GetColumnStatisticsTaskRun ( array $params = [] )
Get the associated metadata/information for a task run, given a task run ID.
GetColumnStatisticsTaskRuns ( array $params = [] )
Retrieves information about all runs associated with the specified table.
GetConnection ( array $params = [] )
Retrieves a connection definition from the Data Catalog.
GetConnections ( array $params = [] )
Retrieves a list of connection definitions from the Data Catalog.
GetCrawler ( array $params = [] )
Retrieves metadata for a specified crawler.
GetCrawlerMetrics ( array $params = [] )
Retrieves metrics about specified crawlers.
GetCrawlers ( array $params = [] )
Retrieves metadata for all crawlers defined in the customer account.
GetCustomEntityType ( array $params = [] )
Retrieves the details of a custom pattern by specifying its name.
GetDataCatalogEncryptionSettings ( array $params = [] )
Retrieves the security configuration for a specified catalog.
GetDataQualityResult ( array $params = [] )
Retrieves the result of a data quality rule evaluation.
GetDataQualityRuleRecommendationRun ( array $params = [] )
Gets the specified recommendation run that was used to generate rules.
GetDataQualityRuleset ( array $params = [] )
Returns an existing ruleset by identifier or name.
GetDataQualityRulesetEvaluationRun ( array $params = [] )
Retrieves a specific run where a ruleset is evaluated against a data source.
GetDatabase ( array $params = [] )
Retrieves the definition of a specified database.
GetDatabases ( array $params = [] )
Retrieves all databases defined in a given Data Catalog.
GetDataflowGraph ( array $params = [] )
Transforms a Python script into a directed acyclic graph (DAG).
GetDevEndpoint ( array $params = [] )
Retrieves information about a specified development endpoint.
GetDevEndpoints ( array $params = [] )
Retrieves all the development endpoints in this Amazon Web Services account.
GetJob ( array $params = [] )
Retrieves an existing job definition.
GetJobBookmark ( array $params = [] )
Returns information on a job bookmark entry.
GetJobRun ( array $params = [] )
Retrieves the metadata for a given job run.
GetJobRuns ( array $params = [] )
Retrieves metadata for all runs of a given job definition.
GetJobs ( array $params = [] )
Retrieves all current job definitions.
GetMLTaskRun ( array $params = [] )
Gets details for a specific task run on a machine learning transform.
GetMLTaskRuns ( array $params = [] )
Gets a list of runs for a machine learning transform.
GetMLTransform ( array $params = [] )
Gets an Glue machine learning transform artifact and all its corresponding metadata.
GetMLTransforms ( array $params = [] )
Gets a sortable, filterable list of existing Glue machine learning transforms.
GetMapping ( array $params = [] )
Creates mappings.
GetPartition ( array $params = [] )
Retrieves information about a specified partition.
GetPartitionIndexes ( array $params = [] )
Retrieves the partition indexes associated with a table.
GetPartitions ( array $params = [] )
Retrieves information about the partitions in a table.
GetPlan ( array $params = [] )
Gets code to perform a specified mapping.
GetRegistry ( array $params = [] )
Describes the specified registry in detail.
GetResourcePolicies ( array $params = [] )
Retrieves the resource policies set on individual resources by Resource Access Manager during cross-account permission grants.
GetResourcePolicy ( array $params = [] )
Retrieves a specified resource policy.
GetSchema ( array $params = [] )
Describes the specified schema in detail.
GetSchemaByDefinition ( array $params = [] )
Retrieves a schema by the SchemaDefinition.
GetSchemaVersion ( array $params = [] )
Get the specified schema by its unique ID assigned when a version of the schema is created or registered.
GetSchemaVersionsDiff ( array $params = [] )
Fetches the schema version difference in the specified difference type between two stored schema versions in the Schema Registry.
GetSecurityConfiguration ( array $params = [] )
Retrieves a specified security configuration.
GetSecurityConfigurations ( array $params = [] )
Retrieves a list of all security configurations.
GetSession ( array $params = [] )
Retrieves the session.
GetStatement ( array $params = [] )
Retrieves the statement.
GetTable ( array $params = [] )
Retrieves the Table definition in a Data Catalog for a specified table.
GetTableOptimizer ( array $params = [] )
Returns the configuration of all optimizers associated with a specified table.
GetTableVersion ( array $params = [] )
Retrieves a specified version of a table.
GetTableVersions ( array $params = [] )
Retrieves a list of strings that identify available versions of a specified table.
GetTables ( array $params = [] )
Retrieves the definitions of some or all of the tables in a given Database.
GetTags ( array $params = [] )
Retrieves a list of tags associated with a resource.
GetTrigger ( array $params = [] )
Retrieves the definition of a trigger.
GetTriggers ( array $params = [] )
Gets all the triggers associated with a job.
GetUnfilteredPartitionMetadata ( array $params = [] )
Retrieves partition metadata from the Data Catalog that contains unfiltered metadata.
GetUnfilteredPartitionsMetadata ( array $params = [] )
Retrieves partition metadata from the Data Catalog that contains unfiltered metadata.
GetUnfilteredTableMetadata ( array $params = [] )
Allows a third-party analytical engine to retrieve unfiltered table metadata from the Data Catalog.
GetUserDefinedFunction ( array $params = [] )
Retrieves a specified function definition from the Data Catalog.
GetUserDefinedFunctions ( array $params = [] )
Retrieves multiple function definitions from the Data Catalog.
GetWorkflow ( array $params = [] )
Retrieves resource metadata for a workflow.
GetWorkflowRun ( array $params = [] )
Retrieves the metadata for a given workflow run.
GetWorkflowRunProperties ( array $params = [] )
Retrieves the workflow run properties which were set during the run.
GetWorkflowRuns ( array $params = [] )
Retrieves metadata for all runs of a given workflow.
ImportCatalogToGlue ( array $params = [] )
Imports an existing Amazon Athena Data Catalog to Glue.
ListBlueprints ( array $params = [] )
Lists all the blueprint names in an account.
ListColumnStatisticsTaskRuns ( array $params = [] )
List all task runs for a particular account.
ListCrawlers ( array $params = [] )
Retrieves the names of all crawler resources in this Amazon Web Services account, or the resources with the specified tag.
ListCrawls ( array $params = [] )
Returns all the crawls of a specified crawler.
ListCustomEntityTypes ( array $params = [] )
Lists all the custom patterns that have been created.
ListDataQualityResults ( array $params = [] )
Returns all data quality execution results for your account.
ListDataQualityRuleRecommendationRuns ( array $params = [] )
Lists the recommendation runs meeting the filter criteria.
ListDataQualityRulesetEvaluationRuns ( array $params = [] )
Lists all the runs meeting the filter criteria, where a ruleset is evaluated against a data source.
ListDataQualityRulesets ( array $params = [] )
Returns a paginated list of rulesets for the specified list of Glue tables.
ListDevEndpoints ( array $params = [] )
Retrieves the names of all DevEndpoint resources in this Amazon Web Services account, or the resources with the specified tag.
ListJobs ( array $params = [] )
Retrieves the names of all job resources in this Amazon Web Services account, or the resources with the specified tag.
ListMLTransforms ( array $params = [] )
Retrieves a sortable, filterable list of existing Glue machine learning transforms in this Amazon Web Services account, or the resources with the specified tag.
ListRegistries ( array $params = [] )
Returns a list of registries that you have created, with minimal registry information.
ListSchemaVersions ( array $params = [] )
Returns a list of schema versions that you have created, with minimal information.
ListSchemas ( array $params = [] )
Returns a list of schemas with minimal details.
ListSessions ( array $params = [] )
Retrieve a list of sessions.
ListStatements ( array $params = [] )
Lists statements for the session.
ListTableOptimizerRuns ( array $params = [] )
Lists the history of previous optimizer runs for a specific table.
ListTriggers ( array $params = [] )
Retrieves the names of all trigger resources in this Amazon Web Services account, or the resources with the specified tag.
ListWorkflows ( array $params = [] )
Lists names of workflows created in the account.
PutDataCatalogEncryptionSettings ( array $params = [] )
Sets the security configuration for a specified catalog.
PutResourcePolicy ( array $params = [] )
Sets the Data Catalog resource policy for access control.
PutSchemaVersionMetadata ( array $params = [] )
Puts the metadata key value pair for a specified schema version ID.
PutWorkflowRunProperties ( array $params = [] )
Puts the specified workflow run properties for the given workflow run.
QuerySchemaVersionMetadata ( array $params = [] )
Queries for the schema version metadata information.
RegisterSchemaVersion ( array $params = [] )
Adds a new version to the existing schema.
RemoveSchemaVersionMetadata ( array $params = [] )
Removes a key value pair from the schema version metadata for the specified schema version ID.
ResetJobBookmark ( array $params = [] )
Resets a bookmark entry.
ResumeWorkflowRun ( array $params = [] )
Restarts selected nodes of a previous partially completed workflow run and resumes the workflow run.
RunStatement ( array $params = [] )
Executes the statement.
SearchTables ( array $params = [] )
Searches a set of tables based on properties in the table metadata as well as on the parent database.
StartBlueprintRun ( array $params = [] )
Starts a new run of the specified blueprint.
StartColumnStatisticsTaskRun ( array $params = [] )
Starts a column statistics task run, for a specified table and columns.
StartCrawler ( array $params = [] )
Starts a crawl using the specified crawler, regardless of what is scheduled.
StartCrawlerSchedule ( array $params = [] )
Changes the schedule state of the specified crawler to SCHEDULED, unless the crawler is already running or the schedule state is already SCHEDULED.
StartDataQualityRuleRecommendationRun ( array $params = [] )
Starts a recommendation run that is used to generate rules when you don't know what rules to write.
StartDataQualityRulesetEvaluationRun ( array $params = [] )
Once you have a ruleset definition (either recommended or your own), you call this operation to evaluate the ruleset against a data source (Glue table).
StartExportLabelsTaskRun ( array $params = [] )
Begins an asynchronous task to export all labeled data for a particular transform.
StartImportLabelsTaskRun ( array $params = [] )
Enables you to provide additional labels (examples of truth) to be used to teach the machine learning transform and improve its quality.
StartJobRun ( array $params = [] )
Starts a job run using a job definition.
StartMLEvaluationTaskRun ( array $params = [] )
Starts a task to estimate the quality of the transform.
StartMLLabelingSetGenerationTaskRun ( array $params = [] )
Starts the active learning workflow for your machine learning transform to improve the transform's quality by generating label sets and adding labels.
StartTrigger ( array $params = [] )
Starts an existing trigger.
StartWorkflowRun ( array $params = [] )
Starts a new run of the specified workflow.
StopColumnStatisticsTaskRun ( array $params = [] )
Stops a task run for the specified table.
StopCrawler ( array $params = [] )
If the specified crawler is running, stops the crawl.
StopCrawlerSchedule ( array $params = [] )
Sets the schedule state of the specified crawler to NOT_SCHEDULED, but does not stop the crawler if it is already running.
StopSession ( array $params = [] )
Stops the session.
StopTrigger ( array $params = [] )
Stops a specified trigger.
StopWorkflowRun ( array $params = [] )
Stops the execution of the specified workflow run.
TagResource ( array $params = [] )
Adds tags to a resource.
UntagResource ( array $params = [] )
Removes tags from a resource.
UpdateBlueprint ( array $params = [] )
Updates a registered blueprint.
UpdateClassifier ( array $params = [] )
Modifies an existing classifier (a GrokClassifier, an XMLClassifier, a JsonClassifier, or a CsvClassifier, depending on which field is present).
UpdateColumnStatisticsForPartition ( array $params = [] )
Creates or updates partition statistics of columns.
UpdateColumnStatisticsForTable ( array $params = [] )
Creates or updates table statistics of columns.
UpdateConnection ( array $params = [] )
Updates a connection definition in the Data Catalog.
UpdateCrawler ( array $params = [] )
Updates a crawler.
UpdateCrawlerSchedule ( array $params = [] )
Updates the schedule of a crawler using a cron expression.
UpdateDataQualityRuleset ( array $params = [] )
Updates the specified data quality ruleset.
UpdateDatabase ( array $params = [] )
Updates an existing database definition in a Data Catalog.
UpdateDevEndpoint ( array $params = [] )
Updates a specified development endpoint.
UpdateJob ( array $params = [] )
Updates an existing job definition.
UpdateJobFromSourceControl ( array $params = [] )
Synchronizes a job from the source control repository.
UpdateMLTransform ( array $params = [] )
Updates an existing machine learning transform.
UpdatePartition ( array $params = [] )
Updates a partition.
UpdateRegistry ( array $params = [] )
Updates an existing registry which is used to hold a collection of schemas.
UpdateSchema ( array $params = [] )
Updates the description, compatibility setting, or version checkpoint for a schema set.
UpdateSourceControlFromJob ( array $params = [] )
Synchronizes a job to the source control repository.
UpdateTable ( array $params = [] )
Updates a metadata table in the Data Catalog.
UpdateTableOptimizer ( array $params = [] )
Updates the configuration for an existing table optimizer.
UpdateTrigger ( array $params = [] )
Updates a trigger definition.
UpdateUserDefinedFunction ( array $params = [] )
Updates an existing function definition in the Data Catalog.
UpdateWorkflow ( array $params = [] )
Updates an existing workflow.

Paginators

Paginators handle automatically iterating over paginated API results. Paginators are associated with specific API operations, and they accept the parameters that the corresponding API operation accepts. You can get a paginator from a client class using getPaginator($paginatorName, $operationParameters). This client supports the following paginators:

GetBlueprintRuns
GetClassifiers
GetColumnStatisticsTaskRuns
GetConnections
GetCrawlerMetrics
GetCrawlers
GetDatabases
GetDevEndpoints
GetJobRuns
GetJobs
GetMLTaskRuns
GetMLTransforms
GetPartitionIndexes
GetPartitions
GetResourcePolicies
GetSecurityConfigurations
GetTableVersions
GetTables
GetTriggers
GetUnfilteredPartitionsMetadata
GetUserDefinedFunctions
GetWorkflowRuns
ListBlueprints
ListColumnStatisticsTaskRuns
ListCrawlers
ListCustomEntityTypes
ListDataQualityResults
ListDataQualityRuleRecommendationRuns
ListDataQualityRulesetEvaluationRuns
ListDataQualityRulesets
ListDevEndpoints
ListJobs
ListMLTransforms
ListRegistries
ListSchemaVersions
ListSchemas
ListSessions
ListTableOptimizerRuns
ListTriggers
ListWorkflows
SearchTables

Operations

BatchCreatePartition

$result = $client->batchCreatePartition([/* ... */]);
$promise = $client->batchCreatePartitionAsync([/* ... */]);

Creates one or more partitions in a batch operation.

Parameter Syntax

$result = $client->batchCreatePartition([
    'CatalogId' => '<string>',
    'DatabaseName' => '<string>', // REQUIRED
    'PartitionInputList' => [ // REQUIRED
        [
            'LastAccessTime' => <integer || string || DateTime>,
            'LastAnalyzedTime' => <integer || string || DateTime>,
            'Parameters' => ['<string>', ...],
            'StorageDescriptor' => [
                'AdditionalLocations' => ['<string>', ...],
                'BucketColumns' => ['<string>', ...],
                'Columns' => [
                    [
                        'Comment' => '<string>',
                        'Name' => '<string>', // REQUIRED
                        'Parameters' => ['<string>', ...],
                        'Type' => '<string>',
                    ],
                    // ...
                ],
                'Compressed' => true || false,
                'InputFormat' => '<string>',
                'Location' => '<string>',
                'NumberOfBuckets' => <integer>,
                'OutputFormat' => '<string>',
                'Parameters' => ['<string>', ...],
                'SchemaReference' => [
                    'SchemaId' => [
                        'RegistryName' => '<string>',
                        'SchemaArn' => '<string>',
                        'SchemaName' => '<string>',
                    ],
                    'SchemaVersionId' => '<string>',
                    'SchemaVersionNumber' => <integer>,
                ],
                'SerdeInfo' => [
                    'Name' => '<string>',
                    'Parameters' => ['<string>', ...],
                    'SerializationLibrary' => '<string>',
                ],
                'SkewedInfo' => [
                    'SkewedColumnNames' => ['<string>', ...],
                    'SkewedColumnValueLocationMaps' => ['<string>', ...],
                    'SkewedColumnValues' => ['<string>', ...],
                ],
                'SortColumns' => [
                    [
                        'Column' => '<string>', // REQUIRED
                        'SortOrder' => <integer>, // REQUIRED
                    ],
                    // ...
                ],
                'StoredAsSubDirectories' => true || false,
            ],
            'Values' => ['<string>', ...],
        ],
        // ...
    ],
    'TableName' => '<string>', // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the catalog in which the partition is to be created. Currently, this should be the Amazon Web Services account ID.

DatabaseName
Required: Yes
Type: string

The name of the metadata database in which the partition is to be created.

PartitionInputList
Required: Yes
Type: Array of PartitionInput structures

A list of PartitionInput structures that define the partitions to be created.

TableName
Required: Yes
Type: string

The name of the metadata table in which the partition is to be created.

Result Syntax

[
    'Errors' => [
        [
            'ErrorDetail' => [
                'ErrorCode' => '<string>',
                'ErrorMessage' => '<string>',
            ],
            'PartitionValues' => ['<string>', ...],
        ],
        // ...
    ],
]

Result Details

Members
Errors
Type: Array of PartitionError structures

The errors encountered when trying to create the requested partitions.

Errors

InvalidInputException:

The input provided was not valid.

AlreadyExistsException:

A resource to be created or added already exists.

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

InternalServiceException:

An internal service error occurred.

EntityNotFoundException:

A specified entity does not exist

OperationTimeoutException:

The operation timed out.

GlueEncryptionException:

An encryption operation failed.

BatchDeleteConnection

$result = $client->batchDeleteConnection([/* ... */]);
$promise = $client->batchDeleteConnectionAsync([/* ... */]);

Deletes a list of connection definitions from the Data Catalog.

Parameter Syntax

$result = $client->batchDeleteConnection([
    'CatalogId' => '<string>',
    'ConnectionNameList' => ['<string>', ...], // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog in which the connections reside. If none is provided, the Amazon Web Services account ID is used by default.

ConnectionNameList
Required: Yes
Type: Array of strings

A list of names of the connections to delete.

Result Syntax

[
    'Errors' => [
        '<NameString>' => [
            'ErrorCode' => '<string>',
            'ErrorMessage' => '<string>',
        ],
        // ...
    ],
    'Succeeded' => ['<string>', ...],
]

Result Details

Members
Errors
Type: Associative array of custom strings keys (NameString) to ErrorDetail structures

A map of the names of connections that were not successfully deleted to error details.

Succeeded
Type: Array of strings

A list of names of the connection definitions that were successfully deleted.

Errors

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

BatchDeletePartition

$result = $client->batchDeletePartition([/* ... */]);
$promise = $client->batchDeletePartitionAsync([/* ... */]);

Deletes one or more partitions in a batch operation.

Parameter Syntax

$result = $client->batchDeletePartition([
    'CatalogId' => '<string>',
    'DatabaseName' => '<string>', // REQUIRED
    'PartitionsToDelete' => [ // REQUIRED
        [
            'Values' => ['<string>', ...], // REQUIRED
        ],
        // ...
    ],
    'TableName' => '<string>', // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog where the partition to be deleted resides. If none is provided, the Amazon Web Services account ID is used by default.

DatabaseName
Required: Yes
Type: string

The name of the catalog database in which the table in question resides.

PartitionsToDelete
Required: Yes
Type: Array of PartitionValueList structures

A list of PartitionInput structures that define the partitions to be deleted.

TableName
Required: Yes
Type: string

The name of the table that contains the partitions to be deleted.

Result Syntax

[
    'Errors' => [
        [
            'ErrorDetail' => [
                'ErrorCode' => '<string>',
                'ErrorMessage' => '<string>',
            ],
            'PartitionValues' => ['<string>', ...],
        ],
        // ...
    ],
]

Result Details

Members
Errors
Type: Array of PartitionError structures

The errors encountered when trying to delete the requested partitions.

Errors

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

BatchDeleteTable

$result = $client->batchDeleteTable([/* ... */]);
$promise = $client->batchDeleteTableAsync([/* ... */]);

Deletes multiple tables at once.

After completing this operation, you no longer have access to the table versions and partitions that belong to the deleted table. Glue deletes these "orphaned" resources asynchronously in a timely manner, at the discretion of the service.

To ensure the immediate deletion of all related resources, before calling BatchDeleteTable, use DeleteTableVersion or BatchDeleteTableVersion, and DeletePartition or BatchDeletePartition, to delete any resources that belong to the table.

Parameter Syntax

$result = $client->batchDeleteTable([
    'CatalogId' => '<string>',
    'DatabaseName' => '<string>', // REQUIRED
    'TablesToDelete' => ['<string>', ...], // REQUIRED
    'TransactionId' => '<string>',
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog where the table resides. If none is provided, the Amazon Web Services account ID is used by default.

DatabaseName
Required: Yes
Type: string

The name of the catalog database in which the tables to delete reside. For Hive compatibility, this name is entirely lowercase.

TablesToDelete
Required: Yes
Type: Array of strings

A list of the table to delete.

TransactionId
Type: string

The transaction ID at which to delete the table contents.

Result Syntax

[
    'Errors' => [
        [
            'ErrorDetail' => [
                'ErrorCode' => '<string>',
                'ErrorMessage' => '<string>',
            ],
            'TableName' => '<string>',
        ],
        // ...
    ],
]

Result Details

Members
Errors
Type: Array of TableError structures

A list of errors encountered in attempting to delete the specified tables.

Errors

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GlueEncryptionException:

An encryption operation failed.

ResourceNotReadyException:

A resource was not ready for a transaction.

BatchDeleteTableVersion

$result = $client->batchDeleteTableVersion([/* ... */]);
$promise = $client->batchDeleteTableVersionAsync([/* ... */]);

Deletes a specified batch of versions of a table.

Parameter Syntax

$result = $client->batchDeleteTableVersion([
    'CatalogId' => '<string>',
    'DatabaseName' => '<string>', // REQUIRED
    'TableName' => '<string>', // REQUIRED
    'VersionIds' => ['<string>', ...], // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog where the tables reside. If none is provided, the Amazon Web Services account ID is used by default.

DatabaseName
Required: Yes
Type: string

The database in the catalog in which the table resides. For Hive compatibility, this name is entirely lowercase.

TableName
Required: Yes
Type: string

The name of the table. For Hive compatibility, this name is entirely lowercase.

VersionIds
Required: Yes
Type: Array of strings

A list of the IDs of versions to be deleted. A VersionId is a string representation of an integer. Each version is incremented by 1.

Result Syntax

[
    'Errors' => [
        [
            'ErrorDetail' => [
                'ErrorCode' => '<string>',
                'ErrorMessage' => '<string>',
            ],
            'TableName' => '<string>',
            'VersionId' => '<string>',
        ],
        // ...
    ],
]

Result Details

Members
Errors
Type: Array of TableVersionError structures

A list of errors encountered while trying to delete the specified table versions.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

BatchGetBlueprints

$result = $client->batchGetBlueprints([/* ... */]);
$promise = $client->batchGetBlueprintsAsync([/* ... */]);

Retrieves information about a list of blueprints.

Parameter Syntax

$result = $client->batchGetBlueprints([
    'IncludeBlueprint' => true || false,
    'IncludeParameterSpec' => true || false,
    'Names' => ['<string>', ...], // REQUIRED
]);

Parameter Details

Members
IncludeBlueprint
Type: boolean

Specifies whether or not to include the blueprint in the response.

IncludeParameterSpec
Type: boolean

Specifies whether or not to include the parameters, as a JSON string, for the blueprint in the response.

Names
Required: Yes
Type: Array of strings

A list of blueprint names.

Result Syntax

[
    'Blueprints' => [
        [
            'BlueprintLocation' => '<string>',
            'BlueprintServiceLocation' => '<string>',
            'CreatedOn' => <DateTime>,
            'Description' => '<string>',
            'ErrorMessage' => '<string>',
            'LastActiveDefinition' => [
                'BlueprintLocation' => '<string>',
                'BlueprintServiceLocation' => '<string>',
                'Description' => '<string>',
                'LastModifiedOn' => <DateTime>,
                'ParameterSpec' => '<string>',
            ],
            'LastModifiedOn' => <DateTime>,
            'Name' => '<string>',
            'ParameterSpec' => '<string>',
            'Status' => 'CREATING|ACTIVE|UPDATING|FAILED',
        ],
        // ...
    ],
    'MissingBlueprints' => ['<string>', ...],
]

Result Details

Members
Blueprints
Type: Array of Blueprint structures

Returns a list of blueprint as a Blueprints object.

MissingBlueprints
Type: Array of strings

Returns a list of BlueprintNames that were not found.

Errors

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

InvalidInputException:

The input provided was not valid.

BatchGetCrawlers

$result = $client->batchGetCrawlers([/* ... */]);
$promise = $client->batchGetCrawlersAsync([/* ... */]);

Returns a list of resource metadata for a given list of crawler names. After calling the ListCrawlers operation, you can call this operation to access the data to which you have been granted permissions. This operation supports all IAM permissions, including permission conditions that uses tags.

Parameter Syntax

$result = $client->batchGetCrawlers([
    'CrawlerNames' => ['<string>', ...], // REQUIRED
]);

Parameter Details

Members
CrawlerNames
Required: Yes
Type: Array of strings

A list of crawler names, which might be the names returned from the ListCrawlers operation.

Result Syntax

[
    'Crawlers' => [
        [
            'Classifiers' => ['<string>', ...],
            'Configuration' => '<string>',
            'CrawlElapsedTime' => <integer>,
            'CrawlerSecurityConfiguration' => '<string>',
            'CreationTime' => <DateTime>,
            'DatabaseName' => '<string>',
            'Description' => '<string>',
            'LakeFormationConfiguration' => [
                'AccountId' => '<string>',
                'UseLakeFormationCredentials' => true || false,
            ],
            'LastCrawl' => [
                'ErrorMessage' => '<string>',
                'LogGroup' => '<string>',
                'LogStream' => '<string>',
                'MessagePrefix' => '<string>',
                'StartTime' => <DateTime>,
                'Status' => 'SUCCEEDED|CANCELLED|FAILED',
            ],
            'LastUpdated' => <DateTime>,
            'LineageConfiguration' => [
                'CrawlerLineageSettings' => 'ENABLE|DISABLE',
            ],
            'Name' => '<string>',
            'RecrawlPolicy' => [
                'RecrawlBehavior' => 'CRAWL_EVERYTHING|CRAWL_NEW_FOLDERS_ONLY|CRAWL_EVENT_MODE',
            ],
            'Role' => '<string>',
            'Schedule' => [
                'ScheduleExpression' => '<string>',
                'State' => 'SCHEDULED|NOT_SCHEDULED|TRANSITIONING',
            ],
            'SchemaChangePolicy' => [
                'DeleteBehavior' => 'LOG|DELETE_FROM_DATABASE|DEPRECATE_IN_DATABASE',
                'UpdateBehavior' => 'LOG|UPDATE_IN_DATABASE',
            ],
            'State' => 'READY|RUNNING|STOPPING',
            'TablePrefix' => '<string>',
            'Targets' => [
                'CatalogTargets' => [
                    [
                        'ConnectionName' => '<string>',
                        'DatabaseName' => '<string>',
                        'DlqEventQueueArn' => '<string>',
                        'EventQueueArn' => '<string>',
                        'Tables' => ['<string>', ...],
                    ],
                    // ...
                ],
                'DeltaTargets' => [
                    [
                        'ConnectionName' => '<string>',
                        'CreateNativeDeltaTable' => true || false,
                        'DeltaTables' => ['<string>', ...],
                        'WriteManifest' => true || false,
                    ],
                    // ...
                ],
                'DynamoDBTargets' => [
                    [
                        'Path' => '<string>',
                        'scanAll' => true || false,
                        'scanRate' => <float>,
                    ],
                    // ...
                ],
                'HudiTargets' => [
                    [
                        'ConnectionName' => '<string>',
                        'Exclusions' => ['<string>', ...],
                        'MaximumTraversalDepth' => <integer>,
                        'Paths' => ['<string>', ...],
                    ],
                    // ...
                ],
                'IcebergTargets' => [
                    [
                        'ConnectionName' => '<string>',
                        'Exclusions' => ['<string>', ...],
                        'MaximumTraversalDepth' => <integer>,
                        'Paths' => ['<string>', ...],
                    ],
                    // ...
                ],
                'JdbcTargets' => [
                    [
                        'ConnectionName' => '<string>',
                        'EnableAdditionalMetadata' => ['<string>', ...],
                        'Exclusions' => ['<string>', ...],
                        'Path' => '<string>',
                    ],
                    // ...
                ],
                'MongoDBTargets' => [
                    [
                        'ConnectionName' => '<string>',
                        'Path' => '<string>',
                        'ScanAll' => true || false,
                    ],
                    // ...
                ],
                'S3Targets' => [
                    [
                        'ConnectionName' => '<string>',
                        'DlqEventQueueArn' => '<string>',
                        'EventQueueArn' => '<string>',
                        'Exclusions' => ['<string>', ...],
                        'Path' => '<string>',
                        'SampleSize' => <integer>,
                    ],
                    // ...
                ],
            ],
            'Version' => <integer>,
        ],
        // ...
    ],
    'CrawlersNotFound' => ['<string>', ...],
]

Result Details

Members
Crawlers
Type: Array of Crawler structures

A list of crawler definitions.

CrawlersNotFound
Type: Array of strings

A list of names of crawlers that were not found.

Errors

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

BatchGetCustomEntityTypes

$result = $client->batchGetCustomEntityTypes([/* ... */]);
$promise = $client->batchGetCustomEntityTypesAsync([/* ... */]);

Retrieves the details for the custom patterns specified by a list of names.

Parameter Syntax

$result = $client->batchGetCustomEntityTypes([
    'Names' => ['<string>', ...], // REQUIRED
]);

Parameter Details

Members
Names
Required: Yes
Type: Array of strings

A list of names of the custom patterns that you want to retrieve.

Result Syntax

[
    'CustomEntityTypes' => [
        [
            'ContextWords' => ['<string>', ...],
            'Name' => '<string>',
            'RegexString' => '<string>',
        ],
        // ...
    ],
    'CustomEntityTypesNotFound' => ['<string>', ...],
]

Result Details

Members
CustomEntityTypes
Type: Array of CustomEntityType structures

A list of CustomEntityType objects representing the custom patterns that have been created.

CustomEntityTypesNotFound
Type: Array of strings

A list of the names of custom patterns that were not found.

Errors

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

BatchGetDataQualityResult

$result = $client->batchGetDataQualityResult([/* ... */]);
$promise = $client->batchGetDataQualityResultAsync([/* ... */]);

Retrieves a list of data quality results for the specified result IDs.

Parameter Syntax

$result = $client->batchGetDataQualityResult([
    'ResultIds' => ['<string>', ...], // REQUIRED
]);

Parameter Details

Members
ResultIds
Required: Yes
Type: Array of strings

A list of unique result IDs for the data quality results.

Result Syntax

[
    'Results' => [
        [
            'AnalyzerResults' => [
                [
                    'Description' => '<string>',
                    'EvaluatedMetrics' => [<float>, ...],
                    'EvaluationMessage' => '<string>',
                    'Name' => '<string>',
                ],
                // ...
            ],
            'CompletedOn' => <DateTime>,
            'DataSource' => [
                'GlueTable' => [
                    'AdditionalOptions' => ['<string>', ...],
                    'CatalogId' => '<string>',
                    'ConnectionName' => '<string>',
                    'DatabaseName' => '<string>',
                    'TableName' => '<string>',
                ],
            ],
            'EvaluationContext' => '<string>',
            'JobName' => '<string>',
            'JobRunId' => '<string>',
            'Observations' => [
                [
                    'Description' => '<string>',
                    'MetricBasedObservation' => [
                        'MetricName' => '<string>',
                        'MetricValues' => [
                            'ActualValue' => <float>,
                            'ExpectedValue' => <float>,
                            'LowerLimit' => <float>,
                            'UpperLimit' => <float>,
                        ],
                        'NewRules' => ['<string>', ...],
                    ],
                ],
                // ...
            ],
            'ResultId' => '<string>',
            'RuleResults' => [
                [
                    'Description' => '<string>',
                    'EvaluatedMetrics' => [<float>, ...],
                    'EvaluationMessage' => '<string>',
                    'Name' => '<string>',
                    'Result' => 'PASS|FAIL|ERROR',
                ],
                // ...
            ],
            'RulesetEvaluationRunId' => '<string>',
            'RulesetName' => '<string>',
            'Score' => <float>,
            'StartedOn' => <DateTime>,
        ],
        // ...
    ],
    'ResultsNotFound' => ['<string>', ...],
]

Result Details

Members
Results
Required: Yes
Type: Array of DataQualityResult structures

A list of DataQualityResult objects representing the data quality results.

ResultsNotFound
Type: Array of strings

A list of result IDs for which results were not found.

Errors

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

BatchGetDevEndpoints

$result = $client->batchGetDevEndpoints([/* ... */]);
$promise = $client->batchGetDevEndpointsAsync([/* ... */]);

Returns a list of resource metadata for a given list of development endpoint names. After calling the ListDevEndpoints operation, you can call this operation to access the data to which you have been granted permissions. This operation supports all IAM permissions, including permission conditions that uses tags.

Parameter Syntax

$result = $client->batchGetDevEndpoints([
    'DevEndpointNames' => ['<string>', ...], // REQUIRED
]);

Parameter Details

Members
DevEndpointNames
Required: Yes
Type: Array of strings

The list of DevEndpoint names, which might be the names returned from the ListDevEndpoint operation.

Result Syntax

[
    'DevEndpoints' => [
        [
            'Arguments' => ['<string>', ...],
            'AvailabilityZone' => '<string>',
            'CreatedTimestamp' => <DateTime>,
            'EndpointName' => '<string>',
            'ExtraJarsS3Path' => '<string>',
            'ExtraPythonLibsS3Path' => '<string>',
            'FailureReason' => '<string>',
            'GlueVersion' => '<string>',
            'LastModifiedTimestamp' => <DateTime>,
            'LastUpdateStatus' => '<string>',
            'NumberOfNodes' => <integer>,
            'NumberOfWorkers' => <integer>,
            'PrivateAddress' => '<string>',
            'PublicAddress' => '<string>',
            'PublicKey' => '<string>',
            'PublicKeys' => ['<string>', ...],
            'RoleArn' => '<string>',
            'SecurityConfiguration' => '<string>',
            'SecurityGroupIds' => ['<string>', ...],
            'Status' => '<string>',
            'SubnetId' => '<string>',
            'VpcId' => '<string>',
            'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X',
            'YarnEndpointAddress' => '<string>',
            'ZeppelinRemoteSparkInterpreterPort' => <integer>,
        ],
        // ...
    ],
    'DevEndpointsNotFound' => ['<string>', ...],
]

Result Details

Members
DevEndpoints
Type: Array of DevEndpoint structures

A list of DevEndpoint definitions.

DevEndpointsNotFound
Type: Array of strings

A list of DevEndpoints not found.

Errors

AccessDeniedException:

Access to a resource was denied.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

InvalidInputException:

The input provided was not valid.

BatchGetJobs

$result = $client->batchGetJobs([/* ... */]);
$promise = $client->batchGetJobsAsync([/* ... */]);

Returns a list of resource metadata for a given list of job names. After calling the ListJobs operation, you can call this operation to access the data to which you have been granted permissions. This operation supports all IAM permissions, including permission conditions that uses tags.

Parameter Syntax

$result = $client->batchGetJobs([
    'JobNames' => ['<string>', ...], // REQUIRED
]);

Parameter Details

Members
JobNames
Required: Yes
Type: Array of strings

A list of job names, which might be the names returned from the ListJobs operation.

Result Syntax

[
    'Jobs' => [
        [
            'AllocatedCapacity' => <integer>,
            'CodeGenConfigurationNodes' => [
                '<NodeId>' => [
                    'Aggregate' => [
                        'Aggs' => [
                            [
                                'AggFunc' => 'avg|countDistinct|count|first|last|kurtosis|max|min|skewness|stddev_samp|stddev_pop|sum|sumDistinct|var_samp|var_pop',
                                'Column' => ['<string>', ...],
                            ],
                            // ...
                        ],
                        'Groups' => [
                            ['<string>', ...],
                            // ...
                        ],
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                    ],
                    'AmazonRedshiftSource' => [
                        'Data' => [
                            'AccessType' => '<string>',
                            'Action' => '<string>',
                            'AdvancedOptions' => [
                                [
                                    'Key' => '<string>',
                                    'Value' => '<string>',
                                ],
                                // ...
                            ],
                            'CatalogDatabase' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'CatalogRedshiftSchema' => '<string>',
                            'CatalogRedshiftTable' => '<string>',
                            'CatalogTable' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'Connection' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'CrawlerConnection' => '<string>',
                            'IamRole' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'MergeAction' => '<string>',
                            'MergeClause' => '<string>',
                            'MergeWhenMatched' => '<string>',
                            'MergeWhenNotMatched' => '<string>',
                            'PostAction' => '<string>',
                            'PreAction' => '<string>',
                            'SampleQuery' => '<string>',
                            'Schema' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'SelectedColumns' => [
                                [
                                    'Description' => '<string>',
                                    'Label' => '<string>',
                                    'Value' => '<string>',
                                ],
                                // ...
                            ],
                            'SourceType' => '<string>',
                            'StagingTable' => '<string>',
                            'Table' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'TablePrefix' => '<string>',
                            'TableSchema' => [
                                [
                                    'Description' => '<string>',
                                    'Label' => '<string>',
                                    'Value' => '<string>',
                                ],
                                // ...
                            ],
                            'TempDir' => '<string>',
                            'Upsert' => true || false,
                        ],
                        'Name' => '<string>',
                    ],
                    'AmazonRedshiftTarget' => [
                        'Data' => [
                            'AccessType' => '<string>',
                            'Action' => '<string>',
                            'AdvancedOptions' => [
                                [
                                    'Key' => '<string>',
                                    'Value' => '<string>',
                                ],
                                // ...
                            ],
                            'CatalogDatabase' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'CatalogRedshiftSchema' => '<string>',
                            'CatalogRedshiftTable' => '<string>',
                            'CatalogTable' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'Connection' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'CrawlerConnection' => '<string>',
                            'IamRole' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'MergeAction' => '<string>',
                            'MergeClause' => '<string>',
                            'MergeWhenMatched' => '<string>',
                            'MergeWhenNotMatched' => '<string>',
                            'PostAction' => '<string>',
                            'PreAction' => '<string>',
                            'SampleQuery' => '<string>',
                            'Schema' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'SelectedColumns' => [
                                [
                                    'Description' => '<string>',
                                    'Label' => '<string>',
                                    'Value' => '<string>',
                                ],
                                // ...
                            ],
                            'SourceType' => '<string>',
                            'StagingTable' => '<string>',
                            'Table' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'TablePrefix' => '<string>',
                            'TableSchema' => [
                                [
                                    'Description' => '<string>',
                                    'Label' => '<string>',
                                    'Value' => '<string>',
                                ],
                                // ...
                            ],
                            'TempDir' => '<string>',
                            'Upsert' => true || false,
                        ],
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                    ],
                    'ApplyMapping' => [
                        'Inputs' => ['<string>', ...],
                        'Mapping' => [
                            [
                                'Children' => [...], // RECURSIVE
                                'Dropped' => true || false,
                                'FromPath' => ['<string>', ...],
                                'FromType' => '<string>',
                                'ToKey' => '<string>',
                                'ToType' => '<string>',
                            ],
                            // ...
                        ],
                        'Name' => '<string>',
                    ],
                    'AthenaConnectorSource' => [
                        'ConnectionName' => '<string>',
                        'ConnectionTable' => '<string>',
                        'ConnectionType' => '<string>',
                        'ConnectorName' => '<string>',
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                        'SchemaName' => '<string>',
                    ],
                    'CatalogDeltaSource' => [
                        'AdditionalDeltaOptions' => ['<string>', ...],
                        'Database' => '<string>',
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                        'Table' => '<string>',
                    ],
                    'CatalogHudiSource' => [
                        'AdditionalHudiOptions' => ['<string>', ...],
                        'Database' => '<string>',
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                        'Table' => '<string>',
                    ],
                    'CatalogKafkaSource' => [
                        'DataPreviewOptions' => [
                            'PollingTime' => <integer>,
                            'RecordPollingLimit' => <integer>,
                        ],
                        'Database' => '<string>',
                        'DetectSchema' => true || false,
                        'Name' => '<string>',
                        'StreamingOptions' => [
                            'AddRecordTimestamp' => '<string>',
                            'Assign' => '<string>',
                            'BootstrapServers' => '<string>',
                            'Classification' => '<string>',
                            'ConnectionName' => '<string>',
                            'Delimiter' => '<string>',
                            'EmitConsumerLagMetrics' => '<string>',
                            'EndingOffsets' => '<string>',
                            'IncludeHeaders' => true || false,
                            'MaxOffsetsPerTrigger' => <integer>,
                            'MinPartitions' => <integer>,
                            'NumRetries' => <integer>,
                            'PollTimeoutMs' => <integer>,
                            'RetryIntervalMs' => <integer>,
                            'SecurityProtocol' => '<string>',
                            'StartingOffsets' => '<string>',
                            'StartingTimestamp' => <DateTime>,
                            'SubscribePattern' => '<string>',
                            'TopicName' => '<string>',
                        ],
                        'Table' => '<string>',
                        'WindowSize' => <integer>,
                    ],
                    'CatalogKinesisSource' => [
                        'DataPreviewOptions' => [
                            'PollingTime' => <integer>,
                            'RecordPollingLimit' => <integer>,
                        ],
                        'Database' => '<string>',
                        'DetectSchema' => true || false,
                        'Name' => '<string>',
                        'StreamingOptions' => [
                            'AddIdleTimeBetweenReads' => true || false,
                            'AddRecordTimestamp' => '<string>',
                            'AvoidEmptyBatches' => true || false,
                            'Classification' => '<string>',
                            'Delimiter' => '<string>',
                            'DescribeShardInterval' => <integer>,
                            'EmitConsumerLagMetrics' => '<string>',
                            'EndpointUrl' => '<string>',
                            'IdleTimeBetweenReadsInMs' => <integer>,
                            'MaxFetchRecordsPerShard' => <integer>,
                            'MaxFetchTimeInMs' => <integer>,
                            'MaxRecordPerRead' => <integer>,
                            'MaxRetryIntervalMs' => <integer>,
                            'NumRetries' => <integer>,
                            'RetryIntervalMs' => <integer>,
                            'RoleArn' => '<string>',
                            'RoleSessionName' => '<string>',
                            'StartingPosition' => 'latest|trim_horizon|earliest|timestamp',
                            'StartingTimestamp' => <DateTime>,
                            'StreamArn' => '<string>',
                            'StreamName' => '<string>',
                        ],
                        'Table' => '<string>',
                        'WindowSize' => <integer>,
                    ],
                    'CatalogSource' => [
                        'Database' => '<string>',
                        'Name' => '<string>',
                        'Table' => '<string>',
                    ],
                    'CatalogTarget' => [
                        'Database' => '<string>',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'Table' => '<string>',
                    ],
                    'ConnectorDataSource' => [
                        'ConnectionType' => '<string>',
                        'Data' => ['<string>', ...],
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                    ],
                    'ConnectorDataTarget' => [
                        'ConnectionType' => '<string>',
                        'Data' => ['<string>', ...],
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                    ],
                    'CustomCode' => [
                        'ClassName' => '<string>',
                        'Code' => '<string>',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                    ],
                    'DirectJDBCSource' => [
                        'ConnectionName' => '<string>',
                        'ConnectionType' => 'sqlserver|mysql|oracle|postgresql|redshift',
                        'Database' => '<string>',
                        'Name' => '<string>',
                        'RedshiftTmpDir' => '<string>',
                        'Table' => '<string>',
                    ],
                    'DirectKafkaSource' => [
                        'DataPreviewOptions' => [
                            'PollingTime' => <integer>,
                            'RecordPollingLimit' => <integer>,
                        ],
                        'DetectSchema' => true || false,
                        'Name' => '<string>',
                        'StreamingOptions' => [
                            'AddRecordTimestamp' => '<string>',
                            'Assign' => '<string>',
                            'BootstrapServers' => '<string>',
                            'Classification' => '<string>',
                            'ConnectionName' => '<string>',
                            'Delimiter' => '<string>',
                            'EmitConsumerLagMetrics' => '<string>',
                            'EndingOffsets' => '<string>',
                            'IncludeHeaders' => true || false,
                            'MaxOffsetsPerTrigger' => <integer>,
                            'MinPartitions' => <integer>,
                            'NumRetries' => <integer>,
                            'PollTimeoutMs' => <integer>,
                            'RetryIntervalMs' => <integer>,
                            'SecurityProtocol' => '<string>',
                            'StartingOffsets' => '<string>',
                            'StartingTimestamp' => <DateTime>,
                            'SubscribePattern' => '<string>',
                            'TopicName' => '<string>',
                        ],
                        'WindowSize' => <integer>,
                    ],
                    'DirectKinesisSource' => [
                        'DataPreviewOptions' => [
                            'PollingTime' => <integer>,
                            'RecordPollingLimit' => <integer>,
                        ],
                        'DetectSchema' => true || false,
                        'Name' => '<string>',
                        'StreamingOptions' => [
                            'AddIdleTimeBetweenReads' => true || false,
                            'AddRecordTimestamp' => '<string>',
                            'AvoidEmptyBatches' => true || false,
                            'Classification' => '<string>',
                            'Delimiter' => '<string>',
                            'DescribeShardInterval' => <integer>,
                            'EmitConsumerLagMetrics' => '<string>',
                            'EndpointUrl' => '<string>',
                            'IdleTimeBetweenReadsInMs' => <integer>,
                            'MaxFetchRecordsPerShard' => <integer>,
                            'MaxFetchTimeInMs' => <integer>,
                            'MaxRecordPerRead' => <integer>,
                            'MaxRetryIntervalMs' => <integer>,
                            'NumRetries' => <integer>,
                            'RetryIntervalMs' => <integer>,
                            'RoleArn' => '<string>',
                            'RoleSessionName' => '<string>',
                            'StartingPosition' => 'latest|trim_horizon|earliest|timestamp',
                            'StartingTimestamp' => <DateTime>,
                            'StreamArn' => '<string>',
                            'StreamName' => '<string>',
                        ],
                        'WindowSize' => <integer>,
                    ],
                    'DropDuplicates' => [
                        'Columns' => [
                            ['<string>', ...],
                            // ...
                        ],
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                    ],
                    'DropFields' => [
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'Paths' => [
                            ['<string>', ...],
                            // ...
                        ],
                    ],
                    'DropNullFields' => [
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'NullCheckBoxList' => [
                            'IsEmpty' => true || false,
                            'IsNegOne' => true || false,
                            'IsNullString' => true || false,
                        ],
                        'NullTextList' => [
                            [
                                'Datatype' => [
                                    'Id' => '<string>',
                                    'Label' => '<string>',
                                ],
                                'Value' => '<string>',
                            ],
                            // ...
                        ],
                    ],
                    'DynamicTransform' => [
                        'FunctionName' => '<string>',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                        'Parameters' => [
                            [
                                'IsOptional' => true || false,
                                'ListType' => 'str|int|float|complex|bool|list|null',
                                'Name' => '<string>',
                                'Type' => 'str|int|float|complex|bool|list|null',
                                'ValidationMessage' => '<string>',
                                'ValidationRule' => '<string>',
                                'Value' => ['<string>', ...],
                            ],
                            // ...
                        ],
                        'Path' => '<string>',
                        'TransformName' => '<string>',
                        'Version' => '<string>',
                    ],
                    'DynamoDBCatalogSource' => [
                        'Database' => '<string>',
                        'Name' => '<string>',
                        'Table' => '<string>',
                    ],
                    'EvaluateDataQuality' => [
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'Output' => 'PrimaryInput|EvaluationResults',
                        'PublishingOptions' => [
                            'CloudWatchMetricsEnabled' => true || false,
                            'EvaluationContext' => '<string>',
                            'ResultsPublishingEnabled' => true || false,
                            'ResultsS3Prefix' => '<string>',
                        ],
                        'Ruleset' => '<string>',
                        'StopJobOnFailureOptions' => [
                            'StopJobOnFailureTiming' => 'Immediate|AfterDataLoad',
                        ],
                    ],
                    'EvaluateDataQualityMultiFrame' => [
                        'AdditionalDataSources' => ['<string>', ...],
                        'AdditionalOptions' => ['<string>', ...],
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'PublishingOptions' => [
                            'CloudWatchMetricsEnabled' => true || false,
                            'EvaluationContext' => '<string>',
                            'ResultsPublishingEnabled' => true || false,
                            'ResultsS3Prefix' => '<string>',
                        ],
                        'Ruleset' => '<string>',
                        'StopJobOnFailureOptions' => [
                            'StopJobOnFailureTiming' => 'Immediate|AfterDataLoad',
                        ],
                    ],
                    'FillMissingValues' => [
                        'FilledPath' => '<string>',
                        'ImputedPath' => '<string>',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                    ],
                    'Filter' => [
                        'Filters' => [
                            [
                                'Negated' => true || false,
                                'Operation' => 'EQ|LT|GT|LTE|GTE|REGEX|ISNULL',
                                'Values' => [
                                    [
                                        'Type' => 'COLUMNEXTRACTED|CONSTANT',
                                        'Value' => ['<string>', ...],
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                        'Inputs' => ['<string>', ...],
                        'LogicalOperator' => 'AND|OR',
                        'Name' => '<string>',
                    ],
                    'GovernedCatalogSource' => [
                        'AdditionalOptions' => [
                            'BoundedFiles' => <integer>,
                            'BoundedSize' => <integer>,
                        ],
                        'Database' => '<string>',
                        'Name' => '<string>',
                        'PartitionPredicate' => '<string>',
                        'Table' => '<string>',
                    ],
                    'GovernedCatalogTarget' => [
                        'Database' => '<string>',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'PartitionKeys' => [
                            ['<string>', ...],
                            // ...
                        ],
                        'SchemaChangePolicy' => [
                            'EnableUpdateCatalog' => true || false,
                            'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                        ],
                        'Table' => '<string>',
                    ],
                    'JDBCConnectorSource' => [
                        'AdditionalOptions' => [
                            'DataTypeMapping' => ['<string>', ...],
                            'FilterPredicate' => '<string>',
                            'JobBookmarkKeys' => ['<string>', ...],
                            'JobBookmarkKeysSortOrder' => '<string>',
                            'LowerBound' => <integer>,
                            'NumPartitions' => <integer>,
                            'PartitionColumn' => '<string>',
                            'UpperBound' => <integer>,
                        ],
                        'ConnectionName' => '<string>',
                        'ConnectionTable' => '<string>',
                        'ConnectionType' => '<string>',
                        'ConnectorName' => '<string>',
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                        'Query' => '<string>',
                    ],
                    'JDBCConnectorTarget' => [
                        'AdditionalOptions' => ['<string>', ...],
                        'ConnectionName' => '<string>',
                        'ConnectionTable' => '<string>',
                        'ConnectionType' => '<string>',
                        'ConnectorName' => '<string>',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                    ],
                    'Join' => [
                        'Columns' => [
                            [
                                'From' => '<string>',
                                'Keys' => [
                                    ['<string>', ...],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                        'Inputs' => ['<string>', ...],
                        'JoinType' => 'equijoin|left|right|outer|leftsemi|leftanti',
                        'Name' => '<string>',
                    ],
                    'Merge' => [
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'PrimaryKeys' => [
                            ['<string>', ...],
                            // ...
                        ],
                        'Source' => '<string>',
                    ],
                    'MicrosoftSQLServerCatalogSource' => [
                        'Database' => '<string>',
                        'Name' => '<string>',
                        'Table' => '<string>',
                    ],
                    'MicrosoftSQLServerCatalogTarget' => [
                        'Database' => '<string>',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'Table' => '<string>',
                    ],
                    'MySQLCatalogSource' => [
                        'Database' => '<string>',
                        'Name' => '<string>',
                        'Table' => '<string>',
                    ],
                    'MySQLCatalogTarget' => [
                        'Database' => '<string>',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'Table' => '<string>',
                    ],
                    'OracleSQLCatalogSource' => [
                        'Database' => '<string>',
                        'Name' => '<string>',
                        'Table' => '<string>',
                    ],
                    'OracleSQLCatalogTarget' => [
                        'Database' => '<string>',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'Table' => '<string>',
                    ],
                    'PIIDetection' => [
                        'EntityTypesToDetect' => ['<string>', ...],
                        'Inputs' => ['<string>', ...],
                        'MaskValue' => '<string>',
                        'Name' => '<string>',
                        'OutputColumnName' => '<string>',
                        'PiiType' => 'RowAudit|RowMasking|ColumnAudit|ColumnMasking',
                        'SampleFraction' => <float>,
                        'ThresholdFraction' => <float>,
                    ],
                    'PostgreSQLCatalogSource' => [
                        'Database' => '<string>',
                        'Name' => '<string>',
                        'Table' => '<string>',
                    ],
                    'PostgreSQLCatalogTarget' => [
                        'Database' => '<string>',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'Table' => '<string>',
                    ],
                    'Recipe' => [
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'RecipeReference' => [
                            'RecipeArn' => '<string>',
                            'RecipeVersion' => '<string>',
                        ],
                    ],
                    'RedshiftSource' => [
                        'Database' => '<string>',
                        'Name' => '<string>',
                        'RedshiftTmpDir' => '<string>',
                        'Table' => '<string>',
                        'TmpDirIAMRole' => '<string>',
                    ],
                    'RedshiftTarget' => [
                        'Database' => '<string>',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'RedshiftTmpDir' => '<string>',
                        'Table' => '<string>',
                        'TmpDirIAMRole' => '<string>',
                        'UpsertRedshiftOptions' => [
                            'ConnectionName' => '<string>',
                            'TableLocation' => '<string>',
                            'UpsertKeys' => ['<string>', ...],
                        ],
                    ],
                    'RelationalCatalogSource' => [
                        'Database' => '<string>',
                        'Name' => '<string>',
                        'Table' => '<string>',
                    ],
                    'RenameField' => [
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'SourcePath' => ['<string>', ...],
                        'TargetPath' => ['<string>', ...],
                    ],
                    'S3CatalogDeltaSource' => [
                        'AdditionalDeltaOptions' => ['<string>', ...],
                        'Database' => '<string>',
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                        'Table' => '<string>',
                    ],
                    'S3CatalogHudiSource' => [
                        'AdditionalHudiOptions' => ['<string>', ...],
                        'Database' => '<string>',
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                        'Table' => '<string>',
                    ],
                    'S3CatalogSource' => [
                        'AdditionalOptions' => [
                            'BoundedFiles' => <integer>,
                            'BoundedSize' => <integer>,
                        ],
                        'Database' => '<string>',
                        'Name' => '<string>',
                        'PartitionPredicate' => '<string>',
                        'Table' => '<string>',
                    ],
                    'S3CatalogTarget' => [
                        'Database' => '<string>',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'PartitionKeys' => [
                            ['<string>', ...],
                            // ...
                        ],
                        'SchemaChangePolicy' => [
                            'EnableUpdateCatalog' => true || false,
                            'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                        ],
                        'Table' => '<string>',
                    ],
                    'S3CsvSource' => [
                        'AdditionalOptions' => [
                            'BoundedFiles' => <integer>,
                            'BoundedSize' => <integer>,
                            'EnableSamplePath' => true || false,
                            'SamplePath' => '<string>',
                        ],
                        'CompressionType' => 'gzip|bzip2',
                        'Escaper' => '<string>',
                        'Exclusions' => ['<string>', ...],
                        'GroupFiles' => '<string>',
                        'GroupSize' => '<string>',
                        'MaxBand' => <integer>,
                        'MaxFilesInBand' => <integer>,
                        'Multiline' => true || false,
                        'Name' => '<string>',
                        'OptimizePerformance' => true || false,
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                        'Paths' => ['<string>', ...],
                        'QuoteChar' => 'quote|quillemet|single_quote|disabled',
                        'Recurse' => true || false,
                        'Separator' => 'comma|ctrla|pipe|semicolon|tab',
                        'SkipFirst' => true || false,
                        'WithHeader' => true || false,
                        'WriteHeader' => true || false,
                    ],
                    'S3DeltaCatalogTarget' => [
                        'AdditionalOptions' => ['<string>', ...],
                        'Database' => '<string>',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'PartitionKeys' => [
                            ['<string>', ...],
                            // ...
                        ],
                        'SchemaChangePolicy' => [
                            'EnableUpdateCatalog' => true || false,
                            'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                        ],
                        'Table' => '<string>',
                    ],
                    'S3DeltaDirectTarget' => [
                        'AdditionalOptions' => ['<string>', ...],
                        'Compression' => 'uncompressed|snappy',
                        'Format' => 'json|csv|avro|orc|parquet|hudi|delta',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'PartitionKeys' => [
                            ['<string>', ...],
                            // ...
                        ],
                        'Path' => '<string>',
                        'SchemaChangePolicy' => [
                            'Database' => '<string>',
                            'EnableUpdateCatalog' => true || false,
                            'Table' => '<string>',
                            'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                        ],
                    ],
                    'S3DeltaSource' => [
                        'AdditionalDeltaOptions' => ['<string>', ...],
                        'AdditionalOptions' => [
                            'BoundedFiles' => <integer>,
                            'BoundedSize' => <integer>,
                            'EnableSamplePath' => true || false,
                            'SamplePath' => '<string>',
                        ],
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                        'Paths' => ['<string>', ...],
                    ],
                    'S3DirectTarget' => [
                        'Compression' => '<string>',
                        'Format' => 'json|csv|avro|orc|parquet|hudi|delta',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'PartitionKeys' => [
                            ['<string>', ...],
                            // ...
                        ],
                        'Path' => '<string>',
                        'SchemaChangePolicy' => [
                            'Database' => '<string>',
                            'EnableUpdateCatalog' => true || false,
                            'Table' => '<string>',
                            'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                        ],
                    ],
                    'S3GlueParquetTarget' => [
                        'Compression' => 'snappy|lzo|gzip|uncompressed|none',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'PartitionKeys' => [
                            ['<string>', ...],
                            // ...
                        ],
                        'Path' => '<string>',
                        'SchemaChangePolicy' => [
                            'Database' => '<string>',
                            'EnableUpdateCatalog' => true || false,
                            'Table' => '<string>',
                            'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                        ],
                    ],
                    'S3HudiCatalogTarget' => [
                        'AdditionalOptions' => ['<string>', ...],
                        'Database' => '<string>',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'PartitionKeys' => [
                            ['<string>', ...],
                            // ...
                        ],
                        'SchemaChangePolicy' => [
                            'EnableUpdateCatalog' => true || false,
                            'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                        ],
                        'Table' => '<string>',
                    ],
                    'S3HudiDirectTarget' => [
                        'AdditionalOptions' => ['<string>', ...],
                        'Compression' => 'gzip|lzo|uncompressed|snappy',
                        'Format' => 'json|csv|avro|orc|parquet|hudi|delta',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'PartitionKeys' => [
                            ['<string>', ...],
                            // ...
                        ],
                        'Path' => '<string>',
                        'SchemaChangePolicy' => [
                            'Database' => '<string>',
                            'EnableUpdateCatalog' => true || false,
                            'Table' => '<string>',
                            'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                        ],
                    ],
                    'S3HudiSource' => [
                        'AdditionalHudiOptions' => ['<string>', ...],
                        'AdditionalOptions' => [
                            'BoundedFiles' => <integer>,
                            'BoundedSize' => <integer>,
                            'EnableSamplePath' => true || false,
                            'SamplePath' => '<string>',
                        ],
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                        'Paths' => ['<string>', ...],
                    ],
                    'S3JsonSource' => [
                        'AdditionalOptions' => [
                            'BoundedFiles' => <integer>,
                            'BoundedSize' => <integer>,
                            'EnableSamplePath' => true || false,
                            'SamplePath' => '<string>',
                        ],
                        'CompressionType' => 'gzip|bzip2',
                        'Exclusions' => ['<string>', ...],
                        'GroupFiles' => '<string>',
                        'GroupSize' => '<string>',
                        'JsonPath' => '<string>',
                        'MaxBand' => <integer>,
                        'MaxFilesInBand' => <integer>,
                        'Multiline' => true || false,
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                        'Paths' => ['<string>', ...],
                        'Recurse' => true || false,
                    ],
                    'S3ParquetSource' => [
                        'AdditionalOptions' => [
                            'BoundedFiles' => <integer>,
                            'BoundedSize' => <integer>,
                            'EnableSamplePath' => true || false,
                            'SamplePath' => '<string>',
                        ],
                        'CompressionType' => 'snappy|lzo|gzip|uncompressed|none',
                        'Exclusions' => ['<string>', ...],
                        'GroupFiles' => '<string>',
                        'GroupSize' => '<string>',
                        'MaxBand' => <integer>,
                        'MaxFilesInBand' => <integer>,
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                        'Paths' => ['<string>', ...],
                        'Recurse' => true || false,
                    ],
                    'SelectFields' => [
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'Paths' => [
                            ['<string>', ...],
                            // ...
                        ],
                    ],
                    'SelectFromCollection' => [
                        'Index' => <integer>,
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                    ],
                    'SnowflakeSource' => [
                        'Data' => [
                            'Action' => '<string>',
                            'AdditionalOptions' => ['<string>', ...],
                            'AutoPushdown' => true || false,
                            'Connection' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'Database' => '<string>',
                            'IamRole' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'MergeAction' => '<string>',
                            'MergeClause' => '<string>',
                            'MergeWhenMatched' => '<string>',
                            'MergeWhenNotMatched' => '<string>',
                            'PostAction' => '<string>',
                            'PreAction' => '<string>',
                            'SampleQuery' => '<string>',
                            'Schema' => '<string>',
                            'SelectedColumns' => [
                                [
                                    'Description' => '<string>',
                                    'Label' => '<string>',
                                    'Value' => '<string>',
                                ],
                                // ...
                            ],
                            'SourceType' => '<string>',
                            'StagingTable' => '<string>',
                            'Table' => '<string>',
                            'TableSchema' => [
                                [
                                    'Description' => '<string>',
                                    'Label' => '<string>',
                                    'Value' => '<string>',
                                ],
                                // ...
                            ],
                            'TempDir' => '<string>',
                            'Upsert' => true || false,
                        ],
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                    ],
                    'SnowflakeTarget' => [
                        'Data' => [
                            'Action' => '<string>',
                            'AdditionalOptions' => ['<string>', ...],
                            'AutoPushdown' => true || false,
                            'Connection' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'Database' => '<string>',
                            'IamRole' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'MergeAction' => '<string>',
                            'MergeClause' => '<string>',
                            'MergeWhenMatched' => '<string>',
                            'MergeWhenNotMatched' => '<string>',
                            'PostAction' => '<string>',
                            'PreAction' => '<string>',
                            'SampleQuery' => '<string>',
                            'Schema' => '<string>',
                            'SelectedColumns' => [
                                [
                                    'Description' => '<string>',
                                    'Label' => '<string>',
                                    'Value' => '<string>',
                                ],
                                // ...
                            ],
                            'SourceType' => '<string>',
                            'StagingTable' => '<string>',
                            'Table' => '<string>',
                            'TableSchema' => [
                                [
                                    'Description' => '<string>',
                                    'Label' => '<string>',
                                    'Value' => '<string>',
                                ],
                                // ...
                            ],
                            'TempDir' => '<string>',
                            'Upsert' => true || false,
                        ],
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                    ],
                    'SparkConnectorSource' => [
                        'AdditionalOptions' => ['<string>', ...],
                        'ConnectionName' => '<string>',
                        'ConnectionType' => '<string>',
                        'ConnectorName' => '<string>',
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                    ],
                    'SparkConnectorTarget' => [
                        'AdditionalOptions' => ['<string>', ...],
                        'ConnectionName' => '<string>',
                        'ConnectionType' => '<string>',
                        'ConnectorName' => '<string>',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                    ],
                    'SparkSQL' => [
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                        'SqlAliases' => [
                            [
                                'Alias' => '<string>',
                                'From' => '<string>',
                            ],
                            // ...
                        ],
                        'SqlQuery' => '<string>',
                    ],
                    'Spigot' => [
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'Path' => '<string>',
                        'Prob' => <float>,
                        'Topk' => <integer>,
                    ],
                    'SplitFields' => [
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'Paths' => [
                            ['<string>', ...],
                            // ...
                        ],
                    ],
                    'Union' => [
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'UnionType' => 'ALL|DISTINCT',
                    ],
                ],
                // ...
            ],
            'Command' => [
                'Name' => '<string>',
                'PythonVersion' => '<string>',
                'Runtime' => '<string>',
                'ScriptLocation' => '<string>',
            ],
            'Connections' => [
                'Connections' => ['<string>', ...],
            ],
            'CreatedOn' => <DateTime>,
            'DefaultArguments' => ['<string>', ...],
            'Description' => '<string>',
            'ExecutionClass' => 'FLEX|STANDARD',
            'ExecutionProperty' => [
                'MaxConcurrentRuns' => <integer>,
            ],
            'GlueVersion' => '<string>',
            'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK',
            'LastModifiedOn' => <DateTime>,
            'LogUri' => '<string>',
            'MaintenanceWindow' => '<string>',
            'MaxCapacity' => <float>,
            'MaxRetries' => <integer>,
            'Name' => '<string>',
            'NonOverridableArguments' => ['<string>', ...],
            'NotificationProperty' => [
                'NotifyDelayAfter' => <integer>,
            ],
            'NumberOfWorkers' => <integer>,
            'Role' => '<string>',
            'SecurityConfiguration' => '<string>',
            'SourceControlDetails' => [
                'AuthStrategy' => 'PERSONAL_ACCESS_TOKEN|AWS_SECRETS_MANAGER',
                'AuthToken' => '<string>',
                'Branch' => '<string>',
                'Folder' => '<string>',
                'LastCommitId' => '<string>',
                'Owner' => '<string>',
                'Provider' => 'GITHUB|GITLAB|BITBUCKET|AWS_CODE_COMMIT',
                'Repository' => '<string>',
            ],
            'Timeout' => <integer>,
            'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X',
        ],
        // ...
    ],
    'JobsNotFound' => ['<string>', ...],
]

Result Details

Members
Jobs
Type: Array of Job structures

A list of job definitions.

JobsNotFound
Type: Array of strings

A list of names of jobs not found.

Errors

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

InvalidInputException:

The input provided was not valid.

BatchGetPartition

$result = $client->batchGetPartition([/* ... */]);
$promise = $client->batchGetPartitionAsync([/* ... */]);

Retrieves partitions in a batch request.

Parameter Syntax

$result = $client->batchGetPartition([
    'CatalogId' => '<string>',
    'DatabaseName' => '<string>', // REQUIRED
    'PartitionsToGet' => [ // REQUIRED
        [
            'Values' => ['<string>', ...], // REQUIRED
        ],
        // ...
    ],
    'TableName' => '<string>', // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog where the partitions in question reside. If none is supplied, the Amazon Web Services account ID is used by default.

DatabaseName
Required: Yes
Type: string

The name of the catalog database where the partitions reside.

PartitionsToGet
Required: Yes
Type: Array of PartitionValueList structures

A list of partition values identifying the partitions to retrieve.

TableName
Required: Yes
Type: string

The name of the partitions' table.

Result Syntax

[
    'Partitions' => [
        [
            'CatalogId' => '<string>',
            'CreationTime' => <DateTime>,
            'DatabaseName' => '<string>',
            'LastAccessTime' => <DateTime>,
            'LastAnalyzedTime' => <DateTime>,
            'Parameters' => ['<string>', ...],
            'StorageDescriptor' => [
                'AdditionalLocations' => ['<string>', ...],
                'BucketColumns' => ['<string>', ...],
                'Columns' => [
                    [
                        'Comment' => '<string>',
                        'Name' => '<string>',
                        'Parameters' => ['<string>', ...],
                        'Type' => '<string>',
                    ],
                    // ...
                ],
                'Compressed' => true || false,
                'InputFormat' => '<string>',
                'Location' => '<string>',
                'NumberOfBuckets' => <integer>,
                'OutputFormat' => '<string>',
                'Parameters' => ['<string>', ...],
                'SchemaReference' => [
                    'SchemaId' => [
                        'RegistryName' => '<string>',
                        'SchemaArn' => '<string>',
                        'SchemaName' => '<string>',
                    ],
                    'SchemaVersionId' => '<string>',
                    'SchemaVersionNumber' => <integer>,
                ],
                'SerdeInfo' => [
                    'Name' => '<string>',
                    'Parameters' => ['<string>', ...],
                    'SerializationLibrary' => '<string>',
                ],
                'SkewedInfo' => [
                    'SkewedColumnNames' => ['<string>', ...],
                    'SkewedColumnValueLocationMaps' => ['<string>', ...],
                    'SkewedColumnValues' => ['<string>', ...],
                ],
                'SortColumns' => [
                    [
                        'Column' => '<string>',
                        'SortOrder' => <integer>,
                    ],
                    // ...
                ],
                'StoredAsSubDirectories' => true || false,
            ],
            'TableName' => '<string>',
            'Values' => ['<string>', ...],
        ],
        // ...
    ],
    'UnprocessedKeys' => [
        [
            'Values' => ['<string>', ...],
        ],
        // ...
    ],
]

Result Details

Members
Partitions
Type: Array of Partition structures

A list of the requested partitions.

UnprocessedKeys
Type: Array of PartitionValueList structures

A list of the partition values in the request for which partitions were not returned.

Errors

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

GlueEncryptionException:

An encryption operation failed.

InvalidStateException:

An error that indicates your data is in an invalid state.

FederationSourceException:

A federation source failed.

FederationSourceRetryableException:

A federation source failed, but the operation may be retried.

BatchGetTableOptimizer

$result = $client->batchGetTableOptimizer([/* ... */]);
$promise = $client->batchGetTableOptimizerAsync([/* ... */]);

Returns the configuration for the specified table optimizers.

Parameter Syntax

$result = $client->batchGetTableOptimizer([
    'Entries' => [ // REQUIRED
        [
            'catalogId' => '<string>',
            'databaseName' => '<string>',
            'tableName' => '<string>',
            'type' => 'compaction',
        ],
        // ...
    ],
]);

Parameter Details

Members
Entries
Required: Yes
Type: Array of BatchGetTableOptimizerEntry structures

A list of BatchGetTableOptimizerEntry objects specifying the table optimizers to retrieve.

Result Syntax

[
    'Failures' => [
        [
            'catalogId' => '<string>',
            'databaseName' => '<string>',
            'error' => [
                'ErrorCode' => '<string>',
                'ErrorMessage' => '<string>',
            ],
            'tableName' => '<string>',
            'type' => 'compaction',
        ],
        // ...
    ],
    'TableOptimizers' => [
        [
            'catalogId' => '<string>',
            'databaseName' => '<string>',
            'tableName' => '<string>',
            'tableOptimizer' => [
                'configuration' => [
                    'enabled' => true || false,
                    'roleArn' => '<string>',
                ],
                'lastRun' => [
                    'endTimestamp' => <DateTime>,
                    'error' => '<string>',
                    'eventType' => 'starting|completed|failed|in_progress',
                    'metrics' => [
                        'JobDurationInHour' => '<string>',
                        'NumberOfBytesCompacted' => '<string>',
                        'NumberOfDpus' => '<string>',
                        'NumberOfFilesCompacted' => '<string>',
                    ],
                    'startTimestamp' => <DateTime>,
                ],
                'type' => 'compaction',
            ],
        ],
        // ...
    ],
]

Result Details

Members
Failures
Type: Array of BatchGetTableOptimizerError structures

A list of errors from the operation.

TableOptimizers
Type: Array of BatchTableOptimizer structures

A list of BatchTableOptimizer objects.

Errors

InternalServiceException:

An internal service error occurred.

BatchGetTriggers

$result = $client->batchGetTriggers([/* ... */]);
$promise = $client->batchGetTriggersAsync([/* ... */]);

Returns a list of resource metadata for a given list of trigger names. After calling the ListTriggers operation, you can call this operation to access the data to which you have been granted permissions. This operation supports all IAM permissions, including permission conditions that uses tags.

Parameter Syntax

$result = $client->batchGetTriggers([
    'TriggerNames' => ['<string>', ...], // REQUIRED
]);

Parameter Details

Members
TriggerNames
Required: Yes
Type: Array of strings

A list of trigger names, which may be the names returned from the ListTriggers operation.

Result Syntax

[
    'Triggers' => [
        [
            'Actions' => [
                [
                    'Arguments' => ['<string>', ...],
                    'CrawlerName' => '<string>',
                    'JobName' => '<string>',
                    'NotificationProperty' => [
                        'NotifyDelayAfter' => <integer>,
                    ],
                    'SecurityConfiguration' => '<string>',
                    'Timeout' => <integer>,
                ],
                // ...
            ],
            'Description' => '<string>',
            'EventBatchingCondition' => [
                'BatchSize' => <integer>,
                'BatchWindow' => <integer>,
            ],
            'Id' => '<string>',
            'Name' => '<string>',
            'Predicate' => [
                'Conditions' => [
                    [
                        'CrawlState' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR',
                        'CrawlerName' => '<string>',
                        'JobName' => '<string>',
                        'LogicalOperator' => 'EQUALS',
                        'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED',
                    ],
                    // ...
                ],
                'Logical' => 'AND|ANY',
            ],
            'Schedule' => '<string>',
            'State' => 'CREATING|CREATED|ACTIVATING|ACTIVATED|DEACTIVATING|DEACTIVATED|DELETING|UPDATING',
            'Type' => 'SCHEDULED|CONDITIONAL|ON_DEMAND|EVENT',
            'WorkflowName' => '<string>',
        ],
        // ...
    ],
    'TriggersNotFound' => ['<string>', ...],
]

Result Details

Members
Triggers
Type: Array of Trigger structures

A list of trigger definitions.

TriggersNotFound
Type: Array of strings

A list of names of triggers not found.

Errors

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

InvalidInputException:

The input provided was not valid.

BatchGetWorkflows

$result = $client->batchGetWorkflows([/* ... */]);
$promise = $client->batchGetWorkflowsAsync([/* ... */]);

Returns a list of resource metadata for a given list of workflow names. After calling the ListWorkflows operation, you can call this operation to access the data to which you have been granted permissions. This operation supports all IAM permissions, including permission conditions that uses tags.

Parameter Syntax

$result = $client->batchGetWorkflows([
    'IncludeGraph' => true || false,
    'Names' => ['<string>', ...], // REQUIRED
]);

Parameter Details

Members
IncludeGraph
Type: boolean

Specifies whether to include a graph when returning the workflow resource metadata.

Names
Required: Yes
Type: Array of strings

A list of workflow names, which may be the names returned from the ListWorkflows operation.

Result Syntax

[
    'MissingWorkflows' => ['<string>', ...],
    'Workflows' => [
        [
            'BlueprintDetails' => [
                'BlueprintName' => '<string>',
                'RunId' => '<string>',
            ],
            'CreatedOn' => <DateTime>,
            'DefaultRunProperties' => ['<string>', ...],
            'Description' => '<string>',
            'Graph' => [
                'Edges' => [
                    [
                        'DestinationId' => '<string>',
                        'SourceId' => '<string>',
                    ],
                    // ...
                ],
                'Nodes' => [
                    [
                        'CrawlerDetails' => [
                            'Crawls' => [
                                [
                                    'CompletedOn' => <DateTime>,
                                    'ErrorMessage' => '<string>',
                                    'LogGroup' => '<string>',
                                    'LogStream' => '<string>',
                                    'StartedOn' => <DateTime>,
                                    'State' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR',
                                ],
                                // ...
                            ],
                        ],
                        'JobDetails' => [
                            'JobRuns' => [
                                [
                                    'AllocatedCapacity' => <integer>,
                                    'Arguments' => ['<string>', ...],
                                    'Attempt' => <integer>,
                                    'CompletedOn' => <DateTime>,
                                    'DPUSeconds' => <float>,
                                    'ErrorMessage' => '<string>',
                                    'ExecutionClass' => 'FLEX|STANDARD',
                                    'ExecutionTime' => <integer>,
                                    'GlueVersion' => '<string>',
                                    'Id' => '<string>',
                                    'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK',
                                    'JobName' => '<string>',
                                    'JobRunState' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED',
                                    'LastModifiedOn' => <DateTime>,
                                    'LogGroupName' => '<string>',
                                    'MaintenanceWindow' => '<string>',
                                    'MaxCapacity' => <float>,
                                    'NotificationProperty' => [
                                        'NotifyDelayAfter' => <integer>,
                                    ],
                                    'NumberOfWorkers' => <integer>,
                                    'PredecessorRuns' => [
                                        [
                                            'JobName' => '<string>',
                                            'RunId' => '<string>',
                                        ],
                                        // ...
                                    ],
                                    'PreviousRunId' => '<string>',
                                    'SecurityConfiguration' => '<string>',
                                    'StartedOn' => <DateTime>,
                                    'Timeout' => <integer>,
                                    'TriggerName' => '<string>',
                                    'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X',
                                ],
                                // ...
                            ],
                        ],
                        'Name' => '<string>',
                        'TriggerDetails' => [
                            'Trigger' => [
                                'Actions' => [
                                    [
                                        'Arguments' => ['<string>', ...],
                                        'CrawlerName' => '<string>',
                                        'JobName' => '<string>',
                                        'NotificationProperty' => [
                                            'NotifyDelayAfter' => <integer>,
                                        ],
                                        'SecurityConfiguration' => '<string>',
                                        'Timeout' => <integer>,
                                    ],
                                    // ...
                                ],
                                'Description' => '<string>',
                                'EventBatchingCondition' => [
                                    'BatchSize' => <integer>,
                                    'BatchWindow' => <integer>,
                                ],
                                'Id' => '<string>',
                                'Name' => '<string>',
                                'Predicate' => [
                                    'Conditions' => [
                                        [
                                            'CrawlState' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR',
                                            'CrawlerName' => '<string>',
                                            'JobName' => '<string>',
                                            'LogicalOperator' => 'EQUALS',
                                            'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED',
                                        ],
                                        // ...
                                    ],
                                    'Logical' => 'AND|ANY',
                                ],
                                'Schedule' => '<string>',
                                'State' => 'CREATING|CREATED|ACTIVATING|ACTIVATED|DEACTIVATING|DEACTIVATED|DELETING|UPDATING',
                                'Type' => 'SCHEDULED|CONDITIONAL|ON_DEMAND|EVENT',
                                'WorkflowName' => '<string>',
                            ],
                        ],
                        'Type' => 'CRAWLER|JOB|TRIGGER',
                        'UniqueId' => '<string>',
                    ],
                    // ...
                ],
            ],
            'LastModifiedOn' => <DateTime>,
            'LastRun' => [
                'CompletedOn' => <DateTime>,
                'ErrorMessage' => '<string>',
                'Graph' => [
                    'Edges' => [
                        [
                            'DestinationId' => '<string>',
                            'SourceId' => '<string>',
                        ],
                        // ...
                    ],
                    'Nodes' => [
                        [
                            'CrawlerDetails' => [
                                'Crawls' => [
                                    [
                                        'CompletedOn' => <DateTime>,
                                        'ErrorMessage' => '<string>',
                                        'LogGroup' => '<string>',
                                        'LogStream' => '<string>',
                                        'StartedOn' => <DateTime>,
                                        'State' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR',
                                    ],
                                    // ...
                                ],
                            ],
                            'JobDetails' => [
                                'JobRuns' => [
                                    [
                                        'AllocatedCapacity' => <integer>,
                                        'Arguments' => ['<string>', ...],
                                        'Attempt' => <integer>,
                                        'CompletedOn' => <DateTime>,
                                        'DPUSeconds' => <float>,
                                        'ErrorMessage' => '<string>',
                                        'ExecutionClass' => 'FLEX|STANDARD',
                                        'ExecutionTime' => <integer>,
                                        'GlueVersion' => '<string>',
                                        'Id' => '<string>',
                                        'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK',
                                        'JobName' => '<string>',
                                        'JobRunState' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED',
                                        'LastModifiedOn' => <DateTime>,
                                        'LogGroupName' => '<string>',
                                        'MaintenanceWindow' => '<string>',
                                        'MaxCapacity' => <float>,
                                        'NotificationProperty' => [
                                            'NotifyDelayAfter' => <integer>,
                                        ],
                                        'NumberOfWorkers' => <integer>,
                                        'PredecessorRuns' => [
                                            [
                                                'JobName' => '<string>',
                                                'RunId' => '<string>',
                                            ],
                                            // ...
                                        ],
                                        'PreviousRunId' => '<string>',
                                        'SecurityConfiguration' => '<string>',
                                        'StartedOn' => <DateTime>,
                                        'Timeout' => <integer>,
                                        'TriggerName' => '<string>',
                                        'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X',
                                    ],
                                    // ...
                                ],
                            ],
                            'Name' => '<string>',
                            'TriggerDetails' => [
                                'Trigger' => [
                                    'Actions' => [
                                        [
                                            'Arguments' => ['<string>', ...],
                                            'CrawlerName' => '<string>',
                                            'JobName' => '<string>',
                                            'NotificationProperty' => [
                                                'NotifyDelayAfter' => <integer>,
                                            ],
                                            'SecurityConfiguration' => '<string>',
                                            'Timeout' => <integer>,
                                        ],
                                        // ...
                                    ],
                                    'Description' => '<string>',
                                    'EventBatchingCondition' => [
                                        'BatchSize' => <integer>,
                                        'BatchWindow' => <integer>,
                                    ],
                                    'Id' => '<string>',
                                    'Name' => '<string>',
                                    'Predicate' => [
                                        'Conditions' => [
                                            [
                                                'CrawlState' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR',
                                                'CrawlerName' => '<string>',
                                                'JobName' => '<string>',
                                                'LogicalOperator' => 'EQUALS',
                                                'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED',
                                            ],
                                            // ...
                                        ],
                                        'Logical' => 'AND|ANY',
                                    ],
                                    'Schedule' => '<string>',
                                    'State' => 'CREATING|CREATED|ACTIVATING|ACTIVATED|DEACTIVATING|DEACTIVATED|DELETING|UPDATING',
                                    'Type' => 'SCHEDULED|CONDITIONAL|ON_DEMAND|EVENT',
                                    'WorkflowName' => '<string>',
                                ],
                            ],
                            'Type' => 'CRAWLER|JOB|TRIGGER',
                            'UniqueId' => '<string>',
                        ],
                        // ...
                    ],
                ],
                'Name' => '<string>',
                'PreviousRunId' => '<string>',
                'StartedOn' => <DateTime>,
                'StartingEventBatchCondition' => [
                    'BatchSize' => <integer>,
                    'BatchWindow' => <integer>,
                ],
                'Statistics' => [
                    'ErroredActions' => <integer>,
                    'FailedActions' => <integer>,
                    'RunningActions' => <integer>,
                    'StoppedActions' => <integer>,
                    'SucceededActions' => <integer>,
                    'TimeoutActions' => <integer>,
                    'TotalActions' => <integer>,
                    'WaitingActions' => <integer>,
                ],
                'Status' => 'RUNNING|COMPLETED|STOPPING|STOPPED|ERROR',
                'WorkflowRunId' => '<string>',
                'WorkflowRunProperties' => ['<string>', ...],
            ],
            'MaxConcurrentRuns' => <integer>,
            'Name' => '<string>',
        ],
        // ...
    ],
]

Result Details

Members
MissingWorkflows
Type: Array of strings

A list of names of workflows not found.

Workflows
Type: Array of Workflow structures

A list of workflow resource metadata.

Errors

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

InvalidInputException:

The input provided was not valid.

BatchStopJobRun

$result = $client->batchStopJobRun([/* ... */]);
$promise = $client->batchStopJobRunAsync([/* ... */]);

Stops one or more job runs for a specified job definition.

Parameter Syntax

$result = $client->batchStopJobRun([
    'JobName' => '<string>', // REQUIRED
    'JobRunIds' => ['<string>', ...], // REQUIRED
]);

Parameter Details

Members
JobName
Required: Yes
Type: string

The name of the job definition for which to stop job runs.

JobRunIds
Required: Yes
Type: Array of strings

A list of the JobRunIds that should be stopped for that job definition.

Result Syntax

[
    'Errors' => [
        [
            'ErrorDetail' => [
                'ErrorCode' => '<string>',
                'ErrorMessage' => '<string>',
            ],
            'JobName' => '<string>',
            'JobRunId' => '<string>',
        ],
        // ...
    ],
    'SuccessfulSubmissions' => [
        [
            'JobName' => '<string>',
            'JobRunId' => '<string>',
        ],
        // ...
    ],
]

Result Details

Members
Errors
Type: Array of BatchStopJobRunError structures

A list of the errors that were encountered in trying to stop JobRuns, including the JobRunId for which each error was encountered and details about the error.

SuccessfulSubmissions
Type: Array of BatchStopJobRunSuccessfulSubmission structures

A list of the JobRuns that were successfully submitted for stopping.

Errors

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

BatchUpdatePartition

$result = $client->batchUpdatePartition([/* ... */]);
$promise = $client->batchUpdatePartitionAsync([/* ... */]);

Updates one or more partitions in a batch operation.

Parameter Syntax

$result = $client->batchUpdatePartition([
    'CatalogId' => '<string>',
    'DatabaseName' => '<string>', // REQUIRED
    'Entries' => [ // REQUIRED
        [
            'PartitionInput' => [ // REQUIRED
                'LastAccessTime' => <integer || string || DateTime>,
                'LastAnalyzedTime' => <integer || string || DateTime>,
                'Parameters' => ['<string>', ...],
                'StorageDescriptor' => [
                    'AdditionalLocations' => ['<string>', ...],
                    'BucketColumns' => ['<string>', ...],
                    'Columns' => [
                        [
                            'Comment' => '<string>',
                            'Name' => '<string>', // REQUIRED
                            'Parameters' => ['<string>', ...],
                            'Type' => '<string>',
                        ],
                        // ...
                    ],
                    'Compressed' => true || false,
                    'InputFormat' => '<string>',
                    'Location' => '<string>',
                    'NumberOfBuckets' => <integer>,
                    'OutputFormat' => '<string>',
                    'Parameters' => ['<string>', ...],
                    'SchemaReference' => [
                        'SchemaId' => [
                            'RegistryName' => '<string>',
                            'SchemaArn' => '<string>',
                            'SchemaName' => '<string>',
                        ],
                        'SchemaVersionId' => '<string>',
                        'SchemaVersionNumber' => <integer>,
                    ],
                    'SerdeInfo' => [
                        'Name' => '<string>',
                        'Parameters' => ['<string>', ...],
                        'SerializationLibrary' => '<string>',
                    ],
                    'SkewedInfo' => [
                        'SkewedColumnNames' => ['<string>', ...],
                        'SkewedColumnValueLocationMaps' => ['<string>', ...],
                        'SkewedColumnValues' => ['<string>', ...],
                    ],
                    'SortColumns' => [
                        [
                            'Column' => '<string>', // REQUIRED
                            'SortOrder' => <integer>, // REQUIRED
                        ],
                        // ...
                    ],
                    'StoredAsSubDirectories' => true || false,
                ],
                'Values' => ['<string>', ...],
            ],
            'PartitionValueList' => ['<string>', ...], // REQUIRED
        ],
        // ...
    ],
    'TableName' => '<string>', // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the catalog in which the partition is to be updated. Currently, this should be the Amazon Web Services account ID.

DatabaseName
Required: Yes
Type: string

The name of the metadata database in which the partition is to be updated.

Entries
Required: Yes
Type: Array of BatchUpdatePartitionRequestEntry structures

A list of up to 100 BatchUpdatePartitionRequestEntry objects to update.

TableName
Required: Yes
Type: string

The name of the metadata table in which the partition is to be updated.

Result Syntax

[
    'Errors' => [
        [
            'ErrorDetail' => [
                'ErrorCode' => '<string>',
                'ErrorMessage' => '<string>',
            ],
            'PartitionValueList' => ['<string>', ...],
        ],
        // ...
    ],
]

Result Details

Members
Errors
Type: Array of BatchUpdatePartitionFailureEntry structures

The errors encountered when trying to update the requested partitions. A list of BatchUpdatePartitionFailureEntry objects.

Errors

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

GlueEncryptionException:

An encryption operation failed.

CancelDataQualityRuleRecommendationRun

$result = $client->cancelDataQualityRuleRecommendationRun([/* ... */]);
$promise = $client->cancelDataQualityRuleRecommendationRunAsync([/* ... */]);

Cancels the specified recommendation run that was being used to generate rules.

Parameter Syntax

$result = $client->cancelDataQualityRuleRecommendationRun([
    'RunId' => '<string>', // REQUIRED
]);

Parameter Details

Members
RunId
Required: Yes
Type: string

The unique run identifier associated with this run.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

CancelDataQualityRulesetEvaluationRun

$result = $client->cancelDataQualityRulesetEvaluationRun([/* ... */]);
$promise = $client->cancelDataQualityRulesetEvaluationRunAsync([/* ... */]);

Cancels a run where a ruleset is being evaluated against a data source.

Parameter Syntax

$result = $client->cancelDataQualityRulesetEvaluationRun([
    'RunId' => '<string>', // REQUIRED
]);

Parameter Details

Members
RunId
Required: Yes
Type: string

The unique run identifier associated with this run.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

CancelMLTaskRun

$result = $client->cancelMLTaskRun([/* ... */]);
$promise = $client->cancelMLTaskRunAsync([/* ... */]);

Cancels (stops) a task run. Machine learning task runs are asynchronous tasks that Glue runs on your behalf as part of various machine learning workflows. You can cancel a machine learning task run at any time by calling CancelMLTaskRun with a task run's parent transform's TransformID and the task run's TaskRunId.

Parameter Syntax

$result = $client->cancelMLTaskRun([
    'TaskRunId' => '<string>', // REQUIRED
    'TransformId' => '<string>', // REQUIRED
]);

Parameter Details

Members
TaskRunId
Required: Yes
Type: string

A unique identifier for the task run.

TransformId
Required: Yes
Type: string

The unique identifier of the machine learning transform.

Result Syntax

[
    'Status' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT',
    'TaskRunId' => '<string>',
    'TransformId' => '<string>',
]

Result Details

Members
Status
Type: string

The status for this run.

TaskRunId
Type: string

The unique identifier for the task run.

TransformId
Type: string

The unique identifier of the machine learning transform.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

CancelStatement

$result = $client->cancelStatement([/* ... */]);
$promise = $client->cancelStatementAsync([/* ... */]);

Cancels the statement.

Parameter Syntax

$result = $client->cancelStatement([
    'Id' => <integer>, // REQUIRED
    'RequestOrigin' => '<string>',
    'SessionId' => '<string>', // REQUIRED
]);

Parameter Details

Members
Id
Required: Yes
Type: int

The ID of the statement to be cancelled.

RequestOrigin
Type: string

The origin of the request to cancel the statement.

SessionId
Required: Yes
Type: string

The Session ID of the statement to be cancelled.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

AccessDeniedException:

Access to a resource was denied.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

InvalidInputException:

The input provided was not valid.

IllegalSessionStateException:

The session is in an invalid state to perform a requested operation.

CheckSchemaVersionValidity

$result = $client->checkSchemaVersionValidity([/* ... */]);
$promise = $client->checkSchemaVersionValidityAsync([/* ... */]);

Validates the supplied schema. This call has no side effects, it simply validates using the supplied schema using DataFormat as the format. Since it does not take a schema set name, no compatibility checks are performed.

Parameter Syntax

$result = $client->checkSchemaVersionValidity([
    'DataFormat' => 'AVRO|JSON|PROTOBUF', // REQUIRED
    'SchemaDefinition' => '<string>', // REQUIRED
]);

Parameter Details

Members
DataFormat
Required: Yes
Type: string

The data format of the schema definition. Currently AVRO, JSON and PROTOBUF are supported.

SchemaDefinition
Required: Yes
Type: string

The definition of the schema that has to be validated.

Result Syntax

[
    'Error' => '<string>',
    'Valid' => true || false,
]

Result Details

Members
Error
Type: string

A validation failure error message.

Valid
Type: boolean

Return true, if the schema is valid and false otherwise.

Errors

InvalidInputException:

The input provided was not valid.

AccessDeniedException:

Access to a resource was denied.

InternalServiceException:

An internal service error occurred.

CreateBlueprint

$result = $client->createBlueprint([/* ... */]);
$promise = $client->createBlueprintAsync([/* ... */]);

Registers a blueprint with Glue.

Parameter Syntax

$result = $client->createBlueprint([
    'BlueprintLocation' => '<string>', // REQUIRED
    'Description' => '<string>',
    'Name' => '<string>', // REQUIRED
    'Tags' => ['<string>', ...],
]);

Parameter Details

Members
BlueprintLocation
Required: Yes
Type: string

Specifies a path in Amazon S3 where the blueprint is published.

Description
Type: string

A description of the blueprint.

Name
Required: Yes
Type: string

The name of the blueprint.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

The tags to be applied to this blueprint.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Type: string

Returns the name of the blueprint that was registered.

Errors

AlreadyExistsException:

A resource to be created or added already exists.

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

CreateClassifier

$result = $client->createClassifier([/* ... */]);
$promise = $client->createClassifierAsync([/* ... */]);

Creates a classifier in the user's account. This can be a GrokClassifier, an XMLClassifier, a JsonClassifier, or a CsvClassifier, depending on which field of the request is present.

Parameter Syntax

$result = $client->createClassifier([
    'CsvClassifier' => [
        'AllowSingleColumn' => true || false,
        'ContainsHeader' => 'UNKNOWN|PRESENT|ABSENT',
        'CustomDatatypeConfigured' => true || false,
        'CustomDatatypes' => ['<string>', ...],
        'Delimiter' => '<string>',
        'DisableValueTrimming' => true || false,
        'Header' => ['<string>', ...],
        'Name' => '<string>', // REQUIRED
        'QuoteSymbol' => '<string>',
        'Serde' => 'OpenCSVSerDe|LazySimpleSerDe|None',
    ],
    'GrokClassifier' => [
        'Classification' => '<string>', // REQUIRED
        'CustomPatterns' => '<string>',
        'GrokPattern' => '<string>', // REQUIRED
        'Name' => '<string>', // REQUIRED
    ],
    'JsonClassifier' => [
        'JsonPath' => '<string>', // REQUIRED
        'Name' => '<string>', // REQUIRED
    ],
    'XMLClassifier' => [
        'Classification' => '<string>', // REQUIRED
        'Name' => '<string>', // REQUIRED
        'RowTag' => '<string>',
    ],
]);

Parameter Details

Members
CsvClassifier
Type: CreateCsvClassifierRequest structure

A CsvClassifier object specifying the classifier to create.

GrokClassifier
Type: CreateGrokClassifierRequest structure

A GrokClassifier object specifying the classifier to create.

JsonClassifier
Type: CreateJsonClassifierRequest structure

A JsonClassifier object specifying the classifier to create.

XMLClassifier
Type: CreateXMLClassifierRequest structure

An XMLClassifier object specifying the classifier to create.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

AlreadyExistsException:

A resource to be created or added already exists.

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

CreateConnection

$result = $client->createConnection([/* ... */]);
$promise = $client->createConnectionAsync([/* ... */]);

Creates a connection definition in the Data Catalog.

Connections used for creating federated resources require the IAM glue:PassConnection permission.

Parameter Syntax

$result = $client->createConnection([
    'CatalogId' => '<string>',
    'ConnectionInput' => [ // REQUIRED
        'ConnectionProperties' => ['<string>', ...], // REQUIRED
        'ConnectionType' => 'JDBC|SFTP|MONGODB|KAFKA|NETWORK|MARKETPLACE|CUSTOM', // REQUIRED
        'Description' => '<string>',
        'MatchCriteria' => ['<string>', ...],
        'Name' => '<string>', // REQUIRED
        'PhysicalConnectionRequirements' => [
            'AvailabilityZone' => '<string>',
            'SecurityGroupIdList' => ['<string>', ...],
            'SubnetId' => '<string>',
        ],
    ],
    'Tags' => ['<string>', ...],
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog in which to create the connection. If none is provided, the Amazon Web Services account ID is used by default.

ConnectionInput
Required: Yes
Type: ConnectionInput structure

A ConnectionInput object defining the connection to create.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

The tags you assign to the connection.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

AlreadyExistsException:

A resource to be created or added already exists.

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

GlueEncryptionException:

An encryption operation failed.

CreateCrawler

$result = $client->createCrawler([/* ... */]);
$promise = $client->createCrawlerAsync([/* ... */]);

Creates a new crawler with specified targets, role, configuration, and optional schedule. At least one crawl target must be specified, in the s3Targets field, the jdbcTargets field, or the DynamoDBTargets field.

Parameter Syntax

$result = $client->createCrawler([
    'Classifiers' => ['<string>', ...],
    'Configuration' => '<string>',
    'CrawlerSecurityConfiguration' => '<string>',
    'DatabaseName' => '<string>',
    'Description' => '<string>',
    'LakeFormationConfiguration' => [
        'AccountId' => '<string>',
        'UseLakeFormationCredentials' => true || false,
    ],
    'LineageConfiguration' => [
        'CrawlerLineageSettings' => 'ENABLE|DISABLE',
    ],
    'Name' => '<string>', // REQUIRED
    'RecrawlPolicy' => [
        'RecrawlBehavior' => 'CRAWL_EVERYTHING|CRAWL_NEW_FOLDERS_ONLY|CRAWL_EVENT_MODE',
    ],
    'Role' => '<string>', // REQUIRED
    'Schedule' => '<string>',
    'SchemaChangePolicy' => [
        'DeleteBehavior' => 'LOG|DELETE_FROM_DATABASE|DEPRECATE_IN_DATABASE',
        'UpdateBehavior' => 'LOG|UPDATE_IN_DATABASE',
    ],
    'TablePrefix' => '<string>',
    'Tags' => ['<string>', ...],
    'Targets' => [ // REQUIRED
        'CatalogTargets' => [
            [
                'ConnectionName' => '<string>',
                'DatabaseName' => '<string>', // REQUIRED
                'DlqEventQueueArn' => '<string>',
                'EventQueueArn' => '<string>',
                'Tables' => ['<string>', ...], // REQUIRED
            ],
            // ...
        ],
        'DeltaTargets' => [
            [
                'ConnectionName' => '<string>',
                'CreateNativeDeltaTable' => true || false,
                'DeltaTables' => ['<string>', ...],
                'WriteManifest' => true || false,
            ],
            // ...
        ],
        'DynamoDBTargets' => [
            [
                'Path' => '<string>',
                'scanAll' => true || false,
                'scanRate' => <float>,
            ],
            // ...
        ],
        'HudiTargets' => [
            [
                'ConnectionName' => '<string>',
                'Exclusions' => ['<string>', ...],
                'MaximumTraversalDepth' => <integer>,
                'Paths' => ['<string>', ...],
            ],
            // ...
        ],
        'IcebergTargets' => [
            [
                'ConnectionName' => '<string>',
                'Exclusions' => ['<string>', ...],
                'MaximumTraversalDepth' => <integer>,
                'Paths' => ['<string>', ...],
            ],
            // ...
        ],
        'JdbcTargets' => [
            [
                'ConnectionName' => '<string>',
                'EnableAdditionalMetadata' => ['<string>', ...],
                'Exclusions' => ['<string>', ...],
                'Path' => '<string>',
            ],
            // ...
        ],
        'MongoDBTargets' => [
            [
                'ConnectionName' => '<string>',
                'Path' => '<string>',
                'ScanAll' => true || false,
            ],
            // ...
        ],
        'S3Targets' => [
            [
                'ConnectionName' => '<string>',
                'DlqEventQueueArn' => '<string>',
                'EventQueueArn' => '<string>',
                'Exclusions' => ['<string>', ...],
                'Path' => '<string>',
                'SampleSize' => <integer>,
            ],
            // ...
        ],
    ],
]);

Parameter Details

Members
Classifiers
Type: Array of strings

A list of custom classifiers that the user has registered. By default, all built-in classifiers are included in a crawl, but these custom classifiers always override the default classifiers for a given classification.

Configuration
Type: string

Crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler's behavior. For more information, see Setting crawler configuration options.

CrawlerSecurityConfiguration
Type: string

The name of the SecurityConfiguration structure to be used by this crawler.

DatabaseName
Type: string

The Glue database where results are written, such as: arn:aws:daylight:us-east-1::database/sometable/*.

Description
Type: string

A description of the new crawler.

LakeFormationConfiguration
Type: LakeFormationConfiguration structure

Specifies Lake Formation configuration settings for the crawler.

LineageConfiguration
Type: LineageConfiguration structure

Specifies data lineage configuration settings for the crawler.

Name
Required: Yes
Type: string

Name of the new crawler.

RecrawlPolicy
Type: RecrawlPolicy structure

A policy that specifies whether to crawl the entire dataset again, or to crawl only folders that were added since the last crawler run.

Role
Required: Yes
Type: string

The IAM role or Amazon Resource Name (ARN) of an IAM role used by the new crawler to access customer resources.

Schedule
Type: string

A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *).

SchemaChangePolicy
Type: SchemaChangePolicy structure

The policy for the crawler's update and deletion behavior.

TablePrefix
Type: string

The table prefix used for catalog tables that are created.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

The tags to use with this crawler request. You may use tags to limit access to the crawler. For more information about tags in Glue, see Amazon Web Services Tags in Glue in the developer guide.

Targets
Required: Yes
Type: CrawlerTargets structure

A list of collection of targets to crawl.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

InvalidInputException:

The input provided was not valid.

AlreadyExistsException:

A resource to be created or added already exists.

OperationTimeoutException:

The operation timed out.

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

CreateCustomEntityType

$result = $client->createCustomEntityType([/* ... */]);
$promise = $client->createCustomEntityTypeAsync([/* ... */]);

Creates a custom pattern that is used to detect sensitive data across the columns and rows of your structured data.

Each custom pattern you create specifies a regular expression and an optional list of context words. If no context words are passed only a regular expression is checked.

Parameter Syntax

$result = $client->createCustomEntityType([
    'ContextWords' => ['<string>', ...],
    'Name' => '<string>', // REQUIRED
    'RegexString' => '<string>', // REQUIRED
    'Tags' => ['<string>', ...],
]);

Parameter Details

Members
ContextWords
Type: Array of strings

A list of context words. If none of these context words are found within the vicinity of the regular expression the data will not be detected as sensitive data.

If no context words are passed only a regular expression is checked.

Name
Required: Yes
Type: string

A name for the custom pattern that allows it to be retrieved or deleted later. This name must be unique per Amazon Web Services account.

RegexString
Required: Yes
Type: string

A regular expression string that is used for detecting sensitive data in a custom pattern.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

A list of tags applied to the custom entity type.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Type: string

The name of the custom pattern you created.

Errors

AccessDeniedException:

Access to a resource was denied.

AlreadyExistsException:

A resource to be created or added already exists.

IdempotentParameterMismatchException:

The same unique identifier was associated with two different records.

InternalServiceException:

An internal service error occurred.

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

CreateDataQualityRuleset

$result = $client->createDataQualityRuleset([/* ... */]);
$promise = $client->createDataQualityRulesetAsync([/* ... */]);

Creates a data quality ruleset with DQDL rules applied to a specified Glue table.

You create the ruleset using the Data Quality Definition Language (DQDL). For more information, see the Glue developer guide.

Parameter Syntax

$result = $client->createDataQualityRuleset([
    'ClientToken' => '<string>',
    'Description' => '<string>',
    'Name' => '<string>', // REQUIRED
    'Ruleset' => '<string>', // REQUIRED
    'Tags' => ['<string>', ...],
    'TargetTable' => [
        'CatalogId' => '<string>',
        'DatabaseName' => '<string>', // REQUIRED
        'TableName' => '<string>', // REQUIRED
    ],
]);

Parameter Details

Members
ClientToken
Type: string

Used for idempotency and is recommended to be set to a random ID (such as a UUID) to avoid creating or starting multiple instances of the same resource.

Description
Type: string

A description of the data quality ruleset.

Name
Required: Yes
Type: string

A unique name for the data quality ruleset.

Ruleset
Required: Yes
Type: string

A Data Quality Definition Language (DQDL) ruleset. For more information, see the Glue developer guide.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

A list of tags applied to the data quality ruleset.

TargetTable
Type: DataQualityTargetTable structure

A target table associated with the data quality ruleset.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Type: string

A unique name for the data quality ruleset.

Errors

InvalidInputException:

The input provided was not valid.

AlreadyExistsException:

A resource to be created or added already exists.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

CreateDatabase

$result = $client->createDatabase([/* ... */]);
$promise = $client->createDatabaseAsync([/* ... */]);

Creates a new database in a Data Catalog.

Parameter Syntax

$result = $client->createDatabase([
    'CatalogId' => '<string>',
    'DatabaseInput' => [ // REQUIRED
        'CreateTableDefaultPermissions' => [
            [
                'Permissions' => ['<string>', ...],
                'Principal' => [
                    'DataLakePrincipalIdentifier' => '<string>',
                ],
            ],
            // ...
        ],
        'Description' => '<string>',
        'FederatedDatabase' => [
            'ConnectionName' => '<string>',
            'Identifier' => '<string>',
        ],
        'LocationUri' => '<string>',
        'Name' => '<string>', // REQUIRED
        'Parameters' => ['<string>', ...],
        'TargetDatabase' => [
            'CatalogId' => '<string>',
            'DatabaseName' => '<string>',
            'Region' => '<string>',
        ],
    ],
    'Tags' => ['<string>', ...],
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog in which to create the database. If none is provided, the Amazon Web Services account ID is used by default.

DatabaseInput
Required: Yes
Type: DatabaseInput structure

The metadata for the database.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

The tags you assign to the database.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

InvalidInputException:

The input provided was not valid.

AlreadyExistsException:

A resource to be created or added already exists.

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GlueEncryptionException:

An encryption operation failed.

ConcurrentModificationException:

Two processes are trying to modify a resource simultaneously.

FederatedResourceAlreadyExistsException:

A federated resource already exists.

CreateDevEndpoint

$result = $client->createDevEndpoint([/* ... */]);
$promise = $client->createDevEndpointAsync([/* ... */]);

Creates a new development endpoint.

Parameter Syntax

$result = $client->createDevEndpoint([
    'Arguments' => ['<string>', ...],
    'EndpointName' => '<string>', // REQUIRED
    'ExtraJarsS3Path' => '<string>',
    'ExtraPythonLibsS3Path' => '<string>',
    'GlueVersion' => '<string>',
    'NumberOfNodes' => <integer>,
    'NumberOfWorkers' => <integer>,
    'PublicKey' => '<string>',
    'PublicKeys' => ['<string>', ...],
    'RoleArn' => '<string>', // REQUIRED
    'SecurityConfiguration' => '<string>',
    'SecurityGroupIds' => ['<string>', ...],
    'SubnetId' => '<string>',
    'Tags' => ['<string>', ...],
    'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X',
]);

Parameter Details

Members
Arguments
Type: Associative array of custom strings keys (GenericString) to strings

A map of arguments used to configure the DevEndpoint.

EndpointName
Required: Yes
Type: string

The name to be assigned to the new DevEndpoint.

ExtraJarsS3Path
Type: string

The path to one or more Java .jar files in an S3 bucket that should be loaded in your DevEndpoint.

ExtraPythonLibsS3Path
Type: string

The paths to one or more Python libraries in an Amazon S3 bucket that should be loaded in your DevEndpoint. Multiple values must be complete paths separated by a comma.

You can only use pure Python libraries with a DevEndpoint. Libraries that rely on C extensions, such as the pandas Python data analysis library, are not yet supported.

GlueVersion
Type: string

Glue version determines the versions of Apache Spark and Python that Glue supports. The Python version indicates the version supported for running your ETL scripts on development endpoints.

For more information about the available Glue versions and corresponding Spark and Python versions, see Glue version in the developer guide.

Development endpoints that are created without specifying a Glue version default to Glue 0.9.

You can specify a version of Python support for development endpoints by using the Arguments parameter in the CreateDevEndpoint or UpdateDevEndpoint APIs. If no arguments are provided, the version defaults to Python 2.

NumberOfNodes
Type: int

The number of Glue Data Processing Units (DPUs) to allocate to this DevEndpoint.

NumberOfWorkers
Type: int

The number of workers of a defined workerType that are allocated to the development endpoint.

The maximum number of workers you can define are 299 for G.1X, and 149 for G.2X.

PublicKey
Type: string

The public key to be used by this DevEndpoint for authentication. This attribute is provided for backward compatibility because the recommended attribute to use is public keys.

PublicKeys
Type: Array of strings

A list of public keys to be used by the development endpoints for authentication. The use of this attribute is preferred over a single public key because the public keys allow you to have a different private key per client.

If you previously created an endpoint with a public key, you must remove that key to be able to set a list of public keys. Call the UpdateDevEndpoint API with the public key content in the deletePublicKeys attribute, and the list of new keys in the addPublicKeys attribute.

RoleArn
Required: Yes
Type: string

The IAM role for the DevEndpoint.

SecurityConfiguration
Type: string

The name of the SecurityConfiguration structure to be used with this DevEndpoint.

SecurityGroupIds
Type: Array of strings

Security group IDs for the security groups to be used by the new DevEndpoint.

SubnetId
Type: string

The subnet ID for the new DevEndpoint to use.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

The tags to use with this DevEndpoint. You may use tags to limit access to the DevEndpoint. For more information about tags in Glue, see Amazon Web Services Tags in Glue in the developer guide.

WorkerType
Type: string

The type of predefined worker that is allocated to the development endpoint. Accepts a value of Standard, G.1X, or G.2X.

  • For the Standard worker type, each worker provides 4 vCPU, 16 GB of memory and a 50GB disk, and 2 executors per worker.

  • For the G.1X worker type, each worker maps to 1 DPU (4 vCPU, 16 GB of memory, 64 GB disk), and provides 1 executor per worker. We recommend this worker type for memory-intensive jobs.

  • For the G.2X worker type, each worker maps to 2 DPU (8 vCPU, 32 GB of memory, 128 GB disk), and provides 1 executor per worker. We recommend this worker type for memory-intensive jobs.

Known issue: when a development endpoint is created with the G.2X WorkerType configuration, the Spark drivers for the development endpoint will run on 4 vCPU, 16 GB of memory, and a 64 GB disk.

Result Syntax

[
    'Arguments' => ['<string>', ...],
    'AvailabilityZone' => '<string>',
    'CreatedTimestamp' => <DateTime>,
    'EndpointName' => '<string>',
    'ExtraJarsS3Path' => '<string>',
    'ExtraPythonLibsS3Path' => '<string>',
    'FailureReason' => '<string>',
    'GlueVersion' => '<string>',
    'NumberOfNodes' => <integer>,
    'NumberOfWorkers' => <integer>,
    'RoleArn' => '<string>',
    'SecurityConfiguration' => '<string>',
    'SecurityGroupIds' => ['<string>', ...],
    'Status' => '<string>',
    'SubnetId' => '<string>',
    'VpcId' => '<string>',
    'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X',
    'YarnEndpointAddress' => '<string>',
    'ZeppelinRemoteSparkInterpreterPort' => <integer>,
]

Result Details

Members
Arguments
Type: Associative array of custom strings keys (GenericString) to strings

The map of arguments used to configure this DevEndpoint.

Valid arguments are:

  • "--enable-glue-datacatalog": ""

You can specify a version of Python support for development endpoints by using the Arguments parameter in the CreateDevEndpoint or UpdateDevEndpoint APIs. If no arguments are provided, the version defaults to Python 2.

AvailabilityZone
Type: string

The Amazon Web Services Availability Zone where this DevEndpoint is located.

CreatedTimestamp
Type: timestamp (string|DateTime or anything parsable by strtotime)

The point in time at which this DevEndpoint was created.

EndpointName
Type: string

The name assigned to the new DevEndpoint.

ExtraJarsS3Path
Type: string

Path to one or more Java .jar files in an S3 bucket that will be loaded in your DevEndpoint.

ExtraPythonLibsS3Path
Type: string

The paths to one or more Python libraries in an S3 bucket that will be loaded in your DevEndpoint.

FailureReason
Type: string

The reason for a current failure in this DevEndpoint.

GlueVersion
Type: string

Glue version determines the versions of Apache Spark and Python that Glue supports. The Python version indicates the version supported for running your ETL scripts on development endpoints.

For more information about the available Glue versions and corresponding Spark and Python versions, see Glue version in the developer guide.

NumberOfNodes
Type: int

The number of Glue Data Processing Units (DPUs) allocated to this DevEndpoint.

NumberOfWorkers
Type: int

The number of workers of a defined workerType that are allocated to the development endpoint.

RoleArn
Type: string

The Amazon Resource Name (ARN) of the role assigned to the new DevEndpoint.

SecurityConfiguration
Type: string

The name of the SecurityConfiguration structure being used with this DevEndpoint.

SecurityGroupIds
Type: Array of strings

The security groups assigned to the new DevEndpoint.

Status
Type: string

The current status of the new DevEndpoint.

SubnetId
Type: string

The subnet ID assigned to the new DevEndpoint.

VpcId
Type: string

The ID of the virtual private cloud (VPC) used by this DevEndpoint.

WorkerType
Type: string

The type of predefined worker that is allocated to the development endpoint. May be a value of Standard, G.1X, or G.2X.

YarnEndpointAddress
Type: string

The address of the YARN endpoint used by this DevEndpoint.

ZeppelinRemoteSparkInterpreterPort
Type: int

The Apache Zeppelin port for the remote Apache Spark interpreter.

Errors

AccessDeniedException:

Access to a resource was denied.

AlreadyExistsException:

A resource to be created or added already exists.

IdempotentParameterMismatchException:

The same unique identifier was associated with two different records.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

InvalidInputException:

The input provided was not valid.

ValidationException:

A value could not be validated.

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

CreateJob

$result = $client->createJob([/* ... */]);
$promise = $client->createJobAsync([/* ... */]);

Creates a new job definition.

Parameter Syntax

$result = $client->createJob([
    'AllocatedCapacity' => <integer>,
    'CodeGenConfigurationNodes' => [
        '<NodeId>' => [
            'Aggregate' => [
                'Aggs' => [ // REQUIRED
                    [
                        'AggFunc' => 'avg|countDistinct|count|first|last|kurtosis|max|min|skewness|stddev_samp|stddev_pop|sum|sumDistinct|var_samp|var_pop', // REQUIRED
                        'Column' => ['<string>', ...], // REQUIRED
                    ],
                    // ...
                ],
                'Groups' => [ // REQUIRED
                    ['<string>', ...],
                    // ...
                ],
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
            ],
            'AmazonRedshiftSource' => [
                'Data' => [
                    'AccessType' => '<string>',
                    'Action' => '<string>',
                    'AdvancedOptions' => [
                        [
                            'Key' => '<string>',
                            'Value' => '<string>',
                        ],
                        // ...
                    ],
                    'CatalogDatabase' => [
                        'Description' => '<string>',
                        'Label' => '<string>',
                        'Value' => '<string>',
                    ],
                    'CatalogRedshiftSchema' => '<string>',
                    'CatalogRedshiftTable' => '<string>',
                    'CatalogTable' => [
                        'Description' => '<string>',
                        'Label' => '<string>',
                        'Value' => '<string>',
                    ],
                    'Connection' => [
                        'Description' => '<string>',
                        'Label' => '<string>',
                        'Value' => '<string>',
                    ],
                    'CrawlerConnection' => '<string>',
                    'IamRole' => [
                        'Description' => '<string>',
                        'Label' => '<string>',
                        'Value' => '<string>',
                    ],
                    'MergeAction' => '<string>',
                    'MergeClause' => '<string>',
                    'MergeWhenMatched' => '<string>',
                    'MergeWhenNotMatched' => '<string>',
                    'PostAction' => '<string>',
                    'PreAction' => '<string>',
                    'SampleQuery' => '<string>',
                    'Schema' => [
                        'Description' => '<string>',
                        'Label' => '<string>',
                        'Value' => '<string>',
                    ],
                    'SelectedColumns' => [
                        [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        // ...
                    ],
                    'SourceType' => '<string>',
                    'StagingTable' => '<string>',
                    'Table' => [
                        'Description' => '<string>',
                        'Label' => '<string>',
                        'Value' => '<string>',
                    ],
                    'TablePrefix' => '<string>',
                    'TableSchema' => [
                        [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        // ...
                    ],
                    'TempDir' => '<string>',
                    'Upsert' => true || false,
                ],
                'Name' => '<string>',
            ],
            'AmazonRedshiftTarget' => [
                'Data' => [
                    'AccessType' => '<string>',
                    'Action' => '<string>',
                    'AdvancedOptions' => [
                        [
                            'Key' => '<string>',
                            'Value' => '<string>',
                        ],
                        // ...
                    ],
                    'CatalogDatabase' => [
                        'Description' => '<string>',
                        'Label' => '<string>',
                        'Value' => '<string>',
                    ],
                    'CatalogRedshiftSchema' => '<string>',
                    'CatalogRedshiftTable' => '<string>',
                    'CatalogTable' => [
                        'Description' => '<string>',
                        'Label' => '<string>',
                        'Value' => '<string>',
                    ],
                    'Connection' => [
                        'Description' => '<string>',
                        'Label' => '<string>',
                        'Value' => '<string>',
                    ],
                    'CrawlerConnection' => '<string>',
                    'IamRole' => [
                        'Description' => '<string>',
                        'Label' => '<string>',
                        'Value' => '<string>',
                    ],
                    'MergeAction' => '<string>',
                    'MergeClause' => '<string>',
                    'MergeWhenMatched' => '<string>',
                    'MergeWhenNotMatched' => '<string>',
                    'PostAction' => '<string>',
                    'PreAction' => '<string>',
                    'SampleQuery' => '<string>',
                    'Schema' => [
                        'Description' => '<string>',
                        'Label' => '<string>',
                        'Value' => '<string>',
                    ],
                    'SelectedColumns' => [
                        [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        // ...
                    ],
                    'SourceType' => '<string>',
                    'StagingTable' => '<string>',
                    'Table' => [
                        'Description' => '<string>',
                        'Label' => '<string>',
                        'Value' => '<string>',
                    ],
                    'TablePrefix' => '<string>',
                    'TableSchema' => [
                        [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        // ...
                    ],
                    'TempDir' => '<string>',
                    'Upsert' => true || false,
                ],
                'Inputs' => ['<string>', ...],
                'Name' => '<string>',
            ],
            'ApplyMapping' => [
                'Inputs' => ['<string>', ...], // REQUIRED
                'Mapping' => [ // REQUIRED
                    [
                        'Children' => [...], // RECURSIVE
                        'Dropped' => true || false,
                        'FromPath' => ['<string>', ...],
                        'FromType' => '<string>',
                        'ToKey' => '<string>',
                        'ToType' => '<string>',
                    ],
                    // ...
                ],
                'Name' => '<string>', // REQUIRED
            ],
            'AthenaConnectorSource' => [
                'ConnectionName' => '<string>', // REQUIRED
                'ConnectionTable' => '<string>',
                'ConnectionType' => '<string>', // REQUIRED
                'ConnectorName' => '<string>', // REQUIRED
                'Name' => '<string>', // REQUIRED
                'OutputSchemas' => [
                    [
                        'Columns' => [
                            [
                                'Name' => '<string>', // REQUIRED
                                'Type' => '<string>',
                            ],
                            // ...
                        ],
                    ],
                    // ...
                ],
                'SchemaName' => '<string>', // REQUIRED
            ],
            'CatalogDeltaSource' => [
                'AdditionalDeltaOptions' => ['<string>', ...],
                'Database' => '<string>', // REQUIRED
                'Name' => '<string>', // REQUIRED
                'OutputSchemas' => [
                    [
                        'Columns' => [
                            [
                                'Name' => '<string>', // REQUIRED
                                'Type' => '<string>',
                            ],
                            // ...
                        ],
                    ],
                    // ...
                ],
                'Table' => '<string>', // REQUIRED
            ],
            'CatalogHudiSource' => [
                'AdditionalHudiOptions' => ['<string>', ...],
                'Database' => '<string>', // REQUIRED
                'Name' => '<string>', // REQUIRED
                'OutputSchemas' => [
                    [
                        'Columns' => [
                            [
                                'Name' => '<string>', // REQUIRED
                                'Type' => '<string>',
                            ],
                            // ...
                        ],
                    ],
                    // ...
                ],
                'Table' => '<string>', // REQUIRED
            ],
            'CatalogKafkaSource' => [
                'DataPreviewOptions' => [
                    'PollingTime' => <integer>,
                    'RecordPollingLimit' => <integer>,
                ],
                'Database' => '<string>', // REQUIRED
                'DetectSchema' => true || false,
                'Name' => '<string>', // REQUIRED
                'StreamingOptions' => [
                    'AddRecordTimestamp' => '<string>',
                    'Assign' => '<string>',
                    'BootstrapServers' => '<string>',
                    'Classification' => '<string>',
                    'ConnectionName' => '<string>',
                    'Delimiter' => '<string>',
                    'EmitConsumerLagMetrics' => '<string>',
                    'EndingOffsets' => '<string>',
                    'IncludeHeaders' => true || false,
                    'MaxOffsetsPerTrigger' => <integer>,
                    'MinPartitions' => <integer>,
                    'NumRetries' => <integer>,
                    'PollTimeoutMs' => <integer>,
                    'RetryIntervalMs' => <integer>,
                    'SecurityProtocol' => '<string>',
                    'StartingOffsets' => '<string>',
                    'StartingTimestamp' => <integer || string || DateTime>,
                    'SubscribePattern' => '<string>',
                    'TopicName' => '<string>',
                ],
                'Table' => '<string>', // REQUIRED
                'WindowSize' => <integer>,
            ],
            'CatalogKinesisSource' => [
                'DataPreviewOptions' => [
                    'PollingTime' => <integer>,
                    'RecordPollingLimit' => <integer>,
                ],
                'Database' => '<string>', // REQUIRED
                'DetectSchema' => true || false,
                'Name' => '<string>', // REQUIRED
                'StreamingOptions' => [
                    'AddIdleTimeBetweenReads' => true || false,
                    'AddRecordTimestamp' => '<string>',
                    'AvoidEmptyBatches' => true || false,
                    'Classification' => '<string>',
                    'Delimiter' => '<string>',
                    'DescribeShardInterval' => <integer>,
                    'EmitConsumerLagMetrics' => '<string>',
                    'EndpointUrl' => '<string>',
                    'IdleTimeBetweenReadsInMs' => <integer>,
                    'MaxFetchRecordsPerShard' => <integer>,
                    'MaxFetchTimeInMs' => <integer>,
                    'MaxRecordPerRead' => <integer>,
                    'MaxRetryIntervalMs' => <integer>,
                    'NumRetries' => <integer>,
                    'RetryIntervalMs' => <integer>,
                    'RoleArn' => '<string>',
                    'RoleSessionName' => '<string>',
                    'StartingPosition' => 'latest|trim_horizon|earliest|timestamp',
                    'StartingTimestamp' => <integer || string || DateTime>,
                    'StreamArn' => '<string>',
                    'StreamName' => '<string>',
                ],
                'Table' => '<string>', // REQUIRED
                'WindowSize' => <integer>,
            ],
            'CatalogSource' => [
                'Database' => '<string>', // REQUIRED
                'Name' => '<string>', // REQUIRED
                'Table' => '<string>', // REQUIRED
            ],
            'CatalogTarget' => [
                'Database' => '<string>', // REQUIRED
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'Table' => '<string>', // REQUIRED
            ],
            'ConnectorDataSource' => [
                'ConnectionType' => '<string>', // REQUIRED
                'Data' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'OutputSchemas' => [
                    [
                        'Columns' => [
                            [
                                'Name' => '<string>', // REQUIRED
                                'Type' => '<string>',
                            ],
                            // ...
                        ],
                    ],
                    // ...
                ],
            ],
            'ConnectorDataTarget' => [
                'ConnectionType' => '<string>', // REQUIRED
                'Data' => ['<string>', ...], // REQUIRED
                'Inputs' => ['<string>', ...],
                'Name' => '<string>', // REQUIRED
            ],
            'CustomCode' => [
                'ClassName' => '<string>', // REQUIRED
                'Code' => '<string>', // REQUIRED
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'OutputSchemas' => [
                    [
                        'Columns' => [
                            [
                                'Name' => '<string>', // REQUIRED
                                'Type' => '<string>',
                            ],
                            // ...
                        ],
                    ],
                    // ...
                ],
            ],
            'DirectJDBCSource' => [
                'ConnectionName' => '<string>', // REQUIRED
                'ConnectionType' => 'sqlserver|mysql|oracle|postgresql|redshift', // REQUIRED
                'Database' => '<string>', // REQUIRED
                'Name' => '<string>', // REQUIRED
                'RedshiftTmpDir' => '<string>',
                'Table' => '<string>', // REQUIRED
            ],
            'DirectKafkaSource' => [
                'DataPreviewOptions' => [
                    'PollingTime' => <integer>,
                    'RecordPollingLimit' => <integer>,
                ],
                'DetectSchema' => true || false,
                'Name' => '<string>', // REQUIRED
                'StreamingOptions' => [
                    'AddRecordTimestamp' => '<string>',
                    'Assign' => '<string>',
                    'BootstrapServers' => '<string>',
                    'Classification' => '<string>',
                    'ConnectionName' => '<string>',
                    'Delimiter' => '<string>',
                    'EmitConsumerLagMetrics' => '<string>',
                    'EndingOffsets' => '<string>',
                    'IncludeHeaders' => true || false,
                    'MaxOffsetsPerTrigger' => <integer>,
                    'MinPartitions' => <integer>,
                    'NumRetries' => <integer>,
                    'PollTimeoutMs' => <integer>,
                    'RetryIntervalMs' => <integer>,
                    'SecurityProtocol' => '<string>',
                    'StartingOffsets' => '<string>',
                    'StartingTimestamp' => <integer || string || DateTime>,
                    'SubscribePattern' => '<string>',
                    'TopicName' => '<string>',
                ],
                'WindowSize' => <integer>,
            ],
            'DirectKinesisSource' => [
                'DataPreviewOptions' => [
                    'PollingTime' => <integer>,
                    'RecordPollingLimit' => <integer>,
                ],
                'DetectSchema' => true || false,
                'Name' => '<string>', // REQUIRED
                'StreamingOptions' => [
                    'AddIdleTimeBetweenReads' => true || false,
                    'AddRecordTimestamp' => '<string>',
                    'AvoidEmptyBatches' => true || false,
                    'Classification' => '<string>',
                    'Delimiter' => '<string>',
                    'DescribeShardInterval' => <integer>,
                    'EmitConsumerLagMetrics' => '<string>',
                    'EndpointUrl' => '<string>',
                    'IdleTimeBetweenReadsInMs' => <integer>,
                    'MaxFetchRecordsPerShard' => <integer>,
                    'MaxFetchTimeInMs' => <integer>,
                    'MaxRecordPerRead' => <integer>,
                    'MaxRetryIntervalMs' => <integer>,
                    'NumRetries' => <integer>,
                    'RetryIntervalMs' => <integer>,
                    'RoleArn' => '<string>',
                    'RoleSessionName' => '<string>',
                    'StartingPosition' => 'latest|trim_horizon|earliest|timestamp',
                    'StartingTimestamp' => <integer || string || DateTime>,
                    'StreamArn' => '<string>',
                    'StreamName' => '<string>',
                ],
                'WindowSize' => <integer>,
            ],
            'DropDuplicates' => [
                'Columns' => [
                    ['<string>', ...],
                    // ...
                ],
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
            ],
            'DropFields' => [
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'Paths' => [ // REQUIRED
                    ['<string>', ...],
                    // ...
                ],
            ],
            'DropNullFields' => [
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'NullCheckBoxList' => [
                    'IsEmpty' => true || false,
                    'IsNegOne' => true || false,
                    'IsNullString' => true || false,
                ],
                'NullTextList' => [
                    [
                        'Datatype' => [ // REQUIRED
                            'Id' => '<string>', // REQUIRED
                            'Label' => '<string>', // REQUIRED
                        ],
                        'Value' => '<string>', // REQUIRED
                    ],
                    // ...
                ],
            ],
            'DynamicTransform' => [
                'FunctionName' => '<string>', // REQUIRED
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'OutputSchemas' => [
                    [
                        'Columns' => [
                            [
                                'Name' => '<string>', // REQUIRED
                                'Type' => '<string>',
                            ],
                            // ...
                        ],
                    ],
                    // ...
                ],
                'Parameters' => [
                    [
                        'IsOptional' => true || false,
                        'ListType' => 'str|int|float|complex|bool|list|null',
                        'Name' => '<string>', // REQUIRED
                        'Type' => 'str|int|float|complex|bool|list|null', // REQUIRED
                        'ValidationMessage' => '<string>',
                        'ValidationRule' => '<string>',
                        'Value' => ['<string>', ...],
                    ],
                    // ...
                ],
                'Path' => '<string>', // REQUIRED
                'TransformName' => '<string>', // REQUIRED
                'Version' => '<string>',
            ],
            'DynamoDBCatalogSource' => [
                'Database' => '<string>', // REQUIRED
                'Name' => '<string>', // REQUIRED
                'Table' => '<string>', // REQUIRED
            ],
            'EvaluateDataQuality' => [
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'Output' => 'PrimaryInput|EvaluationResults',
                'PublishingOptions' => [
                    'CloudWatchMetricsEnabled' => true || false,
                    'EvaluationContext' => '<string>',
                    'ResultsPublishingEnabled' => true || false,
                    'ResultsS3Prefix' => '<string>',
                ],
                'Ruleset' => '<string>', // REQUIRED
                'StopJobOnFailureOptions' => [
                    'StopJobOnFailureTiming' => 'Immediate|AfterDataLoad',
                ],
            ],
            'EvaluateDataQualityMultiFrame' => [
                'AdditionalDataSources' => ['<string>', ...],
                'AdditionalOptions' => ['<string>', ...],
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'PublishingOptions' => [
                    'CloudWatchMetricsEnabled' => true || false,
                    'EvaluationContext' => '<string>',
                    'ResultsPublishingEnabled' => true || false,
                    'ResultsS3Prefix' => '<string>',
                ],
                'Ruleset' => '<string>', // REQUIRED
                'StopJobOnFailureOptions' => [
                    'StopJobOnFailureTiming' => 'Immediate|AfterDataLoad',
                ],
            ],
            'FillMissingValues' => [
                'FilledPath' => '<string>',
                'ImputedPath' => '<string>', // REQUIRED
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
            ],
            'Filter' => [
                'Filters' => [ // REQUIRED
                    [
                        'Negated' => true || false,
                        'Operation' => 'EQ|LT|GT|LTE|GTE|REGEX|ISNULL', // REQUIRED
                        'Values' => [ // REQUIRED
                            [
                                'Type' => 'COLUMNEXTRACTED|CONSTANT', // REQUIRED
                                'Value' => ['<string>', ...], // REQUIRED
                            ],
                            // ...
                        ],
                    ],
                    // ...
                ],
                'Inputs' => ['<string>', ...], // REQUIRED
                'LogicalOperator' => 'AND|OR', // REQUIRED
                'Name' => '<string>', // REQUIRED
            ],
            'GovernedCatalogSource' => [
                'AdditionalOptions' => [
                    'BoundedFiles' => <integer>,
                    'BoundedSize' => <integer>,
                ],
                'Database' => '<string>', // REQUIRED
                'Name' => '<string>', // REQUIRED
                'PartitionPredicate' => '<string>',
                'Table' => '<string>', // REQUIRED
            ],
            'GovernedCatalogTarget' => [
                'Database' => '<string>', // REQUIRED
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'PartitionKeys' => [
                    ['<string>', ...],
                    // ...
                ],
                'SchemaChangePolicy' => [
                    'EnableUpdateCatalog' => true || false,
                    'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                ],
                'Table' => '<string>', // REQUIRED
            ],
            'JDBCConnectorSource' => [
                'AdditionalOptions' => [
                    'DataTypeMapping' => ['<string>', ...],
                    'FilterPredicate' => '<string>',
                    'JobBookmarkKeys' => ['<string>', ...],
                    'JobBookmarkKeysSortOrder' => '<string>',
                    'LowerBound' => <integer>,
                    'NumPartitions' => <integer>,
                    'PartitionColumn' => '<string>',
                    'UpperBound' => <integer>,
                ],
                'ConnectionName' => '<string>', // REQUIRED
                'ConnectionTable' => '<string>',
                'ConnectionType' => '<string>', // REQUIRED
                'ConnectorName' => '<string>', // REQUIRED
                'Name' => '<string>', // REQUIRED
                'OutputSchemas' => [
                    [
                        'Columns' => [
                            [
                                'Name' => '<string>', // REQUIRED
                                'Type' => '<string>',
                            ],
                            // ...
                        ],
                    ],
                    // ...
                ],
                'Query' => '<string>',
            ],
            'JDBCConnectorTarget' => [
                'AdditionalOptions' => ['<string>', ...],
                'ConnectionName' => '<string>', // REQUIRED
                'ConnectionTable' => '<string>', // REQUIRED
                'ConnectionType' => '<string>', // REQUIRED
                'ConnectorName' => '<string>', // REQUIRED
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'OutputSchemas' => [
                    [
                        'Columns' => [
                            [
                                'Name' => '<string>', // REQUIRED
                                'Type' => '<string>',
                            ],
                            // ...
                        ],
                    ],
                    // ...
                ],
            ],
            'Join' => [
                'Columns' => [ // REQUIRED
                    [
                        'From' => '<string>', // REQUIRED
                        'Keys' => [ // REQUIRED
                            ['<string>', ...],
                            // ...
                        ],
                    ],
                    // ...
                ],
                'Inputs' => ['<string>', ...], // REQUIRED
                'JoinType' => 'equijoin|left|right|outer|leftsemi|leftanti', // REQUIRED
                'Name' => '<string>', // REQUIRED
            ],
            'Merge' => [
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'PrimaryKeys' => [ // REQUIRED
                    ['<string>', ...],
                    // ...
                ],
                'Source' => '<string>', // REQUIRED
            ],
            'MicrosoftSQLServerCatalogSource' => [
                'Database' => '<string>', // REQUIRED
                'Name' => '<string>', // REQUIRED
                'Table' => '<string>', // REQUIRED
            ],
            'MicrosoftSQLServerCatalogTarget' => [
                'Database' => '<string>', // REQUIRED
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'Table' => '<string>', // REQUIRED
            ],
            'MySQLCatalogSource' => [
                'Database' => '<string>', // REQUIRED
                'Name' => '<string>', // REQUIRED
                'Table' => '<string>', // REQUIRED
            ],
            'MySQLCatalogTarget' => [
                'Database' => '<string>', // REQUIRED
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'Table' => '<string>', // REQUIRED
            ],
            'OracleSQLCatalogSource' => [
                'Database' => '<string>', // REQUIRED
                'Name' => '<string>', // REQUIRED
                'Table' => '<string>', // REQUIRED
            ],
            'OracleSQLCatalogTarget' => [
                'Database' => '<string>', // REQUIRED
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'Table' => '<string>', // REQUIRED
            ],
            'PIIDetection' => [
                'EntityTypesToDetect' => ['<string>', ...], // REQUIRED
                'Inputs' => ['<string>', ...], // REQUIRED
                'MaskValue' => '<string>',
                'Name' => '<string>', // REQUIRED
                'OutputColumnName' => '<string>',
                'PiiType' => 'RowAudit|RowMasking|ColumnAudit|ColumnMasking', // REQUIRED
                'SampleFraction' => <float>,
                'ThresholdFraction' => <float>,
            ],
            'PostgreSQLCatalogSource' => [
                'Database' => '<string>', // REQUIRED
                'Name' => '<string>', // REQUIRED
                'Table' => '<string>', // REQUIRED
            ],
            'PostgreSQLCatalogTarget' => [
                'Database' => '<string>', // REQUIRED
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'Table' => '<string>', // REQUIRED
            ],
            'Recipe' => [
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'RecipeReference' => [ // REQUIRED
                    'RecipeArn' => '<string>', // REQUIRED
                    'RecipeVersion' => '<string>', // REQUIRED
                ],
            ],
            'RedshiftSource' => [
                'Database' => '<string>', // REQUIRED
                'Name' => '<string>', // REQUIRED
                'RedshiftTmpDir' => '<string>',
                'Table' => '<string>', // REQUIRED
                'TmpDirIAMRole' => '<string>',
            ],
            'RedshiftTarget' => [
                'Database' => '<string>', // REQUIRED
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'RedshiftTmpDir' => '<string>',
                'Table' => '<string>', // REQUIRED
                'TmpDirIAMRole' => '<string>',
                'UpsertRedshiftOptions' => [
                    'ConnectionName' => '<string>',
                    'TableLocation' => '<string>',
                    'UpsertKeys' => ['<string>', ...],
                ],
            ],
            'RelationalCatalogSource' => [
                'Database' => '<string>', // REQUIRED
                'Name' => '<string>', // REQUIRED
                'Table' => '<string>', // REQUIRED
            ],
            'RenameField' => [
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'SourcePath' => ['<string>', ...], // REQUIRED
                'TargetPath' => ['<string>', ...], // REQUIRED
            ],
            'S3CatalogDeltaSource' => [
                'AdditionalDeltaOptions' => ['<string>', ...],
                'Database' => '<string>', // REQUIRED
                'Name' => '<string>', // REQUIRED
                'OutputSchemas' => [
                    [
                        'Columns' => [
                            [
                                'Name' => '<string>', // REQUIRED
                                'Type' => '<string>',
                            ],
                            // ...
                        ],
                    ],
                    // ...
                ],
                'Table' => '<string>', // REQUIRED
            ],
            'S3CatalogHudiSource' => [
                'AdditionalHudiOptions' => ['<string>', ...],
                'Database' => '<string>', // REQUIRED
                'Name' => '<string>', // REQUIRED
                'OutputSchemas' => [
                    [
                        'Columns' => [
                            [
                                'Name' => '<string>', // REQUIRED
                                'Type' => '<string>',
                            ],
                            // ...
                        ],
                    ],
                    // ...
                ],
                'Table' => '<string>', // REQUIRED
            ],
            'S3CatalogSource' => [
                'AdditionalOptions' => [
                    'BoundedFiles' => <integer>,
                    'BoundedSize' => <integer>,
                ],
                'Database' => '<string>', // REQUIRED
                'Name' => '<string>', // REQUIRED
                'PartitionPredicate' => '<string>',
                'Table' => '<string>', // REQUIRED
            ],
            'S3CatalogTarget' => [
                'Database' => '<string>', // REQUIRED
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'PartitionKeys' => [
                    ['<string>', ...],
                    // ...
                ],
                'SchemaChangePolicy' => [
                    'EnableUpdateCatalog' => true || false,
                    'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                ],
                'Table' => '<string>', // REQUIRED
            ],
            'S3CsvSource' => [
                'AdditionalOptions' => [
                    'BoundedFiles' => <integer>,
                    'BoundedSize' => <integer>,
                    'EnableSamplePath' => true || false,
                    'SamplePath' => '<string>',
                ],
                'CompressionType' => 'gzip|bzip2',
                'Escaper' => '<string>',
                'Exclusions' => ['<string>', ...],
                'GroupFiles' => '<string>',
                'GroupSize' => '<string>',
                'MaxBand' => <integer>,
                'MaxFilesInBand' => <integer>,
                'Multiline' => true || false,
                'Name' => '<string>', // REQUIRED
                'OptimizePerformance' => true || false,
                'OutputSchemas' => [
                    [
                        'Columns' => [
                            [
                                'Name' => '<string>', // REQUIRED
                                'Type' => '<string>',
                            ],
                            // ...
                        ],
                    ],
                    // ...
                ],
                'Paths' => ['<string>', ...], // REQUIRED
                'QuoteChar' => 'quote|quillemet|single_quote|disabled', // REQUIRED
                'Recurse' => true || false,
                'Separator' => 'comma|ctrla|pipe|semicolon|tab', // REQUIRED
                'SkipFirst' => true || false,
                'WithHeader' => true || false,
                'WriteHeader' => true || false,
            ],
            'S3DeltaCatalogTarget' => [
                'AdditionalOptions' => ['<string>', ...],
                'Database' => '<string>', // REQUIRED
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'PartitionKeys' => [
                    ['<string>', ...],
                    // ...
                ],
                'SchemaChangePolicy' => [
                    'EnableUpdateCatalog' => true || false,
                    'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                ],
                'Table' => '<string>', // REQUIRED
            ],
            'S3DeltaDirectTarget' => [
                'AdditionalOptions' => ['<string>', ...],
                'Compression' => 'uncompressed|snappy', // REQUIRED
                'Format' => 'json|csv|avro|orc|parquet|hudi|delta', // REQUIRED
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'PartitionKeys' => [
                    ['<string>', ...],
                    // ...
                ],
                'Path' => '<string>', // REQUIRED
                'SchemaChangePolicy' => [
                    'Database' => '<string>',
                    'EnableUpdateCatalog' => true || false,
                    'Table' => '<string>',
                    'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                ],
            ],
            'S3DeltaSource' => [
                'AdditionalDeltaOptions' => ['<string>', ...],
                'AdditionalOptions' => [
                    'BoundedFiles' => <integer>,
                    'BoundedSize' => <integer>,
                    'EnableSamplePath' => true || false,
                    'SamplePath' => '<string>',
                ],
                'Name' => '<string>', // REQUIRED
                'OutputSchemas' => [
                    [
                        'Columns' => [
                            [
                                'Name' => '<string>', // REQUIRED
                                'Type' => '<string>',
                            ],
                            // ...
                        ],
                    ],
                    // ...
                ],
                'Paths' => ['<string>', ...], // REQUIRED
            ],
            'S3DirectTarget' => [
                'Compression' => '<string>',
                'Format' => 'json|csv|avro|orc|parquet|hudi|delta', // REQUIRED
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'PartitionKeys' => [
                    ['<string>', ...],
                    // ...
                ],
                'Path' => '<string>', // REQUIRED
                'SchemaChangePolicy' => [
                    'Database' => '<string>',
                    'EnableUpdateCatalog' => true || false,
                    'Table' => '<string>',
                    'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                ],
            ],
            'S3GlueParquetTarget' => [
                'Compression' => 'snappy|lzo|gzip|uncompressed|none',
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'PartitionKeys' => [
                    ['<string>', ...],
                    // ...
                ],
                'Path' => '<string>', // REQUIRED
                'SchemaChangePolicy' => [
                    'Database' => '<string>',
                    'EnableUpdateCatalog' => true || false,
                    'Table' => '<string>',
                    'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                ],
            ],
            'S3HudiCatalogTarget' => [
                'AdditionalOptions' => ['<string>', ...], // REQUIRED
                'Database' => '<string>', // REQUIRED
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'PartitionKeys' => [
                    ['<string>', ...],
                    // ...
                ],
                'SchemaChangePolicy' => [
                    'EnableUpdateCatalog' => true || false,
                    'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                ],
                'Table' => '<string>', // REQUIRED
            ],
            'S3HudiDirectTarget' => [
                'AdditionalOptions' => ['<string>', ...], // REQUIRED
                'Compression' => 'gzip|lzo|uncompressed|snappy', // REQUIRED
                'Format' => 'json|csv|avro|orc|parquet|hudi|delta', // REQUIRED
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'PartitionKeys' => [
                    ['<string>', ...],
                    // ...
                ],
                'Path' => '<string>', // REQUIRED
                'SchemaChangePolicy' => [
                    'Database' => '<string>',
                    'EnableUpdateCatalog' => true || false,
                    'Table' => '<string>',
                    'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                ],
            ],
            'S3HudiSource' => [
                'AdditionalHudiOptions' => ['<string>', ...],
                'AdditionalOptions' => [
                    'BoundedFiles' => <integer>,
                    'BoundedSize' => <integer>,
                    'EnableSamplePath' => true || false,
                    'SamplePath' => '<string>',
                ],
                'Name' => '<string>', // REQUIRED
                'OutputSchemas' => [
                    [
                        'Columns' => [
                            [
                                'Name' => '<string>', // REQUIRED
                                'Type' => '<string>',
                            ],
                            // ...
                        ],
                    ],
                    // ...
                ],
                'Paths' => ['<string>', ...], // REQUIRED
            ],
            'S3JsonSource' => [
                'AdditionalOptions' => [
                    'BoundedFiles' => <integer>,
                    'BoundedSize' => <integer>,
                    'EnableSamplePath' => true || false,
                    'SamplePath' => '<string>',
                ],
                'CompressionType' => 'gzip|bzip2',
                'Exclusions' => ['<string>', ...],
                'GroupFiles' => '<string>',
                'GroupSize' => '<string>',
                'JsonPath' => '<string>',
                'MaxBand' => <integer>,
                'MaxFilesInBand' => <integer>,
                'Multiline' => true || false,
                'Name' => '<string>', // REQUIRED
                'OutputSchemas' => [
                    [
                        'Columns' => [
                            [
                                'Name' => '<string>', // REQUIRED
                                'Type' => '<string>',
                            ],
                            // ...
                        ],
                    ],
                    // ...
                ],
                'Paths' => ['<string>', ...], // REQUIRED
                'Recurse' => true || false,
            ],
            'S3ParquetSource' => [
                'AdditionalOptions' => [
                    'BoundedFiles' => <integer>,
                    'BoundedSize' => <integer>,
                    'EnableSamplePath' => true || false,
                    'SamplePath' => '<string>',
                ],
                'CompressionType' => 'snappy|lzo|gzip|uncompressed|none',
                'Exclusions' => ['<string>', ...],
                'GroupFiles' => '<string>',
                'GroupSize' => '<string>',
                'MaxBand' => <integer>,
                'MaxFilesInBand' => <integer>,
                'Name' => '<string>', // REQUIRED
                'OutputSchemas' => [
                    [
                        'Columns' => [
                            [
                                'Name' => '<string>', // REQUIRED
                                'Type' => '<string>',
                            ],
                            // ...
                        ],
                    ],
                    // ...
                ],
                'Paths' => ['<string>', ...], // REQUIRED
                'Recurse' => true || false,
            ],
            'SelectFields' => [
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'Paths' => [ // REQUIRED
                    ['<string>', ...],
                    // ...
                ],
            ],
            'SelectFromCollection' => [
                'Index' => <integer>, // REQUIRED
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
            ],
            'SnowflakeSource' => [
                'Data' => [ // REQUIRED
                    'Action' => '<string>',
                    'AdditionalOptions' => ['<string>', ...],
                    'AutoPushdown' => true || false,
                    'Connection' => [
                        'Description' => '<string>',
                        'Label' => '<string>',
                        'Value' => '<string>',
                    ],
                    'Database' => '<string>',
                    'IamRole' => [
                        'Description' => '<string>',
                        'Label' => '<string>',
                        'Value' => '<string>',
                    ],
                    'MergeAction' => '<string>',
                    'MergeClause' => '<string>',
                    'MergeWhenMatched' => '<string>',
                    'MergeWhenNotMatched' => '<string>',
                    'PostAction' => '<string>',
                    'PreAction' => '<string>',
                    'SampleQuery' => '<string>',
                    'Schema' => '<string>',
                    'SelectedColumns' => [
                        [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        // ...
                    ],
                    'SourceType' => '<string>',
                    'StagingTable' => '<string>',
                    'Table' => '<string>',
                    'TableSchema' => [
                        [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        // ...
                    ],
                    'TempDir' => '<string>',
                    'Upsert' => true || false,
                ],
                'Name' => '<string>', // REQUIRED
                'OutputSchemas' => [
                    [
                        'Columns' => [
                            [
                                'Name' => '<string>', // REQUIRED
                                'Type' => '<string>',
                            ],
                            // ...
                        ],
                    ],
                    // ...
                ],
            ],
            'SnowflakeTarget' => [
                'Data' => [ // REQUIRED
                    'Action' => '<string>',
                    'AdditionalOptions' => ['<string>', ...],
                    'AutoPushdown' => true || false,
                    'Connection' => [
                        'Description' => '<string>',
                        'Label' => '<string>',
                        'Value' => '<string>',
                    ],
                    'Database' => '<string>',
                    'IamRole' => [
                        'Description' => '<string>',
                        'Label' => '<string>',
                        'Value' => '<string>',
                    ],
                    'MergeAction' => '<string>',
                    'MergeClause' => '<string>',
                    'MergeWhenMatched' => '<string>',
                    'MergeWhenNotMatched' => '<string>',
                    'PostAction' => '<string>',
                    'PreAction' => '<string>',
                    'SampleQuery' => '<string>',
                    'Schema' => '<string>',
                    'SelectedColumns' => [
                        [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        // ...
                    ],
                    'SourceType' => '<string>',
                    'StagingTable' => '<string>',
                    'Table' => '<string>',
                    'TableSchema' => [
                        [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        // ...
                    ],
                    'TempDir' => '<string>',
                    'Upsert' => true || false,
                ],
                'Inputs' => ['<string>', ...],
                'Name' => '<string>', // REQUIRED
            ],
            'SparkConnectorSource' => [
                'AdditionalOptions' => ['<string>', ...],
                'ConnectionName' => '<string>', // REQUIRED
                'ConnectionType' => '<string>', // REQUIRED
                'ConnectorName' => '<string>', // REQUIRED
                'Name' => '<string>', // REQUIRED
                'OutputSchemas' => [
                    [
                        'Columns' => [
                            [
                                'Name' => '<string>', // REQUIRED
                                'Type' => '<string>',
                            ],
                            // ...
                        ],
                    ],
                    // ...
                ],
            ],
            'SparkConnectorTarget' => [
                'AdditionalOptions' => ['<string>', ...],
                'ConnectionName' => '<string>', // REQUIRED
                'ConnectionType' => '<string>', // REQUIRED
                'ConnectorName' => '<string>', // REQUIRED
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'OutputSchemas' => [
                    [
                        'Columns' => [
                            [
                                'Name' => '<string>', // REQUIRED
                                'Type' => '<string>',
                            ],
                            // ...
                        ],
                    ],
                    // ...
                ],
            ],
            'SparkSQL' => [
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'OutputSchemas' => [
                    [
                        'Columns' => [
                            [
                                'Name' => '<string>', // REQUIRED
                                'Type' => '<string>',
                            ],
                            // ...
                        ],
                    ],
                    // ...
                ],
                'SqlAliases' => [ // REQUIRED
                    [
                        'Alias' => '<string>', // REQUIRED
                        'From' => '<string>', // REQUIRED
                    ],
                    // ...
                ],
                'SqlQuery' => '<string>', // REQUIRED
            ],
            'Spigot' => [
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'Path' => '<string>', // REQUIRED
                'Prob' => <float>,
                'Topk' => <integer>,
            ],
            'SplitFields' => [
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'Paths' => [ // REQUIRED
                    ['<string>', ...],
                    // ...
                ],
            ],
            'Union' => [
                'Inputs' => ['<string>', ...], // REQUIRED
                'Name' => '<string>', // REQUIRED
                'UnionType' => 'ALL|DISTINCT', // REQUIRED
            ],
        ],
        // ...
    ],
    'Command' => [ // REQUIRED
        'Name' => '<string>',
        'PythonVersion' => '<string>',
        'Runtime' => '<string>',
        'ScriptLocation' => '<string>',
    ],
    'Connections' => [
        'Connections' => ['<string>', ...],
    ],
    'DefaultArguments' => ['<string>', ...],
    'Description' => '<string>',
    'ExecutionClass' => 'FLEX|STANDARD',
    'ExecutionProperty' => [
        'MaxConcurrentRuns' => <integer>,
    ],
    'GlueVersion' => '<string>',
    'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK',
    'LogUri' => '<string>',
    'MaintenanceWindow' => '<string>',
    'MaxCapacity' => <float>,
    'MaxRetries' => <integer>,
    'Name' => '<string>', // REQUIRED
    'NonOverridableArguments' => ['<string>', ...],
    'NotificationProperty' => [
        'NotifyDelayAfter' => <integer>,
    ],
    'NumberOfWorkers' => <integer>,
    'Role' => '<string>', // REQUIRED
    'SecurityConfiguration' => '<string>',
    'SourceControlDetails' => [
        'AuthStrategy' => 'PERSONAL_ACCESS_TOKEN|AWS_SECRETS_MANAGER',
        'AuthToken' => '<string>',
        'Branch' => '<string>',
        'Folder' => '<string>',
        'LastCommitId' => '<string>',
        'Owner' => '<string>',
        'Provider' => 'GITHUB|GITLAB|BITBUCKET|AWS_CODE_COMMIT',
        'Repository' => '<string>',
    ],
    'Tags' => ['<string>', ...],
    'Timeout' => <integer>,
    'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X',
]);

Parameter Details

Members
AllocatedCapacity
Type: int

This parameter is deprecated. Use MaxCapacity instead.

The number of Glue data processing units (DPUs) to allocate to this Job. You can allocate a minimum of 2 DPUs; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.

CodeGenConfigurationNodes
Type: Associative array of custom strings keys (NodeId) to CodeGenConfigurationNode structures

The representation of a directed acyclic graph on which both the Glue Studio visual component and Glue Studio code generation is based.

Command
Required: Yes
Type: JobCommand structure

The JobCommand that runs this job.

Connections
Type: ConnectionsList structure

The connections used for this job.

DefaultArguments
Type: Associative array of custom strings keys (GenericString) to strings

The default arguments for every run of this job, specified as name-value pairs.

You can specify arguments here that your own job-execution script consumes, as well as arguments that Glue itself consumes.

Job arguments may be logged. Do not pass plaintext secrets as arguments. Retrieve secrets from a Glue Connection, Secrets Manager or other secret management mechanism if you intend to keep them within the Job.

For information about how to specify and consume your own Job arguments, see the Calling Glue APIs in Python topic in the developer guide.

For information about the arguments you can provide to this field when configuring Spark jobs, see the Special Parameters Used by Glue topic in the developer guide.

For information about the arguments you can provide to this field when configuring Ray jobs, see Using job parameters in Ray jobs in the developer guide.

Description
Type: string

Description of the job being defined.

ExecutionClass
Type: string

Indicates whether the job is run with a standard or flexible execution class. The standard execution-class is ideal for time-sensitive workloads that require fast job startup and dedicated resources.

The flexible execution class is appropriate for time-insensitive jobs whose start and completion times may vary.

Only jobs with Glue version 3.0 and above and command type glueetl will be allowed to set ExecutionClass to FLEX. The flexible execution class is available for Spark jobs.

ExecutionProperty
Type: ExecutionProperty structure

An ExecutionProperty specifying the maximum number of concurrent runs allowed for this job.

GlueVersion
Type: string

In Spark jobs, GlueVersion determines the versions of Apache Spark and Python that Glue available in a job. The Python version indicates the version supported for jobs of type Spark.

Ray jobs should set GlueVersion to 4.0 or greater. However, the versions of Ray, Python and additional libraries available in your Ray job are determined by the Runtime parameter of the Job command.

For more information about the available Glue versions and corresponding Spark and Python versions, see Glue version in the developer guide.

Jobs that are created without specifying a Glue version default to Glue 0.9.

JobMode
Type: string

A mode that describes how a job was created. Valid values are:

  • SCRIPT - The job was created using the Glue Studio script editor.

  • VISUAL - The job was created using the Glue Studio visual editor.

  • NOTEBOOK - The job was created using an interactive sessions notebook.

When the JobMode field is missing or null, SCRIPT is assigned as the default value.

LogUri
Type: string

This field is reserved for future use.

MaintenanceWindow
Type: string

This field specifies a day of the week and hour for a maintenance window for streaming jobs. Glue periodically performs maintenance activities. During these maintenance windows, Glue will need to restart your streaming jobs.

Glue will restart the job within 3 hours of the specified maintenance window. For instance, if you set up the maintenance window for Monday at 10:00AM GMT, your jobs will be restarted between 10:00AM GMT to 1:00PM GMT.

MaxCapacity
Type: double

For Glue version 1.0 or earlier jobs, using the standard worker type, the number of Glue data processing units (DPUs) that can be allocated when this job runs. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.

For Glue version 2.0+ jobs, you cannot specify a Maximum capacity. Instead, you should specify a Worker type and the Number of workers.

Do not set MaxCapacity if using WorkerType and NumberOfWorkers.

The value that can be allocated for MaxCapacity depends on whether you are running a Python shell job, an Apache Spark ETL job, or an Apache Spark streaming ETL job:

  • When you specify a Python shell job (JobCommand.Name="pythonshell"), you can allocate either 0.0625 or 1 DPU. The default is 0.0625 DPU.

  • When you specify an Apache Spark ETL job (JobCommand.Name="glueetl") or Apache Spark streaming ETL job (JobCommand.Name="gluestreaming"), you can allocate from 2 to 100 DPUs. The default is 10 DPUs. This job type cannot have a fractional DPU allocation.

MaxRetries
Type: int

The maximum number of times to retry this job if it fails.

Name
Required: Yes
Type: string

The name you assign to this job definition. It must be unique in your account.

NonOverridableArguments
Type: Associative array of custom strings keys (GenericString) to strings

Arguments for this job that are not overridden when providing job arguments in a job run, specified as name-value pairs.

NotificationProperty
Type: NotificationProperty structure

Specifies configuration properties of a job notification.

NumberOfWorkers
Type: int

The number of workers of a defined workerType that are allocated when a job runs.

Role
Required: Yes
Type: string

The name or Amazon Resource Name (ARN) of the IAM role associated with this job.

SecurityConfiguration
Type: string

The name of the SecurityConfiguration structure to be used with this job.

SourceControlDetails
Type: SourceControlDetails structure

The details for a source control configuration for a job, allowing synchronization of job artifacts to or from a remote repository.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

The tags to use with this job. You may use tags to limit access to the job. For more information about tags in Glue, see Amazon Web Services Tags in Glue in the developer guide.

Timeout
Type: int

The job timeout in minutes. This is the maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).

WorkerType
Type: string

The type of predefined worker that is allocated when a job runs. Accepts a value of G.1X, G.2X, G.4X, G.8X or G.025X for Spark jobs. Accepts the value Z.2X for Ray jobs.

  • For the G.1X worker type, each worker maps to 1 DPU (4 vCPUs, 16 GB of memory) with 84GB disk (approximately 34GB free), and provides 1 executor per worker. We recommend this worker type for workloads such as data transforms, joins, and queries, to offers a scalable and cost effective way to run most jobs.

  • For the G.2X worker type, each worker maps to 2 DPU (8 vCPUs, 32 GB of memory) with 128GB disk (approximately 77GB free), and provides 1 executor per worker. We recommend this worker type for workloads such as data transforms, joins, and queries, to offers a scalable and cost effective way to run most jobs.

  • For the G.4X worker type, each worker maps to 4 DPU (16 vCPUs, 64 GB of memory) with 256GB disk (approximately 235GB free), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for Glue version 3.0 or later Spark ETL jobs in the following Amazon Web Services Regions: US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm).

  • For the G.8X worker type, each worker maps to 8 DPU (32 vCPUs, 128 GB of memory) with 512GB disk (approximately 487GB free), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for Glue version 3.0 or later Spark ETL jobs, in the same Amazon Web Services Regions as supported for the G.4X worker type.

  • For the G.025X worker type, each worker maps to 0.25 DPU (2 vCPUs, 4 GB of memory) with 84GB disk (approximately 34GB free), and provides 1 executor per worker. We recommend this worker type for low volume streaming jobs. This worker type is only available for Glue version 3.0 streaming jobs.

  • For the Z.2X worker type, each worker maps to 2 M-DPU (8vCPUs, 64 GB of memory) with 128 GB disk (approximately 120GB free), and provides up to 8 Ray workers based on the autoscaler.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Type: string

The unique name that was provided for this job definition.

Errors

InvalidInputException:

The input provided was not valid.

IdempotentParameterMismatchException:

The same unique identifier was associated with two different records.

AlreadyExistsException:

A resource to be created or added already exists.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

ConcurrentModificationException:

Two processes are trying to modify a resource simultaneously.

CreateMLTransform

$result = $client->createMLTransform([/* ... */]);
$promise = $client->createMLTransformAsync([/* ... */]);

Creates an Glue machine learning transform. This operation creates the transform and all the necessary parameters to train it.

Call this operation as the first step in the process of using a machine learning transform (such as the FindMatches transform) for deduplicating data. You can provide an optional Description, in addition to the parameters that you want to use for your algorithm.

You must also specify certain parameters for the tasks that Glue runs on your behalf as part of learning from your data and creating a high-quality machine learning transform. These parameters include Role, and optionally, AllocatedCapacity, Timeout, and MaxRetries. For more information, see Jobs.

Parameter Syntax

$result = $client->createMLTransform([
    'Description' => '<string>',
    'GlueVersion' => '<string>',
    'InputRecordTables' => [ // REQUIRED
        [
            'AdditionalOptions' => ['<string>', ...],
            'CatalogId' => '<string>',
            'ConnectionName' => '<string>',
            'DatabaseName' => '<string>', // REQUIRED
            'TableName' => '<string>', // REQUIRED
        ],
        // ...
    ],
    'MaxCapacity' => <float>,
    'MaxRetries' => <integer>,
    'Name' => '<string>', // REQUIRED
    'NumberOfWorkers' => <integer>,
    'Parameters' => [ // REQUIRED
        'FindMatchesParameters' => [
            'AccuracyCostTradeoff' => <float>,
            'EnforceProvidedLabels' => true || false,
            'PrecisionRecallTradeoff' => <float>,
            'PrimaryKeyColumnName' => '<string>',
        ],
        'TransformType' => 'FIND_MATCHES', // REQUIRED
    ],
    'Role' => '<string>', // REQUIRED
    'Tags' => ['<string>', ...],
    'Timeout' => <integer>,
    'TransformEncryption' => [
        'MlUserDataEncryption' => [
            'KmsKeyId' => '<string>',
            'MlUserDataEncryptionMode' => 'DISABLED|SSE-KMS', // REQUIRED
        ],
        'TaskRunSecurityConfigurationName' => '<string>',
    ],
    'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X',
]);

Parameter Details

Members
Description
Type: string

A description of the machine learning transform that is being defined. The default is an empty string.

GlueVersion
Type: string

This value determines which version of Glue this machine learning transform is compatible with. Glue 1.0 is recommended for most customers. If the value is not set, the Glue compatibility defaults to Glue 0.9. For more information, see Glue Versions in the developer guide.

InputRecordTables
Required: Yes
Type: Array of GlueTable structures

A list of Glue table definitions used by the transform.

MaxCapacity
Type: double

The number of Glue data processing units (DPUs) that are allocated to task runs for this transform. You can allocate from 2 to 100 DPUs; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.

MaxCapacity is a mutually exclusive option with NumberOfWorkers and WorkerType.

  • If either NumberOfWorkers or WorkerType is set, then MaxCapacity cannot be set.

  • If MaxCapacity is set then neither NumberOfWorkers or WorkerType can be set.

  • If WorkerType is set, then NumberOfWorkers is required (and vice versa).

  • MaxCapacity and NumberOfWorkers must both be at least 1.

When the WorkerType field is set to a value other than Standard, the MaxCapacity field is set automatically and becomes read-only.

When the WorkerType field is set to a value other than Standard, the MaxCapacity field is set automatically and becomes read-only.

MaxRetries
Type: int

The maximum number of times to retry a task for this transform after a task run fails.

Name
Required: Yes
Type: string

The unique name that you give the transform when you create it.

NumberOfWorkers
Type: int

The number of workers of a defined workerType that are allocated when this task runs.

If WorkerType is set, then NumberOfWorkers is required (and vice versa).

Parameters
Required: Yes
Type: TransformParameters structure

The algorithmic parameters that are specific to the transform type used. Conditionally dependent on the transform type.

Role
Required: Yes
Type: string

The name or Amazon Resource Name (ARN) of the IAM role with the required permissions. The required permissions include both Glue service role permissions to Glue resources, and Amazon S3 permissions required by the transform.

  • This role needs Glue service role permissions to allow access to resources in Glue. See Attach a Policy to IAM Users That Access Glue.

  • This role needs permission to your Amazon Simple Storage Service (Amazon S3) sources, targets, temporary directory, scripts, and any libraries used by the task run for this transform.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

The tags to use with this machine learning transform. You may use tags to limit access to the machine learning transform. For more information about tags in Glue, see Amazon Web Services Tags in Glue in the developer guide.

Timeout
Type: int

The timeout of the task run for this transform in minutes. This is the maximum time that a task run for this transform can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).

TransformEncryption
Type: TransformEncryption structure

The encryption-at-rest settings of the transform that apply to accessing user data. Machine learning transforms can access user data encrypted in Amazon S3 using KMS.

WorkerType
Type: string

The type of predefined worker that is allocated when this task runs. Accepts a value of Standard, G.1X, or G.2X.

  • For the Standard worker type, each worker provides 4 vCPU, 16 GB of memory and a 50GB disk, and 2 executors per worker.

  • For the G.1X worker type, each worker provides 4 vCPU, 16 GB of memory and a 64GB disk, and 1 executor per worker.

  • For the G.2X worker type, each worker provides 8 vCPU, 32 GB of memory and a 128GB disk, and 1 executor per worker.

MaxCapacity is a mutually exclusive option with NumberOfWorkers and WorkerType.

  • If either NumberOfWorkers or WorkerType is set, then MaxCapacity cannot be set.

  • If MaxCapacity is set then neither NumberOfWorkers or WorkerType can be set.

  • If WorkerType is set, then NumberOfWorkers is required (and vice versa).

  • MaxCapacity and NumberOfWorkers must both be at least 1.

Result Syntax

[
    'TransformId' => '<string>',
]

Result Details

Members
TransformId
Type: string

A unique identifier that is generated for the transform.

Errors

AlreadyExistsException:

A resource to be created or added already exists.

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

AccessDeniedException:

Access to a resource was denied.

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

IdempotentParameterMismatchException:

The same unique identifier was associated with two different records.

CreatePartition

$result = $client->createPartition([/* ... */]);
$promise = $client->createPartitionAsync([/* ... */]);

Creates a new partition.

Parameter Syntax

$result = $client->createPartition([
    'CatalogId' => '<string>',
    'DatabaseName' => '<string>', // REQUIRED
    'PartitionInput' => [ // REQUIRED
        'LastAccessTime' => <integer || string || DateTime>,
        'LastAnalyzedTime' => <integer || string || DateTime>,
        'Parameters' => ['<string>', ...],
        'StorageDescriptor' => [
            'AdditionalLocations' => ['<string>', ...],
            'BucketColumns' => ['<string>', ...],
            'Columns' => [
                [
                    'Comment' => '<string>',
                    'Name' => '<string>', // REQUIRED
                    'Parameters' => ['<string>', ...],
                    'Type' => '<string>',
                ],
                // ...
            ],
            'Compressed' => true || false,
            'InputFormat' => '<string>',
            'Location' => '<string>',
            'NumberOfBuckets' => <integer>,
            'OutputFormat' => '<string>',
            'Parameters' => ['<string>', ...],
            'SchemaReference' => [
                'SchemaId' => [
                    'RegistryName' => '<string>',
                    'SchemaArn' => '<string>',
                    'SchemaName' => '<string>',
                ],
                'SchemaVersionId' => '<string>',
                'SchemaVersionNumber' => <integer>,
            ],
            'SerdeInfo' => [
                'Name' => '<string>',
                'Parameters' => ['<string>', ...],
                'SerializationLibrary' => '<string>',
            ],
            'SkewedInfo' => [
                'SkewedColumnNames' => ['<string>', ...],
                'SkewedColumnValueLocationMaps' => ['<string>', ...],
                'SkewedColumnValues' => ['<string>', ...],
            ],
            'SortColumns' => [
                [
                    'Column' => '<string>', // REQUIRED
                    'SortOrder' => <integer>, // REQUIRED
                ],
                // ...
            ],
            'StoredAsSubDirectories' => true || false,
        ],
        'Values' => ['<string>', ...],
    ],
    'TableName' => '<string>', // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The Amazon Web Services account ID of the catalog in which the partition is to be created.

DatabaseName
Required: Yes
Type: string

The name of the metadata database in which the partition is to be created.

PartitionInput
Required: Yes
Type: PartitionInput structure

A PartitionInput structure defining the partition to be created.

TableName
Required: Yes
Type: string

The name of the metadata table in which the partition is to be created.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

InvalidInputException:

The input provided was not valid.

AlreadyExistsException:

A resource to be created or added already exists.

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

InternalServiceException:

An internal service error occurred.

EntityNotFoundException:

A specified entity does not exist

OperationTimeoutException:

The operation timed out.

GlueEncryptionException:

An encryption operation failed.

CreatePartitionIndex

$result = $client->createPartitionIndex([/* ... */]);
$promise = $client->createPartitionIndexAsync([/* ... */]);

Creates a specified partition index in an existing table.

Parameter Syntax

$result = $client->createPartitionIndex([
    'CatalogId' => '<string>',
    'DatabaseName' => '<string>', // REQUIRED
    'PartitionIndex' => [ // REQUIRED
        'IndexName' => '<string>', // REQUIRED
        'Keys' => ['<string>', ...], // REQUIRED
    ],
    'TableName' => '<string>', // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The catalog ID where the table resides.

DatabaseName
Required: Yes
Type: string

Specifies the name of a database in which you want to create a partition index.

PartitionIndex
Required: Yes
Type: PartitionIndex structure

Specifies a PartitionIndex structure to create a partition index in an existing table.

TableName
Required: Yes
Type: string

Specifies the name of a table in which you want to create a partition index.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

AlreadyExistsException:

A resource to be created or added already exists.

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GlueEncryptionException:

An encryption operation failed.

CreateRegistry

$result = $client->createRegistry([/* ... */]);
$promise = $client->createRegistryAsync([/* ... */]);

Creates a new registry which may be used to hold a collection of schemas.

Parameter Syntax

$result = $client->createRegistry([
    'Description' => '<string>',
    'RegistryName' => '<string>', // REQUIRED
    'Tags' => ['<string>', ...],
]);

Parameter Details

Members
Description
Type: string

A description of the registry. If description is not provided, there will not be any default value for this.

RegistryName
Required: Yes
Type: string

Name of the registry to be created of max length of 255, and may only contain letters, numbers, hyphen, underscore, dollar sign, or hash mark. No whitespace.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

Amazon Web Services tags that contain a key value pair and may be searched by console, command line, or API.

Result Syntax

[
    'Description' => '<string>',
    'RegistryArn' => '<string>',
    'RegistryName' => '<string>',
    'Tags' => ['<string>', ...],
]

Result Details

Members
Description
Type: string

A description of the registry.

RegistryArn
Type: string

The Amazon Resource Name (ARN) of the newly created registry.

RegistryName
Type: string

The name of the registry.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

The tags for the registry.

Errors

InvalidInputException:

The input provided was not valid.

AccessDeniedException:

Access to a resource was denied.

AlreadyExistsException:

A resource to be created or added already exists.

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

ConcurrentModificationException:

Two processes are trying to modify a resource simultaneously.

InternalServiceException:

An internal service error occurred.

CreateSchema

$result = $client->createSchema([/* ... */]);
$promise = $client->createSchemaAsync([/* ... */]);

Creates a new schema set and registers the schema definition. Returns an error if the schema set already exists without actually registering the version.

When the schema set is created, a version checkpoint will be set to the first version. Compatibility mode "DISABLED" restricts any additional schema versions from being added after the first schema version. For all other compatibility modes, validation of compatibility settings will be applied only from the second version onwards when the RegisterSchemaVersion API is used.

When this API is called without a RegistryId, this will create an entry for a "default-registry" in the registry database tables, if it is not already present.

Parameter Syntax

$result = $client->createSchema([
    'Compatibility' => 'NONE|DISABLED|BACKWARD|BACKWARD_ALL|FORWARD|FORWARD_ALL|FULL|FULL_ALL',
    'DataFormat' => 'AVRO|JSON|PROTOBUF', // REQUIRED
    'Description' => '<string>',
    'RegistryId' => [
        'RegistryArn' => '<string>',
        'RegistryName' => '<string>',
    ],
    'SchemaDefinition' => '<string>',
    'SchemaName' => '<string>', // REQUIRED
    'Tags' => ['<string>', ...],
]);

Parameter Details

Members
Compatibility
Type: string

The compatibility mode of the schema. The possible values are:

  • NONE: No compatibility mode applies. You can use this choice in development scenarios or if you do not know the compatibility mode that you want to apply to schemas. Any new version added will be accepted without undergoing a compatibility check.

  • DISABLED: This compatibility choice prevents versioning for a particular schema. You can use this choice to prevent future versioning of a schema.

  • BACKWARD: This compatibility choice is recommended as it allows data receivers to read both the current and one previous schema version. This means that for instance, a new schema version cannot drop data fields or change the type of these fields, so they can't be read by readers using the previous version.

  • BACKWARD_ALL: This compatibility choice allows data receivers to read both the current and all previous schema versions. You can use this choice when you need to delete fields or add optional fields, and check compatibility against all previous schema versions.

  • FORWARD: This compatibility choice allows data receivers to read both the current and one next schema version, but not necessarily later versions. You can use this choice when you need to add fields or delete optional fields, but only check compatibility against the last schema version.

  • FORWARD_ALL: This compatibility choice allows data receivers to read written by producers of any new registered schema. You can use this choice when you need to add fields or delete optional fields, and check compatibility against all previous schema versions.

  • FULL: This compatibility choice allows data receivers to read data written by producers using the previous or next version of the schema, but not necessarily earlier or later versions. You can use this choice when you need to add or remove optional fields, but only check compatibility against the last schema version.

  • FULL_ALL: This compatibility choice allows data receivers to read data written by producers using all previous schema versions. You can use this choice when you need to add or remove optional fields, and check compatibility against all previous schema versions.

DataFormat
Required: Yes
Type: string

The data format of the schema definition. Currently AVRO, JSON and PROTOBUF are supported.

Description
Type: string

An optional description of the schema. If description is not provided, there will not be any automatic default value for this.

RegistryId
Type: RegistryId structure

This is a wrapper shape to contain the registry identity fields. If this is not provided, the default registry will be used. The ARN format for the same will be: arn:aws:glue:us-east-2:<customer id>:registry/default-registry:random-5-letter-id.

SchemaDefinition
Type: string

The schema definition using the DataFormat setting for SchemaName.

SchemaName
Required: Yes
Type: string

Name of the schema to be created of max length of 255, and may only contain letters, numbers, hyphen, underscore, dollar sign, or hash mark. No whitespace.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

Amazon Web Services tags that contain a key value pair and may be searched by console, command line, or API. If specified, follows the Amazon Web Services tags-on-create pattern.

Result Syntax

[
    'Compatibility' => 'NONE|DISABLED|BACKWARD|BACKWARD_ALL|FORWARD|FORWARD_ALL|FULL|FULL_ALL',
    'DataFormat' => 'AVRO|JSON|PROTOBUF',
    'Description' => '<string>',
    'LatestSchemaVersion' => <integer>,
    'NextSchemaVersion' => <integer>,
    'RegistryArn' => '<string>',
    'RegistryName' => '<string>',
    'SchemaArn' => '<string>',
    'SchemaCheckpoint' => <integer>,
    'SchemaName' => '<string>',
    'SchemaStatus' => 'AVAILABLE|PENDING|DELETING',
    'SchemaVersionId' => '<string>',
    'SchemaVersionStatus' => 'AVAILABLE|PENDING|FAILURE|DELETING',
    'Tags' => ['<string>', ...],
]

Result Details

Members
Compatibility
Type: string

The schema compatibility mode.

DataFormat
Type: string

The data format of the schema definition. Currently AVRO, JSON and PROTOBUF are supported.

Description
Type: string

A description of the schema if specified when created.

LatestSchemaVersion
Type: long (int|float)

The latest version of the schema associated with the returned schema definition.

NextSchemaVersion
Type: long (int|float)

The next version of the schema associated with the returned schema definition.

RegistryArn
Type: string

The Amazon Resource Name (ARN) of the registry.

RegistryName
Type: string

The name of the registry.

SchemaArn
Type: string

The Amazon Resource Name (ARN) of the schema.

SchemaCheckpoint
Type: long (int|float)

The version number of the checkpoint (the last time the compatibility mode was changed).

SchemaName
Type: string

The name of the schema.

SchemaStatus
Type: string

The status of the schema.

SchemaVersionId
Type: string

The unique identifier of the first schema version.

SchemaVersionStatus
Type: string

The status of the first schema version created.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

The tags for the schema.

Errors

InvalidInputException:

The input provided was not valid.

AccessDeniedException:

Access to a resource was denied.

EntityNotFoundException:

A specified entity does not exist

AlreadyExistsException:

A resource to be created or added already exists.

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

ConcurrentModificationException:

Two processes are trying to modify a resource simultaneously.

InternalServiceException:

An internal service error occurred.

CreateScript

$result = $client->createScript([/* ... */]);
$promise = $client->createScriptAsync([/* ... */]);

Transforms a directed acyclic graph (DAG) into code.

Parameter Syntax

$result = $client->createScript([
    'DagEdges' => [
        [
            'Source' => '<string>', // REQUIRED
            'Target' => '<string>', // REQUIRED
            'TargetParameter' => '<string>',
        ],
        // ...
    ],
    'DagNodes' => [
        [
            'Args' => [ // REQUIRED
                [
                    'Name' => '<string>', // REQUIRED
                    'Param' => true || false,
                    'Value' => '<string>', // REQUIRED
                ],
                // ...
            ],
            'Id' => '<string>', // REQUIRED
            'LineNumber' => <integer>,
            'NodeType' => '<string>', // REQUIRED
        ],
        // ...
    ],
    'Language' => 'PYTHON|SCALA',
]);

Parameter Details

Members
DagEdges
Type: Array of CodeGenEdge structures

A list of the edges in the DAG.

DagNodes
Type: Array of CodeGenNode structures

A list of the nodes in the DAG.

Language
Type: string

The programming language of the resulting code from the DAG.

Result Syntax

[
    'PythonScript' => '<string>',
    'ScalaCode' => '<string>',
]

Result Details

Members
PythonScript
Type: string

The Python script generated from the DAG.

ScalaCode
Type: string

The Scala code generated from the DAG.

Errors

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

CreateSecurityConfiguration

$result = $client->createSecurityConfiguration([/* ... */]);
$promise = $client->createSecurityConfigurationAsync([/* ... */]);

Creates a new security configuration. A security configuration is a set of security properties that can be used by Glue. You can use a security configuration to encrypt data at rest. For information about using security configurations in Glue, see Encrypting Data Written by Crawlers, Jobs, and Development Endpoints.

Parameter Syntax

$result = $client->createSecurityConfiguration([
    'EncryptionConfiguration' => [ // REQUIRED
        'CloudWatchEncryption' => [
            'CloudWatchEncryptionMode' => 'DISABLED|SSE-KMS',
            'KmsKeyArn' => '<string>',
        ],
        'JobBookmarksEncryption' => [
            'JobBookmarksEncryptionMode' => 'DISABLED|CSE-KMS',
            'KmsKeyArn' => '<string>',
        ],
        'S3Encryption' => [
            [
                'KmsKeyArn' => '<string>',
                'S3EncryptionMode' => 'DISABLED|SSE-KMS|SSE-S3',
            ],
            // ...
        ],
    ],
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
EncryptionConfiguration
Required: Yes
Type: EncryptionConfiguration structure

The encryption configuration for the new security configuration.

Name
Required: Yes
Type: string

The name for the new security configuration.

Result Syntax

[
    'CreatedTimestamp' => <DateTime>,
    'Name' => '<string>',
]

Result Details

Members
CreatedTimestamp
Type: timestamp (string|DateTime or anything parsable by strtotime)

The time at which the new security configuration was created.

Name
Type: string

The name assigned to the new security configuration.

Errors

AlreadyExistsException:

A resource to be created or added already exists.

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

CreateSession

$result = $client->createSession([/* ... */]);
$promise = $client->createSessionAsync([/* ... */]);

Creates a new session.

Parameter Syntax

$result = $client->createSession([
    'Command' => [ // REQUIRED
        'Name' => '<string>',
        'PythonVersion' => '<string>',
    ],
    'Connections' => [
        'Connections' => ['<string>', ...],
    ],
    'DefaultArguments' => ['<string>', ...],
    'Description' => '<string>',
    'GlueVersion' => '<string>',
    'Id' => '<string>', // REQUIRED
    'IdleTimeout' => <integer>,
    'MaxCapacity' => <float>,
    'NumberOfWorkers' => <integer>,
    'RequestOrigin' => '<string>',
    'Role' => '<string>', // REQUIRED
    'SecurityConfiguration' => '<string>',
    'Tags' => ['<string>', ...],
    'Timeout' => <integer>,
    'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X',
]);

Parameter Details

Members
Command
Required: Yes
Type: SessionCommand structure

The SessionCommand that runs the job.

Connections
Type: ConnectionsList structure

The number of connections to use for the session.

DefaultArguments
Type: Associative array of custom strings keys (OrchestrationNameString) to strings

A map array of key-value pairs. Max is 75 pairs.

Description
Type: string

The description of the session.

GlueVersion
Type: string

The Glue version determines the versions of Apache Spark and Python that Glue supports. The GlueVersion must be greater than 2.0.

Id
Required: Yes
Type: string

The ID of the session request.

IdleTimeout
Type: int

The number of minutes when idle before session times out. Default for Spark ETL jobs is value of Timeout. Consult the documentation for other job types.

MaxCapacity
Type: double

The number of Glue data processing units (DPUs) that can be allocated when the job runs. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB memory.

NumberOfWorkers
Type: int

The number of workers of a defined WorkerType to use for the session.

RequestOrigin
Type: string

The origin of the request.

Role
Required: Yes
Type: string

The IAM Role ARN

SecurityConfiguration
Type: string

The name of the SecurityConfiguration structure to be used with the session

Tags
Type: Associative array of custom strings keys (TagKey) to strings

The map of key value pairs (tags) belonging to the session.

Timeout
Type: int

The number of minutes before session times out. Default for Spark ETL jobs is 48 hours (2880 minutes), the maximum session lifetime for this job type. Consult the documentation for other job types.

WorkerType
Type: string

The type of predefined worker that is allocated when a job runs. Accepts a value of G.1X, G.2X, G.4X, or G.8X for Spark jobs. Accepts the value Z.2X for Ray notebooks.

  • For the G.1X worker type, each worker maps to 1 DPU (4 vCPUs, 16 GB of memory) with 84GB disk (approximately 34GB free), and provides 1 executor per worker. We recommend this worker type for workloads such as data transforms, joins, and queries, to offers a scalable and cost effective way to run most jobs.

  • For the G.2X worker type, each worker maps to 2 DPU (8 vCPUs, 32 GB of memory) with 128GB disk (approximately 77GB free), and provides 1 executor per worker. We recommend this worker type for workloads such as data transforms, joins, and queries, to offers a scalable and cost effective way to run most jobs.

  • For the G.4X worker type, each worker maps to 4 DPU (16 vCPUs, 64 GB of memory) with 256GB disk (approximately 235GB free), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for Glue version 3.0 or later Spark ETL jobs in the following Amazon Web Services Regions: US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm).

  • For the G.8X worker type, each worker maps to 8 DPU (32 vCPUs, 128 GB of memory) with 512GB disk (approximately 487GB free), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for Glue version 3.0 or later Spark ETL jobs, in the same Amazon Web Services Regions as supported for the G.4X worker type.

  • For the Z.2X worker type, each worker maps to 2 M-DPU (8vCPUs, 64 GB of memory) with 128 GB disk (approximately 120GB free), and provides up to 8 Ray workers based on the autoscaler.

Result Syntax

[
    'Session' => [
        'Command' => [
            'Name' => '<string>',
            'PythonVersion' => '<string>',
        ],
        'CompletedOn' => <DateTime>,
        'Connections' => [
            'Connections' => ['<string>', ...],
        ],
        'CreatedOn' => <DateTime>,
        'DPUSeconds' => <float>,
        'DefaultArguments' => ['<string>', ...],
        'Description' => '<string>',
        'ErrorMessage' => '<string>',
        'ExecutionTime' => <float>,
        'GlueVersion' => '<string>',
        'Id' => '<string>',
        'IdleTimeout' => <integer>,
        'MaxCapacity' => <float>,
        'NumberOfWorkers' => <integer>,
        'Progress' => <float>,
        'Role' => '<string>',
        'SecurityConfiguration' => '<string>',
        'Status' => 'PROVISIONING|READY|FAILED|TIMEOUT|STOPPING|STOPPED',
        'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X',
    ],
]

Result Details

Members
Session
Type: Session structure

Returns the session object in the response.

Errors

AccessDeniedException:

Access to a resource was denied.

IdempotentParameterMismatchException:

The same unique identifier was associated with two different records.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

InvalidInputException:

The input provided was not valid.

ValidationException:

A value could not be validated.

AlreadyExistsException:

A resource to be created or added already exists.

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

CreateTable

$result = $client->createTable([/* ... */]);
$promise = $client->createTableAsync([/* ... */]);

Creates a new table definition in the Data Catalog.

Parameter Syntax

$result = $client->createTable([
    'CatalogId' => '<string>',
    'DatabaseName' => '<string>', // REQUIRED
    'OpenTableFormatInput' => [
        'IcebergInput' => [
            'MetadataOperation' => 'CREATE', // REQUIRED
            'Version' => '<string>',
        ],
    ],
    'PartitionIndexes' => [
        [
            'IndexName' => '<string>', // REQUIRED
            'Keys' => ['<string>', ...], // REQUIRED
        ],
        // ...
    ],
    'TableInput' => [ // REQUIRED
        'Description' => '<string>',
        'LastAccessTime' => <integer || string || DateTime>,
        'LastAnalyzedTime' => <integer || string || DateTime>,
        'Name' => '<string>', // REQUIRED
        'Owner' => '<string>',
        'Parameters' => ['<string>', ...],
        'PartitionKeys' => [
            [
                'Comment' => '<string>',
                'Name' => '<string>', // REQUIRED
                'Parameters' => ['<string>', ...],
                'Type' => '<string>',
            ],
            // ...
        ],
        'Retention' => <integer>,
        'StorageDescriptor' => [
            'AdditionalLocations' => ['<string>', ...],
            'BucketColumns' => ['<string>', ...],
            'Columns' => [
                [
                    'Comment' => '<string>',
                    'Name' => '<string>', // REQUIRED
                    'Parameters' => ['<string>', ...],
                    'Type' => '<string>',
                ],
                // ...
            ],
            'Compressed' => true || false,
            'InputFormat' => '<string>',
            'Location' => '<string>',
            'NumberOfBuckets' => <integer>,
            'OutputFormat' => '<string>',
            'Parameters' => ['<string>', ...],
            'SchemaReference' => [
                'SchemaId' => [
                    'RegistryName' => '<string>',
                    'SchemaArn' => '<string>',
                    'SchemaName' => '<string>',
                ],
                'SchemaVersionId' => '<string>',
                'SchemaVersionNumber' => <integer>,
            ],
            'SerdeInfo' => [
                'Name' => '<string>',
                'Parameters' => ['<string>', ...],
                'SerializationLibrary' => '<string>',
            ],
            'SkewedInfo' => [
                'SkewedColumnNames' => ['<string>', ...],
                'SkewedColumnValueLocationMaps' => ['<string>', ...],
                'SkewedColumnValues' => ['<string>', ...],
            ],
            'SortColumns' => [
                [
                    'Column' => '<string>', // REQUIRED
                    'SortOrder' => <integer>, // REQUIRED
                ],
                // ...
            ],
            'StoredAsSubDirectories' => true || false,
        ],
        'TableType' => '<string>',
        'TargetTable' => [
            'CatalogId' => '<string>',
            'DatabaseName' => '<string>',
            'Name' => '<string>',
            'Region' => '<string>',
        ],
        'ViewExpandedText' => '<string>',
        'ViewOriginalText' => '<string>',
    ],
    'TransactionId' => '<string>',
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog in which to create the Table. If none is supplied, the Amazon Web Services account ID is used by default.

DatabaseName
Required: Yes
Type: string

The catalog database in which to create the new table. For Hive compatibility, this name is entirely lowercase.

OpenTableFormatInput
Type: OpenTableFormatInput structure

Specifies an OpenTableFormatInput structure when creating an open format table.

PartitionIndexes
Type: Array of PartitionIndex structures

A list of partition indexes, PartitionIndex structures, to create in the table.

TableInput
Required: Yes
Type: TableInput structure

The TableInput object that defines the metadata table to create in the catalog.

TransactionId
Type: string

The ID of the transaction.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

AlreadyExistsException:

A resource to be created or added already exists.

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GlueEncryptionException:

An encryption operation failed.

ConcurrentModificationException:

Two processes are trying to modify a resource simultaneously.

ResourceNotReadyException:

A resource was not ready for a transaction.

CreateTableOptimizer

$result = $client->createTableOptimizer([/* ... */]);
$promise = $client->createTableOptimizerAsync([/* ... */]);

Creates a new table optimizer for a specific function. compaction is the only currently supported optimizer type.

Parameter Syntax

$result = $client->createTableOptimizer([
    'CatalogId' => '<string>', // REQUIRED
    'DatabaseName' => '<string>', // REQUIRED
    'TableName' => '<string>', // REQUIRED
    'TableOptimizerConfiguration' => [ // REQUIRED
        'enabled' => true || false,
        'roleArn' => '<string>',
    ],
    'Type' => 'compaction', // REQUIRED
]);

Parameter Details

Members
CatalogId
Required: Yes
Type: string

The Catalog ID of the table.

DatabaseName
Required: Yes
Type: string

The name of the database in the catalog in which the table resides.

TableName
Required: Yes
Type: string

The name of the table.

TableOptimizerConfiguration
Required: Yes
Type: TableOptimizerConfiguration structure

A TableOptimizerConfiguration object representing the configuration of a table optimizer.

Type
Required: Yes
Type: string

The type of table optimizer. Currently, the only valid value is compaction.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

AccessDeniedException:

Access to a resource was denied.

AlreadyExistsException:

A resource to be created or added already exists.

InternalServiceException:

An internal service error occurred.

CreateTrigger

$result = $client->createTrigger([/* ... */]);
$promise = $client->createTriggerAsync([/* ... */]);

Creates a new trigger.

Parameter Syntax

$result = $client->createTrigger([
    'Actions' => [ // REQUIRED
        [
            'Arguments' => ['<string>', ...],
            'CrawlerName' => '<string>',
            'JobName' => '<string>',
            'NotificationProperty' => [
                'NotifyDelayAfter' => <integer>,
            ],
            'SecurityConfiguration' => '<string>',
            'Timeout' => <integer>,
        ],
        // ...
    ],
    'Description' => '<string>',
    'EventBatchingCondition' => [
        'BatchSize' => <integer>, // REQUIRED
        'BatchWindow' => <integer>,
    ],
    'Name' => '<string>', // REQUIRED
    'Predicate' => [
        'Conditions' => [
            [
                'CrawlState' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR',
                'CrawlerName' => '<string>',
                'JobName' => '<string>',
                'LogicalOperator' => 'EQUALS',
                'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED',
            ],
            // ...
        ],
        'Logical' => 'AND|ANY',
    ],
    'Schedule' => '<string>',
    'StartOnCreation' => true || false,
    'Tags' => ['<string>', ...],
    'Type' => 'SCHEDULED|CONDITIONAL|ON_DEMAND|EVENT', // REQUIRED
    'WorkflowName' => '<string>',
]);

Parameter Details

Members
Actions
Required: Yes
Type: Array of Action structures

The actions initiated by this trigger when it fires.

Description
Type: string

A description of the new trigger.

EventBatchingCondition
Type: EventBatchingCondition structure

Batch condition that must be met (specified number of events received or batch time window expired) before EventBridge event trigger fires.

Name
Required: Yes
Type: string

The name of the trigger.

Predicate
Type: Predicate structure

A predicate to specify when the new trigger should fire.

This field is required when the trigger type is CONDITIONAL.

Schedule
Type: string

A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *).

This field is required when the trigger type is SCHEDULED.

StartOnCreation
Type: boolean

Set to true to start SCHEDULED and CONDITIONAL triggers when created. True is not supported for ON_DEMAND triggers.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

The tags to use with this trigger. You may use tags to limit access to the trigger. For more information about tags in Glue, see Amazon Web Services Tags in Glue in the developer guide.

Type
Required: Yes
Type: string

The type of the new trigger.

WorkflowName
Type: string

The name of the workflow associated with the trigger.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Type: string

The name of the trigger.

Errors

AlreadyExistsException:

A resource to be created or added already exists.

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

IdempotentParameterMismatchException:

The same unique identifier was associated with two different records.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

ConcurrentModificationException:

Two processes are trying to modify a resource simultaneously.

CreateUserDefinedFunction

$result = $client->createUserDefinedFunction([/* ... */]);
$promise = $client->createUserDefinedFunctionAsync([/* ... */]);

Creates a new function definition in the Data Catalog.

Parameter Syntax

$result = $client->createUserDefinedFunction([
    'CatalogId' => '<string>',
    'DatabaseName' => '<string>', // REQUIRED
    'FunctionInput' => [ // REQUIRED
        'ClassName' => '<string>',
        'FunctionName' => '<string>',
        'OwnerName' => '<string>',
        'OwnerType' => 'USER|ROLE|GROUP',
        'ResourceUris' => [
            [
                'ResourceType' => 'JAR|FILE|ARCHIVE',
                'Uri' => '<string>',
            ],
            // ...
        ],
    ],
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog in which to create the function. If none is provided, the Amazon Web Services account ID is used by default.

DatabaseName
Required: Yes
Type: string

The name of the catalog database in which to create the function.

FunctionInput
Required: Yes
Type: UserDefinedFunctionInput structure

A FunctionInput object that defines the function to create in the Data Catalog.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

AlreadyExistsException:

A resource to be created or added already exists.

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

EntityNotFoundException:

A specified entity does not exist

OperationTimeoutException:

The operation timed out.

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

GlueEncryptionException:

An encryption operation failed.

CreateWorkflow

$result = $client->createWorkflow([/* ... */]);
$promise = $client->createWorkflowAsync([/* ... */]);

Creates a new workflow.

Parameter Syntax

$result = $client->createWorkflow([
    'DefaultRunProperties' => ['<string>', ...],
    'Description' => '<string>',
    'MaxConcurrentRuns' => <integer>,
    'Name' => '<string>', // REQUIRED
    'Tags' => ['<string>', ...],
]);

Parameter Details

Members
DefaultRunProperties
Type: Associative array of custom strings keys (IdString) to strings

A collection of properties to be used as part of each execution of the workflow.

Description
Type: string

A description of the workflow.

MaxConcurrentRuns
Type: int

You can use this parameter to prevent unwanted multiple updates to data, to control costs, or in some cases, to prevent exceeding the maximum number of concurrent runs of any of the component jobs. If you leave this parameter blank, there is no limit to the number of concurrent workflow runs.

Name
Required: Yes
Type: string

The name to be assigned to the workflow. It should be unique within your account.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

The tags to be used with this workflow.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Type: string

The name of the workflow which was provided as part of the request.

Errors

AlreadyExistsException:

A resource to be created or added already exists.

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

ConcurrentModificationException:

Two processes are trying to modify a resource simultaneously.

DeleteBlueprint

$result = $client->deleteBlueprint([/* ... */]);
$promise = $client->deleteBlueprintAsync([/* ... */]);

Deletes an existing blueprint.

Parameter Syntax

$result = $client->deleteBlueprint([
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the blueprint to delete.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Type: string

Returns the name of the blueprint that was deleted.

Errors

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

DeleteClassifier

$result = $client->deleteClassifier([/* ... */]);
$promise = $client->deleteClassifierAsync([/* ... */]);

Removes a classifier from the Data Catalog.

Parameter Syntax

$result = $client->deleteClassifier([
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

Name of the classifier to remove.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

EntityNotFoundException:

A specified entity does not exist

OperationTimeoutException:

The operation timed out.

DeleteColumnStatisticsForPartition

$result = $client->deleteColumnStatisticsForPartition([/* ... */]);
$promise = $client->deleteColumnStatisticsForPartitionAsync([/* ... */]);

Delete the partition column statistics of a column.

The Identity and Access Management (IAM) permission required for this operation is DeletePartition.

Parameter Syntax

$result = $client->deleteColumnStatisticsForPartition([
    'CatalogId' => '<string>',
    'ColumnName' => '<string>', // REQUIRED
    'DatabaseName' => '<string>', // REQUIRED
    'PartitionValues' => ['<string>', ...], // REQUIRED
    'TableName' => '<string>', // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog where the partitions in question reside. If none is supplied, the Amazon Web Services account ID is used by default.

ColumnName
Required: Yes
Type: string

Name of the column.

DatabaseName
Required: Yes
Type: string

The name of the catalog database where the partitions reside.

PartitionValues
Required: Yes
Type: Array of strings

A list of partition values identifying the partition.

TableName
Required: Yes
Type: string

The name of the partitions' table.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GlueEncryptionException:

An encryption operation failed.

DeleteColumnStatisticsForTable

$result = $client->deleteColumnStatisticsForTable([/* ... */]);
$promise = $client->deleteColumnStatisticsForTableAsync([/* ... */]);

Retrieves table statistics of columns.

The Identity and Access Management (IAM) permission required for this operation is DeleteTable.

Parameter Syntax

$result = $client->deleteColumnStatisticsForTable([
    'CatalogId' => '<string>',
    'ColumnName' => '<string>', // REQUIRED
    'DatabaseName' => '<string>', // REQUIRED
    'TableName' => '<string>', // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog where the partitions in question reside. If none is supplied, the Amazon Web Services account ID is used by default.

ColumnName
Required: Yes
Type: string

The name of the column.

DatabaseName
Required: Yes
Type: string

The name of the catalog database where the partitions reside.

TableName
Required: Yes
Type: string

The name of the partitions' table.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GlueEncryptionException:

An encryption operation failed.

DeleteConnection

$result = $client->deleteConnection([/* ... */]);
$promise = $client->deleteConnectionAsync([/* ... */]);

Deletes a connection from the Data Catalog.

Parameter Syntax

$result = $client->deleteConnection([
    'CatalogId' => '<string>',
    'ConnectionName' => '<string>', // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog in which the connection resides. If none is provided, the Amazon Web Services account ID is used by default.

ConnectionName
Required: Yes
Type: string

The name of the connection to delete.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

EntityNotFoundException:

A specified entity does not exist

OperationTimeoutException:

The operation timed out.

DeleteCrawler

$result = $client->deleteCrawler([/* ... */]);
$promise = $client->deleteCrawlerAsync([/* ... */]);

Removes a specified crawler from the Glue Data Catalog, unless the crawler state is RUNNING.

Parameter Syntax

$result = $client->deleteCrawler([
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the crawler to remove.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

EntityNotFoundException:

A specified entity does not exist

CrawlerRunningException:

The operation cannot be performed because the crawler is already running.

SchedulerTransitioningException:

The specified scheduler is transitioning.

OperationTimeoutException:

The operation timed out.

DeleteCustomEntityType

$result = $client->deleteCustomEntityType([/* ... */]);
$promise = $client->deleteCustomEntityTypeAsync([/* ... */]);

Deletes a custom pattern by specifying its name.

Parameter Syntax

$result = $client->deleteCustomEntityType([
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the custom pattern that you want to delete.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Type: string

The name of the custom pattern you deleted.

Errors

EntityNotFoundException:

A specified entity does not exist

AccessDeniedException:

Access to a resource was denied.

InternalServiceException:

An internal service error occurred.

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

DeleteDataQualityRuleset

$result = $client->deleteDataQualityRuleset([/* ... */]);
$promise = $client->deleteDataQualityRulesetAsync([/* ... */]);

Deletes a data quality ruleset.

Parameter Syntax

$result = $client->deleteDataQualityRuleset([
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

A name for the data quality ruleset.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

DeleteDatabase

$result = $client->deleteDatabase([/* ... */]);
$promise = $client->deleteDatabaseAsync([/* ... */]);

Removes a specified database from a Data Catalog.

After completing this operation, you no longer have access to the tables (and all table versions and partitions that might belong to the tables) and the user-defined functions in the deleted database. Glue deletes these "orphaned" resources asynchronously in a timely manner, at the discretion of the service.

To ensure the immediate deletion of all related resources, before calling DeleteDatabase, use DeleteTableVersion or BatchDeleteTableVersion, DeletePartition or BatchDeletePartition, DeleteUserDefinedFunction, and DeleteTable or BatchDeleteTable, to delete any resources that belong to the database.

Parameter Syntax

$result = $client->deleteDatabase([
    'CatalogId' => '<string>',
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog in which the database resides. If none is provided, the Amazon Web Services account ID is used by default.

Name
Required: Yes
Type: string

The name of the database to delete. For Hive compatibility, this must be all lowercase.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

ConcurrentModificationException:

Two processes are trying to modify a resource simultaneously.

DeleteDevEndpoint

$result = $client->deleteDevEndpoint([/* ... */]);
$promise = $client->deleteDevEndpointAsync([/* ... */]);

Deletes a specified development endpoint.

Parameter Syntax

$result = $client->deleteDevEndpoint([
    'EndpointName' => '<string>', // REQUIRED
]);

Parameter Details

Members
EndpointName
Required: Yes
Type: string

The name of the DevEndpoint.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

InvalidInputException:

The input provided was not valid.

DeleteJob

$result = $client->deleteJob([/* ... */]);
$promise = $client->deleteJobAsync([/* ... */]);

Deletes a specified job definition. If the job definition is not found, no exception is thrown.

Parameter Syntax

$result = $client->deleteJob([
    'JobName' => '<string>', // REQUIRED
]);

Parameter Details

Members
JobName
Required: Yes
Type: string

The name of the job definition to delete.

Result Syntax

[
    'JobName' => '<string>',
]

Result Details

Members
JobName
Type: string

The name of the job definition that was deleted.

Errors

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

DeleteMLTransform

$result = $client->deleteMLTransform([/* ... */]);
$promise = $client->deleteMLTransformAsync([/* ... */]);

Deletes an Glue machine learning transform. Machine learning transforms are a special type of transform that use machine learning to learn the details of the transformation to be performed by learning from examples provided by humans. These transformations are then saved by Glue. If you no longer need a transform, you can delete it by calling DeleteMLTransforms. However, any Glue jobs that still reference the deleted transform will no longer succeed.

Parameter Syntax

$result = $client->deleteMLTransform([
    'TransformId' => '<string>', // REQUIRED
]);

Parameter Details

Members
TransformId
Required: Yes
Type: string

The unique identifier of the transform to delete.

Result Syntax

[
    'TransformId' => '<string>',
]

Result Details

Members
TransformId
Type: string

The unique identifier of the transform that was deleted.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

DeletePartition

$result = $client->deletePartition([/* ... */]);
$promise = $client->deletePartitionAsync([/* ... */]);

Deletes a specified partition.

Parameter Syntax

$result = $client->deletePartition([
    'CatalogId' => '<string>',
    'DatabaseName' => '<string>', // REQUIRED
    'PartitionValues' => ['<string>', ...], // REQUIRED
    'TableName' => '<string>', // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog where the partition to be deleted resides. If none is provided, the Amazon Web Services account ID is used by default.

DatabaseName
Required: Yes
Type: string

The name of the catalog database in which the table in question resides.

PartitionValues
Required: Yes
Type: Array of strings

The values that define the partition.

TableName
Required: Yes
Type: string

The name of the table that contains the partition to be deleted.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

DeletePartitionIndex

$result = $client->deletePartitionIndex([/* ... */]);
$promise = $client->deletePartitionIndexAsync([/* ... */]);

Deletes a specified partition index from an existing table.

Parameter Syntax

$result = $client->deletePartitionIndex([
    'CatalogId' => '<string>',
    'DatabaseName' => '<string>', // REQUIRED
    'IndexName' => '<string>', // REQUIRED
    'TableName' => '<string>', // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The catalog ID where the table resides.

DatabaseName
Required: Yes
Type: string

Specifies the name of a database from which you want to delete a partition index.

IndexName
Required: Yes
Type: string

The name of the partition index to be deleted.

TableName
Required: Yes
Type: string

Specifies the name of a table from which you want to delete a partition index.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

ConflictException:

The CreatePartitions API was called on a table that has indexes enabled.

GlueEncryptionException:

An encryption operation failed.

DeleteRegistry

$result = $client->deleteRegistry([/* ... */]);
$promise = $client->deleteRegistryAsync([/* ... */]);

Delete the entire registry including schema and all of its versions. To get the status of the delete operation, you can call the GetRegistry API after the asynchronous call. Deleting a registry will deactivate all online operations for the registry such as the UpdateRegistry, CreateSchema, UpdateSchema, and RegisterSchemaVersion APIs.

Parameter Syntax

$result = $client->deleteRegistry([
    'RegistryId' => [ // REQUIRED
        'RegistryArn' => '<string>',
        'RegistryName' => '<string>',
    ],
]);

Parameter Details

Members
RegistryId
Required: Yes
Type: RegistryId structure

This is a wrapper structure that may contain the registry name and Amazon Resource Name (ARN).

Result Syntax

[
    'RegistryArn' => '<string>',
    'RegistryName' => '<string>',
    'Status' => 'AVAILABLE|DELETING',
]

Result Details

Members
RegistryArn
Type: string

The Amazon Resource Name (ARN) of the registry being deleted.

RegistryName
Type: string

The name of the registry being deleted.

Status
Type: string

The status of the registry. A successful operation will return the Deleting status.

Errors

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

AccessDeniedException:

Access to a resource was denied.

ConcurrentModificationException:

Two processes are trying to modify a resource simultaneously.

DeleteResourcePolicy

$result = $client->deleteResourcePolicy([/* ... */]);
$promise = $client->deleteResourcePolicyAsync([/* ... */]);

Deletes a specified policy.

Parameter Syntax

$result = $client->deleteResourcePolicy([
    'PolicyHashCondition' => '<string>',
    'ResourceArn' => '<string>',
]);

Parameter Details

Members
PolicyHashCondition
Type: string

The hash value returned when this policy was set.

ResourceArn
Type: string

The ARN of the Glue resource for the resource policy to be deleted.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

InvalidInputException:

The input provided was not valid.

ConditionCheckFailureException:

A specified condition was not satisfied.

DeleteSchema

$result = $client->deleteSchema([/* ... */]);
$promise = $client->deleteSchemaAsync([/* ... */]);

Deletes the entire schema set, including the schema set and all of its versions. To get the status of the delete operation, you can call GetSchema API after the asynchronous call. Deleting a registry will deactivate all online operations for the schema, such as the GetSchemaByDefinition, and RegisterSchemaVersion APIs.

Parameter Syntax

$result = $client->deleteSchema([
    'SchemaId' => [ // REQUIRED
        'RegistryName' => '<string>',
        'SchemaArn' => '<string>',
        'SchemaName' => '<string>',
    ],
]);

Parameter Details

Members
SchemaId
Required: Yes
Type: SchemaId structure

This is a wrapper structure that may contain the schema name and Amazon Resource Name (ARN).

Result Syntax

[
    'SchemaArn' => '<string>',
    'SchemaName' => '<string>',
    'Status' => 'AVAILABLE|PENDING|DELETING',
]

Result Details

Members
SchemaArn
Type: string

The Amazon Resource Name (ARN) of the schema being deleted.

SchemaName
Type: string

The name of the schema being deleted.

Status
Type: string

The status of the schema.

Errors

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

AccessDeniedException:

Access to a resource was denied.

ConcurrentModificationException:

Two processes are trying to modify a resource simultaneously.

DeleteSchemaVersions

$result = $client->deleteSchemaVersions([/* ... */]);
$promise = $client->deleteSchemaVersionsAsync([/* ... */]);

Remove versions from the specified schema. A version number or range may be supplied. If the compatibility mode forbids deleting of a version that is necessary, such as BACKWARDS_FULL, an error is returned. Calling the GetSchemaVersions API after this call will list the status of the deleted versions.

When the range of version numbers contain check pointed version, the API will return a 409 conflict and will not proceed with the deletion. You have to remove the checkpoint first using the DeleteSchemaCheckpoint API before using this API.

You cannot use the DeleteSchemaVersions API to delete the first schema version in the schema set. The first schema version can only be deleted by the DeleteSchema API. This operation will also delete the attached SchemaVersionMetadata under the schema versions. Hard deletes will be enforced on the database.

If the compatibility mode forbids deleting of a version that is necessary, such as BACKWARDS_FULL, an error is returned.

Parameter Syntax

$result = $client->deleteSchemaVersions([
    'SchemaId' => [ // REQUIRED
        'RegistryName' => '<string>',
        'SchemaArn' => '<string>',
        'SchemaName' => '<string>',
    ],
    'Versions' => '<string>', // REQUIRED
]);

Parameter Details

Members
SchemaId
Required: Yes
Type: SchemaId structure

This is a wrapper structure that may contain the schema name and Amazon Resource Name (ARN).

Versions
Required: Yes
Type: string

A version range may be supplied which may be of the format:

  • a single version number, 5

  • a range, 5-8 : deletes versions 5, 6, 7, 8

Result Syntax

[
    'SchemaVersionErrors' => [
        [
            'ErrorDetails' => [
                'ErrorCode' => '<string>',
                'ErrorMessage' => '<string>',
            ],
            'VersionNumber' => <integer>,
        ],
        // ...
    ],
]

Result Details

Members
SchemaVersionErrors
Type: Array of SchemaVersionErrorItem structures

A list of SchemaVersionErrorItem objects, each containing an error and schema version.

Errors

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

AccessDeniedException:

Access to a resource was denied.

ConcurrentModificationException:

Two processes are trying to modify a resource simultaneously.

DeleteSecurityConfiguration

$result = $client->deleteSecurityConfiguration([/* ... */]);
$promise = $client->deleteSecurityConfigurationAsync([/* ... */]);

Deletes a specified security configuration.

Parameter Syntax

$result = $client->deleteSecurityConfiguration([
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the security configuration to delete.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

DeleteSession

$result = $client->deleteSession([/* ... */]);
$promise = $client->deleteSessionAsync([/* ... */]);

Deletes the session.

Parameter Syntax

$result = $client->deleteSession([
    'Id' => '<string>', // REQUIRED
    'RequestOrigin' => '<string>',
]);

Parameter Details

Members
Id
Required: Yes
Type: string

The ID of the session to be deleted.

RequestOrigin
Type: string

The name of the origin of the delete session request.

Result Syntax

[
    'Id' => '<string>',
]

Result Details

Members
Id
Type: string

Returns the ID of the deleted session.

Errors

AccessDeniedException:

Access to a resource was denied.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

InvalidInputException:

The input provided was not valid.

IllegalSessionStateException:

The session is in an invalid state to perform a requested operation.

ConcurrentModificationException:

Two processes are trying to modify a resource simultaneously.

DeleteTable

$result = $client->deleteTable([/* ... */]);
$promise = $client->deleteTableAsync([/* ... */]);

Removes a table definition from the Data Catalog.

After completing this operation, you no longer have access to the table versions and partitions that belong to the deleted table. Glue deletes these "orphaned" resources asynchronously in a timely manner, at the discretion of the service.

To ensure the immediate deletion of all related resources, before calling DeleteTable, use DeleteTableVersion or BatchDeleteTableVersion, and DeletePartition or BatchDeletePartition, to delete any resources that belong to the table.

Parameter Syntax

$result = $client->deleteTable([
    'CatalogId' => '<string>',
    'DatabaseName' => '<string>', // REQUIRED
    'Name' => '<string>', // REQUIRED
    'TransactionId' => '<string>',
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog where the table resides. If none is provided, the Amazon Web Services account ID is used by default.

DatabaseName
Required: Yes
Type: string

The name of the catalog database in which the table resides. For Hive compatibility, this name is entirely lowercase.

Name
Required: Yes
Type: string

The name of the table to be deleted. For Hive compatibility, this name is entirely lowercase.

TransactionId
Type: string

The transaction ID at which to delete the table contents.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

ConcurrentModificationException:

Two processes are trying to modify a resource simultaneously.

ResourceNotReadyException:

A resource was not ready for a transaction.

DeleteTableOptimizer

$result = $client->deleteTableOptimizer([/* ... */]);
$promise = $client->deleteTableOptimizerAsync([/* ... */]);

Deletes an optimizer and all associated metadata for a table. The optimization will no longer be performed on the table.

Parameter Syntax

$result = $client->deleteTableOptimizer([
    'CatalogId' => '<string>', // REQUIRED
    'DatabaseName' => '<string>', // REQUIRED
    'TableName' => '<string>', // REQUIRED
    'Type' => 'compaction', // REQUIRED
]);

Parameter Details

Members
CatalogId
Required: Yes
Type: string

The Catalog ID of the table.

DatabaseName
Required: Yes
Type: string

The name of the database in the catalog in which the table resides.

TableName
Required: Yes
Type: string

The name of the table.

Type
Required: Yes
Type: string

The type of table optimizer.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

AccessDeniedException:

Access to a resource was denied.

InternalServiceException:

An internal service error occurred.

DeleteTableVersion

$result = $client->deleteTableVersion([/* ... */]);
$promise = $client->deleteTableVersionAsync([/* ... */]);

Deletes a specified version of a table.

Parameter Syntax

$result = $client->deleteTableVersion([
    'CatalogId' => '<string>',
    'DatabaseName' => '<string>', // REQUIRED
    'TableName' => '<string>', // REQUIRED
    'VersionId' => '<string>', // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog where the tables reside. If none is provided, the Amazon Web Services account ID is used by default.

DatabaseName
Required: Yes
Type: string

The database in the catalog in which the table resides. For Hive compatibility, this name is entirely lowercase.

TableName
Required: Yes
Type: string

The name of the table. For Hive compatibility, this name is entirely lowercase.

VersionId
Required: Yes
Type: string

The ID of the table version to be deleted. A VersionID is a string representation of an integer. Each version is incremented by 1.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

DeleteTrigger

$result = $client->deleteTrigger([/* ... */]);
$promise = $client->deleteTriggerAsync([/* ... */]);

Deletes a specified trigger. If the trigger is not found, no exception is thrown.

Parameter Syntax

$result = $client->deleteTrigger([
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the trigger to delete.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Type: string

The name of the trigger that was deleted.

Errors

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

ConcurrentModificationException:

Two processes are trying to modify a resource simultaneously.

DeleteUserDefinedFunction

$result = $client->deleteUserDefinedFunction([/* ... */]);
$promise = $client->deleteUserDefinedFunctionAsync([/* ... */]);

Deletes an existing function definition from the Data Catalog.

Parameter Syntax

$result = $client->deleteUserDefinedFunction([
    'CatalogId' => '<string>',
    'DatabaseName' => '<string>', // REQUIRED
    'FunctionName' => '<string>', // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog where the function to be deleted is located. If none is supplied, the Amazon Web Services account ID is used by default.

DatabaseName
Required: Yes
Type: string

The name of the catalog database where the function is located.

FunctionName
Required: Yes
Type: string

The name of the function definition to be deleted.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

DeleteWorkflow

$result = $client->deleteWorkflow([/* ... */]);
$promise = $client->deleteWorkflowAsync([/* ... */]);

Deletes a workflow.

Parameter Syntax

$result = $client->deleteWorkflow([
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

Name of the workflow to be deleted.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Type: string

Name of the workflow specified in input.

Errors

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

ConcurrentModificationException:

Two processes are trying to modify a resource simultaneously.

GetBlueprint

$result = $client->getBlueprint([/* ... */]);
$promise = $client->getBlueprintAsync([/* ... */]);

Retrieves the details of a blueprint.

Parameter Syntax

$result = $client->getBlueprint([
    'IncludeBlueprint' => true || false,
    'IncludeParameterSpec' => true || false,
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
IncludeBlueprint
Type: boolean

Specifies whether or not to include the blueprint in the response.

IncludeParameterSpec
Type: boolean

Specifies whether or not to include the parameter specification.

Name
Required: Yes
Type: string

The name of the blueprint.

Result Syntax

[
    'Blueprint' => [
        'BlueprintLocation' => '<string>',
        'BlueprintServiceLocation' => '<string>',
        'CreatedOn' => <DateTime>,
        'Description' => '<string>',
        'ErrorMessage' => '<string>',
        'LastActiveDefinition' => [
            'BlueprintLocation' => '<string>',
            'BlueprintServiceLocation' => '<string>',
            'Description' => '<string>',
            'LastModifiedOn' => <DateTime>,
            'ParameterSpec' => '<string>',
        ],
        'LastModifiedOn' => <DateTime>,
        'Name' => '<string>',
        'ParameterSpec' => '<string>',
        'Status' => 'CREATING|ACTIVE|UPDATING|FAILED',
    ],
]

Result Details

Members
Blueprint
Type: Blueprint structure

Returns a Blueprint object.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

GetBlueprintRun

$result = $client->getBlueprintRun([/* ... */]);
$promise = $client->getBlueprintRunAsync([/* ... */]);

Retrieves the details of a blueprint run.

Parameter Syntax

$result = $client->getBlueprintRun([
    'BlueprintName' => '<string>', // REQUIRED
    'RunId' => '<string>', // REQUIRED
]);

Parameter Details

Members
BlueprintName
Required: Yes
Type: string

The name of the blueprint.

RunId
Required: Yes
Type: string

The run ID for the blueprint run you want to retrieve.

Result Syntax

[
    'BlueprintRun' => [
        'BlueprintName' => '<string>',
        'CompletedOn' => <DateTime>,
        'ErrorMessage' => '<string>',
        'Parameters' => '<string>',
        'RoleArn' => '<string>',
        'RollbackErrorMessage' => '<string>',
        'RunId' => '<string>',
        'StartedOn' => <DateTime>,
        'State' => 'RUNNING|SUCCEEDED|FAILED|ROLLING_BACK',
        'WorkflowName' => '<string>',
    ],
]

Result Details

Members
BlueprintRun
Type: BlueprintRun structure

Returns a BlueprintRun object.

Errors

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GetBlueprintRuns

$result = $client->getBlueprintRuns([/* ... */]);
$promise = $client->getBlueprintRunsAsync([/* ... */]);

Retrieves the details of blueprint runs for a specified blueprint.

Parameter Syntax

$result = $client->getBlueprintRuns([
    'BlueprintName' => '<string>', // REQUIRED
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
]);

Parameter Details

Members
BlueprintName
Required: Yes
Type: string

The name of the blueprint.

MaxResults
Type: int

The maximum size of a list to return.

NextToken
Type: string

A continuation token, if this is a continuation request.

Result Syntax

[
    'BlueprintRuns' => [
        [
            'BlueprintName' => '<string>',
            'CompletedOn' => <DateTime>,
            'ErrorMessage' => '<string>',
            'Parameters' => '<string>',
            'RoleArn' => '<string>',
            'RollbackErrorMessage' => '<string>',
            'RunId' => '<string>',
            'StartedOn' => <DateTime>,
            'State' => 'RUNNING|SUCCEEDED|FAILED|ROLLING_BACK',
            'WorkflowName' => '<string>',
        ],
        // ...
    ],
    'NextToken' => '<string>',
]

Result Details

Members
BlueprintRuns
Type: Array of BlueprintRun structures

Returns a list of BlueprintRun objects.

NextToken
Type: string

A continuation token, if not all blueprint runs have been returned.

Errors

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

InvalidInputException:

The input provided was not valid.

GetCatalogImportStatus

$result = $client->getCatalogImportStatus([/* ... */]);
$promise = $client->getCatalogImportStatusAsync([/* ... */]);

Retrieves the status of a migration operation.

Parameter Syntax

$result = $client->getCatalogImportStatus([
    'CatalogId' => '<string>',
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the catalog to migrate. Currently, this should be the Amazon Web Services account ID.

Result Syntax

[
    'ImportStatus' => [
        'ImportCompleted' => true || false,
        'ImportTime' => <DateTime>,
        'ImportedBy' => '<string>',
    ],
]

Result Details

Members
ImportStatus
Type: CatalogImportStatus structure

The status of the specified catalog migration.

Errors

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GetClassifier

$result = $client->getClassifier([/* ... */]);
$promise = $client->getClassifierAsync([/* ... */]);

Retrieve a classifier by name.

Parameter Syntax

$result = $client->getClassifier([
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

Name of the classifier to retrieve.

Result Syntax

[
    'Classifier' => [
        'CsvClassifier' => [
            'AllowSingleColumn' => true || false,
            'ContainsHeader' => 'UNKNOWN|PRESENT|ABSENT',
            'CreationTime' => <DateTime>,
            'CustomDatatypeConfigured' => true || false,
            'CustomDatatypes' => ['<string>', ...],
            'Delimiter' => '<string>',
            'DisableValueTrimming' => true || false,
            'Header' => ['<string>', ...],
            'LastUpdated' => <DateTime>,
            'Name' => '<string>',
            'QuoteSymbol' => '<string>',
            'Serde' => 'OpenCSVSerDe|LazySimpleSerDe|None',
            'Version' => <integer>,
        ],
        'GrokClassifier' => [
            'Classification' => '<string>',
            'CreationTime' => <DateTime>,
            'CustomPatterns' => '<string>',
            'GrokPattern' => '<string>',
            'LastUpdated' => <DateTime>,
            'Name' => '<string>',
            'Version' => <integer>,
        ],
        'JsonClassifier' => [
            'CreationTime' => <DateTime>,
            'JsonPath' => '<string>',
            'LastUpdated' => <DateTime>,
            'Name' => '<string>',
            'Version' => <integer>,
        ],
        'XMLClassifier' => [
            'Classification' => '<string>',
            'CreationTime' => <DateTime>,
            'LastUpdated' => <DateTime>,
            'Name' => '<string>',
            'RowTag' => '<string>',
            'Version' => <integer>,
        ],
    ],
]

Result Details

Members
Classifier
Type: Classifier structure

The requested classifier.

Errors

EntityNotFoundException:

A specified entity does not exist

OperationTimeoutException:

The operation timed out.

GetClassifiers

$result = $client->getClassifiers([/* ... */]);
$promise = $client->getClassifiersAsync([/* ... */]);

Lists all classifier objects in the Data Catalog.

Parameter Syntax

$result = $client->getClassifiers([
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
]);

Parameter Details

Members
MaxResults
Type: int

The size of the list to return (optional).

NextToken
Type: string

An optional continuation token.

Result Syntax

[
    'Classifiers' => [
        [
            'CsvClassifier' => [
                'AllowSingleColumn' => true || false,
                'ContainsHeader' => 'UNKNOWN|PRESENT|ABSENT',
                'CreationTime' => <DateTime>,
                'CustomDatatypeConfigured' => true || false,
                'CustomDatatypes' => ['<string>', ...],
                'Delimiter' => '<string>',
                'DisableValueTrimming' => true || false,
                'Header' => ['<string>', ...],
                'LastUpdated' => <DateTime>,
                'Name' => '<string>',
                'QuoteSymbol' => '<string>',
                'Serde' => 'OpenCSVSerDe|LazySimpleSerDe|None',
                'Version' => <integer>,
            ],
            'GrokClassifier' => [
                'Classification' => '<string>',
                'CreationTime' => <DateTime>,
                'CustomPatterns' => '<string>',
                'GrokPattern' => '<string>',
                'LastUpdated' => <DateTime>,
                'Name' => '<string>',
                'Version' => <integer>,
            ],
            'JsonClassifier' => [
                'CreationTime' => <DateTime>,
                'JsonPath' => '<string>',
                'LastUpdated' => <DateTime>,
                'Name' => '<string>',
                'Version' => <integer>,
            ],
            'XMLClassifier' => [
                'Classification' => '<string>',
                'CreationTime' => <DateTime>,
                'LastUpdated' => <DateTime>,
                'Name' => '<string>',
                'RowTag' => '<string>',
                'Version' => <integer>,
            ],
        ],
        // ...
    ],
    'NextToken' => '<string>',
]

Result Details

Members
Classifiers
Type: Array of Classifier structures

The requested list of classifier objects.

NextToken
Type: string

A continuation token.

Errors

OperationTimeoutException:

The operation timed out.

GetColumnStatisticsForPartition

$result = $client->getColumnStatisticsForPartition([/* ... */]);
$promise = $client->getColumnStatisticsForPartitionAsync([/* ... */]);

Retrieves partition statistics of columns.

The Identity and Access Management (IAM) permission required for this operation is GetPartition.

Parameter Syntax

$result = $client->getColumnStatisticsForPartition([
    'CatalogId' => '<string>',
    'ColumnNames' => ['<string>', ...], // REQUIRED
    'DatabaseName' => '<string>', // REQUIRED
    'PartitionValues' => ['<string>', ...], // REQUIRED
    'TableName' => '<string>', // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog where the partitions in question reside. If none is supplied, the Amazon Web Services account ID is used by default.

ColumnNames
Required: Yes
Type: Array of strings

A list of the column names.

DatabaseName
Required: Yes
Type: string

The name of the catalog database where the partitions reside.

PartitionValues
Required: Yes
Type: Array of strings

A list of partition values identifying the partition.

TableName
Required: Yes
Type: string

The name of the partitions' table.

Result Syntax

[
    'ColumnStatisticsList' => [
        [
            'AnalyzedTime' => <DateTime>,
            'ColumnName' => '<string>',
            'ColumnType' => '<string>',
            'StatisticsData' => [
                'BinaryColumnStatisticsData' => [
                    'AverageLength' => <float>,
                    'MaximumLength' => <integer>,
                    'NumberOfNulls' => <integer>,
                ],
                'BooleanColumnStatisticsData' => [
                    'NumberOfFalses' => <integer>,
                    'NumberOfNulls' => <integer>,
                    'NumberOfTrues' => <integer>,
                ],
                'DateColumnStatisticsData' => [
                    'MaximumValue' => <DateTime>,
                    'MinimumValue' => <DateTime>,
                    'NumberOfDistinctValues' => <integer>,
                    'NumberOfNulls' => <integer>,
                ],
                'DecimalColumnStatisticsData' => [
                    'MaximumValue' => [
                        'Scale' => <integer>,
                        'UnscaledValue' => <string || resource || Psr\Http\Message\StreamInterface>,
                    ],
                    'MinimumValue' => [
                        'Scale' => <integer>,
                        'UnscaledValue' => <string || resource || Psr\Http\Message\StreamInterface>,
                    ],
                    'NumberOfDistinctValues' => <integer>,
                    'NumberOfNulls' => <integer>,
                ],
                'DoubleColumnStatisticsData' => [
                    'MaximumValue' => <float>,
                    'MinimumValue' => <float>,
                    'NumberOfDistinctValues' => <integer>,
                    'NumberOfNulls' => <integer>,
                ],
                'LongColumnStatisticsData' => [
                    'MaximumValue' => <integer>,
                    'MinimumValue' => <integer>,
                    'NumberOfDistinctValues' => <integer>,
                    'NumberOfNulls' => <integer>,
                ],
                'StringColumnStatisticsData' => [
                    'AverageLength' => <float>,
                    'MaximumLength' => <integer>,
                    'NumberOfDistinctValues' => <integer>,
                    'NumberOfNulls' => <integer>,
                ],
                'Type' => 'BOOLEAN|DATE|DECIMAL|DOUBLE|LONG|STRING|BINARY',
            ],
        ],
        // ...
    ],
    'Errors' => [
        [
            'ColumnName' => '<string>',
            'Error' => [
                'ErrorCode' => '<string>',
                'ErrorMessage' => '<string>',
            ],
        ],
        // ...
    ],
]

Result Details

Members
ColumnStatisticsList
Type: Array of ColumnStatistics structures

List of ColumnStatistics that failed to be retrieved.

Errors
Type: Array of ColumnError structures

Error occurred during retrieving column statistics data.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GlueEncryptionException:

An encryption operation failed.

GetColumnStatisticsForTable

$result = $client->getColumnStatisticsForTable([/* ... */]);
$promise = $client->getColumnStatisticsForTableAsync([/* ... */]);

Retrieves table statistics of columns.

The Identity and Access Management (IAM) permission required for this operation is GetTable.

Parameter Syntax

$result = $client->getColumnStatisticsForTable([
    'CatalogId' => '<string>',
    'ColumnNames' => ['<string>', ...], // REQUIRED
    'DatabaseName' => '<string>', // REQUIRED
    'TableName' => '<string>', // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog where the partitions in question reside. If none is supplied, the Amazon Web Services account ID is used by default.

ColumnNames
Required: Yes
Type: Array of strings

A list of the column names.

DatabaseName
Required: Yes
Type: string

The name of the catalog database where the partitions reside.

TableName
Required: Yes
Type: string

The name of the partitions' table.

Result Syntax

[
    'ColumnStatisticsList' => [
        [
            'AnalyzedTime' => <DateTime>,
            'ColumnName' => '<string>',
            'ColumnType' => '<string>',
            'StatisticsData' => [
                'BinaryColumnStatisticsData' => [
                    'AverageLength' => <float>,
                    'MaximumLength' => <integer>,
                    'NumberOfNulls' => <integer>,
                ],
                'BooleanColumnStatisticsData' => [
                    'NumberOfFalses' => <integer>,
                    'NumberOfNulls' => <integer>,
                    'NumberOfTrues' => <integer>,
                ],
                'DateColumnStatisticsData' => [
                    'MaximumValue' => <DateTime>,
                    'MinimumValue' => <DateTime>,
                    'NumberOfDistinctValues' => <integer>,
                    'NumberOfNulls' => <integer>,
                ],
                'DecimalColumnStatisticsData' => [
                    'MaximumValue' => [
                        'Scale' => <integer>,
                        'UnscaledValue' => <string || resource || Psr\Http\Message\StreamInterface>,
                    ],
                    'MinimumValue' => [
                        'Scale' => <integer>,
                        'UnscaledValue' => <string || resource || Psr\Http\Message\StreamInterface>,
                    ],
                    'NumberOfDistinctValues' => <integer>,
                    'NumberOfNulls' => <integer>,
                ],
                'DoubleColumnStatisticsData' => [
                    'MaximumValue' => <float>,
                    'MinimumValue' => <float>,
                    'NumberOfDistinctValues' => <integer>,
                    'NumberOfNulls' => <integer>,
                ],
                'LongColumnStatisticsData' => [
                    'MaximumValue' => <integer>,
                    'MinimumValue' => <integer>,
                    'NumberOfDistinctValues' => <integer>,
                    'NumberOfNulls' => <integer>,
                ],
                'StringColumnStatisticsData' => [
                    'AverageLength' => <float>,
                    'MaximumLength' => <integer>,
                    'NumberOfDistinctValues' => <integer>,
                    'NumberOfNulls' => <integer>,
                ],
                'Type' => 'BOOLEAN|DATE|DECIMAL|DOUBLE|LONG|STRING|BINARY',
            ],
        ],
        // ...
    ],
    'Errors' => [
        [
            'ColumnName' => '<string>',
            'Error' => [
                'ErrorCode' => '<string>',
                'ErrorMessage' => '<string>',
            ],
        ],
        // ...
    ],
]

Result Details

Members
ColumnStatisticsList
Type: Array of ColumnStatistics structures

List of ColumnStatistics.

Errors
Type: Array of ColumnError structures

List of ColumnStatistics that failed to be retrieved.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GlueEncryptionException:

An encryption operation failed.

GetColumnStatisticsTaskRun

$result = $client->getColumnStatisticsTaskRun([/* ... */]);
$promise = $client->getColumnStatisticsTaskRunAsync([/* ... */]);

Get the associated metadata/information for a task run, given a task run ID.

Parameter Syntax

$result = $client->getColumnStatisticsTaskRun([
    'ColumnStatisticsTaskRunId' => '<string>', // REQUIRED
]);

Parameter Details

Members
ColumnStatisticsTaskRunId
Required: Yes
Type: string

The identifier for the particular column statistics task run.

Result Syntax

[
    'ColumnStatisticsTaskRun' => [
        'CatalogID' => '<string>',
        'ColumnNameList' => ['<string>', ...],
        'ColumnStatisticsTaskRunId' => '<string>',
        'CreationTime' => <DateTime>,
        'CustomerId' => '<string>',
        'DPUSeconds' => <float>,
        'DatabaseName' => '<string>',
        'EndTime' => <DateTime>,
        'ErrorMessage' => '<string>',
        'LastUpdated' => <DateTime>,
        'NumberOfWorkers' => <integer>,
        'Role' => '<string>',
        'SampleSize' => <float>,
        'SecurityConfiguration' => '<string>',
        'StartTime' => <DateTime>,
        'Status' => 'STARTING|RUNNING|SUCCEEDED|FAILED|STOPPED',
        'TableName' => '<string>',
        'WorkerType' => '<string>',
    ],
]

Result Details

Members
ColumnStatisticsTaskRun
Type: ColumnStatisticsTaskRun structure

A ColumnStatisticsTaskRun object representing the details of the column stats run.

Errors

EntityNotFoundException:

A specified entity does not exist

OperationTimeoutException:

The operation timed out.

InvalidInputException:

The input provided was not valid.

GetColumnStatisticsTaskRuns

$result = $client->getColumnStatisticsTaskRuns([/* ... */]);
$promise = $client->getColumnStatisticsTaskRunsAsync([/* ... */]);

Retrieves information about all runs associated with the specified table.

Parameter Syntax

$result = $client->getColumnStatisticsTaskRuns([
    'DatabaseName' => '<string>', // REQUIRED
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
    'TableName' => '<string>', // REQUIRED
]);

Parameter Details

Members
DatabaseName
Required: Yes
Type: string

The name of the database where the table resides.

MaxResults
Type: int

The maximum size of the response.

NextToken
Type: string

A continuation token, if this is a continuation call.

TableName
Required: Yes
Type: string

The name of the table.

Result Syntax

[
    'ColumnStatisticsTaskRuns' => [
        [
            'CatalogID' => '<string>',
            'ColumnNameList' => ['<string>', ...],
            'ColumnStatisticsTaskRunId' => '<string>',
            'CreationTime' => <DateTime>,
            'CustomerId' => '<string>',
            'DPUSeconds' => <float>,
            'DatabaseName' => '<string>',
            'EndTime' => <DateTime>,
            'ErrorMessage' => '<string>',
            'LastUpdated' => <DateTime>,
            'NumberOfWorkers' => <integer>,
            'Role' => '<string>',
            'SampleSize' => <float>,
            'SecurityConfiguration' => '<string>',
            'StartTime' => <DateTime>,
            'Status' => 'STARTING|RUNNING|SUCCEEDED|FAILED|STOPPED',
            'TableName' => '<string>',
            'WorkerType' => '<string>',
        ],
        // ...
    ],
    'NextToken' => '<string>',
]

Result Details

Members
ColumnStatisticsTaskRuns
Type: Array of ColumnStatisticsTaskRun structures

A list of column statistics task runs.

NextToken
Type: string

A continuation token, if not all task runs have yet been returned.

Errors

OperationTimeoutException:

The operation timed out.

GetConnection

$result = $client->getConnection([/* ... */]);
$promise = $client->getConnectionAsync([/* ... */]);

Retrieves a connection definition from the Data Catalog.

Parameter Syntax

$result = $client->getConnection([
    'CatalogId' => '<string>',
    'HidePassword' => true || false,
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog in which the connection resides. If none is provided, the Amazon Web Services account ID is used by default.

HidePassword
Type: boolean

Allows you to retrieve the connection metadata without returning the password. For instance, the Glue console uses this flag to retrieve the connection, and does not display the password. Set this parameter when the caller might not have permission to use the KMS key to decrypt the password, but it does have permission to access the rest of the connection properties.

Name
Required: Yes
Type: string

The name of the connection definition to retrieve.

Result Syntax

[
    'Connection' => [
        'ConnectionProperties' => ['<string>', ...],
        'ConnectionType' => 'JDBC|SFTP|MONGODB|KAFKA|NETWORK|MARKETPLACE|CUSTOM',
        'CreationTime' => <DateTime>,
        'Description' => '<string>',
        'LastUpdatedBy' => '<string>',
        'LastUpdatedTime' => <DateTime>,
        'MatchCriteria' => ['<string>', ...],
        'Name' => '<string>',
        'PhysicalConnectionRequirements' => [
            'AvailabilityZone' => '<string>',
            'SecurityGroupIdList' => ['<string>', ...],
            'SubnetId' => '<string>',
        ],
    ],
]

Result Details

Members
Connection
Type: Connection structure

The requested connection definition.

Errors

EntityNotFoundException:

A specified entity does not exist

OperationTimeoutException:

The operation timed out.

InvalidInputException:

The input provided was not valid.

GlueEncryptionException:

An encryption operation failed.

GetConnections

$result = $client->getConnections([/* ... */]);
$promise = $client->getConnectionsAsync([/* ... */]);

Retrieves a list of connection definitions from the Data Catalog.

Parameter Syntax

$result = $client->getConnections([
    'CatalogId' => '<string>',
    'Filter' => [
        'ConnectionType' => 'JDBC|SFTP|MONGODB|KAFKA|NETWORK|MARKETPLACE|CUSTOM',
        'MatchCriteria' => ['<string>', ...],
    ],
    'HidePassword' => true || false,
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog in which the connections reside. If none is provided, the Amazon Web Services account ID is used by default.

Filter
Type: GetConnectionsFilter structure

A filter that controls which connections are returned.

HidePassword
Type: boolean

Allows you to retrieve the connection metadata without returning the password. For instance, the Glue console uses this flag to retrieve the connection, and does not display the password. Set this parameter when the caller might not have permission to use the KMS key to decrypt the password, but it does have permission to access the rest of the connection properties.

MaxResults
Type: int

The maximum number of connections to return in one response.

NextToken
Type: string

A continuation token, if this is a continuation call.

Result Syntax

[
    'ConnectionList' => [
        [
            'ConnectionProperties' => ['<string>', ...],
            'ConnectionType' => 'JDBC|SFTP|MONGODB|KAFKA|NETWORK|MARKETPLACE|CUSTOM',
            'CreationTime' => <DateTime>,
            'Description' => '<string>',
            'LastUpdatedBy' => '<string>',
            'LastUpdatedTime' => <DateTime>,
            'MatchCriteria' => ['<string>', ...],
            'Name' => '<string>',
            'PhysicalConnectionRequirements' => [
                'AvailabilityZone' => '<string>',
                'SecurityGroupIdList' => ['<string>', ...],
                'SubnetId' => '<string>',
            ],
        ],
        // ...
    ],
    'NextToken' => '<string>',
]

Result Details

Members
ConnectionList
Type: Array of Connection structures

A list of requested connection definitions.

NextToken
Type: string

A continuation token, if the list of connections returned does not include the last of the filtered connections.

Errors

EntityNotFoundException:

A specified entity does not exist

OperationTimeoutException:

The operation timed out.

InvalidInputException:

The input provided was not valid.

GlueEncryptionException:

An encryption operation failed.

GetCrawler

$result = $client->getCrawler([/* ... */]);
$promise = $client->getCrawlerAsync([/* ... */]);

Retrieves metadata for a specified crawler.

Parameter Syntax

$result = $client->getCrawler([
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the crawler to retrieve metadata for.

Result Syntax

[
    'Crawler' => [
        'Classifiers' => ['<string>', ...],
        'Configuration' => '<string>',
        'CrawlElapsedTime' => <integer>,
        'CrawlerSecurityConfiguration' => '<string>',
        'CreationTime' => <DateTime>,
        'DatabaseName' => '<string>',
        'Description' => '<string>',
        'LakeFormationConfiguration' => [
            'AccountId' => '<string>',
            'UseLakeFormationCredentials' => true || false,
        ],
        'LastCrawl' => [
            'ErrorMessage' => '<string>',
            'LogGroup' => '<string>',
            'LogStream' => '<string>',
            'MessagePrefix' => '<string>',
            'StartTime' => <DateTime>,
            'Status' => 'SUCCEEDED|CANCELLED|FAILED',
        ],
        'LastUpdated' => <DateTime>,
        'LineageConfiguration' => [
            'CrawlerLineageSettings' => 'ENABLE|DISABLE',
        ],
        'Name' => '<string>',
        'RecrawlPolicy' => [
            'RecrawlBehavior' => 'CRAWL_EVERYTHING|CRAWL_NEW_FOLDERS_ONLY|CRAWL_EVENT_MODE',
        ],
        'Role' => '<string>',
        'Schedule' => [
            'ScheduleExpression' => '<string>',
            'State' => 'SCHEDULED|NOT_SCHEDULED|TRANSITIONING',
        ],
        'SchemaChangePolicy' => [
            'DeleteBehavior' => 'LOG|DELETE_FROM_DATABASE|DEPRECATE_IN_DATABASE',
            'UpdateBehavior' => 'LOG|UPDATE_IN_DATABASE',
        ],
        'State' => 'READY|RUNNING|STOPPING',
        'TablePrefix' => '<string>',
        'Targets' => [
            'CatalogTargets' => [
                [
                    'ConnectionName' => '<string>',
                    'DatabaseName' => '<string>',
                    'DlqEventQueueArn' => '<string>',
                    'EventQueueArn' => '<string>',
                    'Tables' => ['<string>', ...],
                ],
                // ...
            ],
            'DeltaTargets' => [
                [
                    'ConnectionName' => '<string>',
                    'CreateNativeDeltaTable' => true || false,
                    'DeltaTables' => ['<string>', ...],
                    'WriteManifest' => true || false,
                ],
                // ...
            ],
            'DynamoDBTargets' => [
                [
                    'Path' => '<string>',
                    'scanAll' => true || false,
                    'scanRate' => <float>,
                ],
                // ...
            ],
            'HudiTargets' => [
                [
                    'ConnectionName' => '<string>',
                    'Exclusions' => ['<string>', ...],
                    'MaximumTraversalDepth' => <integer>,
                    'Paths' => ['<string>', ...],
                ],
                // ...
            ],
            'IcebergTargets' => [
                [
                    'ConnectionName' => '<string>',
                    'Exclusions' => ['<string>', ...],
                    'MaximumTraversalDepth' => <integer>,
                    'Paths' => ['<string>', ...],
                ],
                // ...
            ],
            'JdbcTargets' => [
                [
                    'ConnectionName' => '<string>',
                    'EnableAdditionalMetadata' => ['<string>', ...],
                    'Exclusions' => ['<string>', ...],
                    'Path' => '<string>',
                ],
                // ...
            ],
            'MongoDBTargets' => [
                [
                    'ConnectionName' => '<string>',
                    'Path' => '<string>',
                    'ScanAll' => true || false,
                ],
                // ...
            ],
            'S3Targets' => [
                [
                    'ConnectionName' => '<string>',
                    'DlqEventQueueArn' => '<string>',
                    'EventQueueArn' => '<string>',
                    'Exclusions' => ['<string>', ...],
                    'Path' => '<string>',
                    'SampleSize' => <integer>,
                ],
                // ...
            ],
        ],
        'Version' => <integer>,
    ],
]

Result Details

Members
Crawler
Type: Crawler structure

The metadata for the specified crawler.

Errors

EntityNotFoundException:

A specified entity does not exist

OperationTimeoutException:

The operation timed out.

GetCrawlerMetrics

$result = $client->getCrawlerMetrics([/* ... */]);
$promise = $client->getCrawlerMetricsAsync([/* ... */]);

Retrieves metrics about specified crawlers.

Parameter Syntax

$result = $client->getCrawlerMetrics([
    'CrawlerNameList' => ['<string>', ...],
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
]);

Parameter Details

Members
CrawlerNameList
Type: Array of strings

A list of the names of crawlers about which to retrieve metrics.

MaxResults
Type: int

The maximum size of a list to return.

NextToken
Type: string

A continuation token, if this is a continuation call.

Result Syntax

[
    'CrawlerMetricsList' => [
        [
            'CrawlerName' => '<string>',
            'LastRuntimeSeconds' => <float>,
            'MedianRuntimeSeconds' => <float>,
            'StillEstimating' => true || false,
            'TablesCreated' => <integer>,
            'TablesDeleted' => <integer>,
            'TablesUpdated' => <integer>,
            'TimeLeftSeconds' => <float>,
        ],
        // ...
    ],
    'NextToken' => '<string>',
]

Result Details

Members
CrawlerMetricsList
Type: Array of CrawlerMetrics structures

A list of metrics for the specified crawler.

NextToken
Type: string

A continuation token, if the returned list does not contain the last metric available.

Errors

OperationTimeoutException:

The operation timed out.

GetCrawlers

$result = $client->getCrawlers([/* ... */]);
$promise = $client->getCrawlersAsync([/* ... */]);

Retrieves metadata for all crawlers defined in the customer account.

Parameter Syntax

$result = $client->getCrawlers([
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
]);

Parameter Details

Members
MaxResults
Type: int

The number of crawlers to return on each call.

NextToken
Type: string

A continuation token, if this is a continuation request.

Result Syntax

[
    'Crawlers' => [
        [
            'Classifiers' => ['<string>', ...],
            'Configuration' => '<string>',
            'CrawlElapsedTime' => <integer>,
            'CrawlerSecurityConfiguration' => '<string>',
            'CreationTime' => <DateTime>,
            'DatabaseName' => '<string>',
            'Description' => '<string>',
            'LakeFormationConfiguration' => [
                'AccountId' => '<string>',
                'UseLakeFormationCredentials' => true || false,
            ],
            'LastCrawl' => [
                'ErrorMessage' => '<string>',
                'LogGroup' => '<string>',
                'LogStream' => '<string>',
                'MessagePrefix' => '<string>',
                'StartTime' => <DateTime>,
                'Status' => 'SUCCEEDED|CANCELLED|FAILED',
            ],
            'LastUpdated' => <DateTime>,
            'LineageConfiguration' => [
                'CrawlerLineageSettings' => 'ENABLE|DISABLE',
            ],
            'Name' => '<string>',
            'RecrawlPolicy' => [
                'RecrawlBehavior' => 'CRAWL_EVERYTHING|CRAWL_NEW_FOLDERS_ONLY|CRAWL_EVENT_MODE',
            ],
            'Role' => '<string>',
            'Schedule' => [
                'ScheduleExpression' => '<string>',
                'State' => 'SCHEDULED|NOT_SCHEDULED|TRANSITIONING',
            ],
            'SchemaChangePolicy' => [
                'DeleteBehavior' => 'LOG|DELETE_FROM_DATABASE|DEPRECATE_IN_DATABASE',
                'UpdateBehavior' => 'LOG|UPDATE_IN_DATABASE',
            ],
            'State' => 'READY|RUNNING|STOPPING',
            'TablePrefix' => '<string>',
            'Targets' => [
                'CatalogTargets' => [
                    [
                        'ConnectionName' => '<string>',
                        'DatabaseName' => '<string>',
                        'DlqEventQueueArn' => '<string>',
                        'EventQueueArn' => '<string>',
                        'Tables' => ['<string>', ...],
                    ],
                    // ...
                ],
                'DeltaTargets' => [
                    [
                        'ConnectionName' => '<string>',
                        'CreateNativeDeltaTable' => true || false,
                        'DeltaTables' => ['<string>', ...],
                        'WriteManifest' => true || false,
                    ],
                    // ...
                ],
                'DynamoDBTargets' => [
                    [
                        'Path' => '<string>',
                        'scanAll' => true || false,
                        'scanRate' => <float>,
                    ],
                    // ...
                ],
                'HudiTargets' => [
                    [
                        'ConnectionName' => '<string>',
                        'Exclusions' => ['<string>', ...],
                        'MaximumTraversalDepth' => <integer>,
                        'Paths' => ['<string>', ...],
                    ],
                    // ...
                ],
                'IcebergTargets' => [
                    [
                        'ConnectionName' => '<string>',
                        'Exclusions' => ['<string>', ...],
                        'MaximumTraversalDepth' => <integer>,
                        'Paths' => ['<string>', ...],
                    ],
                    // ...
                ],
                'JdbcTargets' => [
                    [
                        'ConnectionName' => '<string>',
                        'EnableAdditionalMetadata' => ['<string>', ...],
                        'Exclusions' => ['<string>', ...],
                        'Path' => '<string>',
                    ],
                    // ...
                ],
                'MongoDBTargets' => [
                    [
                        'ConnectionName' => '<string>',
                        'Path' => '<string>',
                        'ScanAll' => true || false,
                    ],
                    // ...
                ],
                'S3Targets' => [
                    [
                        'ConnectionName' => '<string>',
                        'DlqEventQueueArn' => '<string>',
                        'EventQueueArn' => '<string>',
                        'Exclusions' => ['<string>', ...],
                        'Path' => '<string>',
                        'SampleSize' => <integer>,
                    ],
                    // ...
                ],
            ],
            'Version' => <integer>,
        ],
        // ...
    ],
    'NextToken' => '<string>',
]

Result Details

Members
Crawlers
Type: Array of Crawler structures

A list of crawler metadata.

NextToken
Type: string

A continuation token, if the returned list has not reached the end of those defined in this customer account.

Errors

OperationTimeoutException:

The operation timed out.

GetCustomEntityType

$result = $client->getCustomEntityType([/* ... */]);
$promise = $client->getCustomEntityTypeAsync([/* ... */]);

Retrieves the details of a custom pattern by specifying its name.

Parameter Syntax

$result = $client->getCustomEntityType([
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the custom pattern that you want to retrieve.

Result Syntax

[
    'ContextWords' => ['<string>', ...],
    'Name' => '<string>',
    'RegexString' => '<string>',
]

Result Details

Members
ContextWords
Type: Array of strings

A list of context words if specified when you created the custom pattern. If none of these context words are found within the vicinity of the regular expression the data will not be detected as sensitive data.

Name
Type: string

The name of the custom pattern that you retrieved.

RegexString
Type: string

A regular expression string that is used for detecting sensitive data in a custom pattern.

Errors

EntityNotFoundException:

A specified entity does not exist

AccessDeniedException:

Access to a resource was denied.

InternalServiceException:

An internal service error occurred.

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

GetDataCatalogEncryptionSettings

$result = $client->getDataCatalogEncryptionSettings([/* ... */]);
$promise = $client->getDataCatalogEncryptionSettingsAsync([/* ... */]);

Retrieves the security configuration for a specified catalog.

Parameter Syntax

$result = $client->getDataCatalogEncryptionSettings([
    'CatalogId' => '<string>',
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog to retrieve the security configuration for. If none is provided, the Amazon Web Services account ID is used by default.

Result Syntax

[
    'DataCatalogEncryptionSettings' => [
        'ConnectionPasswordEncryption' => [
            'AwsKmsKeyId' => '<string>',
            'ReturnConnectionPasswordEncrypted' => true || false,
        ],
        'EncryptionAtRest' => [
            'CatalogEncryptionMode' => 'DISABLED|SSE-KMS|SSE-KMS-WITH-SERVICE-ROLE',
            'CatalogEncryptionServiceRole' => '<string>',
            'SseAwsKmsKeyId' => '<string>',
        ],
    ],
]

Result Details

Members
DataCatalogEncryptionSettings

The requested security configuration.

Errors

InternalServiceException:

An internal service error occurred.

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

GetDataQualityResult

$result = $client->getDataQualityResult([/* ... */]);
$promise = $client->getDataQualityResultAsync([/* ... */]);

Retrieves the result of a data quality rule evaluation.

Parameter Syntax

$result = $client->getDataQualityResult([
    'ResultId' => '<string>', // REQUIRED
]);

Parameter Details

Members
ResultId
Required: Yes
Type: string

A unique result ID for the data quality result.

Result Syntax

[
    'AnalyzerResults' => [
        [
            'Description' => '<string>',
            'EvaluatedMetrics' => [<float>, ...],
            'EvaluationMessage' => '<string>',
            'Name' => '<string>',
        ],
        // ...
    ],
    'CompletedOn' => <DateTime>,
    'DataSource' => [
        'GlueTable' => [
            'AdditionalOptions' => ['<string>', ...],
            'CatalogId' => '<string>',
            'ConnectionName' => '<string>',
            'DatabaseName' => '<string>',
            'TableName' => '<string>',
        ],
    ],
    'EvaluationContext' => '<string>',
    'JobName' => '<string>',
    'JobRunId' => '<string>',
    'Observations' => [
        [
            'Description' => '<string>',
            'MetricBasedObservation' => [
                'MetricName' => '<string>',
                'MetricValues' => [
                    'ActualValue' => <float>,
                    'ExpectedValue' => <float>,
                    'LowerLimit' => <float>,
                    'UpperLimit' => <float>,
                ],
                'NewRules' => ['<string>', ...],
            ],
        ],
        // ...
    ],
    'ResultId' => '<string>',
    'RuleResults' => [
        [
            'Description' => '<string>',
            'EvaluatedMetrics' => [<float>, ...],
            'EvaluationMessage' => '<string>',
            'Name' => '<string>',
            'Result' => 'PASS|FAIL|ERROR',
        ],
        // ...
    ],
    'RulesetEvaluationRunId' => '<string>',
    'RulesetName' => '<string>',
    'Score' => <float>,
    'StartedOn' => <DateTime>,
]

Result Details

Members
AnalyzerResults
Type: Array of DataQualityAnalyzerResult structures

A list of DataQualityAnalyzerResult objects representing the results for each analyzer.

CompletedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time when the run for this data quality result was completed.

DataSource
Type: DataSource structure

The table associated with the data quality result, if any.

EvaluationContext
Type: string

In the context of a job in Glue Studio, each node in the canvas is typically assigned some sort of name and data quality nodes will have names. In the case of multiple nodes, the evaluationContext can differentiate the nodes.

JobName
Type: string

The job name associated with the data quality result, if any.

JobRunId
Type: string

The job run ID associated with the data quality result, if any.

Observations
Type: Array of DataQualityObservation structures

A list of DataQualityObservation objects representing the observations generated after evaluating the rules and analyzers.

ResultId
Type: string

A unique result ID for the data quality result.

RuleResults
Type: Array of DataQualityRuleResult structures

A list of DataQualityRuleResult objects representing the results for each rule.

RulesetEvaluationRunId
Type: string

The unique run ID associated with the ruleset evaluation.

RulesetName
Type: string

The name of the ruleset associated with the data quality result.

Score
Type: double

An aggregate data quality score. Represents the ratio of rules that passed to the total number of rules.

StartedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time when the run for this data quality result started.

Errors

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

EntityNotFoundException:

A specified entity does not exist

GetDataQualityRuleRecommendationRun

$result = $client->getDataQualityRuleRecommendationRun([/* ... */]);
$promise = $client->getDataQualityRuleRecommendationRunAsync([/* ... */]);

Gets the specified recommendation run that was used to generate rules.

Parameter Syntax

$result = $client->getDataQualityRuleRecommendationRun([
    'RunId' => '<string>', // REQUIRED
]);

Parameter Details

Members
RunId
Required: Yes
Type: string

The unique run identifier associated with this run.

Result Syntax

[
    'CompletedOn' => <DateTime>,
    'CreatedRulesetName' => '<string>',
    'DataSource' => [
        'GlueTable' => [
            'AdditionalOptions' => ['<string>', ...],
            'CatalogId' => '<string>',
            'ConnectionName' => '<string>',
            'DatabaseName' => '<string>',
            'TableName' => '<string>',
        ],
    ],
    'ErrorString' => '<string>',
    'ExecutionTime' => <integer>,
    'LastModifiedOn' => <DateTime>,
    'NumberOfWorkers' => <integer>,
    'RecommendedRuleset' => '<string>',
    'Role' => '<string>',
    'RunId' => '<string>',
    'StartedOn' => <DateTime>,
    'Status' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT',
    'Timeout' => <integer>,
]

Result Details

Members
CompletedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time when this run was completed.

CreatedRulesetName
Type: string

The name of the ruleset that was created by the run.

DataSource
Type: DataSource structure

The data source (an Glue table) associated with this run.

ErrorString
Type: string

The error strings that are associated with the run.

ExecutionTime
Type: int

The amount of time (in seconds) that the run consumed resources.

LastModifiedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

A timestamp. The last point in time when this data quality rule recommendation run was modified.

NumberOfWorkers
Type: int

The number of G.1X workers to be used in the run. The default is 5.

RecommendedRuleset
Type: string

When a start rule recommendation run completes, it creates a recommended ruleset (a set of rules). This member has those rules in Data Quality Definition Language (DQDL) format.

Role
Type: string

An IAM role supplied to encrypt the results of the run.

RunId
Type: string

The unique run identifier associated with this run.

StartedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time when this run started.

Status
Type: string

The status for this run.

Timeout
Type: int

The timeout for a run in minutes. This is the maximum time that a run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

GetDataQualityRuleset

$result = $client->getDataQualityRuleset([/* ... */]);
$promise = $client->getDataQualityRulesetAsync([/* ... */]);

Returns an existing ruleset by identifier or name.

Parameter Syntax

$result = $client->getDataQualityRuleset([
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the ruleset.

Result Syntax

[
    'CreatedOn' => <DateTime>,
    'Description' => '<string>',
    'LastModifiedOn' => <DateTime>,
    'Name' => '<string>',
    'RecommendationRunId' => '<string>',
    'Ruleset' => '<string>',
    'TargetTable' => [
        'CatalogId' => '<string>',
        'DatabaseName' => '<string>',
        'TableName' => '<string>',
    ],
]

Result Details

Members
CreatedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

A timestamp. The time and date that this data quality ruleset was created.

Description
Type: string

A description of the ruleset.

LastModifiedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

A timestamp. The last point in time when this data quality ruleset was modified.

Name
Type: string

The name of the ruleset.

RecommendationRunId
Type: string

When a ruleset was created from a recommendation run, this run ID is generated to link the two together.

Ruleset
Type: string

A Data Quality Definition Language (DQDL) ruleset. For more information, see the Glue developer guide.

TargetTable
Type: DataQualityTargetTable structure

The name and database name of the target table.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

GetDataQualityRulesetEvaluationRun

$result = $client->getDataQualityRulesetEvaluationRun([/* ... */]);
$promise = $client->getDataQualityRulesetEvaluationRunAsync([/* ... */]);

Retrieves a specific run where a ruleset is evaluated against a data source.

Parameter Syntax

$result = $client->getDataQualityRulesetEvaluationRun([
    'RunId' => '<string>', // REQUIRED
]);

Parameter Details

Members
RunId
Required: Yes
Type: string

The unique run identifier associated with this run.

Result Syntax

[
    'AdditionalDataSources' => [
        '<NameString>' => [
            'GlueTable' => [
                'AdditionalOptions' => ['<string>', ...],
                'CatalogId' => '<string>',
                'ConnectionName' => '<string>',
                'DatabaseName' => '<string>',
                'TableName' => '<string>',
            ],
        ],
        // ...
    ],
    'AdditionalRunOptions' => [
        'CloudWatchMetricsEnabled' => true || false,
        'ResultsS3Prefix' => '<string>',
    ],
    'CompletedOn' => <DateTime>,
    'DataSource' => [
        'GlueTable' => [
            'AdditionalOptions' => ['<string>', ...],
            'CatalogId' => '<string>',
            'ConnectionName' => '<string>',
            'DatabaseName' => '<string>',
            'TableName' => '<string>',
        ],
    ],
    'ErrorString' => '<string>',
    'ExecutionTime' => <integer>,
    'LastModifiedOn' => <DateTime>,
    'NumberOfWorkers' => <integer>,
    'ResultIds' => ['<string>', ...],
    'Role' => '<string>',
    'RulesetNames' => ['<string>', ...],
    'RunId' => '<string>',
    'StartedOn' => <DateTime>,
    'Status' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT',
    'Timeout' => <integer>,
]

Result Details

Members
AdditionalDataSources
Type: Associative array of custom strings keys (NameString) to DataSource structures

A map of reference strings to additional data sources you can specify for an evaluation run.

AdditionalRunOptions

Additional run options you can specify for an evaluation run.

CompletedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time when this run was completed.

DataSource
Type: DataSource structure

The data source (an Glue table) associated with this evaluation run.

ErrorString
Type: string

The error strings that are associated with the run.

ExecutionTime
Type: int

The amount of time (in seconds) that the run consumed resources.

LastModifiedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

A timestamp. The last point in time when this data quality rule recommendation run was modified.

NumberOfWorkers
Type: int

The number of G.1X workers to be used in the run. The default is 5.

ResultIds
Type: Array of strings

A list of result IDs for the data quality results for the run.

Role
Type: string

An IAM role supplied to encrypt the results of the run.

RulesetNames
Type: Array of strings

A list of ruleset names for the run.

RunId
Type: string

The unique run identifier associated with this run.

StartedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time when this run started.

Status
Type: string

The status for this run.

Timeout
Type: int

The timeout for a run in minutes. This is the maximum time that a run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

GetDatabase

$result = $client->getDatabase([/* ... */]);
$promise = $client->getDatabaseAsync([/* ... */]);

Retrieves the definition of a specified database.

Parameter Syntax

$result = $client->getDatabase([
    'CatalogId' => '<string>',
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog in which the database resides. If none is provided, the Amazon Web Services account ID is used by default.

Name
Required: Yes
Type: string

The name of the database to retrieve. For Hive compatibility, this should be all lowercase.

Result Syntax

[
    'Database' => [
        'CatalogId' => '<string>',
        'CreateTableDefaultPermissions' => [
            [
                'Permissions' => ['<string>', ...],
                'Principal' => [
                    'DataLakePrincipalIdentifier' => '<string>',
                ],
            ],
            // ...
        ],
        'CreateTime' => <DateTime>,
        'Description' => '<string>',
        'FederatedDatabase' => [
            'ConnectionName' => '<string>',
            'Identifier' => '<string>',
        ],
        'LocationUri' => '<string>',
        'Name' => '<string>',
        'Parameters' => ['<string>', ...],
        'TargetDatabase' => [
            'CatalogId' => '<string>',
            'DatabaseName' => '<string>',
            'Region' => '<string>',
        ],
    ],
]

Result Details

Members
Database
Type: Database structure

The definition of the specified database in the Data Catalog.

Errors

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GlueEncryptionException:

An encryption operation failed.

FederationSourceException:

A federation source failed.

GetDatabases

$result = $client->getDatabases([/* ... */]);
$promise = $client->getDatabasesAsync([/* ... */]);

Retrieves all databases defined in a given Data Catalog.

Parameter Syntax

$result = $client->getDatabases([
    'CatalogId' => '<string>',
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
    'ResourceShareType' => 'FOREIGN|ALL|FEDERATED',
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog from which to retrieve Databases. If none is provided, the Amazon Web Services account ID is used by default.

MaxResults
Type: int

The maximum number of databases to return in one response.

NextToken
Type: string

A continuation token, if this is a continuation call.

ResourceShareType
Type: string

Allows you to specify that you want to list the databases shared with your account. The allowable values are FEDERATED, FOREIGN or ALL.

  • If set to FEDERATED, will list the federated databases (referencing an external entity) shared with your account.

  • If set to FOREIGN, will list the databases shared with your account.

  • If set to ALL, will list the databases shared with your account, as well as the databases in yor local account.

Result Syntax

[
    'DatabaseList' => [
        [
            'CatalogId' => '<string>',
            'CreateTableDefaultPermissions' => [
                [
                    'Permissions' => ['<string>', ...],
                    'Principal' => [
                        'DataLakePrincipalIdentifier' => '<string>',
                    ],
                ],
                // ...
            ],
            'CreateTime' => <DateTime>,
            'Description' => '<string>',
            'FederatedDatabase' => [
                'ConnectionName' => '<string>',
                'Identifier' => '<string>',
            ],
            'LocationUri' => '<string>',
            'Name' => '<string>',
            'Parameters' => ['<string>', ...],
            'TargetDatabase' => [
                'CatalogId' => '<string>',
                'DatabaseName' => '<string>',
                'Region' => '<string>',
            ],
        ],
        // ...
    ],
    'NextToken' => '<string>',
]

Result Details

Members
DatabaseList
Required: Yes
Type: Array of Database structures

A list of Database objects from the specified catalog.

NextToken
Type: string

A continuation token for paginating the returned list of tokens, returned if the current segment of the list is not the last.

Errors

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GlueEncryptionException:

An encryption operation failed.

GetDataflowGraph

$result = $client->getDataflowGraph([/* ... */]);
$promise = $client->getDataflowGraphAsync([/* ... */]);

Transforms a Python script into a directed acyclic graph (DAG).

Parameter Syntax

$result = $client->getDataflowGraph([
    'PythonScript' => '<string>',
]);

Parameter Details

Members
PythonScript
Type: string

The Python script to transform.

Result Syntax

[
    'DagEdges' => [
        [
            'Source' => '<string>',
            'Target' => '<string>',
            'TargetParameter' => '<string>',
        ],
        // ...
    ],
    'DagNodes' => [
        [
            'Args' => [
                [
                    'Name' => '<string>',
                    'Param' => true || false,
                    'Value' => '<string>',
                ],
                // ...
            ],
            'Id' => '<string>',
            'LineNumber' => <integer>,
            'NodeType' => '<string>',
        ],
        // ...
    ],
]

Result Details

Members
DagEdges
Type: Array of CodeGenEdge structures

A list of the edges in the resulting DAG.

DagNodes
Type: Array of CodeGenNode structures

A list of the nodes in the resulting DAG.

Errors

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GetDevEndpoint

$result = $client->getDevEndpoint([/* ... */]);
$promise = $client->getDevEndpointAsync([/* ... */]);

Retrieves information about a specified development endpoint.

When you create a development endpoint in a virtual private cloud (VPC), Glue returns only a private IP address, and the public IP address field is not populated. When you create a non-VPC development endpoint, Glue returns only a public IP address.

Parameter Syntax

$result = $client->getDevEndpoint([
    'EndpointName' => '<string>', // REQUIRED
]);

Parameter Details

Members
EndpointName
Required: Yes
Type: string

Name of the DevEndpoint to retrieve information for.

Result Syntax

[
    'DevEndpoint' => [
        'Arguments' => ['<string>', ...],
        'AvailabilityZone' => '<string>',
        'CreatedTimestamp' => <DateTime>,
        'EndpointName' => '<string>',
        'ExtraJarsS3Path' => '<string>',
        'ExtraPythonLibsS3Path' => '<string>',
        'FailureReason' => '<string>',
        'GlueVersion' => '<string>',
        'LastModifiedTimestamp' => <DateTime>,
        'LastUpdateStatus' => '<string>',
        'NumberOfNodes' => <integer>,
        'NumberOfWorkers' => <integer>,
        'PrivateAddress' => '<string>',
        'PublicAddress' => '<string>',
        'PublicKey' => '<string>',
        'PublicKeys' => ['<string>', ...],
        'RoleArn' => '<string>',
        'SecurityConfiguration' => '<string>',
        'SecurityGroupIds' => ['<string>', ...],
        'Status' => '<string>',
        'SubnetId' => '<string>',
        'VpcId' => '<string>',
        'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X',
        'YarnEndpointAddress' => '<string>',
        'ZeppelinRemoteSparkInterpreterPort' => <integer>,
    ],
]

Result Details

Members
DevEndpoint
Type: DevEndpoint structure

A DevEndpoint definition.

Errors

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

InvalidInputException:

The input provided was not valid.

GetDevEndpoints

$result = $client->getDevEndpoints([/* ... */]);
$promise = $client->getDevEndpointsAsync([/* ... */]);

Retrieves all the development endpoints in this Amazon Web Services account.

When you create a development endpoint in a virtual private cloud (VPC), Glue returns only a private IP address and the public IP address field is not populated. When you create a non-VPC development endpoint, Glue returns only a public IP address.

Parameter Syntax

$result = $client->getDevEndpoints([
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
]);

Parameter Details

Members
MaxResults
Type: int

The maximum size of information to return.

NextToken
Type: string

A continuation token, if this is a continuation call.

Result Syntax

[
    'DevEndpoints' => [
        [
            'Arguments' => ['<string>', ...],
            'AvailabilityZone' => '<string>',
            'CreatedTimestamp' => <DateTime>,
            'EndpointName' => '<string>',
            'ExtraJarsS3Path' => '<string>',
            'ExtraPythonLibsS3Path' => '<string>',
            'FailureReason' => '<string>',
            'GlueVersion' => '<string>',
            'LastModifiedTimestamp' => <DateTime>,
            'LastUpdateStatus' => '<string>',
            'NumberOfNodes' => <integer>,
            'NumberOfWorkers' => <integer>,
            'PrivateAddress' => '<string>',
            'PublicAddress' => '<string>',
            'PublicKey' => '<string>',
            'PublicKeys' => ['<string>', ...],
            'RoleArn' => '<string>',
            'SecurityConfiguration' => '<string>',
            'SecurityGroupIds' => ['<string>', ...],
            'Status' => '<string>',
            'SubnetId' => '<string>',
            'VpcId' => '<string>',
            'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X',
            'YarnEndpointAddress' => '<string>',
            'ZeppelinRemoteSparkInterpreterPort' => <integer>,
        ],
        // ...
    ],
    'NextToken' => '<string>',
]

Result Details

Members
DevEndpoints
Type: Array of DevEndpoint structures

A list of DevEndpoint definitions.

NextToken
Type: string

A continuation token, if not all DevEndpoint definitions have yet been returned.

Errors

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

InvalidInputException:

The input provided was not valid.

GetJob

$result = $client->getJob([/* ... */]);
$promise = $client->getJobAsync([/* ... */]);

Retrieves an existing job definition.

Parameter Syntax

$result = $client->getJob([
    'JobName' => '<string>', // REQUIRED
]);

Parameter Details

Members
JobName
Required: Yes
Type: string

The name of the job definition to retrieve.

Result Syntax

[
    'Job' => [
        'AllocatedCapacity' => <integer>,
        'CodeGenConfigurationNodes' => [
            '<NodeId>' => [
                'Aggregate' => [
                    'Aggs' => [
                        [
                            'AggFunc' => 'avg|countDistinct|count|first|last|kurtosis|max|min|skewness|stddev_samp|stddev_pop|sum|sumDistinct|var_samp|var_pop',
                            'Column' => ['<string>', ...],
                        ],
                        // ...
                    ],
                    'Groups' => [
                        ['<string>', ...],
                        // ...
                    ],
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                ],
                'AmazonRedshiftSource' => [
                    'Data' => [
                        'AccessType' => '<string>',
                        'Action' => '<string>',
                        'AdvancedOptions' => [
                            [
                                'Key' => '<string>',
                                'Value' => '<string>',
                            ],
                            // ...
                        ],
                        'CatalogDatabase' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'CatalogRedshiftSchema' => '<string>',
                        'CatalogRedshiftTable' => '<string>',
                        'CatalogTable' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'Connection' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'CrawlerConnection' => '<string>',
                        'IamRole' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'MergeAction' => '<string>',
                        'MergeClause' => '<string>',
                        'MergeWhenMatched' => '<string>',
                        'MergeWhenNotMatched' => '<string>',
                        'PostAction' => '<string>',
                        'PreAction' => '<string>',
                        'SampleQuery' => '<string>',
                        'Schema' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'SelectedColumns' => [
                            [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            // ...
                        ],
                        'SourceType' => '<string>',
                        'StagingTable' => '<string>',
                        'Table' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'TablePrefix' => '<string>',
                        'TableSchema' => [
                            [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            // ...
                        ],
                        'TempDir' => '<string>',
                        'Upsert' => true || false,
                    ],
                    'Name' => '<string>',
                ],
                'AmazonRedshiftTarget' => [
                    'Data' => [
                        'AccessType' => '<string>',
                        'Action' => '<string>',
                        'AdvancedOptions' => [
                            [
                                'Key' => '<string>',
                                'Value' => '<string>',
                            ],
                            // ...
                        ],
                        'CatalogDatabase' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'CatalogRedshiftSchema' => '<string>',
                        'CatalogRedshiftTable' => '<string>',
                        'CatalogTable' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'Connection' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'CrawlerConnection' => '<string>',
                        'IamRole' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'MergeAction' => '<string>',
                        'MergeClause' => '<string>',
                        'MergeWhenMatched' => '<string>',
                        'MergeWhenNotMatched' => '<string>',
                        'PostAction' => '<string>',
                        'PreAction' => '<string>',
                        'SampleQuery' => '<string>',
                        'Schema' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'SelectedColumns' => [
                            [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            // ...
                        ],
                        'SourceType' => '<string>',
                        'StagingTable' => '<string>',
                        'Table' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'TablePrefix' => '<string>',
                        'TableSchema' => [
                            [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            // ...
                        ],
                        'TempDir' => '<string>',
                        'Upsert' => true || false,
                    ],
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                ],
                'ApplyMapping' => [
                    'Inputs' => ['<string>', ...],
                    'Mapping' => [
                        [
                            'Children' => [...], // RECURSIVE
                            'Dropped' => true || false,
                            'FromPath' => ['<string>', ...],
                            'FromType' => '<string>',
                            'ToKey' => '<string>',
                            'ToType' => '<string>',
                        ],
                        // ...
                    ],
                    'Name' => '<string>',
                ],
                'AthenaConnectorSource' => [
                    'ConnectionName' => '<string>',
                    'ConnectionTable' => '<string>',
                    'ConnectionType' => '<string>',
                    'ConnectorName' => '<string>',
                    'Name' => '<string>',
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>',
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                    'SchemaName' => '<string>',
                ],
                'CatalogDeltaSource' => [
                    'AdditionalDeltaOptions' => ['<string>', ...],
                    'Database' => '<string>',
                    'Name' => '<string>',
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>',
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                    'Table' => '<string>',
                ],
                'CatalogHudiSource' => [
                    'AdditionalHudiOptions' => ['<string>', ...],
                    'Database' => '<string>',
                    'Name' => '<string>',
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>',
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                    'Table' => '<string>',
                ],
                'CatalogKafkaSource' => [
                    'DataPreviewOptions' => [
                        'PollingTime' => <integer>,
                        'RecordPollingLimit' => <integer>,
                    ],
                    'Database' => '<string>',
                    'DetectSchema' => true || false,
                    'Name' => '<string>',
                    'StreamingOptions' => [
                        'AddRecordTimestamp' => '<string>',
                        'Assign' => '<string>',
                        'BootstrapServers' => '<string>',
                        'Classification' => '<string>',
                        'ConnectionName' => '<string>',
                        'Delimiter' => '<string>',
                        'EmitConsumerLagMetrics' => '<string>',
                        'EndingOffsets' => '<string>',
                        'IncludeHeaders' => true || false,
                        'MaxOffsetsPerTrigger' => <integer>,
                        'MinPartitions' => <integer>,
                        'NumRetries' => <integer>,
                        'PollTimeoutMs' => <integer>,
                        'RetryIntervalMs' => <integer>,
                        'SecurityProtocol' => '<string>',
                        'StartingOffsets' => '<string>',
                        'StartingTimestamp' => <DateTime>,
                        'SubscribePattern' => '<string>',
                        'TopicName' => '<string>',
                    ],
                    'Table' => '<string>',
                    'WindowSize' => <integer>,
                ],
                'CatalogKinesisSource' => [
                    'DataPreviewOptions' => [
                        'PollingTime' => <integer>,
                        'RecordPollingLimit' => <integer>,
                    ],
                    'Database' => '<string>',
                    'DetectSchema' => true || false,
                    'Name' => '<string>',
                    'StreamingOptions' => [
                        'AddIdleTimeBetweenReads' => true || false,
                        'AddRecordTimestamp' => '<string>',
                        'AvoidEmptyBatches' => true || false,
                        'Classification' => '<string>',
                        'Delimiter' => '<string>',
                        'DescribeShardInterval' => <integer>,
                        'EmitConsumerLagMetrics' => '<string>',
                        'EndpointUrl' => '<string>',
                        'IdleTimeBetweenReadsInMs' => <integer>,
                        'MaxFetchRecordsPerShard' => <integer>,
                        'MaxFetchTimeInMs' => <integer>,
                        'MaxRecordPerRead' => <integer>,
                        'MaxRetryIntervalMs' => <integer>,
                        'NumRetries' => <integer>,
                        'RetryIntervalMs' => <integer>,
                        'RoleArn' => '<string>',
                        'RoleSessionName' => '<string>',
                        'StartingPosition' => 'latest|trim_horizon|earliest|timestamp',
                        'StartingTimestamp' => <DateTime>,
                        'StreamArn' => '<string>',
                        'StreamName' => '<string>',
                    ],
                    'Table' => '<string>',
                    'WindowSize' => <integer>,
                ],
                'CatalogSource' => [
                    'Database' => '<string>',
                    'Name' => '<string>',
                    'Table' => '<string>',
                ],
                'CatalogTarget' => [
                    'Database' => '<string>',
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                    'Table' => '<string>',
                ],
                'ConnectorDataSource' => [
                    'ConnectionType' => '<string>',
                    'Data' => ['<string>', ...],
                    'Name' => '<string>',
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>',
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                ],
                'ConnectorDataTarget' => [
                    'ConnectionType' => '<string>',
                    'Data' => ['<string>', ...],
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                ],
                'CustomCode' => [
                    'ClassName' => '<string>',
                    'Code' => '<string>',
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>',
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                ],
                'DirectJDBCSource' => [
                    'ConnectionName' => '<string>',
                    'ConnectionType' => 'sqlserver|mysql|oracle|postgresql|redshift',
                    'Database' => '<string>',
                    'Name' => '<string>',
                    'RedshiftTmpDir' => '<string>',
                    'Table' => '<string>',
                ],
                'DirectKafkaSource' => [
                    'DataPreviewOptions' => [
                        'PollingTime' => <integer>,
                        'RecordPollingLimit' => <integer>,
                    ],
                    'DetectSchema' => true || false,
                    'Name' => '<string>',
                    'StreamingOptions' => [
                        'AddRecordTimestamp' => '<string>',
                        'Assign' => '<string>',
                        'BootstrapServers' => '<string>',
                        'Classification' => '<string>',
                        'ConnectionName' => '<string>',
                        'Delimiter' => '<string>',
                        'EmitConsumerLagMetrics' => '<string>',
                        'EndingOffsets' => '<string>',
                        'IncludeHeaders' => true || false,
                        'MaxOffsetsPerTrigger' => <integer>,
                        'MinPartitions' => <integer>,
                        'NumRetries' => <integer>,
                        'PollTimeoutMs' => <integer>,
                        'RetryIntervalMs' => <integer>,
                        'SecurityProtocol' => '<string>',
                        'StartingOffsets' => '<string>',
                        'StartingTimestamp' => <DateTime>,
                        'SubscribePattern' => '<string>',
                        'TopicName' => '<string>',
                    ],
                    'WindowSize' => <integer>,
                ],
                'DirectKinesisSource' => [
                    'DataPreviewOptions' => [
                        'PollingTime' => <integer>,
                        'RecordPollingLimit' => <integer>,
                    ],
                    'DetectSchema' => true || false,
                    'Name' => '<string>',
                    'StreamingOptions' => [
                        'AddIdleTimeBetweenReads' => true || false,
                        'AddRecordTimestamp' => '<string>',
                        'AvoidEmptyBatches' => true || false,
                        'Classification' => '<string>',
                        'Delimiter' => '<string>',
                        'DescribeShardInterval' => <integer>,
                        'EmitConsumerLagMetrics' => '<string>',
                        'EndpointUrl' => '<string>',
                        'IdleTimeBetweenReadsInMs' => <integer>,
                        'MaxFetchRecordsPerShard' => <integer>,
                        'MaxFetchTimeInMs' => <integer>,
                        'MaxRecordPerRead' => <integer>,
                        'MaxRetryIntervalMs' => <integer>,
                        'NumRetries' => <integer>,
                        'RetryIntervalMs' => <integer>,
                        'RoleArn' => '<string>',
                        'RoleSessionName' => '<string>',
                        'StartingPosition' => 'latest|trim_horizon|earliest|timestamp',
                        'StartingTimestamp' => <DateTime>,
                        'StreamArn' => '<string>',
                        'StreamName' => '<string>',
                    ],
                    'WindowSize' => <integer>,
                ],
                'DropDuplicates' => [
                    'Columns' => [
                        ['<string>', ...],
                        // ...
                    ],
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                ],
                'DropFields' => [
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                    'Paths' => [
                        ['<string>', ...],
                        // ...
                    ],
                ],
                'DropNullFields' => [
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                    'NullCheckBoxList' => [
                        'IsEmpty' => true || false,
                        'IsNegOne' => true || false,
                        'IsNullString' => true || false,
                    ],
                    'NullTextList' => [
                        [
                            'Datatype' => [
                                'Id' => '<string>',
                                'Label' => '<string>',
                            ],
                            'Value' => '<string>',
                        ],
                        // ...
                    ],
                ],
                'DynamicTransform' => [
                    'FunctionName' => '<string>',
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>',
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                    'Parameters' => [
                        [
                            'IsOptional' => true || false,
                            'ListType' => 'str|int|float|complex|bool|list|null',
                            'Name' => '<string>',
                            'Type' => 'str|int|float|complex|bool|list|null',
                            'ValidationMessage' => '<string>',
                            'ValidationRule' => '<string>',
                            'Value' => ['<string>', ...],
                        ],
                        // ...
                    ],
                    'Path' => '<string>',
                    'TransformName' => '<string>',
                    'Version' => '<string>',
                ],
                'DynamoDBCatalogSource' => [
                    'Database' => '<string>',
                    'Name' => '<string>',
                    'Table' => '<string>',
                ],
                'EvaluateDataQuality' => [
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                    'Output' => 'PrimaryInput|EvaluationResults',
                    'PublishingOptions' => [
                        'CloudWatchMetricsEnabled' => true || false,
                        'EvaluationContext' => '<string>',
                        'ResultsPublishingEnabled' => true || false,
                        'ResultsS3Prefix' => '<string>',
                    ],
                    'Ruleset' => '<string>',
                    'StopJobOnFailureOptions' => [
                        'StopJobOnFailureTiming' => 'Immediate|AfterDataLoad',
                    ],
                ],
                'EvaluateDataQualityMultiFrame' => [
                    'AdditionalDataSources' => ['<string>', ...],
                    'AdditionalOptions' => ['<string>', ...],
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                    'PublishingOptions' => [
                        'CloudWatchMetricsEnabled' => true || false,
                        'EvaluationContext' => '<string>',
                        'ResultsPublishingEnabled' => true || false,
                        'ResultsS3Prefix' => '<string>',
                    ],
                    'Ruleset' => '<string>',
                    'StopJobOnFailureOptions' => [
                        'StopJobOnFailureTiming' => 'Immediate|AfterDataLoad',
                    ],
                ],
                'FillMissingValues' => [
                    'FilledPath' => '<string>',
                    'ImputedPath' => '<string>',
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                ],
                'Filter' => [
                    'Filters' => [
                        [
                            'Negated' => true || false,
                            'Operation' => 'EQ|LT|GT|LTE|GTE|REGEX|ISNULL',
                            'Values' => [
                                [
                                    'Type' => 'COLUMNEXTRACTED|CONSTANT',
                                    'Value' => ['<string>', ...],
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                    'Inputs' => ['<string>', ...],
                    'LogicalOperator' => 'AND|OR',
                    'Name' => '<string>',
                ],
                'GovernedCatalogSource' => [
                    'AdditionalOptions' => [
                        'BoundedFiles' => <integer>,
                        'BoundedSize' => <integer>,
                    ],
                    'Database' => '<string>',
                    'Name' => '<string>',
                    'PartitionPredicate' => '<string>',
                    'Table' => '<string>',
                ],
                'GovernedCatalogTarget' => [
                    'Database' => '<string>',
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                    'PartitionKeys' => [
                        ['<string>', ...],
                        // ...
                    ],
                    'SchemaChangePolicy' => [
                        'EnableUpdateCatalog' => true || false,
                        'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                    ],
                    'Table' => '<string>',
                ],
                'JDBCConnectorSource' => [
                    'AdditionalOptions' => [
                        'DataTypeMapping' => ['<string>', ...],
                        'FilterPredicate' => '<string>',
                        'JobBookmarkKeys' => ['<string>', ...],
                        'JobBookmarkKeysSortOrder' => '<string>',
                        'LowerBound' => <integer>,
                        'NumPartitions' => <integer>,
                        'PartitionColumn' => '<string>',
                        'UpperBound' => <integer>,
                    ],
                    'ConnectionName' => '<string>',
                    'ConnectionTable' => '<string>',
                    'ConnectionType' => '<string>',
                    'ConnectorName' => '<string>',
                    'Name' => '<string>',
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>',
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                    'Query' => '<string>',
                ],
                'JDBCConnectorTarget' => [
                    'AdditionalOptions' => ['<string>', ...],
                    'ConnectionName' => '<string>',
                    'ConnectionTable' => '<string>',
                    'ConnectionType' => '<string>',
                    'ConnectorName' => '<string>',
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>',
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                ],
                'Join' => [
                    'Columns' => [
                        [
                            'From' => '<string>',
                            'Keys' => [
                                ['<string>', ...],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                    'Inputs' => ['<string>', ...],
                    'JoinType' => 'equijoin|left|right|outer|leftsemi|leftanti',
                    'Name' => '<string>',
                ],
                'Merge' => [
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                    'PrimaryKeys' => [
                        ['<string>', ...],
                        // ...
                    ],
                    'Source' => '<string>',
                ],
                'MicrosoftSQLServerCatalogSource' => [
                    'Database' => '<string>',
                    'Name' => '<string>',
                    'Table' => '<string>',
                ],
                'MicrosoftSQLServerCatalogTarget' => [
                    'Database' => '<string>',
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                    'Table' => '<string>',
                ],
                'MySQLCatalogSource' => [
                    'Database' => '<string>',
                    'Name' => '<string>',
                    'Table' => '<string>',
                ],
                'MySQLCatalogTarget' => [
                    'Database' => '<string>',
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                    'Table' => '<string>',
                ],
                'OracleSQLCatalogSource' => [
                    'Database' => '<string>',
                    'Name' => '<string>',
                    'Table' => '<string>',
                ],
                'OracleSQLCatalogTarget' => [
                    'Database' => '<string>',
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                    'Table' => '<string>',
                ],
                'PIIDetection' => [
                    'EntityTypesToDetect' => ['<string>', ...],
                    'Inputs' => ['<string>', ...],
                    'MaskValue' => '<string>',
                    'Name' => '<string>',
                    'OutputColumnName' => '<string>',
                    'PiiType' => 'RowAudit|RowMasking|ColumnAudit|ColumnMasking',
                    'SampleFraction' => <float>,
                    'ThresholdFraction' => <float>,
                ],
                'PostgreSQLCatalogSource' => [
                    'Database' => '<string>',
                    'Name' => '<string>',
                    'Table' => '<string>',
                ],
                'PostgreSQLCatalogTarget' => [
                    'Database' => '<string>',
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                    'Table' => '<string>',
                ],
                'Recipe' => [
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                    'RecipeReference' => [
                        'RecipeArn' => '<string>',
                        'RecipeVersion' => '<string>',
                    ],
                ],
                'RedshiftSource' => [
                    'Database' => '<string>',
                    'Name' => '<string>',
                    'RedshiftTmpDir' => '<string>',
                    'Table' => '<string>',
                    'TmpDirIAMRole' => '<string>',
                ],
                'RedshiftTarget' => [
                    'Database' => '<string>',
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                    'RedshiftTmpDir' => '<string>',
                    'Table' => '<string>',
                    'TmpDirIAMRole' => '<string>',
                    'UpsertRedshiftOptions' => [
                        'ConnectionName' => '<string>',
                        'TableLocation' => '<string>',
                        'UpsertKeys' => ['<string>', ...],
                    ],
                ],
                'RelationalCatalogSource' => [
                    'Database' => '<string>',
                    'Name' => '<string>',
                    'Table' => '<string>',
                ],
                'RenameField' => [
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                    'SourcePath' => ['<string>', ...],
                    'TargetPath' => ['<string>', ...],
                ],
                'S3CatalogDeltaSource' => [
                    'AdditionalDeltaOptions' => ['<string>', ...],
                    'Database' => '<string>',
                    'Name' => '<string>',
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>',
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                    'Table' => '<string>',
                ],
                'S3CatalogHudiSource' => [
                    'AdditionalHudiOptions' => ['<string>', ...],
                    'Database' => '<string>',
                    'Name' => '<string>',
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>',
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                    'Table' => '<string>',
                ],
                'S3CatalogSource' => [
                    'AdditionalOptions' => [
                        'BoundedFiles' => <integer>,
                        'BoundedSize' => <integer>,
                    ],
                    'Database' => '<string>',
                    'Name' => '<string>',
                    'PartitionPredicate' => '<string>',
                    'Table' => '<string>',
                ],
                'S3CatalogTarget' => [
                    'Database' => '<string>',
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                    'PartitionKeys' => [
                        ['<string>', ...],
                        // ...
                    ],
                    'SchemaChangePolicy' => [
                        'EnableUpdateCatalog' => true || false,
                        'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                    ],
                    'Table' => '<string>',
                ],
                'S3CsvSource' => [
                    'AdditionalOptions' => [
                        'BoundedFiles' => <integer>,
                        'BoundedSize' => <integer>,
                        'EnableSamplePath' => true || false,
                        'SamplePath' => '<string>',
                    ],
                    'CompressionType' => 'gzip|bzip2',
                    'Escaper' => '<string>',
                    'Exclusions' => ['<string>', ...],
                    'GroupFiles' => '<string>',
                    'GroupSize' => '<string>',
                    'MaxBand' => <integer>,
                    'MaxFilesInBand' => <integer>,
                    'Multiline' => true || false,
                    'Name' => '<string>',
                    'OptimizePerformance' => true || false,
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>',
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                    'Paths' => ['<string>', ...],
                    'QuoteChar' => 'quote|quillemet|single_quote|disabled',
                    'Recurse' => true || false,
                    'Separator' => 'comma|ctrla|pipe|semicolon|tab',
                    'SkipFirst' => true || false,
                    'WithHeader' => true || false,
                    'WriteHeader' => true || false,
                ],
                'S3DeltaCatalogTarget' => [
                    'AdditionalOptions' => ['<string>', ...],
                    'Database' => '<string>',
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                    'PartitionKeys' => [
                        ['<string>', ...],
                        // ...
                    ],
                    'SchemaChangePolicy' => [
                        'EnableUpdateCatalog' => true || false,
                        'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                    ],
                    'Table' => '<string>',
                ],
                'S3DeltaDirectTarget' => [
                    'AdditionalOptions' => ['<string>', ...],
                    'Compression' => 'uncompressed|snappy',
                    'Format' => 'json|csv|avro|orc|parquet|hudi|delta',
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                    'PartitionKeys' => [
                        ['<string>', ...],
                        // ...
                    ],
                    'Path' => '<string>',
                    'SchemaChangePolicy' => [
                        'Database' => '<string>',
                        'EnableUpdateCatalog' => true || false,
                        'Table' => '<string>',
                        'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                    ],
                ],
                'S3DeltaSource' => [
                    'AdditionalDeltaOptions' => ['<string>', ...],
                    'AdditionalOptions' => [
                        'BoundedFiles' => <integer>,
                        'BoundedSize' => <integer>,
                        'EnableSamplePath' => true || false,
                        'SamplePath' => '<string>',
                    ],
                    'Name' => '<string>',
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>',
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                    'Paths' => ['<string>', ...],
                ],
                'S3DirectTarget' => [
                    'Compression' => '<string>',
                    'Format' => 'json|csv|avro|orc|parquet|hudi|delta',
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                    'PartitionKeys' => [
                        ['<string>', ...],
                        // ...
                    ],
                    'Path' => '<string>',
                    'SchemaChangePolicy' => [
                        'Database' => '<string>',
                        'EnableUpdateCatalog' => true || false,
                        'Table' => '<string>',
                        'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                    ],
                ],
                'S3GlueParquetTarget' => [
                    'Compression' => 'snappy|lzo|gzip|uncompressed|none',
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                    'PartitionKeys' => [
                        ['<string>', ...],
                        // ...
                    ],
                    'Path' => '<string>',
                    'SchemaChangePolicy' => [
                        'Database' => '<string>',
                        'EnableUpdateCatalog' => true || false,
                        'Table' => '<string>',
                        'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                    ],
                ],
                'S3HudiCatalogTarget' => [
                    'AdditionalOptions' => ['<string>', ...],
                    'Database' => '<string>',
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                    'PartitionKeys' => [
                        ['<string>', ...],
                        // ...
                    ],
                    'SchemaChangePolicy' => [
                        'EnableUpdateCatalog' => true || false,
                        'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                    ],
                    'Table' => '<string>',
                ],
                'S3HudiDirectTarget' => [
                    'AdditionalOptions' => ['<string>', ...],
                    'Compression' => 'gzip|lzo|uncompressed|snappy',
                    'Format' => 'json|csv|avro|orc|parquet|hudi|delta',
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                    'PartitionKeys' => [
                        ['<string>', ...],
                        // ...
                    ],
                    'Path' => '<string>',
                    'SchemaChangePolicy' => [
                        'Database' => '<string>',
                        'EnableUpdateCatalog' => true || false,
                        'Table' => '<string>',
                        'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                    ],
                ],
                'S3HudiSource' => [
                    'AdditionalHudiOptions' => ['<string>', ...],
                    'AdditionalOptions' => [
                        'BoundedFiles' => <integer>,
                        'BoundedSize' => <integer>,
                        'EnableSamplePath' => true || false,
                        'SamplePath' => '<string>',
                    ],
                    'Name' => '<string>',
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>',
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                    'Paths' => ['<string>', ...],
                ],
                'S3JsonSource' => [
                    'AdditionalOptions' => [
                        'BoundedFiles' => <integer>,
                        'BoundedSize' => <integer>,
                        'EnableSamplePath' => true || false,
                        'SamplePath' => '<string>',
                    ],
                    'CompressionType' => 'gzip|bzip2',
                    'Exclusions' => ['<string>', ...],
                    'GroupFiles' => '<string>',
                    'GroupSize' => '<string>',
                    'JsonPath' => '<string>',
                    'MaxBand' => <integer>,
                    'MaxFilesInBand' => <integer>,
                    'Multiline' => true || false,
                    'Name' => '<string>',
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>',
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                    'Paths' => ['<string>', ...],
                    'Recurse' => true || false,
                ],
                'S3ParquetSource' => [
                    'AdditionalOptions' => [
                        'BoundedFiles' => <integer>,
                        'BoundedSize' => <integer>,
                        'EnableSamplePath' => true || false,
                        'SamplePath' => '<string>',
                    ],
                    'CompressionType' => 'snappy|lzo|gzip|uncompressed|none',
                    'Exclusions' => ['<string>', ...],
                    'GroupFiles' => '<string>',
                    'GroupSize' => '<string>',
                    'MaxBand' => <integer>,
                    'MaxFilesInBand' => <integer>,
                    'Name' => '<string>',
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>',
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                    'Paths' => ['<string>', ...],
                    'Recurse' => true || false,
                ],
                'SelectFields' => [
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                    'Paths' => [
                        ['<string>', ...],
                        // ...
                    ],
                ],
                'SelectFromCollection' => [
                    'Index' => <integer>,
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                ],
                'SnowflakeSource' => [
                    'Data' => [
                        'Action' => '<string>',
                        'AdditionalOptions' => ['<string>', ...],
                        'AutoPushdown' => true || false,
                        'Connection' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'Database' => '<string>',
                        'IamRole' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'MergeAction' => '<string>',
                        'MergeClause' => '<string>',
                        'MergeWhenMatched' => '<string>',
                        'MergeWhenNotMatched' => '<string>',
                        'PostAction' => '<string>',
                        'PreAction' => '<string>',
                        'SampleQuery' => '<string>',
                        'Schema' => '<string>',
                        'SelectedColumns' => [
                            [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            // ...
                        ],
                        'SourceType' => '<string>',
                        'StagingTable' => '<string>',
                        'Table' => '<string>',
                        'TableSchema' => [
                            [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            // ...
                        ],
                        'TempDir' => '<string>',
                        'Upsert' => true || false,
                    ],
                    'Name' => '<string>',
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>',
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                ],
                'SnowflakeTarget' => [
                    'Data' => [
                        'Action' => '<string>',
                        'AdditionalOptions' => ['<string>', ...],
                        'AutoPushdown' => true || false,
                        'Connection' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'Database' => '<string>',
                        'IamRole' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'MergeAction' => '<string>',
                        'MergeClause' => '<string>',
                        'MergeWhenMatched' => '<string>',
                        'MergeWhenNotMatched' => '<string>',
                        'PostAction' => '<string>',
                        'PreAction' => '<string>',
                        'SampleQuery' => '<string>',
                        'Schema' => '<string>',
                        'SelectedColumns' => [
                            [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            // ...
                        ],
                        'SourceType' => '<string>',
                        'StagingTable' => '<string>',
                        'Table' => '<string>',
                        'TableSchema' => [
                            [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            // ...
                        ],
                        'TempDir' => '<string>',
                        'Upsert' => true || false,
                    ],
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                ],
                'SparkConnectorSource' => [
                    'AdditionalOptions' => ['<string>', ...],
                    'ConnectionName' => '<string>',
                    'ConnectionType' => '<string>',
                    'ConnectorName' => '<string>',
                    'Name' => '<string>',
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>',
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                ],
                'SparkConnectorTarget' => [
                    'AdditionalOptions' => ['<string>', ...],
                    'ConnectionName' => '<string>',
                    'ConnectionType' => '<string>',
                    'ConnectorName' => '<string>',
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>',
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                ],
                'SparkSQL' => [
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>',
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                    'SqlAliases' => [
                        [
                            'Alias' => '<string>',
                            'From' => '<string>',
                        ],
                        // ...
                    ],
                    'SqlQuery' => '<string>',
                ],
                'Spigot' => [
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                    'Path' => '<string>',
                    'Prob' => <float>,
                    'Topk' => <integer>,
                ],
                'SplitFields' => [
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                    'Paths' => [
                        ['<string>', ...],
                        // ...
                    ],
                ],
                'Union' => [
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                    'UnionType' => 'ALL|DISTINCT',
                ],
            ],
            // ...
        ],
        'Command' => [
            'Name' => '<string>',
            'PythonVersion' => '<string>',
            'Runtime' => '<string>',
            'ScriptLocation' => '<string>',
        ],
        'Connections' => [
            'Connections' => ['<string>', ...],
        ],
        'CreatedOn' => <DateTime>,
        'DefaultArguments' => ['<string>', ...],
        'Description' => '<string>',
        'ExecutionClass' => 'FLEX|STANDARD',
        'ExecutionProperty' => [
            'MaxConcurrentRuns' => <integer>,
        ],
        'GlueVersion' => '<string>',
        'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK',
        'LastModifiedOn' => <DateTime>,
        'LogUri' => '<string>',
        'MaintenanceWindow' => '<string>',
        'MaxCapacity' => <float>,
        'MaxRetries' => <integer>,
        'Name' => '<string>',
        'NonOverridableArguments' => ['<string>', ...],
        'NotificationProperty' => [
            'NotifyDelayAfter' => <integer>,
        ],
        'NumberOfWorkers' => <integer>,
        'Role' => '<string>',
        'SecurityConfiguration' => '<string>',
        'SourceControlDetails' => [
            'AuthStrategy' => 'PERSONAL_ACCESS_TOKEN|AWS_SECRETS_MANAGER',
            'AuthToken' => '<string>',
            'Branch' => '<string>',
            'Folder' => '<string>',
            'LastCommitId' => '<string>',
            'Owner' => '<string>',
            'Provider' => 'GITHUB|GITLAB|BITBUCKET|AWS_CODE_COMMIT',
            'Repository' => '<string>',
        ],
        'Timeout' => <integer>,
        'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X',
    ],
]

Result Details

Members
Job
Type: Job structure

The requested job definition.

Errors

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GetJobBookmark

$result = $client->getJobBookmark([/* ... */]);
$promise = $client->getJobBookmarkAsync([/* ... */]);

Returns information on a job bookmark entry.

For more information about enabling and using job bookmarks, see:

Parameter Syntax

$result = $client->getJobBookmark([
    'JobName' => '<string>', // REQUIRED
    'RunId' => '<string>',
]);

Parameter Details

Members
JobName
Required: Yes
Type: string

The name of the job in question.

RunId
Type: string

The unique run identifier associated with this job run.

Result Syntax

[
    'JobBookmarkEntry' => [
        'Attempt' => <integer>,
        'JobBookmark' => '<string>',
        'JobName' => '<string>',
        'PreviousRunId' => '<string>',
        'Run' => <integer>,
        'RunId' => '<string>',
        'Version' => <integer>,
    ],
]

Result Details

Members
JobBookmarkEntry
Type: JobBookmarkEntry structure

A structure that defines a point that a job can resume processing.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

ValidationException:

A value could not be validated.

GetJobRun

$result = $client->getJobRun([/* ... */]);
$promise = $client->getJobRunAsync([/* ... */]);

Retrieves the metadata for a given job run.

Parameter Syntax

$result = $client->getJobRun([
    'JobName' => '<string>', // REQUIRED
    'PredecessorsIncluded' => true || false,
    'RunId' => '<string>', // REQUIRED
]);

Parameter Details

Members
JobName
Required: Yes
Type: string

Name of the job definition being run.

PredecessorsIncluded
Type: boolean

True if a list of predecessor runs should be returned.

RunId
Required: Yes
Type: string

The ID of the job run.

Result Syntax

[
    'JobRun' => [
        'AllocatedCapacity' => <integer>,
        'Arguments' => ['<string>', ...],
        'Attempt' => <integer>,
        'CompletedOn' => <DateTime>,
        'DPUSeconds' => <float>,
        'ErrorMessage' => '<string>',
        'ExecutionClass' => 'FLEX|STANDARD',
        'ExecutionTime' => <integer>,
        'GlueVersion' => '<string>',
        'Id' => '<string>',
        'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK',
        'JobName' => '<string>',
        'JobRunState' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED',
        'LastModifiedOn' => <DateTime>,
        'LogGroupName' => '<string>',
        'MaintenanceWindow' => '<string>',
        'MaxCapacity' => <float>,
        'NotificationProperty' => [
            'NotifyDelayAfter' => <integer>,
        ],
        'NumberOfWorkers' => <integer>,
        'PredecessorRuns' => [
            [
                'JobName' => '<string>',
                'RunId' => '<string>',
            ],
            // ...
        ],
        'PreviousRunId' => '<string>',
        'SecurityConfiguration' => '<string>',
        'StartedOn' => <DateTime>,
        'Timeout' => <integer>,
        'TriggerName' => '<string>',
        'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X',
    ],
]

Result Details

Members
JobRun
Type: JobRun structure

The requested job-run metadata.

Errors

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GetJobRuns

$result = $client->getJobRuns([/* ... */]);
$promise = $client->getJobRunsAsync([/* ... */]);

Retrieves metadata for all runs of a given job definition.

Parameter Syntax

$result = $client->getJobRuns([
    'JobName' => '<string>', // REQUIRED
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
]);

Parameter Details

Members
JobName
Required: Yes
Type: string

The name of the job definition for which to retrieve all job runs.

MaxResults
Type: int

The maximum size of the response.

NextToken
Type: string

A continuation token, if this is a continuation call.

Result Syntax

[
    'JobRuns' => [
        [
            'AllocatedCapacity' => <integer>,
            'Arguments' => ['<string>', ...],
            'Attempt' => <integer>,
            'CompletedOn' => <DateTime>,
            'DPUSeconds' => <float>,
            'ErrorMessage' => '<string>',
            'ExecutionClass' => 'FLEX|STANDARD',
            'ExecutionTime' => <integer>,
            'GlueVersion' => '<string>',
            'Id' => '<string>',
            'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK',
            'JobName' => '<string>',
            'JobRunState' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED',
            'LastModifiedOn' => <DateTime>,
            'LogGroupName' => '<string>',
            'MaintenanceWindow' => '<string>',
            'MaxCapacity' => <float>,
            'NotificationProperty' => [
                'NotifyDelayAfter' => <integer>,
            ],
            'NumberOfWorkers' => <integer>,
            'PredecessorRuns' => [
                [
                    'JobName' => '<string>',
                    'RunId' => '<string>',
                ],
                // ...
            ],
            'PreviousRunId' => '<string>',
            'SecurityConfiguration' => '<string>',
            'StartedOn' => <DateTime>,
            'Timeout' => <integer>,
            'TriggerName' => '<string>',
            'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X',
        ],
        // ...
    ],
    'NextToken' => '<string>',
]

Result Details

Members
JobRuns
Type: Array of JobRun structures

A list of job-run metadata objects.

NextToken
Type: string

A continuation token, if not all requested job runs have been returned.

Errors

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GetJobs

$result = $client->getJobs([/* ... */]);
$promise = $client->getJobsAsync([/* ... */]);

Retrieves all current job definitions.

Parameter Syntax

$result = $client->getJobs([
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
]);

Parameter Details

Members
MaxResults
Type: int

The maximum size of the response.

NextToken
Type: string

A continuation token, if this is a continuation call.

Result Syntax

[
    'Jobs' => [
        [
            'AllocatedCapacity' => <integer>,
            'CodeGenConfigurationNodes' => [
                '<NodeId>' => [
                    'Aggregate' => [
                        'Aggs' => [
                            [
                                'AggFunc' => 'avg|countDistinct|count|first|last|kurtosis|max|min|skewness|stddev_samp|stddev_pop|sum|sumDistinct|var_samp|var_pop',
                                'Column' => ['<string>', ...],
                            ],
                            // ...
                        ],
                        'Groups' => [
                            ['<string>', ...],
                            // ...
                        ],
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                    ],
                    'AmazonRedshiftSource' => [
                        'Data' => [
                            'AccessType' => '<string>',
                            'Action' => '<string>',
                            'AdvancedOptions' => [
                                [
                                    'Key' => '<string>',
                                    'Value' => '<string>',
                                ],
                                // ...
                            ],
                            'CatalogDatabase' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'CatalogRedshiftSchema' => '<string>',
                            'CatalogRedshiftTable' => '<string>',
                            'CatalogTable' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'Connection' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'CrawlerConnection' => '<string>',
                            'IamRole' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'MergeAction' => '<string>',
                            'MergeClause' => '<string>',
                            'MergeWhenMatched' => '<string>',
                            'MergeWhenNotMatched' => '<string>',
                            'PostAction' => '<string>',
                            'PreAction' => '<string>',
                            'SampleQuery' => '<string>',
                            'Schema' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'SelectedColumns' => [
                                [
                                    'Description' => '<string>',
                                    'Label' => '<string>',
                                    'Value' => '<string>',
                                ],
                                // ...
                            ],
                            'SourceType' => '<string>',
                            'StagingTable' => '<string>',
                            'Table' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'TablePrefix' => '<string>',
                            'TableSchema' => [
                                [
                                    'Description' => '<string>',
                                    'Label' => '<string>',
                                    'Value' => '<string>',
                                ],
                                // ...
                            ],
                            'TempDir' => '<string>',
                            'Upsert' => true || false,
                        ],
                        'Name' => '<string>',
                    ],
                    'AmazonRedshiftTarget' => [
                        'Data' => [
                            'AccessType' => '<string>',
                            'Action' => '<string>',
                            'AdvancedOptions' => [
                                [
                                    'Key' => '<string>',
                                    'Value' => '<string>',
                                ],
                                // ...
                            ],
                            'CatalogDatabase' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'CatalogRedshiftSchema' => '<string>',
                            'CatalogRedshiftTable' => '<string>',
                            'CatalogTable' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'Connection' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'CrawlerConnection' => '<string>',
                            'IamRole' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'MergeAction' => '<string>',
                            'MergeClause' => '<string>',
                            'MergeWhenMatched' => '<string>',
                            'MergeWhenNotMatched' => '<string>',
                            'PostAction' => '<string>',
                            'PreAction' => '<string>',
                            'SampleQuery' => '<string>',
                            'Schema' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'SelectedColumns' => [
                                [
                                    'Description' => '<string>',
                                    'Label' => '<string>',
                                    'Value' => '<string>',
                                ],
                                // ...
                            ],
                            'SourceType' => '<string>',
                            'StagingTable' => '<string>',
                            'Table' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'TablePrefix' => '<string>',
                            'TableSchema' => [
                                [
                                    'Description' => '<string>',
                                    'Label' => '<string>',
                                    'Value' => '<string>',
                                ],
                                // ...
                            ],
                            'TempDir' => '<string>',
                            'Upsert' => true || false,
                        ],
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                    ],
                    'ApplyMapping' => [
                        'Inputs' => ['<string>', ...],
                        'Mapping' => [
                            [
                                'Children' => [...], // RECURSIVE
                                'Dropped' => true || false,
                                'FromPath' => ['<string>', ...],
                                'FromType' => '<string>',
                                'ToKey' => '<string>',
                                'ToType' => '<string>',
                            ],
                            // ...
                        ],
                        'Name' => '<string>',
                    ],
                    'AthenaConnectorSource' => [
                        'ConnectionName' => '<string>',
                        'ConnectionTable' => '<string>',
                        'ConnectionType' => '<string>',
                        'ConnectorName' => '<string>',
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                        'SchemaName' => '<string>',
                    ],
                    'CatalogDeltaSource' => [
                        'AdditionalDeltaOptions' => ['<string>', ...],
                        'Database' => '<string>',
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                        'Table' => '<string>',
                    ],
                    'CatalogHudiSource' => [
                        'AdditionalHudiOptions' => ['<string>', ...],
                        'Database' => '<string>',
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                        'Table' => '<string>',
                    ],
                    'CatalogKafkaSource' => [
                        'DataPreviewOptions' => [
                            'PollingTime' => <integer>,
                            'RecordPollingLimit' => <integer>,
                        ],
                        'Database' => '<string>',
                        'DetectSchema' => true || false,
                        'Name' => '<string>',
                        'StreamingOptions' => [
                            'AddRecordTimestamp' => '<string>',
                            'Assign' => '<string>',
                            'BootstrapServers' => '<string>',
                            'Classification' => '<string>',
                            'ConnectionName' => '<string>',
                            'Delimiter' => '<string>',
                            'EmitConsumerLagMetrics' => '<string>',
                            'EndingOffsets' => '<string>',
                            'IncludeHeaders' => true || false,
                            'MaxOffsetsPerTrigger' => <integer>,
                            'MinPartitions' => <integer>,
                            'NumRetries' => <integer>,
                            'PollTimeoutMs' => <integer>,
                            'RetryIntervalMs' => <integer>,
                            'SecurityProtocol' => '<string>',
                            'StartingOffsets' => '<string>',
                            'StartingTimestamp' => <DateTime>,
                            'SubscribePattern' => '<string>',
                            'TopicName' => '<string>',
                        ],
                        'Table' => '<string>',
                        'WindowSize' => <integer>,
                    ],
                    'CatalogKinesisSource' => [
                        'DataPreviewOptions' => [
                            'PollingTime' => <integer>,
                            'RecordPollingLimit' => <integer>,
                        ],
                        'Database' => '<string>',
                        'DetectSchema' => true || false,
                        'Name' => '<string>',
                        'StreamingOptions' => [
                            'AddIdleTimeBetweenReads' => true || false,
                            'AddRecordTimestamp' => '<string>',
                            'AvoidEmptyBatches' => true || false,
                            'Classification' => '<string>',
                            'Delimiter' => '<string>',
                            'DescribeShardInterval' => <integer>,
                            'EmitConsumerLagMetrics' => '<string>',
                            'EndpointUrl' => '<string>',
                            'IdleTimeBetweenReadsInMs' => <integer>,
                            'MaxFetchRecordsPerShard' => <integer>,
                            'MaxFetchTimeInMs' => <integer>,
                            'MaxRecordPerRead' => <integer>,
                            'MaxRetryIntervalMs' => <integer>,
                            'NumRetries' => <integer>,
                            'RetryIntervalMs' => <integer>,
                            'RoleArn' => '<string>',
                            'RoleSessionName' => '<string>',
                            'StartingPosition' => 'latest|trim_horizon|earliest|timestamp',
                            'StartingTimestamp' => <DateTime>,
                            'StreamArn' => '<string>',
                            'StreamName' => '<string>',
                        ],
                        'Table' => '<string>',
                        'WindowSize' => <integer>,
                    ],
                    'CatalogSource' => [
                        'Database' => '<string>',
                        'Name' => '<string>',
                        'Table' => '<string>',
                    ],
                    'CatalogTarget' => [
                        'Database' => '<string>',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'Table' => '<string>',
                    ],
                    'ConnectorDataSource' => [
                        'ConnectionType' => '<string>',
                        'Data' => ['<string>', ...],
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                    ],
                    'ConnectorDataTarget' => [
                        'ConnectionType' => '<string>',
                        'Data' => ['<string>', ...],
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                    ],
                    'CustomCode' => [
                        'ClassName' => '<string>',
                        'Code' => '<string>',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                    ],
                    'DirectJDBCSource' => [
                        'ConnectionName' => '<string>',
                        'ConnectionType' => 'sqlserver|mysql|oracle|postgresql|redshift',
                        'Database' => '<string>',
                        'Name' => '<string>',
                        'RedshiftTmpDir' => '<string>',
                        'Table' => '<string>',
                    ],
                    'DirectKafkaSource' => [
                        'DataPreviewOptions' => [
                            'PollingTime' => <integer>,
                            'RecordPollingLimit' => <integer>,
                        ],
                        'DetectSchema' => true || false,
                        'Name' => '<string>',
                        'StreamingOptions' => [
                            'AddRecordTimestamp' => '<string>',
                            'Assign' => '<string>',
                            'BootstrapServers' => '<string>',
                            'Classification' => '<string>',
                            'ConnectionName' => '<string>',
                            'Delimiter' => '<string>',
                            'EmitConsumerLagMetrics' => '<string>',
                            'EndingOffsets' => '<string>',
                            'IncludeHeaders' => true || false,
                            'MaxOffsetsPerTrigger' => <integer>,
                            'MinPartitions' => <integer>,
                            'NumRetries' => <integer>,
                            'PollTimeoutMs' => <integer>,
                            'RetryIntervalMs' => <integer>,
                            'SecurityProtocol' => '<string>',
                            'StartingOffsets' => '<string>',
                            'StartingTimestamp' => <DateTime>,
                            'SubscribePattern' => '<string>',
                            'TopicName' => '<string>',
                        ],
                        'WindowSize' => <integer>,
                    ],
                    'DirectKinesisSource' => [
                        'DataPreviewOptions' => [
                            'PollingTime' => <integer>,
                            'RecordPollingLimit' => <integer>,
                        ],
                        'DetectSchema' => true || false,
                        'Name' => '<string>',
                        'StreamingOptions' => [
                            'AddIdleTimeBetweenReads' => true || false,
                            'AddRecordTimestamp' => '<string>',
                            'AvoidEmptyBatches' => true || false,
                            'Classification' => '<string>',
                            'Delimiter' => '<string>',
                            'DescribeShardInterval' => <integer>,
                            'EmitConsumerLagMetrics' => '<string>',
                            'EndpointUrl' => '<string>',
                            'IdleTimeBetweenReadsInMs' => <integer>,
                            'MaxFetchRecordsPerShard' => <integer>,
                            'MaxFetchTimeInMs' => <integer>,
                            'MaxRecordPerRead' => <integer>,
                            'MaxRetryIntervalMs' => <integer>,
                            'NumRetries' => <integer>,
                            'RetryIntervalMs' => <integer>,
                            'RoleArn' => '<string>',
                            'RoleSessionName' => '<string>',
                            'StartingPosition' => 'latest|trim_horizon|earliest|timestamp',
                            'StartingTimestamp' => <DateTime>,
                            'StreamArn' => '<string>',
                            'StreamName' => '<string>',
                        ],
                        'WindowSize' => <integer>,
                    ],
                    'DropDuplicates' => [
                        'Columns' => [
                            ['<string>', ...],
                            // ...
                        ],
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                    ],
                    'DropFields' => [
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'Paths' => [
                            ['<string>', ...],
                            // ...
                        ],
                    ],
                    'DropNullFields' => [
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'NullCheckBoxList' => [
                            'IsEmpty' => true || false,
                            'IsNegOne' => true || false,
                            'IsNullString' => true || false,
                        ],
                        'NullTextList' => [
                            [
                                'Datatype' => [
                                    'Id' => '<string>',
                                    'Label' => '<string>',
                                ],
                                'Value' => '<string>',
                            ],
                            // ...
                        ],
                    ],
                    'DynamicTransform' => [
                        'FunctionName' => '<string>',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                        'Parameters' => [
                            [
                                'IsOptional' => true || false,
                                'ListType' => 'str|int|float|complex|bool|list|null',
                                'Name' => '<string>',
                                'Type' => 'str|int|float|complex|bool|list|null',
                                'ValidationMessage' => '<string>',
                                'ValidationRule' => '<string>',
                                'Value' => ['<string>', ...],
                            ],
                            // ...
                        ],
                        'Path' => '<string>',
                        'TransformName' => '<string>',
                        'Version' => '<string>',
                    ],
                    'DynamoDBCatalogSource' => [
                        'Database' => '<string>',
                        'Name' => '<string>',
                        'Table' => '<string>',
                    ],
                    'EvaluateDataQuality' => [
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'Output' => 'PrimaryInput|EvaluationResults',
                        'PublishingOptions' => [
                            'CloudWatchMetricsEnabled' => true || false,
                            'EvaluationContext' => '<string>',
                            'ResultsPublishingEnabled' => true || false,
                            'ResultsS3Prefix' => '<string>',
                        ],
                        'Ruleset' => '<string>',
                        'StopJobOnFailureOptions' => [
                            'StopJobOnFailureTiming' => 'Immediate|AfterDataLoad',
                        ],
                    ],
                    'EvaluateDataQualityMultiFrame' => [
                        'AdditionalDataSources' => ['<string>', ...],
                        'AdditionalOptions' => ['<string>', ...],
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'PublishingOptions' => [
                            'CloudWatchMetricsEnabled' => true || false,
                            'EvaluationContext' => '<string>',
                            'ResultsPublishingEnabled' => true || false,
                            'ResultsS3Prefix' => '<string>',
                        ],
                        'Ruleset' => '<string>',
                        'StopJobOnFailureOptions' => [
                            'StopJobOnFailureTiming' => 'Immediate|AfterDataLoad',
                        ],
                    ],
                    'FillMissingValues' => [
                        'FilledPath' => '<string>',
                        'ImputedPath' => '<string>',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                    ],
                    'Filter' => [
                        'Filters' => [
                            [
                                'Negated' => true || false,
                                'Operation' => 'EQ|LT|GT|LTE|GTE|REGEX|ISNULL',
                                'Values' => [
                                    [
                                        'Type' => 'COLUMNEXTRACTED|CONSTANT',
                                        'Value' => ['<string>', ...],
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                        'Inputs' => ['<string>', ...],
                        'LogicalOperator' => 'AND|OR',
                        'Name' => '<string>',
                    ],
                    'GovernedCatalogSource' => [
                        'AdditionalOptions' => [
                            'BoundedFiles' => <integer>,
                            'BoundedSize' => <integer>,
                        ],
                        'Database' => '<string>',
                        'Name' => '<string>',
                        'PartitionPredicate' => '<string>',
                        'Table' => '<string>',
                    ],
                    'GovernedCatalogTarget' => [
                        'Database' => '<string>',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'PartitionKeys' => [
                            ['<string>', ...],
                            // ...
                        ],
                        'SchemaChangePolicy' => [
                            'EnableUpdateCatalog' => true || false,
                            'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                        ],
                        'Table' => '<string>',
                    ],
                    'JDBCConnectorSource' => [
                        'AdditionalOptions' => [
                            'DataTypeMapping' => ['<string>', ...],
                            'FilterPredicate' => '<string>',
                            'JobBookmarkKeys' => ['<string>', ...],
                            'JobBookmarkKeysSortOrder' => '<string>',
                            'LowerBound' => <integer>,
                            'NumPartitions' => <integer>,
                            'PartitionColumn' => '<string>',
                            'UpperBound' => <integer>,
                        ],
                        'ConnectionName' => '<string>',
                        'ConnectionTable' => '<string>',
                        'ConnectionType' => '<string>',
                        'ConnectorName' => '<string>',
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                        'Query' => '<string>',
                    ],
                    'JDBCConnectorTarget' => [
                        'AdditionalOptions' => ['<string>', ...],
                        'ConnectionName' => '<string>',
                        'ConnectionTable' => '<string>',
                        'ConnectionType' => '<string>',
                        'ConnectorName' => '<string>',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                    ],
                    'Join' => [
                        'Columns' => [
                            [
                                'From' => '<string>',
                                'Keys' => [
                                    ['<string>', ...],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                        'Inputs' => ['<string>', ...],
                        'JoinType' => 'equijoin|left|right|outer|leftsemi|leftanti',
                        'Name' => '<string>',
                    ],
                    'Merge' => [
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'PrimaryKeys' => [
                            ['<string>', ...],
                            // ...
                        ],
                        'Source' => '<string>',
                    ],
                    'MicrosoftSQLServerCatalogSource' => [
                        'Database' => '<string>',
                        'Name' => '<string>',
                        'Table' => '<string>',
                    ],
                    'MicrosoftSQLServerCatalogTarget' => [
                        'Database' => '<string>',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'Table' => '<string>',
                    ],
                    'MySQLCatalogSource' => [
                        'Database' => '<string>',
                        'Name' => '<string>',
                        'Table' => '<string>',
                    ],
                    'MySQLCatalogTarget' => [
                        'Database' => '<string>',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'Table' => '<string>',
                    ],
                    'OracleSQLCatalogSource' => [
                        'Database' => '<string>',
                        'Name' => '<string>',
                        'Table' => '<string>',
                    ],
                    'OracleSQLCatalogTarget' => [
                        'Database' => '<string>',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'Table' => '<string>',
                    ],
                    'PIIDetection' => [
                        'EntityTypesToDetect' => ['<string>', ...],
                        'Inputs' => ['<string>', ...],
                        'MaskValue' => '<string>',
                        'Name' => '<string>',
                        'OutputColumnName' => '<string>',
                        'PiiType' => 'RowAudit|RowMasking|ColumnAudit|ColumnMasking',
                        'SampleFraction' => <float>,
                        'ThresholdFraction' => <float>,
                    ],
                    'PostgreSQLCatalogSource' => [
                        'Database' => '<string>',
                        'Name' => '<string>',
                        'Table' => '<string>',
                    ],
                    'PostgreSQLCatalogTarget' => [
                        'Database' => '<string>',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'Table' => '<string>',
                    ],
                    'Recipe' => [
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'RecipeReference' => [
                            'RecipeArn' => '<string>',
                            'RecipeVersion' => '<string>',
                        ],
                    ],
                    'RedshiftSource' => [
                        'Database' => '<string>',
                        'Name' => '<string>',
                        'RedshiftTmpDir' => '<string>',
                        'Table' => '<string>',
                        'TmpDirIAMRole' => '<string>',
                    ],
                    'RedshiftTarget' => [
                        'Database' => '<string>',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'RedshiftTmpDir' => '<string>',
                        'Table' => '<string>',
                        'TmpDirIAMRole' => '<string>',
                        'UpsertRedshiftOptions' => [
                            'ConnectionName' => '<string>',
                            'TableLocation' => '<string>',
                            'UpsertKeys' => ['<string>', ...],
                        ],
                    ],
                    'RelationalCatalogSource' => [
                        'Database' => '<string>',
                        'Name' => '<string>',
                        'Table' => '<string>',
                    ],
                    'RenameField' => [
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'SourcePath' => ['<string>', ...],
                        'TargetPath' => ['<string>', ...],
                    ],
                    'S3CatalogDeltaSource' => [
                        'AdditionalDeltaOptions' => ['<string>', ...],
                        'Database' => '<string>',
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                        'Table' => '<string>',
                    ],
                    'S3CatalogHudiSource' => [
                        'AdditionalHudiOptions' => ['<string>', ...],
                        'Database' => '<string>',
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                        'Table' => '<string>',
                    ],
                    'S3CatalogSource' => [
                        'AdditionalOptions' => [
                            'BoundedFiles' => <integer>,
                            'BoundedSize' => <integer>,
                        ],
                        'Database' => '<string>',
                        'Name' => '<string>',
                        'PartitionPredicate' => '<string>',
                        'Table' => '<string>',
                    ],
                    'S3CatalogTarget' => [
                        'Database' => '<string>',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'PartitionKeys' => [
                            ['<string>', ...],
                            // ...
                        ],
                        'SchemaChangePolicy' => [
                            'EnableUpdateCatalog' => true || false,
                            'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                        ],
                        'Table' => '<string>',
                    ],
                    'S3CsvSource' => [
                        'AdditionalOptions' => [
                            'BoundedFiles' => <integer>,
                            'BoundedSize' => <integer>,
                            'EnableSamplePath' => true || false,
                            'SamplePath' => '<string>',
                        ],
                        'CompressionType' => 'gzip|bzip2',
                        'Escaper' => '<string>',
                        'Exclusions' => ['<string>', ...],
                        'GroupFiles' => '<string>',
                        'GroupSize' => '<string>',
                        'MaxBand' => <integer>,
                        'MaxFilesInBand' => <integer>,
                        'Multiline' => true || false,
                        'Name' => '<string>',
                        'OptimizePerformance' => true || false,
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                        'Paths' => ['<string>', ...],
                        'QuoteChar' => 'quote|quillemet|single_quote|disabled',
                        'Recurse' => true || false,
                        'Separator' => 'comma|ctrla|pipe|semicolon|tab',
                        'SkipFirst' => true || false,
                        'WithHeader' => true || false,
                        'WriteHeader' => true || false,
                    ],
                    'S3DeltaCatalogTarget' => [
                        'AdditionalOptions' => ['<string>', ...],
                        'Database' => '<string>',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'PartitionKeys' => [
                            ['<string>', ...],
                            // ...
                        ],
                        'SchemaChangePolicy' => [
                            'EnableUpdateCatalog' => true || false,
                            'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                        ],
                        'Table' => '<string>',
                    ],
                    'S3DeltaDirectTarget' => [
                        'AdditionalOptions' => ['<string>', ...],
                        'Compression' => 'uncompressed|snappy',
                        'Format' => 'json|csv|avro|orc|parquet|hudi|delta',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'PartitionKeys' => [
                            ['<string>', ...],
                            // ...
                        ],
                        'Path' => '<string>',
                        'SchemaChangePolicy' => [
                            'Database' => '<string>',
                            'EnableUpdateCatalog' => true || false,
                            'Table' => '<string>',
                            'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                        ],
                    ],
                    'S3DeltaSource' => [
                        'AdditionalDeltaOptions' => ['<string>', ...],
                        'AdditionalOptions' => [
                            'BoundedFiles' => <integer>,
                            'BoundedSize' => <integer>,
                            'EnableSamplePath' => true || false,
                            'SamplePath' => '<string>',
                        ],
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                        'Paths' => ['<string>', ...],
                    ],
                    'S3DirectTarget' => [
                        'Compression' => '<string>',
                        'Format' => 'json|csv|avro|orc|parquet|hudi|delta',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'PartitionKeys' => [
                            ['<string>', ...],
                            // ...
                        ],
                        'Path' => '<string>',
                        'SchemaChangePolicy' => [
                            'Database' => '<string>',
                            'EnableUpdateCatalog' => true || false,
                            'Table' => '<string>',
                            'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                        ],
                    ],
                    'S3GlueParquetTarget' => [
                        'Compression' => 'snappy|lzo|gzip|uncompressed|none',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'PartitionKeys' => [
                            ['<string>', ...],
                            // ...
                        ],
                        'Path' => '<string>',
                        'SchemaChangePolicy' => [
                            'Database' => '<string>',
                            'EnableUpdateCatalog' => true || false,
                            'Table' => '<string>',
                            'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                        ],
                    ],
                    'S3HudiCatalogTarget' => [
                        'AdditionalOptions' => ['<string>', ...],
                        'Database' => '<string>',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'PartitionKeys' => [
                            ['<string>', ...],
                            // ...
                        ],
                        'SchemaChangePolicy' => [
                            'EnableUpdateCatalog' => true || false,
                            'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                        ],
                        'Table' => '<string>',
                    ],
                    'S3HudiDirectTarget' => [
                        'AdditionalOptions' => ['<string>', ...],
                        'Compression' => 'gzip|lzo|uncompressed|snappy',
                        'Format' => 'json|csv|avro|orc|parquet|hudi|delta',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'PartitionKeys' => [
                            ['<string>', ...],
                            // ...
                        ],
                        'Path' => '<string>',
                        'SchemaChangePolicy' => [
                            'Database' => '<string>',
                            'EnableUpdateCatalog' => true || false,
                            'Table' => '<string>',
                            'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                        ],
                    ],
                    'S3HudiSource' => [
                        'AdditionalHudiOptions' => ['<string>', ...],
                        'AdditionalOptions' => [
                            'BoundedFiles' => <integer>,
                            'BoundedSize' => <integer>,
                            'EnableSamplePath' => true || false,
                            'SamplePath' => '<string>',
                        ],
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                        'Paths' => ['<string>', ...],
                    ],
                    'S3JsonSource' => [
                        'AdditionalOptions' => [
                            'BoundedFiles' => <integer>,
                            'BoundedSize' => <integer>,
                            'EnableSamplePath' => true || false,
                            'SamplePath' => '<string>',
                        ],
                        'CompressionType' => 'gzip|bzip2',
                        'Exclusions' => ['<string>', ...],
                        'GroupFiles' => '<string>',
                        'GroupSize' => '<string>',
                        'JsonPath' => '<string>',
                        'MaxBand' => <integer>,
                        'MaxFilesInBand' => <integer>,
                        'Multiline' => true || false,
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                        'Paths' => ['<string>', ...],
                        'Recurse' => true || false,
                    ],
                    'S3ParquetSource' => [
                        'AdditionalOptions' => [
                            'BoundedFiles' => <integer>,
                            'BoundedSize' => <integer>,
                            'EnableSamplePath' => true || false,
                            'SamplePath' => '<string>',
                        ],
                        'CompressionType' => 'snappy|lzo|gzip|uncompressed|none',
                        'Exclusions' => ['<string>', ...],
                        'GroupFiles' => '<string>',
                        'GroupSize' => '<string>',
                        'MaxBand' => <integer>,
                        'MaxFilesInBand' => <integer>,
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                        'Paths' => ['<string>', ...],
                        'Recurse' => true || false,
                    ],
                    'SelectFields' => [
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'Paths' => [
                            ['<string>', ...],
                            // ...
                        ],
                    ],
                    'SelectFromCollection' => [
                        'Index' => <integer>,
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                    ],
                    'SnowflakeSource' => [
                        'Data' => [
                            'Action' => '<string>',
                            'AdditionalOptions' => ['<string>', ...],
                            'AutoPushdown' => true || false,
                            'Connection' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'Database' => '<string>',
                            'IamRole' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'MergeAction' => '<string>',
                            'MergeClause' => '<string>',
                            'MergeWhenMatched' => '<string>',
                            'MergeWhenNotMatched' => '<string>',
                            'PostAction' => '<string>',
                            'PreAction' => '<string>',
                            'SampleQuery' => '<string>',
                            'Schema' => '<string>',
                            'SelectedColumns' => [
                                [
                                    'Description' => '<string>',
                                    'Label' => '<string>',
                                    'Value' => '<string>',
                                ],
                                // ...
                            ],
                            'SourceType' => '<string>',
                            'StagingTable' => '<string>',
                            'Table' => '<string>',
                            'TableSchema' => [
                                [
                                    'Description' => '<string>',
                                    'Label' => '<string>',
                                    'Value' => '<string>',
                                ],
                                // ...
                            ],
                            'TempDir' => '<string>',
                            'Upsert' => true || false,
                        ],
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                    ],
                    'SnowflakeTarget' => [
                        'Data' => [
                            'Action' => '<string>',
                            'AdditionalOptions' => ['<string>', ...],
                            'AutoPushdown' => true || false,
                            'Connection' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'Database' => '<string>',
                            'IamRole' => [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            'MergeAction' => '<string>',
                            'MergeClause' => '<string>',
                            'MergeWhenMatched' => '<string>',
                            'MergeWhenNotMatched' => '<string>',
                            'PostAction' => '<string>',
                            'PreAction' => '<string>',
                            'SampleQuery' => '<string>',
                            'Schema' => '<string>',
                            'SelectedColumns' => [
                                [
                                    'Description' => '<string>',
                                    'Label' => '<string>',
                                    'Value' => '<string>',
                                ],
                                // ...
                            ],
                            'SourceType' => '<string>',
                            'StagingTable' => '<string>',
                            'Table' => '<string>',
                            'TableSchema' => [
                                [
                                    'Description' => '<string>',
                                    'Label' => '<string>',
                                    'Value' => '<string>',
                                ],
                                // ...
                            ],
                            'TempDir' => '<string>',
                            'Upsert' => true || false,
                        ],
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                    ],
                    'SparkConnectorSource' => [
                        'AdditionalOptions' => ['<string>', ...],
                        'ConnectionName' => '<string>',
                        'ConnectionType' => '<string>',
                        'ConnectorName' => '<string>',
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                    ],
                    'SparkConnectorTarget' => [
                        'AdditionalOptions' => ['<string>', ...],
                        'ConnectionName' => '<string>',
                        'ConnectionType' => '<string>',
                        'ConnectorName' => '<string>',
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                    ],
                    'SparkSQL' => [
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'OutputSchemas' => [
                            [
                                'Columns' => [
                                    [
                                        'Name' => '<string>',
                                        'Type' => '<string>',
                                    ],
                                    // ...
                                ],
                            ],
                            // ...
                        ],
                        'SqlAliases' => [
                            [
                                'Alias' => '<string>',
                                'From' => '<string>',
                            ],
                            // ...
                        ],
                        'SqlQuery' => '<string>',
                    ],
                    'Spigot' => [
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'Path' => '<string>',
                        'Prob' => <float>,
                        'Topk' => <integer>,
                    ],
                    'SplitFields' => [
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'Paths' => [
                            ['<string>', ...],
                            // ...
                        ],
                    ],
                    'Union' => [
                        'Inputs' => ['<string>', ...],
                        'Name' => '<string>',
                        'UnionType' => 'ALL|DISTINCT',
                    ],
                ],
                // ...
            ],
            'Command' => [
                'Name' => '<string>',
                'PythonVersion' => '<string>',
                'Runtime' => '<string>',
                'ScriptLocation' => '<string>',
            ],
            'Connections' => [
                'Connections' => ['<string>', ...],
            ],
            'CreatedOn' => <DateTime>,
            'DefaultArguments' => ['<string>', ...],
            'Description' => '<string>',
            'ExecutionClass' => 'FLEX|STANDARD',
            'ExecutionProperty' => [
                'MaxConcurrentRuns' => <integer>,
            ],
            'GlueVersion' => '<string>',
            'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK',
            'LastModifiedOn' => <DateTime>,
            'LogUri' => '<string>',
            'MaintenanceWindow' => '<string>',
            'MaxCapacity' => <float>,
            'MaxRetries' => <integer>,
            'Name' => '<string>',
            'NonOverridableArguments' => ['<string>', ...],
            'NotificationProperty' => [
                'NotifyDelayAfter' => <integer>,
            ],
            'NumberOfWorkers' => <integer>,
            'Role' => '<string>',
            'SecurityConfiguration' => '<string>',
            'SourceControlDetails' => [
                'AuthStrategy' => 'PERSONAL_ACCESS_TOKEN|AWS_SECRETS_MANAGER',
                'AuthToken' => '<string>',
                'Branch' => '<string>',
                'Folder' => '<string>',
                'LastCommitId' => '<string>',
                'Owner' => '<string>',
                'Provider' => 'GITHUB|GITLAB|BITBUCKET|AWS_CODE_COMMIT',
                'Repository' => '<string>',
            ],
            'Timeout' => <integer>,
            'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X',
        ],
        // ...
    ],
    'NextToken' => '<string>',
]

Result Details

Members
Jobs
Type: Array of Job structures

A list of job definitions.

NextToken
Type: string

A continuation token, if not all job definitions have yet been returned.

Errors

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GetMLTaskRun

$result = $client->getMLTaskRun([/* ... */]);
$promise = $client->getMLTaskRunAsync([/* ... */]);

Gets details for a specific task run on a machine learning transform. Machine learning task runs are asynchronous tasks that Glue runs on your behalf as part of various machine learning workflows. You can check the stats of any task run by calling GetMLTaskRun with the TaskRunID and its parent transform's TransformID.

Parameter Syntax

$result = $client->getMLTaskRun([
    'TaskRunId' => '<string>', // REQUIRED
    'TransformId' => '<string>', // REQUIRED
]);

Parameter Details

Members
TaskRunId
Required: Yes
Type: string

The unique identifier of the task run.

TransformId
Required: Yes
Type: string

The unique identifier of the machine learning transform.

Result Syntax

[
    'CompletedOn' => <DateTime>,
    'ErrorString' => '<string>',
    'ExecutionTime' => <integer>,
    'LastModifiedOn' => <DateTime>,
    'LogGroupName' => '<string>',
    'Properties' => [
        'ExportLabelsTaskRunProperties' => [
            'OutputS3Path' => '<string>',
        ],
        'FindMatchesTaskRunProperties' => [
            'JobId' => '<string>',
            'JobName' => '<string>',
            'JobRunId' => '<string>',
        ],
        'ImportLabelsTaskRunProperties' => [
            'InputS3Path' => '<string>',
            'Replace' => true || false,
        ],
        'LabelingSetGenerationTaskRunProperties' => [
            'OutputS3Path' => '<string>',
        ],
        'TaskType' => 'EVALUATION|LABELING_SET_GENERATION|IMPORT_LABELS|EXPORT_LABELS|FIND_MATCHES',
    ],
    'StartedOn' => <DateTime>,
    'Status' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT',
    'TaskRunId' => '<string>',
    'TransformId' => '<string>',
]

Result Details

Members
CompletedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time when this task run was completed.

ErrorString
Type: string

The error strings that are associated with the task run.

ExecutionTime
Type: int

The amount of time (in seconds) that the task run consumed resources.

LastModifiedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time when this task run was last modified.

LogGroupName
Type: string

The names of the log groups that are associated with the task run.

Properties
Type: TaskRunProperties structure

The list of properties that are associated with the task run.

StartedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time when this task run started.

Status
Type: string

The status for this task run.

TaskRunId
Type: string

The unique run identifier associated with this run.

TransformId
Type: string

The unique identifier of the task run.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

GetMLTaskRuns

$result = $client->getMLTaskRuns([/* ... */]);
$promise = $client->getMLTaskRunsAsync([/* ... */]);

Gets a list of runs for a machine learning transform. Machine learning task runs are asynchronous tasks that Glue runs on your behalf as part of various machine learning workflows. You can get a sortable, filterable list of machine learning task runs by calling GetMLTaskRuns with their parent transform's TransformID and other optional parameters as documented in this section.

This operation returns a list of historic runs and must be paginated.

Parameter Syntax

$result = $client->getMLTaskRuns([
    'Filter' => [
        'StartedAfter' => <integer || string || DateTime>,
        'StartedBefore' => <integer || string || DateTime>,
        'Status' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT',
        'TaskRunType' => 'EVALUATION|LABELING_SET_GENERATION|IMPORT_LABELS|EXPORT_LABELS|FIND_MATCHES',
    ],
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
    'Sort' => [
        'Column' => 'TASK_RUN_TYPE|STATUS|STARTED', // REQUIRED
        'SortDirection' => 'DESCENDING|ASCENDING', // REQUIRED
    ],
    'TransformId' => '<string>', // REQUIRED
]);

Parameter Details

Members
Filter
Type: TaskRunFilterCriteria structure

The filter criteria, in the TaskRunFilterCriteria structure, for the task run.

MaxResults
Type: int

The maximum number of results to return.

NextToken
Type: string

A token for pagination of the results. The default is empty.

Sort
Type: TaskRunSortCriteria structure

The sorting criteria, in the TaskRunSortCriteria structure, for the task run.

TransformId
Required: Yes
Type: string

The unique identifier of the machine learning transform.

Result Syntax

[
    'NextToken' => '<string>',
    'TaskRuns' => [
        [
            'CompletedOn' => <DateTime>,
            'ErrorString' => '<string>',
            'ExecutionTime' => <integer>,
            'LastModifiedOn' => <DateTime>,
            'LogGroupName' => '<string>',
            'Properties' => [
                'ExportLabelsTaskRunProperties' => [
                    'OutputS3Path' => '<string>',
                ],
                'FindMatchesTaskRunProperties' => [
                    'JobId' => '<string>',
                    'JobName' => '<string>',
                    'JobRunId' => '<string>',
                ],
                'ImportLabelsTaskRunProperties' => [
                    'InputS3Path' => '<string>',
                    'Replace' => true || false,
                ],
                'LabelingSetGenerationTaskRunProperties' => [
                    'OutputS3Path' => '<string>',
                ],
                'TaskType' => 'EVALUATION|LABELING_SET_GENERATION|IMPORT_LABELS|EXPORT_LABELS|FIND_MATCHES',
            ],
            'StartedOn' => <DateTime>,
            'Status' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT',
            'TaskRunId' => '<string>',
            'TransformId' => '<string>',
        ],
        // ...
    ],
]

Result Details

Members
NextToken
Type: string

A pagination token, if more results are available.

TaskRuns
Type: Array of TaskRun structures

A list of task runs that are associated with the transform.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

GetMLTransform

$result = $client->getMLTransform([/* ... */]);
$promise = $client->getMLTransformAsync([/* ... */]);

Gets an Glue machine learning transform artifact and all its corresponding metadata. Machine learning transforms are a special type of transform that use machine learning to learn the details of the transformation to be performed by learning from examples provided by humans. These transformations are then saved by Glue. You can retrieve their metadata by calling GetMLTransform.

Parameter Syntax

$result = $client->getMLTransform([
    'TransformId' => '<string>', // REQUIRED
]);

Parameter Details

Members
TransformId
Required: Yes
Type: string

The unique identifier of the transform, generated at the time that the transform was created.

Result Syntax

[
    'CreatedOn' => <DateTime>,
    'Description' => '<string>',
    'EvaluationMetrics' => [
        'FindMatchesMetrics' => [
            'AreaUnderPRCurve' => <float>,
            'ColumnImportances' => [
                [
                    'ColumnName' => '<string>',
                    'Importance' => <float>,
                ],
                // ...
            ],
            'ConfusionMatrix' => [
                'NumFalseNegatives' => <integer>,
                'NumFalsePositives' => <integer>,
                'NumTrueNegatives' => <integer>,
                'NumTruePositives' => <integer>,
            ],
            'F1' => <float>,
            'Precision' => <float>,
            'Recall' => <float>,
        ],
        'TransformType' => 'FIND_MATCHES',
    ],
    'GlueVersion' => '<string>',
    'InputRecordTables' => [
        [
            'AdditionalOptions' => ['<string>', ...],
            'CatalogId' => '<string>',
            'ConnectionName' => '<string>',
            'DatabaseName' => '<string>',
            'TableName' => '<string>',
        ],
        // ...
    ],
    'LabelCount' => <integer>,
    'LastModifiedOn' => <DateTime>,
    'MaxCapacity' => <float>,
    'MaxRetries' => <integer>,
    'Name' => '<string>',
    'NumberOfWorkers' => <integer>,
    'Parameters' => [
        'FindMatchesParameters' => [
            'AccuracyCostTradeoff' => <float>,
            'EnforceProvidedLabels' => true || false,
            'PrecisionRecallTradeoff' => <float>,
            'PrimaryKeyColumnName' => '<string>',
        ],
        'TransformType' => 'FIND_MATCHES',
    ],
    'Role' => '<string>',
    'Schema' => [
        [
            'DataType' => '<string>',
            'Name' => '<string>',
        ],
        // ...
    ],
    'Status' => 'NOT_READY|READY|DELETING',
    'Timeout' => <integer>,
    'TransformEncryption' => [
        'MlUserDataEncryption' => [
            'KmsKeyId' => '<string>',
            'MlUserDataEncryptionMode' => 'DISABLED|SSE-KMS',
        ],
        'TaskRunSecurityConfigurationName' => '<string>',
    ],
    'TransformId' => '<string>',
    'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X',
]

Result Details

Members
CreatedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time when the transform was created.

Description
Type: string

A description of the transform.

EvaluationMetrics
Type: EvaluationMetrics structure

The latest evaluation metrics.

GlueVersion
Type: string

This value determines which version of Glue this machine learning transform is compatible with. Glue 1.0 is recommended for most customers. If the value is not set, the Glue compatibility defaults to Glue 0.9. For more information, see Glue Versions in the developer guide.

InputRecordTables
Type: Array of GlueTable structures

A list of Glue table definitions used by the transform.

LabelCount
Type: int

The number of labels available for this transform.

LastModifiedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time when the transform was last modified.

MaxCapacity
Type: double

The number of Glue data processing units (DPUs) that are allocated to task runs for this transform. You can allocate from 2 to 100 DPUs; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.

When the WorkerType field is set to a value other than Standard, the MaxCapacity field is set automatically and becomes read-only.

MaxRetries
Type: int

The maximum number of times to retry a task for this transform after a task run fails.

Name
Type: string

The unique name given to the transform when it was created.

NumberOfWorkers
Type: int

The number of workers of a defined workerType that are allocated when this task runs.

Parameters
Type: TransformParameters structure

The configuration parameters that are specific to the algorithm used.

Role
Type: string

The name or Amazon Resource Name (ARN) of the IAM role with the required permissions.

Schema
Type: Array of SchemaColumn structures

The Map<Column, Type> object that represents the schema that this transform accepts. Has an upper bound of 100 columns.

Status
Type: string

The last known status of the transform (to indicate whether it can be used or not). One of "NOT_READY", "READY", or "DELETING".

Timeout
Type: int

The timeout for a task run for this transform in minutes. This is the maximum time that a task run for this transform can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).

TransformEncryption
Type: TransformEncryption structure

The encryption-at-rest settings of the transform that apply to accessing user data. Machine learning transforms can access user data encrypted in Amazon S3 using KMS.

TransformId
Type: string

The unique identifier of the transform, generated at the time that the transform was created.

WorkerType
Type: string

The type of predefined worker that is allocated when this task runs. Accepts a value of Standard, G.1X, or G.2X.

  • For the Standard worker type, each worker provides 4 vCPU, 16 GB of memory and a 50GB disk, and 2 executors per worker.

  • For the G.1X worker type, each worker provides 4 vCPU, 16 GB of memory and a 64GB disk, and 1 executor per worker.

  • For the G.2X worker type, each worker provides 8 vCPU, 32 GB of memory and a 128GB disk, and 1 executor per worker.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

GetMLTransforms

$result = $client->getMLTransforms([/* ... */]);
$promise = $client->getMLTransformsAsync([/* ... */]);

Gets a sortable, filterable list of existing Glue machine learning transforms. Machine learning transforms are a special type of transform that use machine learning to learn the details of the transformation to be performed by learning from examples provided by humans. These transformations are then saved by Glue, and you can retrieve their metadata by calling GetMLTransforms.

Parameter Syntax

$result = $client->getMLTransforms([
    'Filter' => [
        'CreatedAfter' => <integer || string || DateTime>,
        'CreatedBefore' => <integer || string || DateTime>,
        'GlueVersion' => '<string>',
        'LastModifiedAfter' => <integer || string || DateTime>,
        'LastModifiedBefore' => <integer || string || DateTime>,
        'Name' => '<string>',
        'Schema' => [
            [
                'DataType' => '<string>',
                'Name' => '<string>',
            ],
            // ...
        ],
        'Status' => 'NOT_READY|READY|DELETING',
        'TransformType' => 'FIND_MATCHES',
    ],
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
    'Sort' => [
        'Column' => 'NAME|TRANSFORM_TYPE|STATUS|CREATED|LAST_MODIFIED', // REQUIRED
        'SortDirection' => 'DESCENDING|ASCENDING', // REQUIRED
    ],
]);

Parameter Details

Members
Filter
Type: TransformFilterCriteria structure

The filter transformation criteria.

MaxResults
Type: int

The maximum number of results to return.

NextToken
Type: string

A paginated token to offset the results.

Sort
Type: TransformSortCriteria structure

The sorting criteria.

Result Syntax

[
    'NextToken' => '<string>',
    'Transforms' => [
        [
            'CreatedOn' => <DateTime>,
            'Description' => '<string>',
            'EvaluationMetrics' => [
                'FindMatchesMetrics' => [
                    'AreaUnderPRCurve' => <float>,
                    'ColumnImportances' => [
                        [
                            'ColumnName' => '<string>',
                            'Importance' => <float>,
                        ],
                        // ...
                    ],
                    'ConfusionMatrix' => [
                        'NumFalseNegatives' => <integer>,
                        'NumFalsePositives' => <integer>,
                        'NumTrueNegatives' => <integer>,
                        'NumTruePositives' => <integer>,
                    ],
                    'F1' => <float>,
                    'Precision' => <float>,
                    'Recall' => <float>,
                ],
                'TransformType' => 'FIND_MATCHES',
            ],
            'GlueVersion' => '<string>',
            'InputRecordTables' => [
                [
                    'AdditionalOptions' => ['<string>', ...],
                    'CatalogId' => '<string>',
                    'ConnectionName' => '<string>',
                    'DatabaseName' => '<string>',
                    'TableName' => '<string>',
                ],
                // ...
            ],
            'LabelCount' => <integer>,
            'LastModifiedOn' => <DateTime>,
            'MaxCapacity' => <float>,
            'MaxRetries' => <integer>,
            'Name' => '<string>',
            'NumberOfWorkers' => <integer>,
            'Parameters' => [
                'FindMatchesParameters' => [
                    'AccuracyCostTradeoff' => <float>,
                    'EnforceProvidedLabels' => true || false,
                    'PrecisionRecallTradeoff' => <float>,
                    'PrimaryKeyColumnName' => '<string>',
                ],
                'TransformType' => 'FIND_MATCHES',
            ],
            'Role' => '<string>',
            'Schema' => [
                [
                    'DataType' => '<string>',
                    'Name' => '<string>',
                ],
                // ...
            ],
            'Status' => 'NOT_READY|READY|DELETING',
            'Timeout' => <integer>,
            'TransformEncryption' => [
                'MlUserDataEncryption' => [
                    'KmsKeyId' => '<string>',
                    'MlUserDataEncryptionMode' => 'DISABLED|SSE-KMS',
                ],
                'TaskRunSecurityConfigurationName' => '<string>',
            ],
            'TransformId' => '<string>',
            'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X',
        ],
        // ...
    ],
]

Result Details

Members
NextToken
Type: string

A pagination token, if more results are available.

Transforms
Required: Yes
Type: Array of MLTransform structures

A list of machine learning transforms.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

GetMapping

$result = $client->getMapping([/* ... */]);
$promise = $client->getMappingAsync([/* ... */]);

Creates mappings.

Parameter Syntax

$result = $client->getMapping([
    'Location' => [
        'DynamoDB' => [
            [
                'Name' => '<string>', // REQUIRED
                'Param' => true || false,
                'Value' => '<string>', // REQUIRED
            ],
            // ...
        ],
        'Jdbc' => [
            [
                'Name' => '<string>', // REQUIRED
                'Param' => true || false,
                'Value' => '<string>', // REQUIRED
            ],
            // ...
        ],
        'S3' => [
            [
                'Name' => '<string>', // REQUIRED
                'Param' => true || false,
                'Value' => '<string>', // REQUIRED
            ],
            // ...
        ],
    ],
    'Sinks' => [
        [
            'DatabaseName' => '<string>', // REQUIRED
            'TableName' => '<string>', // REQUIRED
        ],
        // ...
    ],
    'Source' => [ // REQUIRED
        'DatabaseName' => '<string>', // REQUIRED
        'TableName' => '<string>', // REQUIRED
    ],
]);

Parameter Details

Members
Location
Type: Location structure

Parameters for the mapping.

Sinks
Type: Array of CatalogEntry structures

A list of target tables.

Source
Required: Yes
Type: CatalogEntry structure

Specifies the source table.

Result Syntax

[
    'Mapping' => [
        [
            'SourcePath' => '<string>',
            'SourceTable' => '<string>',
            'SourceType' => '<string>',
            'TargetPath' => '<string>',
            'TargetTable' => '<string>',
            'TargetType' => '<string>',
        ],
        // ...
    ],
]

Result Details

Members
Mapping
Required: Yes
Type: Array of MappingEntry structures

A list of mappings to the specified targets.

Errors

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

EntityNotFoundException:

A specified entity does not exist

GetPartition

$result = $client->getPartition([/* ... */]);
$promise = $client->getPartitionAsync([/* ... */]);

Retrieves information about a specified partition.

Parameter Syntax

$result = $client->getPartition([
    'CatalogId' => '<string>',
    'DatabaseName' => '<string>', // REQUIRED
    'PartitionValues' => ['<string>', ...], // REQUIRED
    'TableName' => '<string>', // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog where the partition in question resides. If none is provided, the Amazon Web Services account ID is used by default.

DatabaseName
Required: Yes
Type: string

The name of the catalog database where the partition resides.

PartitionValues
Required: Yes
Type: Array of strings

The values that define the partition.

TableName
Required: Yes
Type: string

The name of the partition's table.

Result Syntax

[
    'Partition' => [
        'CatalogId' => '<string>',
        'CreationTime' => <DateTime>,
        'DatabaseName' => '<string>',
        'LastAccessTime' => <DateTime>,
        'LastAnalyzedTime' => <DateTime>,
        'Parameters' => ['<string>', ...],
        'StorageDescriptor' => [
            'AdditionalLocations' => ['<string>', ...],
            'BucketColumns' => ['<string>', ...],
            'Columns' => [
                [
                    'Comment' => '<string>',
                    'Name' => '<string>',
                    'Parameters' => ['<string>', ...],
                    'Type' => '<string>',
                ],
                // ...
            ],
            'Compressed' => true || false,
            'InputFormat' => '<string>',
            'Location' => '<string>',
            'NumberOfBuckets' => <integer>,
            'OutputFormat' => '<string>',
            'Parameters' => ['<string>', ...],
            'SchemaReference' => [
                'SchemaId' => [
                    'RegistryName' => '<string>',
                    'SchemaArn' => '<string>',
                    'SchemaName' => '<string>',
                ],
                'SchemaVersionId' => '<string>',
                'SchemaVersionNumber' => <integer>,
            ],
            'SerdeInfo' => [
                'Name' => '<string>',
                'Parameters' => ['<string>', ...],
                'SerializationLibrary' => '<string>',
            ],
            'SkewedInfo' => [
                'SkewedColumnNames' => ['<string>', ...],
                'SkewedColumnValueLocationMaps' => ['<string>', ...],
                'SkewedColumnValues' => ['<string>', ...],
            ],
            'SortColumns' => [
                [
                    'Column' => '<string>',
                    'SortOrder' => <integer>,
                ],
                // ...
            ],
            'StoredAsSubDirectories' => true || false,
        ],
        'TableName' => '<string>',
        'Values' => ['<string>', ...],
    ],
]

Result Details

Members
Partition
Type: Partition structure

The requested information, in the form of a Partition object.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GlueEncryptionException:

An encryption operation failed.

FederationSourceException:

A federation source failed.

FederationSourceRetryableException:

A federation source failed, but the operation may be retried.

GetPartitionIndexes

$result = $client->getPartitionIndexes([/* ... */]);
$promise = $client->getPartitionIndexesAsync([/* ... */]);

Retrieves the partition indexes associated with a table.

Parameter Syntax

$result = $client->getPartitionIndexes([
    'CatalogId' => '<string>',
    'DatabaseName' => '<string>', // REQUIRED
    'NextToken' => '<string>',
    'TableName' => '<string>', // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The catalog ID where the table resides.

DatabaseName
Required: Yes
Type: string

Specifies the name of a database from which you want to retrieve partition indexes.

NextToken
Type: string

A continuation token, included if this is a continuation call.

TableName
Required: Yes
Type: string

Specifies the name of a table for which you want to retrieve the partition indexes.

Result Syntax

[
    'NextToken' => '<string>',
    'PartitionIndexDescriptorList' => [
        [
            'BackfillErrors' => [
                [
                    'Code' => 'ENCRYPTED_PARTITION_ERROR|INTERNAL_ERROR|INVALID_PARTITION_TYPE_DATA_ERROR|MISSING_PARTITION_VALUE_ERROR|UNSUPPORTED_PARTITION_CHARACTER_ERROR',
                    'Partitions' => [
                        [
                            'Values' => ['<string>', ...],
                        ],
                        // ...
                    ],
                ],
                // ...
            ],
            'IndexName' => '<string>',
            'IndexStatus' => 'CREATING|ACTIVE|DELETING|FAILED',
            'Keys' => [
                [
                    'Name' => '<string>',
                    'Type' => '<string>',
                ],
                // ...
            ],
        ],
        // ...
    ],
]

Result Details

Members
NextToken
Type: string

A continuation token, present if the current list segment is not the last.

PartitionIndexDescriptorList
Type: Array of PartitionIndexDescriptor structures

A list of index descriptors.

Errors

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

ConflictException:

The CreatePartitions API was called on a table that has indexes enabled.

GetPartitions

$result = $client->getPartitions([/* ... */]);
$promise = $client->getPartitionsAsync([/* ... */]);

Retrieves information about the partitions in a table.

Parameter Syntax

$result = $client->getPartitions([
    'CatalogId' => '<string>',
    'DatabaseName' => '<string>', // REQUIRED
    'ExcludeColumnSchema' => true || false,
    'Expression' => '<string>',
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
    'QueryAsOfTime' => <integer || string || DateTime>,
    'Segment' => [
        'SegmentNumber' => <integer>, // REQUIRED
        'TotalSegments' => <integer>, // REQUIRED
    ],
    'TableName' => '<string>', // REQUIRED
    'TransactionId' => '<string>',
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog where the partitions in question reside. If none is provided, the Amazon Web Services account ID is used by default.

DatabaseName
Required: Yes
Type: string

The name of the catalog database where the partitions reside.

ExcludeColumnSchema
Type: boolean

When true, specifies not returning the partition column schema. Useful when you are interested only in other partition attributes such as partition values or location. This approach avoids the problem of a large response by not returning duplicate data.

Expression
Type: string

An expression that filters the partitions to be returned.

The expression uses SQL syntax similar to the SQL WHERE filter clause. The SQL statement parser JSQLParser parses the expression.

Operators: The following are the operators that you can use in the Expression API call:

=

Checks whether the values of the two operands are equal; if yes, then the condition becomes true.

Example: Assume 'variable a' holds 10 and 'variable b' holds 20.

(a = b) is not true.

< >

Checks whether the values of two operands are equal; if the values are not equal, then the condition becomes true.

Example: (a < > b) is true.

>

Checks whether the value of the left operand is greater than the value of the right operand; if yes, then the condition becomes true.

Example: (a > b) is not true.

<

Checks whether the value of the left operand is less than the value of the right operand; if yes, then the condition becomes true.

Example: (a < b) is true.

>=

Checks whether the value of the left operand is greater than or equal to the value of the right operand; if yes, then the condition becomes true.

Example: (a >= b) is not true.

<=

Checks whether the value of the left operand is less than or equal to the value of the right operand; if yes, then the condition becomes true.

Example: (a <= b) is true.

AND, OR, IN, BETWEEN, LIKE, NOT, IS NULL

Logical operators.

Supported Partition Key Types: The following are the supported partition keys.

  • string

  • date

  • timestamp

  • int

  • bigint

  • long

  • tinyint

  • smallint

  • decimal

If an type is encountered that is not valid, an exception is thrown.

The following list shows the valid operators on each type. When you define a crawler, the partitionKey type is created as a STRING, to be compatible with the catalog partitions.

Sample API Call:

MaxResults
Type: int

The maximum number of partitions to return in a single response.

NextToken
Type: string

A continuation token, if this is not the first call to retrieve these partitions.

QueryAsOfTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The time as of when to read the partition contents. If not set, the most recent transaction commit time will be used. Cannot be specified along with TransactionId.

Segment
Type: Segment structure

The segment of the table's partitions to scan in this request.

TableName
Required: Yes
Type: string

The name of the partitions' table.

TransactionId
Type: string

The transaction ID at which to read the partition contents.

Result Syntax

[
    'NextToken' => '<string>',
    'Partitions' => [
        [
            'CatalogId' => '<string>',
            'CreationTime' => <DateTime>,
            'DatabaseName' => '<string>',
            'LastAccessTime' => <DateTime>,
            'LastAnalyzedTime' => <DateTime>,
            'Parameters' => ['<string>', ...],
            'StorageDescriptor' => [
                'AdditionalLocations' => ['<string>', ...],
                'BucketColumns' => ['<string>', ...],
                'Columns' => [
                    [
                        'Comment' => '<string>',
                        'Name' => '<string>',
                        'Parameters' => ['<string>', ...],
                        'Type' => '<string>',
                    ],
                    // ...
                ],
                'Compressed' => true || false,
                'InputFormat' => '<string>',
                'Location' => '<string>',
                'NumberOfBuckets' => <integer>,
                'OutputFormat' => '<string>',
                'Parameters' => ['<string>', ...],
                'SchemaReference' => [
                    'SchemaId' => [
                        'RegistryName' => '<string>',
                        'SchemaArn' => '<string>',
                        'SchemaName' => '<string>',
                    ],
                    'SchemaVersionId' => '<string>',
                    'SchemaVersionNumber' => <integer>,
                ],
                'SerdeInfo' => [
                    'Name' => '<string>',
                    'Parameters' => ['<string>', ...],
                    'SerializationLibrary' => '<string>',
                ],
                'SkewedInfo' => [
                    'SkewedColumnNames' => ['<string>', ...],
                    'SkewedColumnValueLocationMaps' => ['<string>', ...],
                    'SkewedColumnValues' => ['<string>', ...],
                ],
                'SortColumns' => [
                    [
                        'Column' => '<string>',
                        'SortOrder' => <integer>,
                    ],
                    // ...
                ],
                'StoredAsSubDirectories' => true || false,
            ],
            'TableName' => '<string>',
            'Values' => ['<string>', ...],
        ],
        // ...
    ],
]

Result Details

Members
NextToken
Type: string

A continuation token, if the returned list of partitions does not include the last one.

Partitions
Type: Array of Partition structures

A list of requested partitions.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

GlueEncryptionException:

An encryption operation failed.

InvalidStateException:

An error that indicates your data is in an invalid state.

ResourceNotReadyException:

A resource was not ready for a transaction.

FederationSourceException:

A federation source failed.

FederationSourceRetryableException:

A federation source failed, but the operation may be retried.

GetPlan

$result = $client->getPlan([/* ... */]);
$promise = $client->getPlanAsync([/* ... */]);

Gets code to perform a specified mapping.

Parameter Syntax

$result = $client->getPlan([
    'AdditionalPlanOptionsMap' => ['<string>', ...],
    'Language' => 'PYTHON|SCALA',
    'Location' => [
        'DynamoDB' => [
            [
                'Name' => '<string>', // REQUIRED
                'Param' => true || false,
                'Value' => '<string>', // REQUIRED
            ],
            // ...
        ],
        'Jdbc' => [
            [
                'Name' => '<string>', // REQUIRED
                'Param' => true || false,
                'Value' => '<string>', // REQUIRED
            ],
            // ...
        ],
        'S3' => [
            [
                'Name' => '<string>', // REQUIRED
                'Param' => true || false,
                'Value' => '<string>', // REQUIRED
            ],
            // ...
        ],
    ],
    'Mapping' => [ // REQUIRED
        [
            'SourcePath' => '<string>',
            'SourceTable' => '<string>',
            'SourceType' => '<string>',
            'TargetPath' => '<string>',
            'TargetTable' => '<string>',
            'TargetType' => '<string>',
        ],
        // ...
    ],
    'Sinks' => [
        [
            'DatabaseName' => '<string>', // REQUIRED
            'TableName' => '<string>', // REQUIRED
        ],
        // ...
    ],
    'Source' => [ // REQUIRED
        'DatabaseName' => '<string>', // REQUIRED
        'TableName' => '<string>', // REQUIRED
    ],
]);

Parameter Details

Members
AdditionalPlanOptionsMap
Type: Associative array of custom strings keys (GenericString) to strings

A map to hold additional optional key-value parameters.

Currently, these key-value pairs are supported:

  • inferSchema  —  Specifies whether to set inferSchema to true or false for the default script generated by an Glue job. For example, to set inferSchema to true, pass the following key value pair:

    --additional-plan-options-map '{"inferSchema":"true"}'

Language
Type: string

The programming language of the code to perform the mapping.

Location
Type: Location structure

The parameters for the mapping.

Mapping
Required: Yes
Type: Array of MappingEntry structures

The list of mappings from a source table to target tables.

Sinks
Type: Array of CatalogEntry structures

The target tables.

Source
Required: Yes
Type: CatalogEntry structure

The source table.

Result Syntax

[
    'PythonScript' => '<string>',
    'ScalaCode' => '<string>',
]

Result Details

Members
PythonScript
Type: string

A Python script to perform the mapping.

ScalaCode
Type: string

The Scala code to perform the mapping.

Errors

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GetRegistry

$result = $client->getRegistry([/* ... */]);
$promise = $client->getRegistryAsync([/* ... */]);

Describes the specified registry in detail.

Parameter Syntax

$result = $client->getRegistry([
    'RegistryId' => [ // REQUIRED
        'RegistryArn' => '<string>',
        'RegistryName' => '<string>',
    ],
]);

Parameter Details

Members
RegistryId
Required: Yes
Type: RegistryId structure

This is a wrapper structure that may contain the registry name and Amazon Resource Name (ARN).

Result Syntax

[
    'CreatedTime' => '<string>',
    'Description' => '<string>',
    'RegistryArn' => '<string>',
    'RegistryName' => '<string>',
    'Status' => 'AVAILABLE|DELETING',
    'UpdatedTime' => '<string>',
]

Result Details

Members
CreatedTime
Type: string

The date and time the registry was created.

Description
Type: string

A description of the registry.

RegistryArn
Type: string

The Amazon Resource Name (ARN) of the registry.

RegistryName
Type: string

The name of the registry.

Status
Type: string

The status of the registry.

UpdatedTime
Type: string

The date and time the registry was updated.

Errors

InvalidInputException:

The input provided was not valid.

AccessDeniedException:

Access to a resource was denied.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

GetResourcePolicies

$result = $client->getResourcePolicies([/* ... */]);
$promise = $client->getResourcePoliciesAsync([/* ... */]);

Retrieves the resource policies set on individual resources by Resource Access Manager during cross-account permission grants. Also retrieves the Data Catalog resource policy.

If you enabled metadata encryption in Data Catalog settings, and you do not have permission on the KMS key, the operation can't return the Data Catalog resource policy.

Parameter Syntax

$result = $client->getResourcePolicies([
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
]);

Parameter Details

Members
MaxResults
Type: int

The maximum size of a list to return.

NextToken
Type: string

A continuation token, if this is a continuation request.

Result Syntax

[
    'GetResourcePoliciesResponseList' => [
        [
            'CreateTime' => <DateTime>,
            'PolicyHash' => '<string>',
            'PolicyInJson' => '<string>',
            'UpdateTime' => <DateTime>,
        ],
        // ...
    ],
    'NextToken' => '<string>',
]

Result Details

Members
GetResourcePoliciesResponseList
Type: Array of GluePolicy structures

A list of the individual resource policies and the account-level resource policy.

NextToken
Type: string

A continuation token, if the returned list does not contain the last resource policy available.

Errors

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

InvalidInputException:

The input provided was not valid.

GlueEncryptionException:

An encryption operation failed.

GetResourcePolicy

$result = $client->getResourcePolicy([/* ... */]);
$promise = $client->getResourcePolicyAsync([/* ... */]);

Retrieves a specified resource policy.

Parameter Syntax

$result = $client->getResourcePolicy([
    'ResourceArn' => '<string>',
]);

Parameter Details

Members
ResourceArn
Type: string

The ARN of the Glue resource for which to retrieve the resource policy. If not supplied, the Data Catalog resource policy is returned. Use GetResourcePolicies to view all existing resource policies. For more information see Specifying Glue Resource ARNs.

Result Syntax

[
    'CreateTime' => <DateTime>,
    'PolicyHash' => '<string>',
    'PolicyInJson' => '<string>',
    'UpdateTime' => <DateTime>,
]

Result Details

Members
CreateTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time at which the policy was created.

PolicyHash
Type: string

Contains the hash value associated with this policy.

PolicyInJson
Type: string

Contains the requested policy document, in JSON format.

UpdateTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time at which the policy was last updated.

Errors

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

InvalidInputException:

The input provided was not valid.

GetSchema

$result = $client->getSchema([/* ... */]);
$promise = $client->getSchemaAsync([/* ... */]);

Describes the specified schema in detail.

Parameter Syntax

$result = $client->getSchema([
    'SchemaId' => [ // REQUIRED
        'RegistryName' => '<string>',
        'SchemaArn' => '<string>',
        'SchemaName' => '<string>',
    ],
]);

Parameter Details

Members
SchemaId
Required: Yes
Type: SchemaId structure

This is a wrapper structure to contain schema identity fields. The structure contains:

  • SchemaId$SchemaArn: The Amazon Resource Name (ARN) of the schema. Either SchemaArn or SchemaName and RegistryName has to be provided.

  • SchemaId$SchemaName: The name of the schema. Either SchemaArn or SchemaName and RegistryName has to be provided.

Result Syntax

[
    'Compatibility' => 'NONE|DISABLED|BACKWARD|BACKWARD_ALL|FORWARD|FORWARD_ALL|FULL|FULL_ALL',
    'CreatedTime' => '<string>',
    'DataFormat' => 'AVRO|JSON|PROTOBUF',
    'Description' => '<string>',
    'LatestSchemaVersion' => <integer>,
    'NextSchemaVersion' => <integer>,
    'RegistryArn' => '<string>',
    'RegistryName' => '<string>',
    'SchemaArn' => '<string>',
    'SchemaCheckpoint' => <integer>,
    'SchemaName' => '<string>',
    'SchemaStatus' => 'AVAILABLE|PENDING|DELETING',
    'UpdatedTime' => '<string>',
]

Result Details

Members
Compatibility
Type: string

The compatibility mode of the schema.

CreatedTime
Type: string

The date and time the schema was created.

DataFormat
Type: string

The data format of the schema definition. Currently AVRO, JSON and PROTOBUF are supported.

Description
Type: string

A description of schema if specified when created

LatestSchemaVersion
Type: long (int|float)

The latest version of the schema associated with the returned schema definition.

NextSchemaVersion
Type: long (int|float)

The next version of the schema associated with the returned schema definition.

RegistryArn
Type: string

The Amazon Resource Name (ARN) of the registry.

RegistryName
Type: string

The name of the registry.

SchemaArn
Type: string

The Amazon Resource Name (ARN) of the schema.

SchemaCheckpoint
Type: long (int|float)

The version number of the checkpoint (the last time the compatibility mode was changed).

SchemaName
Type: string

The name of the schema.

SchemaStatus
Type: string

The status of the schema.

UpdatedTime
Type: string

The date and time the schema was updated.

Errors

InvalidInputException:

The input provided was not valid.

AccessDeniedException:

Access to a resource was denied.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

GetSchemaByDefinition

$result = $client->getSchemaByDefinition([/* ... */]);
$promise = $client->getSchemaByDefinitionAsync([/* ... */]);

Retrieves a schema by the SchemaDefinition. The schema definition is sent to the Schema Registry, canonicalized, and hashed. If the hash is matched within the scope of the SchemaName or ARN (or the default registry, if none is supplied), that schema’s metadata is returned. Otherwise, a 404 or NotFound error is returned. Schema versions in Deleted statuses will not be included in the results.

Parameter Syntax

$result = $client->getSchemaByDefinition([
    'SchemaDefinition' => '<string>', // REQUIRED
    'SchemaId' => [ // REQUIRED
        'RegistryName' => '<string>',
        'SchemaArn' => '<string>',
        'SchemaName' => '<string>',
    ],
]);

Parameter Details

Members
SchemaDefinition
Required: Yes
Type: string

The definition of the schema for which schema details are required.

SchemaId
Required: Yes
Type: SchemaId structure

This is a wrapper structure to contain schema identity fields. The structure contains:

  • SchemaId$SchemaArn: The Amazon Resource Name (ARN) of the schema. One of SchemaArn or SchemaName has to be provided.

  • SchemaId$SchemaName: The name of the schema. One of SchemaArn or SchemaName has to be provided.

Result Syntax

[
    'CreatedTime' => '<string>',
    'DataFormat' => 'AVRO|JSON|PROTOBUF',
    'SchemaArn' => '<string>',
    'SchemaVersionId' => '<string>',
    'Status' => 'AVAILABLE|PENDING|FAILURE|DELETING',
]

Result Details

Members
CreatedTime
Type: string

The date and time the schema was created.

DataFormat
Type: string

The data format of the schema definition. Currently AVRO, JSON and PROTOBUF are supported.

SchemaArn
Type: string

The Amazon Resource Name (ARN) of the schema.

SchemaVersionId
Type: string

The schema ID of the schema version.

Status
Type: string

The status of the schema version.

Errors

InvalidInputException:

The input provided was not valid.

AccessDeniedException:

Access to a resource was denied.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

GetSchemaVersion

$result = $client->getSchemaVersion([/* ... */]);
$promise = $client->getSchemaVersionAsync([/* ... */]);

Get the specified schema by its unique ID assigned when a version of the schema is created or registered. Schema versions in Deleted status will not be included in the results.

Parameter Syntax

$result = $client->getSchemaVersion([
    'SchemaId' => [
        'RegistryName' => '<string>',
        'SchemaArn' => '<string>',
        'SchemaName' => '<string>',
    ],
    'SchemaVersionId' => '<string>',
    'SchemaVersionNumber' => [
        'LatestVersion' => true || false,
        'VersionNumber' => <integer>,
    ],
]);

Parameter Details

Members
SchemaId
Type: SchemaId structure

This is a wrapper structure to contain schema identity fields. The structure contains:

  • SchemaId$SchemaArn: The Amazon Resource Name (ARN) of the schema. Either SchemaArn or SchemaName and RegistryName has to be provided.

  • SchemaId$SchemaName: The name of the schema. Either SchemaArn or SchemaName and RegistryName has to be provided.

SchemaVersionId
Type: string

The SchemaVersionId of the schema version. This field is required for fetching by schema ID. Either this or the SchemaId wrapper has to be provided.

SchemaVersionNumber
Type: SchemaVersionNumber structure

The version number of the schema.

Result Syntax

[
    'CreatedTime' => '<string>',
    'DataFormat' => 'AVRO|JSON|PROTOBUF',
    'SchemaArn' => '<string>',
    'SchemaDefinition' => '<string>',
    'SchemaVersionId' => '<string>',
    'Status' => 'AVAILABLE|PENDING|FAILURE|DELETING',
    'VersionNumber' => <integer>,
]

Result Details

Members
CreatedTime
Type: string

The date and time the schema version was created.

DataFormat
Type: string

The data format of the schema definition. Currently AVRO, JSON and PROTOBUF are supported.

SchemaArn
Type: string

The Amazon Resource Name (ARN) of the schema.

SchemaDefinition
Type: string

The schema definition for the schema ID.

SchemaVersionId
Type: string

The SchemaVersionId of the schema version.

Status
Type: string

The status of the schema version.

VersionNumber
Type: long (int|float)

The version number of the schema.

Errors

InvalidInputException:

The input provided was not valid.

AccessDeniedException:

Access to a resource was denied.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

GetSchemaVersionsDiff

$result = $client->getSchemaVersionsDiff([/* ... */]);
$promise = $client->getSchemaVersionsDiffAsync([/* ... */]);

Fetches the schema version difference in the specified difference type between two stored schema versions in the Schema Registry.

This API allows you to compare two schema versions between two schema definitions under the same schema.

Parameter Syntax

$result = $client->getSchemaVersionsDiff([
    'FirstSchemaVersionNumber' => [ // REQUIRED
        'LatestVersion' => true || false,
        'VersionNumber' => <integer>,
    ],
    'SchemaDiffType' => 'SYNTAX_DIFF', // REQUIRED
    'SchemaId' => [ // REQUIRED
        'RegistryName' => '<string>',
        'SchemaArn' => '<string>',
        'SchemaName' => '<string>',
    ],
    'SecondSchemaVersionNumber' => [ // REQUIRED
        'LatestVersion' => true || false,
        'VersionNumber' => <integer>,
    ],
]);

Parameter Details

Members
FirstSchemaVersionNumber
Required: Yes
Type: SchemaVersionNumber structure

The first of the two schema versions to be compared.

SchemaDiffType
Required: Yes
Type: string

Refers to SYNTAX_DIFF, which is the currently supported diff type.

SchemaId
Required: Yes
Type: SchemaId structure

This is a wrapper structure to contain schema identity fields. The structure contains:

  • SchemaId$SchemaArn: The Amazon Resource Name (ARN) of the schema. One of SchemaArn or SchemaName has to be provided.

  • SchemaId$SchemaName: The name of the schema. One of SchemaArn or SchemaName has to be provided.

SecondSchemaVersionNumber
Required: Yes
Type: SchemaVersionNumber structure

The second of the two schema versions to be compared.

Result Syntax

[
    'Diff' => '<string>',
]

Result Details

Members
Diff
Type: string

The difference between schemas as a string in JsonPatch format.

Errors

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

AccessDeniedException:

Access to a resource was denied.

InternalServiceException:

An internal service error occurred.

GetSecurityConfiguration

$result = $client->getSecurityConfiguration([/* ... */]);
$promise = $client->getSecurityConfigurationAsync([/* ... */]);

Retrieves a specified security configuration.

Parameter Syntax

$result = $client->getSecurityConfiguration([
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the security configuration to retrieve.

Result Syntax

[
    'SecurityConfiguration' => [
        'CreatedTimeStamp' => <DateTime>,
        'EncryptionConfiguration' => [
            'CloudWatchEncryption' => [
                'CloudWatchEncryptionMode' => 'DISABLED|SSE-KMS',
                'KmsKeyArn' => '<string>',
            ],
            'JobBookmarksEncryption' => [
                'JobBookmarksEncryptionMode' => 'DISABLED|CSE-KMS',
                'KmsKeyArn' => '<string>',
            ],
            'S3Encryption' => [
                [
                    'KmsKeyArn' => '<string>',
                    'S3EncryptionMode' => 'DISABLED|SSE-KMS|SSE-S3',
                ],
                // ...
            ],
        ],
        'Name' => '<string>',
    ],
]

Result Details

Members
SecurityConfiguration
Type: SecurityConfiguration structure

The requested security configuration.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GetSecurityConfigurations

$result = $client->getSecurityConfigurations([/* ... */]);
$promise = $client->getSecurityConfigurationsAsync([/* ... */]);

Retrieves a list of all security configurations.

Parameter Syntax

$result = $client->getSecurityConfigurations([
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
]);

Parameter Details

Members
MaxResults
Type: int

The maximum number of results to return.

NextToken
Type: string

A continuation token, if this is a continuation call.

Result Syntax

[
    'NextToken' => '<string>',
    'SecurityConfigurations' => [
        [
            'CreatedTimeStamp' => <DateTime>,
            'EncryptionConfiguration' => [
                'CloudWatchEncryption' => [
                    'CloudWatchEncryptionMode' => 'DISABLED|SSE-KMS',
                    'KmsKeyArn' => '<string>',
                ],
                'JobBookmarksEncryption' => [
                    'JobBookmarksEncryptionMode' => 'DISABLED|CSE-KMS',
                    'KmsKeyArn' => '<string>',
                ],
                'S3Encryption' => [
                    [
                        'KmsKeyArn' => '<string>',
                        'S3EncryptionMode' => 'DISABLED|SSE-KMS|SSE-S3',
                    ],
                    // ...
                ],
            ],
            'Name' => '<string>',
        ],
        // ...
    ],
]

Result Details

Members
NextToken
Type: string

A continuation token, if there are more security configurations to return.

SecurityConfigurations
Type: Array of SecurityConfiguration structures

A list of security configurations.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GetSession

$result = $client->getSession([/* ... */]);
$promise = $client->getSessionAsync([/* ... */]);

Retrieves the session.

Parameter Syntax

$result = $client->getSession([
    'Id' => '<string>', // REQUIRED
    'RequestOrigin' => '<string>',
]);

Parameter Details

Members
Id
Required: Yes
Type: string

The ID of the session.

RequestOrigin
Type: string

The origin of the request.

Result Syntax

[
    'Session' => [
        'Command' => [
            'Name' => '<string>',
            'PythonVersion' => '<string>',
        ],
        'CompletedOn' => <DateTime>,
        'Connections' => [
            'Connections' => ['<string>', ...],
        ],
        'CreatedOn' => <DateTime>,
        'DPUSeconds' => <float>,
        'DefaultArguments' => ['<string>', ...],
        'Description' => '<string>',
        'ErrorMessage' => '<string>',
        'ExecutionTime' => <float>,
        'GlueVersion' => '<string>',
        'Id' => '<string>',
        'IdleTimeout' => <integer>,
        'MaxCapacity' => <float>,
        'NumberOfWorkers' => <integer>,
        'Progress' => <float>,
        'Role' => '<string>',
        'SecurityConfiguration' => '<string>',
        'Status' => 'PROVISIONING|READY|FAILED|TIMEOUT|STOPPING|STOPPED',
        'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X',
    ],
]

Result Details

Members
Session
Type: Session structure

The session object is returned in the response.

Errors

AccessDeniedException:

Access to a resource was denied.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

InvalidInputException:

The input provided was not valid.

GetStatement

$result = $client->getStatement([/* ... */]);
$promise = $client->getStatementAsync([/* ... */]);

Retrieves the statement.

Parameter Syntax

$result = $client->getStatement([
    'Id' => <integer>, // REQUIRED
    'RequestOrigin' => '<string>',
    'SessionId' => '<string>', // REQUIRED
]);

Parameter Details

Members
Id
Required: Yes
Type: int

The Id of the statement.

RequestOrigin
Type: string

The origin of the request.

SessionId
Required: Yes
Type: string

The Session ID of the statement.

Result Syntax

[
    'Statement' => [
        'Code' => '<string>',
        'CompletedOn' => <integer>,
        'Id' => <integer>,
        'Output' => [
            'Data' => [
                'TextPlain' => '<string>',
            ],
            'ErrorName' => '<string>',
            'ErrorValue' => '<string>',
            'ExecutionCount' => <integer>,
            'Status' => 'WAITING|RUNNING|AVAILABLE|CANCELLING|CANCELLED|ERROR',
            'Traceback' => ['<string>', ...],
        ],
        'Progress' => <float>,
        'StartedOn' => <integer>,
        'State' => 'WAITING|RUNNING|AVAILABLE|CANCELLING|CANCELLED|ERROR',
    ],
]

Result Details

Members
Statement
Type: Statement structure

Returns the statement.

Errors

AccessDeniedException:

Access to a resource was denied.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

InvalidInputException:

The input provided was not valid.

IllegalSessionStateException:

The session is in an invalid state to perform a requested operation.

GetTable

$result = $client->getTable([/* ... */]);
$promise = $client->getTableAsync([/* ... */]);

Retrieves the Table definition in a Data Catalog for a specified table.

Parameter Syntax

$result = $client->getTable([
    'CatalogId' => '<string>',
    'DatabaseName' => '<string>', // REQUIRED
    'Name' => '<string>', // REQUIRED
    'QueryAsOfTime' => <integer || string || DateTime>,
    'TransactionId' => '<string>',
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog where the table resides. If none is provided, the Amazon Web Services account ID is used by default.

DatabaseName
Required: Yes
Type: string

The name of the database in the catalog in which the table resides. For Hive compatibility, this name is entirely lowercase.

Name
Required: Yes
Type: string

The name of the table for which to retrieve the definition. For Hive compatibility, this name is entirely lowercase.

QueryAsOfTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The time as of when to read the table contents. If not set, the most recent transaction commit time will be used. Cannot be specified along with TransactionId.

TransactionId
Type: string

The transaction ID at which to read the table contents.

Result Syntax

[
    'Table' => [
        'CatalogId' => '<string>',
        'CreateTime' => <DateTime>,
        'CreatedBy' => '<string>',
        'DatabaseName' => '<string>',
        'Description' => '<string>',
        'FederatedTable' => [
            'ConnectionName' => '<string>',
            'DatabaseIdentifier' => '<string>',
            'Identifier' => '<string>',
        ],
        'IsMultiDialectView' => true || false,
        'IsRegisteredWithLakeFormation' => true || false,
        'LastAccessTime' => <DateTime>,
        'LastAnalyzedTime' => <DateTime>,
        'Name' => '<string>',
        'Owner' => '<string>',
        'Parameters' => ['<string>', ...],
        'PartitionKeys' => [
            [
                'Comment' => '<string>',
                'Name' => '<string>',
                'Parameters' => ['<string>', ...],
                'Type' => '<string>',
            ],
            // ...
        ],
        'Retention' => <integer>,
        'StorageDescriptor' => [
            'AdditionalLocations' => ['<string>', ...],
            'BucketColumns' => ['<string>', ...],
            'Columns' => [
                [
                    'Comment' => '<string>',
                    'Name' => '<string>',
                    'Parameters' => ['<string>', ...],
                    'Type' => '<string>',
                ],
                // ...
            ],
            'Compressed' => true || false,
            'InputFormat' => '<string>',
            'Location' => '<string>',
            'NumberOfBuckets' => <integer>,
            'OutputFormat' => '<string>',
            'Parameters' => ['<string>', ...],
            'SchemaReference' => [
                'SchemaId' => [
                    'RegistryName' => '<string>',
                    'SchemaArn' => '<string>',
                    'SchemaName' => '<string>',
                ],
                'SchemaVersionId' => '<string>',
                'SchemaVersionNumber' => <integer>,
            ],
            'SerdeInfo' => [
                'Name' => '<string>',
                'Parameters' => ['<string>', ...],
                'SerializationLibrary' => '<string>',
            ],
            'SkewedInfo' => [
                'SkewedColumnNames' => ['<string>', ...],
                'SkewedColumnValueLocationMaps' => ['<string>', ...],
                'SkewedColumnValues' => ['<string>', ...],
            ],
            'SortColumns' => [
                [
                    'Column' => '<string>',
                    'SortOrder' => <integer>,
                ],
                // ...
            ],
            'StoredAsSubDirectories' => true || false,
        ],
        'TableType' => '<string>',
        'TargetTable' => [
            'CatalogId' => '<string>',
            'DatabaseName' => '<string>',
            'Name' => '<string>',
            'Region' => '<string>',
        ],
        'UpdateTime' => <DateTime>,
        'VersionId' => '<string>',
        'ViewDefinition' => [
            'Definer' => '<string>',
            'IsProtected' => true || false,
            'Representations' => [
                [
                    'Dialect' => 'REDSHIFT|ATHENA|SPARK',
                    'DialectVersion' => '<string>',
                    'IsStale' => true || false,
                    'ViewExpandedText' => '<string>',
                    'ViewOriginalText' => '<string>',
                ],
                // ...
            ],
            'SubObjects' => ['<string>', ...],
        ],
        'ViewExpandedText' => '<string>',
        'ViewOriginalText' => '<string>',
    ],
]

Result Details

Members
Table
Type: Table structure

The Table object that defines the specified table.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GlueEncryptionException:

An encryption operation failed.

ResourceNotReadyException:

A resource was not ready for a transaction.

FederationSourceException:

A federation source failed.

FederationSourceRetryableException:

A federation source failed, but the operation may be retried.

GetTableOptimizer

$result = $client->getTableOptimizer([/* ... */]);
$promise = $client->getTableOptimizerAsync([/* ... */]);

Returns the configuration of all optimizers associated with a specified table.

Parameter Syntax

$result = $client->getTableOptimizer([
    'CatalogId' => '<string>', // REQUIRED
    'DatabaseName' => '<string>', // REQUIRED
    'TableName' => '<string>', // REQUIRED
    'Type' => 'compaction', // REQUIRED
]);

Parameter Details

Members
CatalogId
Required: Yes
Type: string

The Catalog ID of the table.

DatabaseName
Required: Yes
Type: string

The name of the database in the catalog in which the table resides.

TableName
Required: Yes
Type: string

The name of the table.

Type
Required: Yes
Type: string

The type of table optimizer.

Result Syntax

[
    'CatalogId' => '<string>',
    'DatabaseName' => '<string>',
    'TableName' => '<string>',
    'TableOptimizer' => [
        'configuration' => [
            'enabled' => true || false,
            'roleArn' => '<string>',
        ],
        'lastRun' => [
            'endTimestamp' => <DateTime>,
            'error' => '<string>',
            'eventType' => 'starting|completed|failed|in_progress',
            'metrics' => [
                'JobDurationInHour' => '<string>',
                'NumberOfBytesCompacted' => '<string>',
                'NumberOfDpus' => '<string>',
                'NumberOfFilesCompacted' => '<string>',
            ],
            'startTimestamp' => <DateTime>,
        ],
        'type' => 'compaction',
    ],
]

Result Details

Members
CatalogId
Type: string

The Catalog ID of the table.

DatabaseName
Type: string

The name of the database in the catalog in which the table resides.

TableName
Type: string

The name of the table.

TableOptimizer
Type: TableOptimizer structure

The optimizer associated with the specified table.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

AccessDeniedException:

Access to a resource was denied.

InternalServiceException:

An internal service error occurred.

GetTableVersion

$result = $client->getTableVersion([/* ... */]);
$promise = $client->getTableVersionAsync([/* ... */]);

Retrieves a specified version of a table.

Parameter Syntax

$result = $client->getTableVersion([
    'CatalogId' => '<string>',
    'DatabaseName' => '<string>', // REQUIRED
    'TableName' => '<string>', // REQUIRED
    'VersionId' => '<string>',
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog where the tables reside. If none is provided, the Amazon Web Services account ID is used by default.

DatabaseName
Required: Yes
Type: string

The database in the catalog in which the table resides. For Hive compatibility, this name is entirely lowercase.

TableName
Required: Yes
Type: string

The name of the table. For Hive compatibility, this name is entirely lowercase.

VersionId
Type: string

The ID value of the table version to be retrieved. A VersionID is a string representation of an integer. Each version is incremented by 1.

Result Syntax

[
    'TableVersion' => [
        'Table' => [
            'CatalogId' => '<string>',
            'CreateTime' => <DateTime>,
            'CreatedBy' => '<string>',
            'DatabaseName' => '<string>',
            'Description' => '<string>',
            'FederatedTable' => [
                'ConnectionName' => '<string>',
                'DatabaseIdentifier' => '<string>',
                'Identifier' => '<string>',
            ],
            'IsMultiDialectView' => true || false,
            'IsRegisteredWithLakeFormation' => true || false,
            'LastAccessTime' => <DateTime>,
            'LastAnalyzedTime' => <DateTime>,
            'Name' => '<string>',
            'Owner' => '<string>',
            'Parameters' => ['<string>', ...],
            'PartitionKeys' => [
                [
                    'Comment' => '<string>',
                    'Name' => '<string>',
                    'Parameters' => ['<string>', ...],
                    'Type' => '<string>',
                ],
                // ...
            ],
            'Retention' => <integer>,
            'StorageDescriptor' => [
                'AdditionalLocations' => ['<string>', ...],
                'BucketColumns' => ['<string>', ...],
                'Columns' => [
                    [
                        'Comment' => '<string>',
                        'Name' => '<string>',
                        'Parameters' => ['<string>', ...],
                        'Type' => '<string>',
                    ],
                    // ...
                ],
                'Compressed' => true || false,
                'InputFormat' => '<string>',
                'Location' => '<string>',
                'NumberOfBuckets' => <integer>,
                'OutputFormat' => '<string>',
                'Parameters' => ['<string>', ...],
                'SchemaReference' => [
                    'SchemaId' => [
                        'RegistryName' => '<string>',
                        'SchemaArn' => '<string>',
                        'SchemaName' => '<string>',
                    ],
                    'SchemaVersionId' => '<string>',
                    'SchemaVersionNumber' => <integer>,
                ],
                'SerdeInfo' => [
                    'Name' => '<string>',
                    'Parameters' => ['<string>', ...],
                    'SerializationLibrary' => '<string>',
                ],
                'SkewedInfo' => [
                    'SkewedColumnNames' => ['<string>', ...],
                    'SkewedColumnValueLocationMaps' => ['<string>', ...],
                    'SkewedColumnValues' => ['<string>', ...],
                ],
                'SortColumns' => [
                    [
                        'Column' => '<string>',
                        'SortOrder' => <integer>,
                    ],
                    // ...
                ],
                'StoredAsSubDirectories' => true || false,
            ],
            'TableType' => '<string>',
            'TargetTable' => [
                'CatalogId' => '<string>',
                'DatabaseName' => '<string>',
                'Name' => '<string>',
                'Region' => '<string>',
            ],
            'UpdateTime' => <DateTime>,
            'VersionId' => '<string>',
            'ViewDefinition' => [
                'Definer' => '<string>',
                'IsProtected' => true || false,
                'Representations' => [
                    [
                        'Dialect' => 'REDSHIFT|ATHENA|SPARK',
                        'DialectVersion' => '<string>',
                        'IsStale' => true || false,
                        'ViewExpandedText' => '<string>',
                        'ViewOriginalText' => '<string>',
                    ],
                    // ...
                ],
                'SubObjects' => ['<string>', ...],
            ],
            'ViewExpandedText' => '<string>',
            'ViewOriginalText' => '<string>',
        ],
        'VersionId' => '<string>',
    ],
]

Result Details

Members
TableVersion
Type: TableVersion structure

The requested table version.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GlueEncryptionException:

An encryption operation failed.

GetTableVersions

$result = $client->getTableVersions([/* ... */]);
$promise = $client->getTableVersionsAsync([/* ... */]);

Retrieves a list of strings that identify available versions of a specified table.

Parameter Syntax

$result = $client->getTableVersions([
    'CatalogId' => '<string>',
    'DatabaseName' => '<string>', // REQUIRED
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
    'TableName' => '<string>', // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog where the tables reside. If none is provided, the Amazon Web Services account ID is used by default.

DatabaseName
Required: Yes
Type: string

The database in the catalog in which the table resides. For Hive compatibility, this name is entirely lowercase.

MaxResults
Type: int

The maximum number of table versions to return in one response.

NextToken
Type: string

A continuation token, if this is not the first call.

TableName
Required: Yes
Type: string

The name of the table. For Hive compatibility, this name is entirely lowercase.

Result Syntax

[
    'NextToken' => '<string>',
    'TableVersions' => [
        [
            'Table' => [
                'CatalogId' => '<string>',
                'CreateTime' => <DateTime>,
                'CreatedBy' => '<string>',
                'DatabaseName' => '<string>',
                'Description' => '<string>',
                'FederatedTable' => [
                    'ConnectionName' => '<string>',
                    'DatabaseIdentifier' => '<string>',
                    'Identifier' => '<string>',
                ],
                'IsMultiDialectView' => true || false,
                'IsRegisteredWithLakeFormation' => true || false,
                'LastAccessTime' => <DateTime>,
                'LastAnalyzedTime' => <DateTime>,
                'Name' => '<string>',
                'Owner' => '<string>',
                'Parameters' => ['<string>', ...],
                'PartitionKeys' => [
                    [
                        'Comment' => '<string>',
                        'Name' => '<string>',
                        'Parameters' => ['<string>', ...],
                        'Type' => '<string>',
                    ],
                    // ...
                ],
                'Retention' => <integer>,
                'StorageDescriptor' => [
                    'AdditionalLocations' => ['<string>', ...],
                    'BucketColumns' => ['<string>', ...],
                    'Columns' => [
                        [
                            'Comment' => '<string>',
                            'Name' => '<string>',
                            'Parameters' => ['<string>', ...],
                            'Type' => '<string>',
                        ],
                        // ...
                    ],
                    'Compressed' => true || false,
                    'InputFormat' => '<string>',
                    'Location' => '<string>',
                    'NumberOfBuckets' => <integer>,
                    'OutputFormat' => '<string>',
                    'Parameters' => ['<string>', ...],
                    'SchemaReference' => [
                        'SchemaId' => [
                            'RegistryName' => '<string>',
                            'SchemaArn' => '<string>',
                            'SchemaName' => '<string>',
                        ],
                        'SchemaVersionId' => '<string>',
                        'SchemaVersionNumber' => <integer>,
                    ],
                    'SerdeInfo' => [
                        'Name' => '<string>',
                        'Parameters' => ['<string>', ...],
                        'SerializationLibrary' => '<string>',
                    ],
                    'SkewedInfo' => [
                        'SkewedColumnNames' => ['<string>', ...],
                        'SkewedColumnValueLocationMaps' => ['<string>', ...],
                        'SkewedColumnValues' => ['<string>', ...],
                    ],
                    'SortColumns' => [
                        [
                            'Column' => '<string>',
                            'SortOrder' => <integer>,
                        ],
                        // ...
                    ],
                    'StoredAsSubDirectories' => true || false,
                ],
                'TableType' => '<string>',
                'TargetTable' => [
                    'CatalogId' => '<string>',
                    'DatabaseName' => '<string>',
                    'Name' => '<string>',
                    'Region' => '<string>',
                ],
                'UpdateTime' => <DateTime>,
                'VersionId' => '<string>',
                'ViewDefinition' => [
                    'Definer' => '<string>',
                    'IsProtected' => true || false,
                    'Representations' => [
                        [
                            'Dialect' => 'REDSHIFT|ATHENA|SPARK',
                            'DialectVersion' => '<string>',
                            'IsStale' => true || false,
                            'ViewExpandedText' => '<string>',
                            'ViewOriginalText' => '<string>',
                        ],
                        // ...
                    ],
                    'SubObjects' => ['<string>', ...],
                ],
                'ViewExpandedText' => '<string>',
                'ViewOriginalText' => '<string>',
            ],
            'VersionId' => '<string>',
        ],
        // ...
    ],
]

Result Details

Members
NextToken
Type: string

A continuation token, if the list of available versions does not include the last one.

TableVersions
Type: Array of TableVersion structures

A list of strings identifying available versions of the specified table.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GlueEncryptionException:

An encryption operation failed.

GetTables

$result = $client->getTables([/* ... */]);
$promise = $client->getTablesAsync([/* ... */]);

Retrieves the definitions of some or all of the tables in a given Database.

Parameter Syntax

$result = $client->getTables([
    'CatalogId' => '<string>',
    'DatabaseName' => '<string>', // REQUIRED
    'Expression' => '<string>',
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
    'QueryAsOfTime' => <integer || string || DateTime>,
    'TransactionId' => '<string>',
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog where the tables reside. If none is provided, the Amazon Web Services account ID is used by default.

DatabaseName
Required: Yes
Type: string

The database in the catalog whose tables to list. For Hive compatibility, this name is entirely lowercase.

Expression
Type: string

A regular expression pattern. If present, only those tables whose names match the pattern are returned.

MaxResults
Type: int

The maximum number of tables to return in a single response.

NextToken
Type: string

A continuation token, included if this is a continuation call.

QueryAsOfTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The time as of when to read the table contents. If not set, the most recent transaction commit time will be used. Cannot be specified along with TransactionId.

TransactionId
Type: string

The transaction ID at which to read the table contents.

Result Syntax

[
    'NextToken' => '<string>',
    'TableList' => [
        [
            'CatalogId' => '<string>',
            'CreateTime' => <DateTime>,
            'CreatedBy' => '<string>',
            'DatabaseName' => '<string>',
            'Description' => '<string>',
            'FederatedTable' => [
                'ConnectionName' => '<string>',
                'DatabaseIdentifier' => '<string>',
                'Identifier' => '<string>',
            ],
            'IsMultiDialectView' => true || false,
            'IsRegisteredWithLakeFormation' => true || false,
            'LastAccessTime' => <DateTime>,
            'LastAnalyzedTime' => <DateTime>,
            'Name' => '<string>',
            'Owner' => '<string>',
            'Parameters' => ['<string>', ...],
            'PartitionKeys' => [
                [
                    'Comment' => '<string>',
                    'Name' => '<string>',
                    'Parameters' => ['<string>', ...],
                    'Type' => '<string>',
                ],
                // ...
            ],
            'Retention' => <integer>,
            'StorageDescriptor' => [
                'AdditionalLocations' => ['<string>', ...],
                'BucketColumns' => ['<string>', ...],
                'Columns' => [
                    [
                        'Comment' => '<string>',
                        'Name' => '<string>',
                        'Parameters' => ['<string>', ...],
                        'Type' => '<string>',
                    ],
                    // ...
                ],
                'Compressed' => true || false,
                'InputFormat' => '<string>',
                'Location' => '<string>',
                'NumberOfBuckets' => <integer>,
                'OutputFormat' => '<string>',
                'Parameters' => ['<string>', ...],
                'SchemaReference' => [
                    'SchemaId' => [
                        'RegistryName' => '<string>',
                        'SchemaArn' => '<string>',
                        'SchemaName' => '<string>',
                    ],
                    'SchemaVersionId' => '<string>',
                    'SchemaVersionNumber' => <integer>,
                ],
                'SerdeInfo' => [
                    'Name' => '<string>',
                    'Parameters' => ['<string>', ...],
                    'SerializationLibrary' => '<string>',
                ],
                'SkewedInfo' => [
                    'SkewedColumnNames' => ['<string>', ...],
                    'SkewedColumnValueLocationMaps' => ['<string>', ...],
                    'SkewedColumnValues' => ['<string>', ...],
                ],
                'SortColumns' => [
                    [
                        'Column' => '<string>',
                        'SortOrder' => <integer>,
                    ],
                    // ...
                ],
                'StoredAsSubDirectories' => true || false,
            ],
            'TableType' => '<string>',
            'TargetTable' => [
                'CatalogId' => '<string>',
                'DatabaseName' => '<string>',
                'Name' => '<string>',
                'Region' => '<string>',
            ],
            'UpdateTime' => <DateTime>,
            'VersionId' => '<string>',
            'ViewDefinition' => [
                'Definer' => '<string>',
                'IsProtected' => true || false,
                'Representations' => [
                    [
                        'Dialect' => 'REDSHIFT|ATHENA|SPARK',
                        'DialectVersion' => '<string>',
                        'IsStale' => true || false,
                        'ViewExpandedText' => '<string>',
                        'ViewOriginalText' => '<string>',
                    ],
                    // ...
                ],
                'SubObjects' => ['<string>', ...],
            ],
            'ViewExpandedText' => '<string>',
            'ViewOriginalText' => '<string>',
        ],
        // ...
    ],
]

Result Details

Members
NextToken
Type: string

A continuation token, present if the current list segment is not the last.

TableList
Type: Array of Table structures

A list of the requested Table objects.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

GlueEncryptionException:

An encryption operation failed.

FederationSourceException:

A federation source failed.

FederationSourceRetryableException:

A federation source failed, but the operation may be retried.

GetTags

$result = $client->getTags([/* ... */]);
$promise = $client->getTagsAsync([/* ... */]);

Retrieves a list of tags associated with a resource.

Parameter Syntax

$result = $client->getTags([
    'ResourceArn' => '<string>', // REQUIRED
]);

Parameter Details

Members
ResourceArn
Required: Yes
Type: string

The Amazon Resource Name (ARN) of the resource for which to retrieve tags.

Result Syntax

[
    'Tags' => ['<string>', ...],
]

Result Details

Members
Tags
Type: Associative array of custom strings keys (TagKey) to strings

The requested tags.

Errors

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

EntityNotFoundException:

A specified entity does not exist

GetTrigger

$result = $client->getTrigger([/* ... */]);
$promise = $client->getTriggerAsync([/* ... */]);

Retrieves the definition of a trigger.

Parameter Syntax

$result = $client->getTrigger([
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the trigger to retrieve.

Result Syntax

[
    'Trigger' => [
        'Actions' => [
            [
                'Arguments' => ['<string>', ...],
                'CrawlerName' => '<string>',
                'JobName' => '<string>',
                'NotificationProperty' => [
                    'NotifyDelayAfter' => <integer>,
                ],
                'SecurityConfiguration' => '<string>',
                'Timeout' => <integer>,
            ],
            // ...
        ],
        'Description' => '<string>',
        'EventBatchingCondition' => [
            'BatchSize' => <integer>,
            'BatchWindow' => <integer>,
        ],
        'Id' => '<string>',
        'Name' => '<string>',
        'Predicate' => [
            'Conditions' => [
                [
                    'CrawlState' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR',
                    'CrawlerName' => '<string>',
                    'JobName' => '<string>',
                    'LogicalOperator' => 'EQUALS',
                    'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED',
                ],
                // ...
            ],
            'Logical' => 'AND|ANY',
        ],
        'Schedule' => '<string>',
        'State' => 'CREATING|CREATED|ACTIVATING|ACTIVATED|DEACTIVATING|DEACTIVATED|DELETING|UPDATING',
        'Type' => 'SCHEDULED|CONDITIONAL|ON_DEMAND|EVENT',
        'WorkflowName' => '<string>',
    ],
]

Result Details

Members
Trigger
Type: Trigger structure

The requested trigger definition.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GetTriggers

$result = $client->getTriggers([/* ... */]);
$promise = $client->getTriggersAsync([/* ... */]);

Gets all the triggers associated with a job.

Parameter Syntax

$result = $client->getTriggers([
    'DependentJobName' => '<string>',
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
]);

Parameter Details

Members
DependentJobName
Type: string

The name of the job to retrieve triggers for. The trigger that can start this job is returned, and if there is no such trigger, all triggers are returned.

MaxResults
Type: int

The maximum size of the response.

NextToken
Type: string

A continuation token, if this is a continuation call.

Result Syntax

[
    'NextToken' => '<string>',
    'Triggers' => [
        [
            'Actions' => [
                [
                    'Arguments' => ['<string>', ...],
                    'CrawlerName' => '<string>',
                    'JobName' => '<string>',
                    'NotificationProperty' => [
                        'NotifyDelayAfter' => <integer>,
                    ],
                    'SecurityConfiguration' => '<string>',
                    'Timeout' => <integer>,
                ],
                // ...
            ],
            'Description' => '<string>',
            'EventBatchingCondition' => [
                'BatchSize' => <integer>,
                'BatchWindow' => <integer>,
            ],
            'Id' => '<string>',
            'Name' => '<string>',
            'Predicate' => [
                'Conditions' => [
                    [
                        'CrawlState' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR',
                        'CrawlerName' => '<string>',
                        'JobName' => '<string>',
                        'LogicalOperator' => 'EQUALS',
                        'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED',
                    ],
                    // ...
                ],
                'Logical' => 'AND|ANY',
            ],
            'Schedule' => '<string>',
            'State' => 'CREATING|CREATED|ACTIVATING|ACTIVATED|DEACTIVATING|DEACTIVATED|DELETING|UPDATING',
            'Type' => 'SCHEDULED|CONDITIONAL|ON_DEMAND|EVENT',
            'WorkflowName' => '<string>',
        ],
        // ...
    ],
]

Result Details

Members
NextToken
Type: string

A continuation token, if not all the requested triggers have yet been returned.

Triggers
Type: Array of Trigger structures

A list of triggers for the specified job.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GetUnfilteredPartitionMetadata

$result = $client->getUnfilteredPartitionMetadata([/* ... */]);
$promise = $client->getUnfilteredPartitionMetadataAsync([/* ... */]);

Retrieves partition metadata from the Data Catalog that contains unfiltered metadata.

For IAM authorization, the public IAM action associated with this API is glue:GetPartition.

Parameter Syntax

$result = $client->getUnfilteredPartitionMetadata([
    'AuditContext' => [
        'AdditionalAuditContext' => '<string>',
        'AllColumnsRequested' => true || false,
        'RequestedColumns' => ['<string>', ...],
    ],
    'CatalogId' => '<string>', // REQUIRED
    'DatabaseName' => '<string>', // REQUIRED
    'PartitionValues' => ['<string>', ...], // REQUIRED
    'QuerySessionContext' => [
        'AdditionalContext' => ['<string>', ...],
        'ClusterId' => '<string>',
        'QueryAuthorizationId' => '<string>',
        'QueryId' => '<string>',
        'QueryStartTime' => <integer || string || DateTime>,
    ],
    'Region' => '<string>',
    'SupportedPermissionTypes' => ['<string>', ...], // REQUIRED
    'TableName' => '<string>', // REQUIRED
]);

Parameter Details

Members
AuditContext
Type: AuditContext structure

A structure containing Lake Formation audit context information.

CatalogId
Required: Yes
Type: string

The catalog ID where the partition resides.

DatabaseName
Required: Yes
Type: string

(Required) Specifies the name of a database that contains the partition.

PartitionValues
Required: Yes
Type: Array of strings

(Required) A list of partition key values.

QuerySessionContext
Type: QuerySessionContext structure

A structure used as a protocol between query engines and Lake Formation or Glue. Contains both a Lake Formation generated authorization identifier and information from the request's authorization context.

Region
Type: string

Specified only if the base tables belong to a different Amazon Web Services Region.

SupportedPermissionTypes
Required: Yes
Type: Array of strings

(Required) A list of supported permission types.

TableName
Required: Yes
Type: string

(Required) Specifies the name of a table that contains the partition.

Result Syntax

[
    'AuthorizedColumns' => ['<string>', ...],
    'IsRegisteredWithLakeFormation' => true || false,
    'Partition' => [
        'CatalogId' => '<string>',
        'CreationTime' => <DateTime>,
        'DatabaseName' => '<string>',
        'LastAccessTime' => <DateTime>,
        'LastAnalyzedTime' => <DateTime>,
        'Parameters' => ['<string>', ...],
        'StorageDescriptor' => [
            'AdditionalLocations' => ['<string>', ...],
            'BucketColumns' => ['<string>', ...],
            'Columns' => [
                [
                    'Comment' => '<string>',
                    'Name' => '<string>',
                    'Parameters' => ['<string>', ...],
                    'Type' => '<string>',
                ],
                // ...
            ],
            'Compressed' => true || false,
            'InputFormat' => '<string>',
            'Location' => '<string>',
            'NumberOfBuckets' => <integer>,
            'OutputFormat' => '<string>',
            'Parameters' => ['<string>', ...],
            'SchemaReference' => [
                'SchemaId' => [
                    'RegistryName' => '<string>',
                    'SchemaArn' => '<string>',
                    'SchemaName' => '<string>',
                ],
                'SchemaVersionId' => '<string>',
                'SchemaVersionNumber' => <integer>,
            ],
            'SerdeInfo' => [
                'Name' => '<string>',
                'Parameters' => ['<string>', ...],
                'SerializationLibrary' => '<string>',
            ],
            'SkewedInfo' => [
                'SkewedColumnNames' => ['<string>', ...],
                'SkewedColumnValueLocationMaps' => ['<string>', ...],
                'SkewedColumnValues' => ['<string>', ...],
            ],
            'SortColumns' => [
                [
                    'Column' => '<string>',
                    'SortOrder' => <integer>,
                ],
                // ...
            ],
            'StoredAsSubDirectories' => true || false,
        ],
        'TableName' => '<string>',
        'Values' => ['<string>', ...],
    ],
]

Result Details

Members
AuthorizedColumns
Type: Array of strings

A list of column names that the user has been granted access to.

IsRegisteredWithLakeFormation
Type: boolean

A Boolean value that indicates whether the partition location is registered with Lake Formation.

Partition
Type: Partition structure

A Partition object containing the partition metadata.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GlueEncryptionException:

An encryption operation failed.

PermissionTypeMismatchException:

The operation timed out.

FederationSourceException:

A federation source failed.

FederationSourceRetryableException:

A federation source failed, but the operation may be retried.

GetUnfilteredPartitionsMetadata

$result = $client->getUnfilteredPartitionsMetadata([/* ... */]);
$promise = $client->getUnfilteredPartitionsMetadataAsync([/* ... */]);

Retrieves partition metadata from the Data Catalog that contains unfiltered metadata.

For IAM authorization, the public IAM action associated with this API is glue:GetPartitions.

Parameter Syntax

$result = $client->getUnfilteredPartitionsMetadata([
    'AuditContext' => [
        'AdditionalAuditContext' => '<string>',
        'AllColumnsRequested' => true || false,
        'RequestedColumns' => ['<string>', ...],
    ],
    'CatalogId' => '<string>', // REQUIRED
    'DatabaseName' => '<string>', // REQUIRED
    'Expression' => '<string>',
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
    'QuerySessionContext' => [
        'AdditionalContext' => ['<string>', ...],
        'ClusterId' => '<string>',
        'QueryAuthorizationId' => '<string>',
        'QueryId' => '<string>',
        'QueryStartTime' => <integer || string || DateTime>,
    ],
    'Region' => '<string>',
    'Segment' => [
        'SegmentNumber' => <integer>, // REQUIRED
        'TotalSegments' => <integer>, // REQUIRED
    ],
    'SupportedPermissionTypes' => ['<string>', ...], // REQUIRED
    'TableName' => '<string>', // REQUIRED
]);

Parameter Details

Members
AuditContext
Type: AuditContext structure

A structure containing Lake Formation audit context information.

CatalogId
Required: Yes
Type: string

The ID of the Data Catalog where the partitions in question reside. If none is provided, the AWS account ID is used by default.

DatabaseName
Required: Yes
Type: string

The name of the catalog database where the partitions reside.

Expression
Type: string

An expression that filters the partitions to be returned.

The expression uses SQL syntax similar to the SQL WHERE filter clause. The SQL statement parser JSQLParser parses the expression.

Operators: The following are the operators that you can use in the Expression API call:

=

Checks whether the values of the two operands are equal; if yes, then the condition becomes true.

Example: Assume 'variable a' holds 10 and 'variable b' holds 20.

(a = b) is not true.

< >

Checks whether the values of two operands are equal; if the values are not equal, then the condition becomes true.

Example: (a < > b) is true.

>

Checks whether the value of the left operand is greater than the value of the right operand; if yes, then the condition becomes true.

Example: (a > b) is not true.

<

Checks whether the value of the left operand is less than the value of the right operand; if yes, then the condition becomes true.

Example: (a < b) is true.

>=

Checks whether the value of the left operand is greater than or equal to the value of the right operand; if yes, then the condition becomes true.

Example: (a >= b) is not true.

<=

Checks whether the value of the left operand is less than or equal to the value of the right operand; if yes, then the condition becomes true.

Example: (a <= b) is true.

AND, OR, IN, BETWEEN, LIKE, NOT, IS NULL

Logical operators.

Supported Partition Key Types: The following are the supported partition keys.

  • string

  • date

  • timestamp

  • int

  • bigint

  • long

  • tinyint

  • smallint

  • decimal

If an type is encountered that is not valid, an exception is thrown.

MaxResults
Type: int

The maximum number of partitions to return in a single response.

NextToken
Type: string

A continuation token, if this is not the first call to retrieve these partitions.

QuerySessionContext
Type: QuerySessionContext structure

A structure used as a protocol between query engines and Lake Formation or Glue. Contains both a Lake Formation generated authorization identifier and information from the request's authorization context.

Region
Type: string

Specified only if the base tables belong to a different Amazon Web Services Region.

Segment
Type: Segment structure

The segment of the table's partitions to scan in this request.

SupportedPermissionTypes
Required: Yes
Type: Array of strings

A list of supported permission types.

TableName
Required: Yes
Type: string

The name of the table that contains the partition.

Result Syntax

[
    'NextToken' => '<string>',
    'UnfilteredPartitions' => [
        [
            'AuthorizedColumns' => ['<string>', ...],
            'IsRegisteredWithLakeFormation' => true || false,
            'Partition' => [
                'CatalogId' => '<string>',
                'CreationTime' => <DateTime>,
                'DatabaseName' => '<string>',
                'LastAccessTime' => <DateTime>,
                'LastAnalyzedTime' => <DateTime>,
                'Parameters' => ['<string>', ...],
                'StorageDescriptor' => [
                    'AdditionalLocations' => ['<string>', ...],
                    'BucketColumns' => ['<string>', ...],
                    'Columns' => [
                        [
                            'Comment' => '<string>',
                            'Name' => '<string>',
                            'Parameters' => ['<string>', ...],
                            'Type' => '<string>',
                        ],
                        // ...
                    ],
                    'Compressed' => true || false,
                    'InputFormat' => '<string>',
                    'Location' => '<string>',
                    'NumberOfBuckets' => <integer>,
                    'OutputFormat' => '<string>',
                    'Parameters' => ['<string>', ...],
                    'SchemaReference' => [
                        'SchemaId' => [
                            'RegistryName' => '<string>',
                            'SchemaArn' => '<string>',
                            'SchemaName' => '<string>',
                        ],
                        'SchemaVersionId' => '<string>',
                        'SchemaVersionNumber' => <integer>,
                    ],
                    'SerdeInfo' => [
                        'Name' => '<string>',
                        'Parameters' => ['<string>', ...],
                        'SerializationLibrary' => '<string>',
                    ],
                    'SkewedInfo' => [
                        'SkewedColumnNames' => ['<string>', ...],
                        'SkewedColumnValueLocationMaps' => ['<string>', ...],
                        'SkewedColumnValues' => ['<string>', ...],
                    ],
                    'SortColumns' => [
                        [
                            'Column' => '<string>',
                            'SortOrder' => <integer>,
                        ],
                        // ...
                    ],
                    'StoredAsSubDirectories' => true || false,
                ],
                'TableName' => '<string>',
                'Values' => ['<string>', ...],
            ],
        ],
        // ...
    ],
]

Result Details

Members
NextToken
Type: string

A continuation token, if the returned list of partitions does not include the last one.

UnfilteredPartitions
Type: Array of UnfilteredPartition structures

A list of requested partitions.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GlueEncryptionException:

An encryption operation failed.

PermissionTypeMismatchException:

The operation timed out.

FederationSourceException:

A federation source failed.

FederationSourceRetryableException:

A federation source failed, but the operation may be retried.

GetUnfilteredTableMetadata

$result = $client->getUnfilteredTableMetadata([/* ... */]);
$promise = $client->getUnfilteredTableMetadataAsync([/* ... */]);

Allows a third-party analytical engine to retrieve unfiltered table metadata from the Data Catalog.

For IAM authorization, the public IAM action associated with this API is glue:GetTable.

Parameter Syntax

$result = $client->getUnfilteredTableMetadata([
    'AuditContext' => [
        'AdditionalAuditContext' => '<string>',
        'AllColumnsRequested' => true || false,
        'RequestedColumns' => ['<string>', ...],
    ],
    'CatalogId' => '<string>', // REQUIRED
    'DatabaseName' => '<string>', // REQUIRED
    'Name' => '<string>', // REQUIRED
    'ParentResourceArn' => '<string>',
    'Permissions' => ['<string>', ...],
    'QuerySessionContext' => [
        'AdditionalContext' => ['<string>', ...],
        'ClusterId' => '<string>',
        'QueryAuthorizationId' => '<string>',
        'QueryId' => '<string>',
        'QueryStartTime' => <integer || string || DateTime>,
    ],
    'Region' => '<string>',
    'RootResourceArn' => '<string>',
    'SupportedDialect' => [
        'Dialect' => 'REDSHIFT|ATHENA|SPARK',
        'DialectVersion' => '<string>',
    ],
    'SupportedPermissionTypes' => ['<string>', ...], // REQUIRED
]);

Parameter Details

Members
AuditContext
Type: AuditContext structure

A structure containing Lake Formation audit context information.

CatalogId
Required: Yes
Type: string

The catalog ID where the table resides.

DatabaseName
Required: Yes
Type: string

(Required) Specifies the name of a database that contains the table.

Name
Required: Yes
Type: string

(Required) Specifies the name of a table for which you are requesting metadata.

ParentResourceArn
Type: string

The resource ARN of the view.

Permissions
Type: Array of strings

The Lake Formation data permissions of the caller on the table. Used to authorize the call when no view context is found.

QuerySessionContext
Type: QuerySessionContext structure

A structure used as a protocol between query engines and Lake Formation or Glue. Contains both a Lake Formation generated authorization identifier and information from the request's authorization context.

Region
Type: string

Specified only if the base tables belong to a different Amazon Web Services Region.

RootResourceArn
Type: string

The resource ARN of the root view in a chain of nested views.

SupportedDialect
Type: SupportedDialect structure

A structure specifying the dialect and dialect version used by the query engine.

SupportedPermissionTypes
Required: Yes
Type: Array of strings

Indicates the level of filtering a third-party analytical engine is capable of enforcing when calling the GetUnfilteredTableMetadata API operation. Accepted values are:

  • COLUMN_PERMISSION - Column permissions ensure that users can access only specific columns in the table. If there are particular columns contain sensitive data, data lake administrators can define column filters that exclude access to specific columns.

  • CELL_FILTER_PERMISSION - Cell-level filtering combines column filtering (include or exclude columns) and row filter expressions to restrict access to individual elements in the table.

  • NESTED_PERMISSION - Nested permissions combines cell-level filtering and nested column filtering to restrict access to columns and/or nested columns in specific rows based on row filter expressions.

  • NESTED_CELL_PERMISSION - Nested cell permissions combines nested permission with nested cell-level filtering. This allows different subsets of nested columns to be restricted based on an array of row filter expressions.

Note: Each of these permission types follows a hierarchical order where each subsequent permission type includes all permission of the previous type.

Important: If you provide a supported permission type that doesn't match the user's level of permissions on the table, then Lake Formation raises an exception. For example, if the third-party engine calling the GetUnfilteredTableMetadata operation can enforce only column-level filtering, and the user has nested cell filtering applied on the table, Lake Formation throws an exception, and will not return unfiltered table metadata and data access credentials.

Result Syntax

[
    'AuthorizedColumns' => ['<string>', ...],
    'CellFilters' => [
        [
            'ColumnName' => '<string>',
            'RowFilterExpression' => '<string>',
        ],
        // ...
    ],
    'IsMultiDialectView' => true || false,
    'IsProtected' => true || false,
    'IsRegisteredWithLakeFormation' => true || false,
    'Permissions' => ['<string>', ...],
    'QueryAuthorizationId' => '<string>',
    'ResourceArn' => '<string>',
    'RowFilter' => '<string>',
    'Table' => [
        'CatalogId' => '<string>',
        'CreateTime' => <DateTime>,
        'CreatedBy' => '<string>',
        'DatabaseName' => '<string>',
        'Description' => '<string>',
        'FederatedTable' => [
            'ConnectionName' => '<string>',
            'DatabaseIdentifier' => '<string>',
            'Identifier' => '<string>',
        ],
        'IsMultiDialectView' => true || false,
        'IsRegisteredWithLakeFormation' => true || false,
        'LastAccessTime' => <DateTime>,
        'LastAnalyzedTime' => <DateTime>,
        'Name' => '<string>',
        'Owner' => '<string>',
        'Parameters' => ['<string>', ...],
        'PartitionKeys' => [
            [
                'Comment' => '<string>',
                'Name' => '<string>',
                'Parameters' => ['<string>', ...],
                'Type' => '<string>',
            ],
            // ...
        ],
        'Retention' => <integer>,
        'StorageDescriptor' => [
            'AdditionalLocations' => ['<string>', ...],
            'BucketColumns' => ['<string>', ...],
            'Columns' => [
                [
                    'Comment' => '<string>',
                    'Name' => '<string>',
                    'Parameters' => ['<string>', ...],
                    'Type' => '<string>',
                ],
                // ...
            ],
            'Compressed' => true || false,
            'InputFormat' => '<string>',
            'Location' => '<string>',
            'NumberOfBuckets' => <integer>,
            'OutputFormat' => '<string>',
            'Parameters' => ['<string>', ...],
            'SchemaReference' => [
                'SchemaId' => [
                    'RegistryName' => '<string>',
                    'SchemaArn' => '<string>',
                    'SchemaName' => '<string>',
                ],
                'SchemaVersionId' => '<string>',
                'SchemaVersionNumber' => <integer>,
            ],
            'SerdeInfo' => [
                'Name' => '<string>',
                'Parameters' => ['<string>', ...],
                'SerializationLibrary' => '<string>',
            ],
            'SkewedInfo' => [
                'SkewedColumnNames' => ['<string>', ...],
                'SkewedColumnValueLocationMaps' => ['<string>', ...],
                'SkewedColumnValues' => ['<string>', ...],
            ],
            'SortColumns' => [
                [
                    'Column' => '<string>',
                    'SortOrder' => <integer>,
                ],
                // ...
            ],
            'StoredAsSubDirectories' => true || false,
        ],
        'TableType' => '<string>',
        'TargetTable' => [
            'CatalogId' => '<string>',
            'DatabaseName' => '<string>',
            'Name' => '<string>',
            'Region' => '<string>',
        ],
        'UpdateTime' => <DateTime>,
        'VersionId' => '<string>',
        'ViewDefinition' => [
            'Definer' => '<string>',
            'IsProtected' => true || false,
            'Representations' => [
                [
                    'Dialect' => 'REDSHIFT|ATHENA|SPARK',
                    'DialectVersion' => '<string>',
                    'IsStale' => true || false,
                    'ViewExpandedText' => '<string>',
                    'ViewOriginalText' => '<string>',
                ],
                // ...
            ],
            'SubObjects' => ['<string>', ...],
        ],
        'ViewExpandedText' => '<string>',
        'ViewOriginalText' => '<string>',
    ],
]

Result Details

Members
AuthorizedColumns
Type: Array of strings

A list of column names that the user has been granted access to.

CellFilters
Type: Array of ColumnRowFilter structures

A list of column row filters.

IsMultiDialectView
Type: boolean

Specifies whether the view supports the SQL dialects of one or more different query engines and can therefore be read by those engines.

IsProtected
Type: boolean

A flag that instructs the engine not to push user-provided operations into the logical plan of the view during query planning. However, if set this flag does not guarantee that the engine will comply. Refer to the engine's documentation to understand the guarantees provided, if any.

IsRegisteredWithLakeFormation
Type: boolean

A Boolean value that indicates whether the partition location is registered with Lake Formation.

Permissions
Type: Array of strings

The Lake Formation data permissions of the caller on the table. Used to authorize the call when no view context is found.

QueryAuthorizationId
Type: string

A cryptographically generated query identifier generated by Glue or Lake Formation.

ResourceArn
Type: string

The resource ARN of the parent resource extracted from the request.

RowFilter
Type: string

The filter that applies to the table. For example when applying the filter in SQL, it would go in the WHERE clause and can be evaluated by using an AND operator with any other predicates applied by the user querying the table.

Table
Type: Table structure

A Table object containing the table metadata.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GlueEncryptionException:

An encryption operation failed.

PermissionTypeMismatchException:

The operation timed out.

FederationSourceException:

A federation source failed.

FederationSourceRetryableException:

A federation source failed, but the operation may be retried.

GetUserDefinedFunction

$result = $client->getUserDefinedFunction([/* ... */]);
$promise = $client->getUserDefinedFunctionAsync([/* ... */]);

Retrieves a specified function definition from the Data Catalog.

Parameter Syntax

$result = $client->getUserDefinedFunction([
    'CatalogId' => '<string>',
    'DatabaseName' => '<string>', // REQUIRED
    'FunctionName' => '<string>', // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog where the function to be retrieved is located. If none is provided, the Amazon Web Services account ID is used by default.

DatabaseName
Required: Yes
Type: string

The name of the catalog database where the function is located.

FunctionName
Required: Yes
Type: string

The name of the function.

Result Syntax

[
    'UserDefinedFunction' => [
        'CatalogId' => '<string>',
        'ClassName' => '<string>',
        'CreateTime' => <DateTime>,
        'DatabaseName' => '<string>',
        'FunctionName' => '<string>',
        'OwnerName' => '<string>',
        'OwnerType' => 'USER|ROLE|GROUP',
        'ResourceUris' => [
            [
                'ResourceType' => 'JAR|FILE|ARCHIVE',
                'Uri' => '<string>',
            ],
            // ...
        ],
    ],
]

Result Details

Members
UserDefinedFunction
Type: UserDefinedFunction structure

The requested function definition.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GlueEncryptionException:

An encryption operation failed.

GetUserDefinedFunctions

$result = $client->getUserDefinedFunctions([/* ... */]);
$promise = $client->getUserDefinedFunctionsAsync([/* ... */]);

Retrieves multiple function definitions from the Data Catalog.

Parameter Syntax

$result = $client->getUserDefinedFunctions([
    'CatalogId' => '<string>',
    'DatabaseName' => '<string>',
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
    'Pattern' => '<string>', // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog where the functions to be retrieved are located. If none is provided, the Amazon Web Services account ID is used by default.

DatabaseName
Type: string

The name of the catalog database where the functions are located. If none is provided, functions from all the databases across the catalog will be returned.

MaxResults
Type: int

The maximum number of functions to return in one response.

NextToken
Type: string

A continuation token, if this is a continuation call.

Pattern
Required: Yes
Type: string

An optional function-name pattern string that filters the function definitions returned.

Result Syntax

[
    'NextToken' => '<string>',
    'UserDefinedFunctions' => [
        [
            'CatalogId' => '<string>',
            'ClassName' => '<string>',
            'CreateTime' => <DateTime>,
            'DatabaseName' => '<string>',
            'FunctionName' => '<string>',
            'OwnerName' => '<string>',
            'OwnerType' => 'USER|ROLE|GROUP',
            'ResourceUris' => [
                [
                    'ResourceType' => 'JAR|FILE|ARCHIVE',
                    'Uri' => '<string>',
                ],
                // ...
            ],
        ],
        // ...
    ],
]

Result Details

Members
NextToken
Type: string

A continuation token, if the list of functions returned does not include the last requested function.

UserDefinedFunctions
Type: Array of UserDefinedFunction structures

A list of requested function definitions.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

GlueEncryptionException:

An encryption operation failed.

GetWorkflow

$result = $client->getWorkflow([/* ... */]);
$promise = $client->getWorkflowAsync([/* ... */]);

Retrieves resource metadata for a workflow.

Parameter Syntax

$result = $client->getWorkflow([
    'IncludeGraph' => true || false,
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
IncludeGraph
Type: boolean

Specifies whether to include a graph when returning the workflow resource metadata.

Name
Required: Yes
Type: string

The name of the workflow to retrieve.

Result Syntax

[
    'Workflow' => [
        'BlueprintDetails' => [
            'BlueprintName' => '<string>',
            'RunId' => '<string>',
        ],
        'CreatedOn' => <DateTime>,
        'DefaultRunProperties' => ['<string>', ...],
        'Description' => '<string>',
        'Graph' => [
            'Edges' => [
                [
                    'DestinationId' => '<string>',
                    'SourceId' => '<string>',
                ],
                // ...
            ],
            'Nodes' => [
                [
                    'CrawlerDetails' => [
                        'Crawls' => [
                            [
                                'CompletedOn' => <DateTime>,
                                'ErrorMessage' => '<string>',
                                'LogGroup' => '<string>',
                                'LogStream' => '<string>',
                                'StartedOn' => <DateTime>,
                                'State' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR',
                            ],
                            // ...
                        ],
                    ],
                    'JobDetails' => [
                        'JobRuns' => [
                            [
                                'AllocatedCapacity' => <integer>,
                                'Arguments' => ['<string>', ...],
                                'Attempt' => <integer>,
                                'CompletedOn' => <DateTime>,
                                'DPUSeconds' => <float>,
                                'ErrorMessage' => '<string>',
                                'ExecutionClass' => 'FLEX|STANDARD',
                                'ExecutionTime' => <integer>,
                                'GlueVersion' => '<string>',
                                'Id' => '<string>',
                                'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK',
                                'JobName' => '<string>',
                                'JobRunState' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED',
                                'LastModifiedOn' => <DateTime>,
                                'LogGroupName' => '<string>',
                                'MaintenanceWindow' => '<string>',
                                'MaxCapacity' => <float>,
                                'NotificationProperty' => [
                                    'NotifyDelayAfter' => <integer>,
                                ],
                                'NumberOfWorkers' => <integer>,
                                'PredecessorRuns' => [
                                    [
                                        'JobName' => '<string>',
                                        'RunId' => '<string>',
                                    ],
                                    // ...
                                ],
                                'PreviousRunId' => '<string>',
                                'SecurityConfiguration' => '<string>',
                                'StartedOn' => <DateTime>,
                                'Timeout' => <integer>,
                                'TriggerName' => '<string>',
                                'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X',
                            ],
                            // ...
                        ],
                    ],
                    'Name' => '<string>',
                    'TriggerDetails' => [
                        'Trigger' => [
                            'Actions' => [
                                [
                                    'Arguments' => ['<string>', ...],
                                    'CrawlerName' => '<string>',
                                    'JobName' => '<string>',
                                    'NotificationProperty' => [
                                        'NotifyDelayAfter' => <integer>,
                                    ],
                                    'SecurityConfiguration' => '<string>',
                                    'Timeout' => <integer>,
                                ],
                                // ...
                            ],
                            'Description' => '<string>',
                            'EventBatchingCondition' => [
                                'BatchSize' => <integer>,
                                'BatchWindow' => <integer>,
                            ],
                            'Id' => '<string>',
                            'Name' => '<string>',
                            'Predicate' => [
                                'Conditions' => [
                                    [
                                        'CrawlState' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR',
                                        'CrawlerName' => '<string>',
                                        'JobName' => '<string>',
                                        'LogicalOperator' => 'EQUALS',
                                        'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED',
                                    ],
                                    // ...
                                ],
                                'Logical' => 'AND|ANY',
                            ],
                            'Schedule' => '<string>',
                            'State' => 'CREATING|CREATED|ACTIVATING|ACTIVATED|DEACTIVATING|DEACTIVATED|DELETING|UPDATING',
                            'Type' => 'SCHEDULED|CONDITIONAL|ON_DEMAND|EVENT',
                            'WorkflowName' => '<string>',
                        ],
                    ],
                    'Type' => 'CRAWLER|JOB|TRIGGER',
                    'UniqueId' => '<string>',
                ],
                // ...
            ],
        ],
        'LastModifiedOn' => <DateTime>,
        'LastRun' => [
            'CompletedOn' => <DateTime>,
            'ErrorMessage' => '<string>',
            'Graph' => [
                'Edges' => [
                    [
                        'DestinationId' => '<string>',
                        'SourceId' => '<string>',
                    ],
                    // ...
                ],
                'Nodes' => [
                    [
                        'CrawlerDetails' => [
                            'Crawls' => [
                                [
                                    'CompletedOn' => <DateTime>,
                                    'ErrorMessage' => '<string>',
                                    'LogGroup' => '<string>',
                                    'LogStream' => '<string>',
                                    'StartedOn' => <DateTime>,
                                    'State' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR',
                                ],
                                // ...
                            ],
                        ],
                        'JobDetails' => [
                            'JobRuns' => [
                                [
                                    'AllocatedCapacity' => <integer>,
                                    'Arguments' => ['<string>', ...],
                                    'Attempt' => <integer>,
                                    'CompletedOn' => <DateTime>,
                                    'DPUSeconds' => <float>,
                                    'ErrorMessage' => '<string>',
                                    'ExecutionClass' => 'FLEX|STANDARD',
                                    'ExecutionTime' => <integer>,
                                    'GlueVersion' => '<string>',
                                    'Id' => '<string>',
                                    'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK',
                                    'JobName' => '<string>',
                                    'JobRunState' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED',
                                    'LastModifiedOn' => <DateTime>,
                                    'LogGroupName' => '<string>',
                                    'MaintenanceWindow' => '<string>',
                                    'MaxCapacity' => <float>,
                                    'NotificationProperty' => [
                                        'NotifyDelayAfter' => <integer>,
                                    ],
                                    'NumberOfWorkers' => <integer>,
                                    'PredecessorRuns' => [
                                        [
                                            'JobName' => '<string>',
                                            'RunId' => '<string>',
                                        ],
                                        // ...
                                    ],
                                    'PreviousRunId' => '<string>',
                                    'SecurityConfiguration' => '<string>',
                                    'StartedOn' => <DateTime>,
                                    'Timeout' => <integer>,
                                    'TriggerName' => '<string>',
                                    'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X',
                                ],
                                // ...
                            ],
                        ],
                        'Name' => '<string>',
                        'TriggerDetails' => [
                            'Trigger' => [
                                'Actions' => [
                                    [
                                        'Arguments' => ['<string>', ...],
                                        'CrawlerName' => '<string>',
                                        'JobName' => '<string>',
                                        'NotificationProperty' => [
                                            'NotifyDelayAfter' => <integer>,
                                        ],
                                        'SecurityConfiguration' => '<string>',
                                        'Timeout' => <integer>,
                                    ],
                                    // ...
                                ],
                                'Description' => '<string>',
                                'EventBatchingCondition' => [
                                    'BatchSize' => <integer>,
                                    'BatchWindow' => <integer>,
                                ],
                                'Id' => '<string>',
                                'Name' => '<string>',
                                'Predicate' => [
                                    'Conditions' => [
                                        [
                                            'CrawlState' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR',
                                            'CrawlerName' => '<string>',
                                            'JobName' => '<string>',
                                            'LogicalOperator' => 'EQUALS',
                                            'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED',
                                        ],
                                        // ...
                                    ],
                                    'Logical' => 'AND|ANY',
                                ],
                                'Schedule' => '<string>',
                                'State' => 'CREATING|CREATED|ACTIVATING|ACTIVATED|DEACTIVATING|DEACTIVATED|DELETING|UPDATING',
                                'Type' => 'SCHEDULED|CONDITIONAL|ON_DEMAND|EVENT',
                                'WorkflowName' => '<string>',
                            ],
                        ],
                        'Type' => 'CRAWLER|JOB|TRIGGER',
                        'UniqueId' => '<string>',
                    ],
                    // ...
                ],
            ],
            'Name' => '<string>',
            'PreviousRunId' => '<string>',
            'StartedOn' => <DateTime>,
            'StartingEventBatchCondition' => [
                'BatchSize' => <integer>,
                'BatchWindow' => <integer>,
            ],
            'Statistics' => [
                'ErroredActions' => <integer>,
                'FailedActions' => <integer>,
                'RunningActions' => <integer>,
                'StoppedActions' => <integer>,
                'SucceededActions' => <integer>,
                'TimeoutActions' => <integer>,
                'TotalActions' => <integer>,
                'WaitingActions' => <integer>,
            ],
            'Status' => 'RUNNING|COMPLETED|STOPPING|STOPPED|ERROR',
            'WorkflowRunId' => '<string>',
            'WorkflowRunProperties' => ['<string>', ...],
        ],
        'MaxConcurrentRuns' => <integer>,
        'Name' => '<string>',
    ],
]

Result Details

Members
Workflow
Type: Workflow structure

The resource metadata for the workflow.

Errors

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GetWorkflowRun

$result = $client->getWorkflowRun([/* ... */]);
$promise = $client->getWorkflowRunAsync([/* ... */]);

Retrieves the metadata for a given workflow run.

Parameter Syntax

$result = $client->getWorkflowRun([
    'IncludeGraph' => true || false,
    'Name' => '<string>', // REQUIRED
    'RunId' => '<string>', // REQUIRED
]);

Parameter Details

Members
IncludeGraph
Type: boolean

Specifies whether to include the workflow graph in response or not.

Name
Required: Yes
Type: string

Name of the workflow being run.

RunId
Required: Yes
Type: string

The ID of the workflow run.

Result Syntax

[
    'Run' => [
        'CompletedOn' => <DateTime>,
        'ErrorMessage' => '<string>',
        'Graph' => [
            'Edges' => [
                [
                    'DestinationId' => '<string>',
                    'SourceId' => '<string>',
                ],
                // ...
            ],
            'Nodes' => [
                [
                    'CrawlerDetails' => [
                        'Crawls' => [
                            [
                                'CompletedOn' => <DateTime>,
                                'ErrorMessage' => '<string>',
                                'LogGroup' => '<string>',
                                'LogStream' => '<string>',
                                'StartedOn' => <DateTime>,
                                'State' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR',
                            ],
                            // ...
                        ],
                    ],
                    'JobDetails' => [
                        'JobRuns' => [
                            [
                                'AllocatedCapacity' => <integer>,
                                'Arguments' => ['<string>', ...],
                                'Attempt' => <integer>,
                                'CompletedOn' => <DateTime>,
                                'DPUSeconds' => <float>,
                                'ErrorMessage' => '<string>',
                                'ExecutionClass' => 'FLEX|STANDARD',
                                'ExecutionTime' => <integer>,
                                'GlueVersion' => '<string>',
                                'Id' => '<string>',
                                'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK',
                                'JobName' => '<string>',
                                'JobRunState' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED',
                                'LastModifiedOn' => <DateTime>,
                                'LogGroupName' => '<string>',
                                'MaintenanceWindow' => '<string>',
                                'MaxCapacity' => <float>,
                                'NotificationProperty' => [
                                    'NotifyDelayAfter' => <integer>,
                                ],
                                'NumberOfWorkers' => <integer>,
                                'PredecessorRuns' => [
                                    [
                                        'JobName' => '<string>',
                                        'RunId' => '<string>',
                                    ],
                                    // ...
                                ],
                                'PreviousRunId' => '<string>',
                                'SecurityConfiguration' => '<string>',
                                'StartedOn' => <DateTime>,
                                'Timeout' => <integer>,
                                'TriggerName' => '<string>',
                                'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X',
                            ],
                            // ...
                        ],
                    ],
                    'Name' => '<string>',
                    'TriggerDetails' => [
                        'Trigger' => [
                            'Actions' => [
                                [
                                    'Arguments' => ['<string>', ...],
                                    'CrawlerName' => '<string>',
                                    'JobName' => '<string>',
                                    'NotificationProperty' => [
                                        'NotifyDelayAfter' => <integer>,
                                    ],
                                    'SecurityConfiguration' => '<string>',
                                    'Timeout' => <integer>,
                                ],
                                // ...
                            ],
                            'Description' => '<string>',
                            'EventBatchingCondition' => [
                                'BatchSize' => <integer>,
                                'BatchWindow' => <integer>,
                            ],
                            'Id' => '<string>',
                            'Name' => '<string>',
                            'Predicate' => [
                                'Conditions' => [
                                    [
                                        'CrawlState' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR',
                                        'CrawlerName' => '<string>',
                                        'JobName' => '<string>',
                                        'LogicalOperator' => 'EQUALS',
                                        'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED',
                                    ],
                                    // ...
                                ],
                                'Logical' => 'AND|ANY',
                            ],
                            'Schedule' => '<string>',
                            'State' => 'CREATING|CREATED|ACTIVATING|ACTIVATED|DEACTIVATING|DEACTIVATED|DELETING|UPDATING',
                            'Type' => 'SCHEDULED|CONDITIONAL|ON_DEMAND|EVENT',
                            'WorkflowName' => '<string>',
                        ],
                    ],
                    'Type' => 'CRAWLER|JOB|TRIGGER',
                    'UniqueId' => '<string>',
                ],
                // ...
            ],
        ],
        'Name' => '<string>',
        'PreviousRunId' => '<string>',
        'StartedOn' => <DateTime>,
        'StartingEventBatchCondition' => [
            'BatchSize' => <integer>,
            'BatchWindow' => <integer>,
        ],
        'Statistics' => [
            'ErroredActions' => <integer>,
            'FailedActions' => <integer>,
            'RunningActions' => <integer>,
            'StoppedActions' => <integer>,
            'SucceededActions' => <integer>,
            'TimeoutActions' => <integer>,
            'TotalActions' => <integer>,
            'WaitingActions' => <integer>,
        ],
        'Status' => 'RUNNING|COMPLETED|STOPPING|STOPPED|ERROR',
        'WorkflowRunId' => '<string>',
        'WorkflowRunProperties' => ['<string>', ...],
    ],
]

Result Details

Members
Run
Type: WorkflowRun structure

The requested workflow run metadata.

Errors

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GetWorkflowRunProperties

$result = $client->getWorkflowRunProperties([/* ... */]);
$promise = $client->getWorkflowRunPropertiesAsync([/* ... */]);

Retrieves the workflow run properties which were set during the run.

Parameter Syntax

$result = $client->getWorkflowRunProperties([
    'Name' => '<string>', // REQUIRED
    'RunId' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

Name of the workflow which was run.

RunId
Required: Yes
Type: string

The ID of the workflow run whose run properties should be returned.

Result Syntax

[
    'RunProperties' => ['<string>', ...],
]

Result Details

Members
RunProperties
Type: Associative array of custom strings keys (IdString) to strings

The workflow run properties which were set during the specified run.

Errors

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GetWorkflowRuns

$result = $client->getWorkflowRuns([/* ... */]);
$promise = $client->getWorkflowRunsAsync([/* ... */]);

Retrieves metadata for all runs of a given workflow.

Parameter Syntax

$result = $client->getWorkflowRuns([
    'IncludeGraph' => true || false,
    'MaxResults' => <integer>,
    'Name' => '<string>', // REQUIRED
    'NextToken' => '<string>',
]);

Parameter Details

Members
IncludeGraph
Type: boolean

Specifies whether to include the workflow graph in response or not.

MaxResults
Type: int

The maximum number of workflow runs to be included in the response.

Name
Required: Yes
Type: string

Name of the workflow whose metadata of runs should be returned.

NextToken
Type: string

The maximum size of the response.

Result Syntax

[
    'NextToken' => '<string>',
    'Runs' => [
        [
            'CompletedOn' => <DateTime>,
            'ErrorMessage' => '<string>',
            'Graph' => [
                'Edges' => [
                    [
                        'DestinationId' => '<string>',
                        'SourceId' => '<string>',
                    ],
                    // ...
                ],
                'Nodes' => [
                    [
                        'CrawlerDetails' => [
                            'Crawls' => [
                                [
                                    'CompletedOn' => <DateTime>,
                                    'ErrorMessage' => '<string>',
                                    'LogGroup' => '<string>',
                                    'LogStream' => '<string>',
                                    'StartedOn' => <DateTime>,
                                    'State' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR',
                                ],
                                // ...
                            ],
                        ],
                        'JobDetails' => [
                            'JobRuns' => [
                                [
                                    'AllocatedCapacity' => <integer>,
                                    'Arguments' => ['<string>', ...],
                                    'Attempt' => <integer>,
                                    'CompletedOn' => <DateTime>,
                                    'DPUSeconds' => <float>,
                                    'ErrorMessage' => '<string>',
                                    'ExecutionClass' => 'FLEX|STANDARD',
                                    'ExecutionTime' => <integer>,
                                    'GlueVersion' => '<string>',
                                    'Id' => '<string>',
                                    'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK',
                                    'JobName' => '<string>',
                                    'JobRunState' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED',
                                    'LastModifiedOn' => <DateTime>,
                                    'LogGroupName' => '<string>',
                                    'MaintenanceWindow' => '<string>',
                                    'MaxCapacity' => <float>,
                                    'NotificationProperty' => [
                                        'NotifyDelayAfter' => <integer>,
                                    ],
                                    'NumberOfWorkers' => <integer>,
                                    'PredecessorRuns' => [
                                        [
                                            'JobName' => '<string>',
                                            'RunId' => '<string>',
                                        ],
                                        // ...
                                    ],
                                    'PreviousRunId' => '<string>',
                                    'SecurityConfiguration' => '<string>',
                                    'StartedOn' => <DateTime>,
                                    'Timeout' => <integer>,
                                    'TriggerName' => '<string>',
                                    'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X',
                                ],
                                // ...
                            ],
                        ],
                        'Name' => '<string>',
                        'TriggerDetails' => [
                            'Trigger' => [
                                'Actions' => [
                                    [
                                        'Arguments' => ['<string>', ...],
                                        'CrawlerName' => '<string>',
                                        'JobName' => '<string>',
                                        'NotificationProperty' => [
                                            'NotifyDelayAfter' => <integer>,
                                        ],
                                        'SecurityConfiguration' => '<string>',
                                        'Timeout' => <integer>,
                                    ],
                                    // ...
                                ],
                                'Description' => '<string>',
                                'EventBatchingCondition' => [
                                    'BatchSize' => <integer>,
                                    'BatchWindow' => <integer>,
                                ],
                                'Id' => '<string>',
                                'Name' => '<string>',
                                'Predicate' => [
                                    'Conditions' => [
                                        [
                                            'CrawlState' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR',
                                            'CrawlerName' => '<string>',
                                            'JobName' => '<string>',
                                            'LogicalOperator' => 'EQUALS',
                                            'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED',
                                        ],
                                        // ...
                                    ],
                                    'Logical' => 'AND|ANY',
                                ],
                                'Schedule' => '<string>',
                                'State' => 'CREATING|CREATED|ACTIVATING|ACTIVATED|DEACTIVATING|DEACTIVATED|DELETING|UPDATING',
                                'Type' => 'SCHEDULED|CONDITIONAL|ON_DEMAND|EVENT',
                                'WorkflowName' => '<string>',
                            ],
                        ],
                        'Type' => 'CRAWLER|JOB|TRIGGER',
                        'UniqueId' => '<string>',
                    ],
                    // ...
                ],
            ],
            'Name' => '<string>',
            'PreviousRunId' => '<string>',
            'StartedOn' => <DateTime>,
            'StartingEventBatchCondition' => [
                'BatchSize' => <integer>,
                'BatchWindow' => <integer>,
            ],
            'Statistics' => [
                'ErroredActions' => <integer>,
                'FailedActions' => <integer>,
                'RunningActions' => <integer>,
                'StoppedActions' => <integer>,
                'SucceededActions' => <integer>,
                'TimeoutActions' => <integer>,
                'TotalActions' => <integer>,
                'WaitingActions' => <integer>,
            ],
            'Status' => 'RUNNING|COMPLETED|STOPPING|STOPPED|ERROR',
            'WorkflowRunId' => '<string>',
            'WorkflowRunProperties' => ['<string>', ...],
        ],
        // ...
    ],
]

Result Details

Members
NextToken
Type: string

A continuation token, if not all requested workflow runs have been returned.

Runs
Type: Array of WorkflowRun structures

A list of workflow run metadata objects.

Errors

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

ImportCatalogToGlue

$result = $client->importCatalogToGlue([/* ... */]);
$promise = $client->importCatalogToGlueAsync([/* ... */]);

Imports an existing Amazon Athena Data Catalog to Glue.

Parameter Syntax

$result = $client->importCatalogToGlue([
    'CatalogId' => '<string>',
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the catalog to import. Currently, this should be the Amazon Web Services account ID.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

ListBlueprints

$result = $client->listBlueprints([/* ... */]);
$promise = $client->listBlueprintsAsync([/* ... */]);

Lists all the blueprint names in an account.

Parameter Syntax

$result = $client->listBlueprints([
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
    'Tags' => ['<string>', ...],
]);

Parameter Details

Members
MaxResults
Type: int

The maximum size of a list to return.

NextToken
Type: string

A continuation token, if this is a continuation request.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

Filters the list by an Amazon Web Services resource tag.

Result Syntax

[
    'Blueprints' => ['<string>', ...],
    'NextToken' => '<string>',
]

Result Details

Members
Blueprints
Type: Array of strings

List of names of blueprints in the account.

NextToken
Type: string

A continuation token, if not all blueprint names have been returned.

Errors

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

ListColumnStatisticsTaskRuns

$result = $client->listColumnStatisticsTaskRuns([/* ... */]);
$promise = $client->listColumnStatisticsTaskRunsAsync([/* ... */]);

List all task runs for a particular account.

Parameter Syntax

$result = $client->listColumnStatisticsTaskRuns([
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
]);

Parameter Details

Members
MaxResults
Type: int

The maximum size of the response.

NextToken
Type: string

A continuation token, if this is a continuation call.

Result Syntax

[
    'ColumnStatisticsTaskRunIds' => ['<string>', ...],
    'NextToken' => '<string>',
]

Result Details

Members
ColumnStatisticsTaskRunIds
Type: Array of strings

A list of column statistics task run IDs.

NextToken
Type: string

A continuation token, if not all task run IDs have yet been returned.

Errors

OperationTimeoutException:

The operation timed out.

ListCrawlers

$result = $client->listCrawlers([/* ... */]);
$promise = $client->listCrawlersAsync([/* ... */]);

Retrieves the names of all crawler resources in this Amazon Web Services account, or the resources with the specified tag. This operation allows you to see which resources are available in your account, and their names.

This operation takes the optional Tags field, which you can use as a filter on the response so that tagged resources can be retrieved as a group. If you choose to use tags filtering, only resources with the tag are retrieved.

Parameter Syntax

$result = $client->listCrawlers([
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
    'Tags' => ['<string>', ...],
]);

Parameter Details

Members
MaxResults
Type: int

The maximum size of a list to return.

NextToken
Type: string

A continuation token, if this is a continuation request.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

Specifies to return only these tagged resources.

Result Syntax

[
    'CrawlerNames' => ['<string>', ...],
    'NextToken' => '<string>',
]

Result Details

Members
CrawlerNames
Type: Array of strings

The names of all crawlers in the account, or the crawlers with the specified tags.

NextToken
Type: string

A continuation token, if the returned list does not contain the last metric available.

Errors

OperationTimeoutException:

The operation timed out.

ListCrawls

$result = $client->listCrawls([/* ... */]);
$promise = $client->listCrawlsAsync([/* ... */]);

Returns all the crawls of a specified crawler. Returns only the crawls that have occurred since the launch date of the crawler history feature, and only retains up to 12 months of crawls. Older crawls will not be returned.

You may use this API to:

  • Retrive all the crawls of a specified crawler.

  • Retrieve all the crawls of a specified crawler within a limited count.

  • Retrieve all the crawls of a specified crawler in a specific time range.

  • Retrieve all the crawls of a specified crawler with a particular state, crawl ID, or DPU hour value.

Parameter Syntax

$result = $client->listCrawls([
    'CrawlerName' => '<string>', // REQUIRED
    'Filters' => [
        [
            'FieldName' => 'CRAWL_ID|STATE|START_TIME|END_TIME|DPU_HOUR',
            'FieldValue' => '<string>',
            'FilterOperator' => 'GT|GE|LT|LE|EQ|NE',
        ],
        // ...
    ],
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
]);

Parameter Details

Members
CrawlerName
Required: Yes
Type: string

The name of the crawler whose runs you want to retrieve.

Filters
Type: Array of CrawlsFilter structures

Filters the crawls by the criteria you specify in a list of CrawlsFilter objects.

MaxResults
Type: int

The maximum number of results to return. The default is 20, and maximum is 100.

NextToken
Type: string

A continuation token, if this is a continuation call.

Result Syntax

[
    'Crawls' => [
        [
            'CrawlId' => '<string>',
            'DPUHour' => <float>,
            'EndTime' => <DateTime>,
            'ErrorMessage' => '<string>',
            'LogGroup' => '<string>',
            'LogStream' => '<string>',
            'MessagePrefix' => '<string>',
            'StartTime' => <DateTime>,
            'State' => 'RUNNING|COMPLETED|FAILED|STOPPED',
            'Summary' => '<string>',
        ],
        // ...
    ],
    'NextToken' => '<string>',
]

Result Details

Members
Crawls
Type: Array of CrawlerHistory structures

A list of CrawlerHistory objects representing the crawl runs that meet your criteria.

NextToken
Type: string

A continuation token for paginating the returned list of tokens, returned if the current segment of the list is not the last.

Errors

EntityNotFoundException:

A specified entity does not exist

OperationTimeoutException:

The operation timed out.

InvalidInputException:

The input provided was not valid.

ListCustomEntityTypes

$result = $client->listCustomEntityTypes([/* ... */]);
$promise = $client->listCustomEntityTypesAsync([/* ... */]);

Lists all the custom patterns that have been created.

Parameter Syntax

$result = $client->listCustomEntityTypes([
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
    'Tags' => ['<string>', ...],
]);

Parameter Details

Members
MaxResults
Type: int

The maximum number of results to return.

NextToken
Type: string

A paginated token to offset the results.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

A list of key-value pair tags.

Result Syntax

[
    'CustomEntityTypes' => [
        [
            'ContextWords' => ['<string>', ...],
            'Name' => '<string>',
            'RegexString' => '<string>',
        ],
        // ...
    ],
    'NextToken' => '<string>',
]

Result Details

Members
CustomEntityTypes
Type: Array of CustomEntityType structures

A list of CustomEntityType objects representing custom patterns.

NextToken
Type: string

A pagination token, if more results are available.

Errors

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

ListDataQualityResults

$result = $client->listDataQualityResults([/* ... */]);
$promise = $client->listDataQualityResultsAsync([/* ... */]);

Returns all data quality execution results for your account.

Parameter Syntax

$result = $client->listDataQualityResults([
    'Filter' => [
        'DataSource' => [
            'GlueTable' => [ // REQUIRED
                'AdditionalOptions' => ['<string>', ...],
                'CatalogId' => '<string>',
                'ConnectionName' => '<string>',
                'DatabaseName' => '<string>', // REQUIRED
                'TableName' => '<string>', // REQUIRED
            ],
        ],
        'JobName' => '<string>',
        'JobRunId' => '<string>',
        'StartedAfter' => <integer || string || DateTime>,
        'StartedBefore' => <integer || string || DateTime>,
    ],
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
]);

Parameter Details

Members
Filter

The filter criteria.

MaxResults
Type: int

The maximum number of results to return.

NextToken
Type: string

A paginated token to offset the results.

Result Syntax

[
    'NextToken' => '<string>',
    'Results' => [
        [
            'DataSource' => [
                'GlueTable' => [
                    'AdditionalOptions' => ['<string>', ...],
                    'CatalogId' => '<string>',
                    'ConnectionName' => '<string>',
                    'DatabaseName' => '<string>',
                    'TableName' => '<string>',
                ],
            ],
            'JobName' => '<string>',
            'JobRunId' => '<string>',
            'ResultId' => '<string>',
            'StartedOn' => <DateTime>,
        ],
        // ...
    ],
]

Result Details

Members
NextToken
Type: string

A pagination token, if more results are available.

Results
Required: Yes
Type: Array of DataQualityResultDescription structures

A list of DataQualityResultDescription objects.

Errors

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

ListDataQualityRuleRecommendationRuns

$result = $client->listDataQualityRuleRecommendationRuns([/* ... */]);
$promise = $client->listDataQualityRuleRecommendationRunsAsync([/* ... */]);

Lists the recommendation runs meeting the filter criteria.

Parameter Syntax

$result = $client->listDataQualityRuleRecommendationRuns([
    'Filter' => [
        'DataSource' => [ // REQUIRED
            'GlueTable' => [ // REQUIRED
                'AdditionalOptions' => ['<string>', ...],
                'CatalogId' => '<string>',
                'ConnectionName' => '<string>',
                'DatabaseName' => '<string>', // REQUIRED
                'TableName' => '<string>', // REQUIRED
            ],
        ],
        'StartedAfter' => <integer || string || DateTime>,
        'StartedBefore' => <integer || string || DateTime>,
    ],
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
]);

Parameter Details

Members
Filter

The filter criteria.

MaxResults
Type: int

The maximum number of results to return.

NextToken
Type: string

A paginated token to offset the results.

Result Syntax

[
    'NextToken' => '<string>',
    'Runs' => [
        [
            'DataSource' => [
                'GlueTable' => [
                    'AdditionalOptions' => ['<string>', ...],
                    'CatalogId' => '<string>',
                    'ConnectionName' => '<string>',
                    'DatabaseName' => '<string>',
                    'TableName' => '<string>',
                ],
            ],
            'RunId' => '<string>',
            'StartedOn' => <DateTime>,
            'Status' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT',
        ],
        // ...
    ],
]

Result Details

Members
NextToken
Type: string

A pagination token, if more results are available.

Runs

A list of DataQualityRuleRecommendationRunDescription objects.

Errors

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

ListDataQualityRulesetEvaluationRuns

$result = $client->listDataQualityRulesetEvaluationRuns([/* ... */]);
$promise = $client->listDataQualityRulesetEvaluationRunsAsync([/* ... */]);

Lists all the runs meeting the filter criteria, where a ruleset is evaluated against a data source.

Parameter Syntax

$result = $client->listDataQualityRulesetEvaluationRuns([
    'Filter' => [
        'DataSource' => [ // REQUIRED
            'GlueTable' => [ // REQUIRED
                'AdditionalOptions' => ['<string>', ...],
                'CatalogId' => '<string>',
                'ConnectionName' => '<string>',
                'DatabaseName' => '<string>', // REQUIRED
                'TableName' => '<string>', // REQUIRED
            ],
        ],
        'StartedAfter' => <integer || string || DateTime>,
        'StartedBefore' => <integer || string || DateTime>,
    ],
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
]);

Parameter Details

Members
Filter

The filter criteria.

MaxResults
Type: int

The maximum number of results to return.

NextToken
Type: string

A paginated token to offset the results.

Result Syntax

[
    'NextToken' => '<string>',
    'Runs' => [
        [
            'DataSource' => [
                'GlueTable' => [
                    'AdditionalOptions' => ['<string>', ...],
                    'CatalogId' => '<string>',
                    'ConnectionName' => '<string>',
                    'DatabaseName' => '<string>',
                    'TableName' => '<string>',
                ],
            ],
            'RunId' => '<string>',
            'StartedOn' => <DateTime>,
            'Status' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT',
        ],
        // ...
    ],
]

Result Details

Members
NextToken
Type: string

A pagination token, if more results are available.

Runs
Type: Array of DataQualityRulesetEvaluationRunDescription structures

A list of DataQualityRulesetEvaluationRunDescription objects representing data quality ruleset runs.

Errors

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

ListDataQualityRulesets

$result = $client->listDataQualityRulesets([/* ... */]);
$promise = $client->listDataQualityRulesetsAsync([/* ... */]);

Returns a paginated list of rulesets for the specified list of Glue tables.

Parameter Syntax

$result = $client->listDataQualityRulesets([
    'Filter' => [
        'CreatedAfter' => <integer || string || DateTime>,
        'CreatedBefore' => <integer || string || DateTime>,
        'Description' => '<string>',
        'LastModifiedAfter' => <integer || string || DateTime>,
        'LastModifiedBefore' => <integer || string || DateTime>,
        'Name' => '<string>',
        'TargetTable' => [
            'CatalogId' => '<string>',
            'DatabaseName' => '<string>', // REQUIRED
            'TableName' => '<string>', // REQUIRED
        ],
    ],
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
    'Tags' => ['<string>', ...],
]);

Parameter Details

Members
Filter

The filter criteria.

MaxResults
Type: int

The maximum number of results to return.

NextToken
Type: string

A paginated token to offset the results.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

A list of key-value pair tags.

Result Syntax

[
    'NextToken' => '<string>',
    'Rulesets' => [
        [
            'CreatedOn' => <DateTime>,
            'Description' => '<string>',
            'LastModifiedOn' => <DateTime>,
            'Name' => '<string>',
            'RecommendationRunId' => '<string>',
            'RuleCount' => <integer>,
            'TargetTable' => [
                'CatalogId' => '<string>',
                'DatabaseName' => '<string>',
                'TableName' => '<string>',
            ],
        ],
        // ...
    ],
]

Result Details

Members
NextToken
Type: string

A pagination token, if more results are available.

Rulesets
Type: Array of DataQualityRulesetListDetails structures

A paginated list of rulesets for the specified list of Glue tables.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

ListDevEndpoints

$result = $client->listDevEndpoints([/* ... */]);
$promise = $client->listDevEndpointsAsync([/* ... */]);

Retrieves the names of all DevEndpoint resources in this Amazon Web Services account, or the resources with the specified tag. This operation allows you to see which resources are available in your account, and their names.

This operation takes the optional Tags field, which you can use as a filter on the response so that tagged resources can be retrieved as a group. If you choose to use tags filtering, only resources with the tag are retrieved.

Parameter Syntax

$result = $client->listDevEndpoints([
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
    'Tags' => ['<string>', ...],
]);

Parameter Details

Members
MaxResults
Type: int

The maximum size of a list to return.

NextToken
Type: string

A continuation token, if this is a continuation request.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

Specifies to return only these tagged resources.

Result Syntax

[
    'DevEndpointNames' => ['<string>', ...],
    'NextToken' => '<string>',
]

Result Details

Members
DevEndpointNames
Type: Array of strings

The names of all the DevEndpoints in the account, or the DevEndpoints with the specified tags.

NextToken
Type: string

A continuation token, if the returned list does not contain the last metric available.

Errors

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

ListJobs

$result = $client->listJobs([/* ... */]);
$promise = $client->listJobsAsync([/* ... */]);

Retrieves the names of all job resources in this Amazon Web Services account, or the resources with the specified tag. This operation allows you to see which resources are available in your account, and their names.

This operation takes the optional Tags field, which you can use as a filter on the response so that tagged resources can be retrieved as a group. If you choose to use tags filtering, only resources with the tag are retrieved.

Parameter Syntax

$result = $client->listJobs([
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
    'Tags' => ['<string>', ...],
]);

Parameter Details

Members
MaxResults
Type: int

The maximum size of a list to return.

NextToken
Type: string

A continuation token, if this is a continuation request.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

Specifies to return only these tagged resources.

Result Syntax

[
    'JobNames' => ['<string>', ...],
    'NextToken' => '<string>',
]

Result Details

Members
JobNames
Type: Array of strings

The names of all jobs in the account, or the jobs with the specified tags.

NextToken
Type: string

A continuation token, if the returned list does not contain the last metric available.

Errors

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

ListMLTransforms

$result = $client->listMLTransforms([/* ... */]);
$promise = $client->listMLTransformsAsync([/* ... */]);

Retrieves a sortable, filterable list of existing Glue machine learning transforms in this Amazon Web Services account, or the resources with the specified tag. This operation takes the optional Tags field, which you can use as a filter of the responses so that tagged resources can be retrieved as a group. If you choose to use tag filtering, only resources with the tags are retrieved.

Parameter Syntax

$result = $client->listMLTransforms([
    'Filter' => [
        'CreatedAfter' => <integer || string || DateTime>,
        'CreatedBefore' => <integer || string || DateTime>,
        'GlueVersion' => '<string>',
        'LastModifiedAfter' => <integer || string || DateTime>,
        'LastModifiedBefore' => <integer || string || DateTime>,
        'Name' => '<string>',
        'Schema' => [
            [
                'DataType' => '<string>',
                'Name' => '<string>',
            ],
            // ...
        ],
        'Status' => 'NOT_READY|READY|DELETING',
        'TransformType' => 'FIND_MATCHES',
    ],
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
    'Sort' => [
        'Column' => 'NAME|TRANSFORM_TYPE|STATUS|CREATED|LAST_MODIFIED', // REQUIRED
        'SortDirection' => 'DESCENDING|ASCENDING', // REQUIRED
    ],
    'Tags' => ['<string>', ...],
]);

Parameter Details

Members
Filter
Type: TransformFilterCriteria structure

A TransformFilterCriteria used to filter the machine learning transforms.

MaxResults
Type: int

The maximum size of a list to return.

NextToken
Type: string

A continuation token, if this is a continuation request.

Sort
Type: TransformSortCriteria structure

A TransformSortCriteria used to sort the machine learning transforms.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

Specifies to return only these tagged resources.

Result Syntax

[
    'NextToken' => '<string>',
    'TransformIds' => ['<string>', ...],
]

Result Details

Members
NextToken
Type: string

A continuation token, if the returned list does not contain the last metric available.

TransformIds
Required: Yes
Type: Array of strings

The identifiers of all the machine learning transforms in the account, or the machine learning transforms with the specified tags.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

ListRegistries

$result = $client->listRegistries([/* ... */]);
$promise = $client->listRegistriesAsync([/* ... */]);

Returns a list of registries that you have created, with minimal registry information. Registries in the Deleting status will not be included in the results. Empty results will be returned if there are no registries available.

Parameter Syntax

$result = $client->listRegistries([
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
]);

Parameter Details

Members
MaxResults
Type: int

Maximum number of results required per page. If the value is not supplied, this will be defaulted to 25 per page.

NextToken
Type: string

A continuation token, if this is a continuation call.

Result Syntax

[
    'NextToken' => '<string>',
    'Registries' => [
        [
            'CreatedTime' => '<string>',
            'Description' => '<string>',
            'RegistryArn' => '<string>',
            'RegistryName' => '<string>',
            'Status' => 'AVAILABLE|DELETING',
            'UpdatedTime' => '<string>',
        ],
        // ...
    ],
]

Result Details

Members
NextToken
Type: string

A continuation token for paginating the returned list of tokens, returned if the current segment of the list is not the last.

Registries
Type: Array of RegistryListItem structures

An array of RegistryDetailedListItem objects containing minimal details of each registry.

Errors

InvalidInputException:

The input provided was not valid.

AccessDeniedException:

Access to a resource was denied.

InternalServiceException:

An internal service error occurred.

ListSchemaVersions

$result = $client->listSchemaVersions([/* ... */]);
$promise = $client->listSchemaVersionsAsync([/* ... */]);

Returns a list of schema versions that you have created, with minimal information. Schema versions in Deleted status will not be included in the results. Empty results will be returned if there are no schema versions available.

Parameter Syntax

$result = $client->listSchemaVersions([
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
    'SchemaId' => [ // REQUIRED
        'RegistryName' => '<string>',
        'SchemaArn' => '<string>',
        'SchemaName' => '<string>',
    ],
]);

Parameter Details

Members
MaxResults
Type: int

Maximum number of results required per page. If the value is not supplied, this will be defaulted to 25 per page.

NextToken
Type: string

A continuation token, if this is a continuation call.

SchemaId
Required: Yes
Type: SchemaId structure

This is a wrapper structure to contain schema identity fields. The structure contains:

  • SchemaId$SchemaArn: The Amazon Resource Name (ARN) of the schema. Either SchemaArn or SchemaName and RegistryName has to be provided.

  • SchemaId$SchemaName: The name of the schema. Either SchemaArn or SchemaName and RegistryName has to be provided.

Result Syntax

[
    'NextToken' => '<string>',
    'Schemas' => [
        [
            'CreatedTime' => '<string>',
            'SchemaArn' => '<string>',
            'SchemaVersionId' => '<string>',
            'Status' => 'AVAILABLE|PENDING|FAILURE|DELETING',
            'VersionNumber' => <integer>,
        ],
        // ...
    ],
]

Result Details

Members
NextToken
Type: string

A continuation token for paginating the returned list of tokens, returned if the current segment of the list is not the last.

Schemas
Type: Array of SchemaVersionListItem structures

An array of SchemaVersionList objects containing details of each schema version.

Errors

InvalidInputException:

The input provided was not valid.

AccessDeniedException:

Access to a resource was denied.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

ListSchemas

$result = $client->listSchemas([/* ... */]);
$promise = $client->listSchemasAsync([/* ... */]);

Returns a list of schemas with minimal details. Schemas in Deleting status will not be included in the results. Empty results will be returned if there are no schemas available.

When the RegistryId is not provided, all the schemas across registries will be part of the API response.

Parameter Syntax

$result = $client->listSchemas([
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
    'RegistryId' => [
        'RegistryArn' => '<string>',
        'RegistryName' => '<string>',
    ],
]);

Parameter Details

Members
MaxResults
Type: int

Maximum number of results required per page. If the value is not supplied, this will be defaulted to 25 per page.

NextToken
Type: string

A continuation token, if this is a continuation call.

RegistryId
Type: RegistryId structure

A wrapper structure that may contain the registry name and Amazon Resource Name (ARN).

Result Syntax

[
    'NextToken' => '<string>',
    'Schemas' => [
        [
            'CreatedTime' => '<string>',
            'Description' => '<string>',
            'RegistryName' => '<string>',
            'SchemaArn' => '<string>',
            'SchemaName' => '<string>',
            'SchemaStatus' => 'AVAILABLE|PENDING|DELETING',
            'UpdatedTime' => '<string>',
        ],
        // ...
    ],
]

Result Details

Members
NextToken
Type: string

A continuation token for paginating the returned list of tokens, returned if the current segment of the list is not the last.

Schemas
Type: Array of SchemaListItem structures

An array of SchemaListItem objects containing details of each schema.

Errors

InvalidInputException:

The input provided was not valid.

AccessDeniedException:

Access to a resource was denied.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

ListSessions

$result = $client->listSessions([/* ... */]);
$promise = $client->listSessionsAsync([/* ... */]);

Retrieve a list of sessions.

Parameter Syntax

$result = $client->listSessions([
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
    'RequestOrigin' => '<string>',
    'Tags' => ['<string>', ...],
]);

Parameter Details

Members
MaxResults
Type: int

The maximum number of results.

NextToken
Type: string

The token for the next set of results, or null if there are no more result.

RequestOrigin
Type: string

The origin of the request.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

Tags belonging to the session.

Result Syntax

[
    'Ids' => ['<string>', ...],
    'NextToken' => '<string>',
    'Sessions' => [
        [
            'Command' => [
                'Name' => '<string>',
                'PythonVersion' => '<string>',
            ],
            'CompletedOn' => <DateTime>,
            'Connections' => [
                'Connections' => ['<string>', ...],
            ],
            'CreatedOn' => <DateTime>,
            'DPUSeconds' => <float>,
            'DefaultArguments' => ['<string>', ...],
            'Description' => '<string>',
            'ErrorMessage' => '<string>',
            'ExecutionTime' => <float>,
            'GlueVersion' => '<string>',
            'Id' => '<string>',
            'IdleTimeout' => <integer>,
            'MaxCapacity' => <float>,
            'NumberOfWorkers' => <integer>,
            'Progress' => <float>,
            'Role' => '<string>',
            'SecurityConfiguration' => '<string>',
            'Status' => 'PROVISIONING|READY|FAILED|TIMEOUT|STOPPING|STOPPED',
            'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X',
        ],
        // ...
    ],
]

Result Details

Members
Ids
Type: Array of strings

Returns the ID of the session.

NextToken
Type: string

The token for the next set of results, or null if there are no more result.

Sessions
Type: Array of Session structures

Returns the session object.

Errors

AccessDeniedException:

Access to a resource was denied.

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

ListStatements

$result = $client->listStatements([/* ... */]);
$promise = $client->listStatementsAsync([/* ... */]);

Lists statements for the session.

Parameter Syntax

$result = $client->listStatements([
    'NextToken' => '<string>',
    'RequestOrigin' => '<string>',
    'SessionId' => '<string>', // REQUIRED
]);

Parameter Details

Members
NextToken
Type: string

A continuation token, if this is a continuation call.

RequestOrigin
Type: string

The origin of the request to list statements.

SessionId
Required: Yes
Type: string

The Session ID of the statements.

Result Syntax

[
    'NextToken' => '<string>',
    'Statements' => [
        [
            'Code' => '<string>',
            'CompletedOn' => <integer>,
            'Id' => <integer>,
            'Output' => [
                'Data' => [
                    'TextPlain' => '<string>',
                ],
                'ErrorName' => '<string>',
                'ErrorValue' => '<string>',
                'ExecutionCount' => <integer>,
                'Status' => 'WAITING|RUNNING|AVAILABLE|CANCELLING|CANCELLED|ERROR',
                'Traceback' => ['<string>', ...],
            ],
            'Progress' => <float>,
            'StartedOn' => <integer>,
            'State' => 'WAITING|RUNNING|AVAILABLE|CANCELLING|CANCELLED|ERROR',
        ],
        // ...
    ],
]

Result Details

Members
NextToken
Type: string

A continuation token, if not all statements have yet been returned.

Statements
Type: Array of Statement structures

Returns the list of statements.

Errors

AccessDeniedException:

Access to a resource was denied.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

InvalidInputException:

The input provided was not valid.

IllegalSessionStateException:

The session is in an invalid state to perform a requested operation.

ListTableOptimizerRuns

$result = $client->listTableOptimizerRuns([/* ... */]);
$promise = $client->listTableOptimizerRunsAsync([/* ... */]);

Lists the history of previous optimizer runs for a specific table.

Parameter Syntax

$result = $client->listTableOptimizerRuns([
    'CatalogId' => '<string>', // REQUIRED
    'DatabaseName' => '<string>', // REQUIRED
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
    'TableName' => '<string>', // REQUIRED
    'Type' => 'compaction', // REQUIRED
]);

Parameter Details

Members
CatalogId
Required: Yes
Type: string

The Catalog ID of the table.

DatabaseName
Required: Yes
Type: string

The name of the database in the catalog in which the table resides.

MaxResults
Type: int

The maximum number of optimizer runs to return on each call.

NextToken
Type: string

A continuation token, if this is a continuation call.

TableName
Required: Yes
Type: string

The name of the table.

Type
Required: Yes
Type: string

The type of table optimizer. Currently, the only valid value is compaction.

Result Syntax

[
    'CatalogId' => '<string>',
    'DatabaseName' => '<string>',
    'NextToken' => '<string>',
    'TableName' => '<string>',
    'TableOptimizerRuns' => [
        [
            'endTimestamp' => <DateTime>,
            'error' => '<string>',
            'eventType' => 'starting|completed|failed|in_progress',
            'metrics' => [
                'JobDurationInHour' => '<string>',
                'NumberOfBytesCompacted' => '<string>',
                'NumberOfDpus' => '<string>',
                'NumberOfFilesCompacted' => '<string>',
            ],
            'startTimestamp' => <DateTime>,
        ],
        // ...
    ],
]

Result Details

Members
CatalogId
Type: string

The Catalog ID of the table.

DatabaseName
Type: string

The name of the database in the catalog in which the table resides.

NextToken
Type: string

A continuation token for paginating the returned list of optimizer runs, returned if the current segment of the list is not the last.

TableName
Type: string

The name of the table.

TableOptimizerRuns
Type: Array of TableOptimizerRun structures

A list of the optimizer runs associated with a table.

Errors

EntityNotFoundException:

A specified entity does not exist

AccessDeniedException:

Access to a resource was denied.

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

ListTriggers

$result = $client->listTriggers([/* ... */]);
$promise = $client->listTriggersAsync([/* ... */]);

Retrieves the names of all trigger resources in this Amazon Web Services account, or the resources with the specified tag. This operation allows you to see which resources are available in your account, and their names.

This operation takes the optional Tags field, which you can use as a filter on the response so that tagged resources can be retrieved as a group. If you choose to use tags filtering, only resources with the tag are retrieved.

Parameter Syntax

$result = $client->listTriggers([
    'DependentJobName' => '<string>',
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
    'Tags' => ['<string>', ...],
]);

Parameter Details

Members
DependentJobName
Type: string

The name of the job for which to retrieve triggers. The trigger that can start this job is returned. If there is no such trigger, all triggers are returned.

MaxResults
Type: int

The maximum size of a list to return.

NextToken
Type: string

A continuation token, if this is a continuation request.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

Specifies to return only these tagged resources.

Result Syntax

[
    'NextToken' => '<string>',
    'TriggerNames' => ['<string>', ...],
]

Result Details

Members
NextToken
Type: string

A continuation token, if the returned list does not contain the last metric available.

TriggerNames
Type: Array of strings

The names of all triggers in the account, or the triggers with the specified tags.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

ListWorkflows

$result = $client->listWorkflows([/* ... */]);
$promise = $client->listWorkflowsAsync([/* ... */]);

Lists names of workflows created in the account.

Parameter Syntax

$result = $client->listWorkflows([
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
]);

Parameter Details

Members
MaxResults
Type: int

The maximum size of a list to return.

NextToken
Type: string

A continuation token, if this is a continuation request.

Result Syntax

[
    'NextToken' => '<string>',
    'Workflows' => ['<string>', ...],
]

Result Details

Members
NextToken
Type: string

A continuation token, if not all workflow names have been returned.

Workflows
Type: Array of strings

List of names of workflows in the account.

Errors

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

PutDataCatalogEncryptionSettings

$result = $client->putDataCatalogEncryptionSettings([/* ... */]);
$promise = $client->putDataCatalogEncryptionSettingsAsync([/* ... */]);

Sets the security configuration for a specified catalog. After the configuration has been set, the specified encryption is applied to every catalog write thereafter.

Parameter Syntax

$result = $client->putDataCatalogEncryptionSettings([
    'CatalogId' => '<string>',
    'DataCatalogEncryptionSettings' => [ // REQUIRED
        'ConnectionPasswordEncryption' => [
            'AwsKmsKeyId' => '<string>',
            'ReturnConnectionPasswordEncrypted' => true || false, // REQUIRED
        ],
        'EncryptionAtRest' => [
            'CatalogEncryptionMode' => 'DISABLED|SSE-KMS|SSE-KMS-WITH-SERVICE-ROLE', // REQUIRED
            'CatalogEncryptionServiceRole' => '<string>',
            'SseAwsKmsKeyId' => '<string>',
        ],
    ],
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog to set the security configuration for. If none is provided, the Amazon Web Services account ID is used by default.

DataCatalogEncryptionSettings
Required: Yes
Type: DataCatalogEncryptionSettings structure

The security configuration to set.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

InternalServiceException:

An internal service error occurred.

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

PutResourcePolicy

$result = $client->putResourcePolicy([/* ... */]);
$promise = $client->putResourcePolicyAsync([/* ... */]);

Sets the Data Catalog resource policy for access control.

Parameter Syntax

$result = $client->putResourcePolicy([
    'EnableHybrid' => 'TRUE|FALSE',
    'PolicyExistsCondition' => 'MUST_EXIST|NOT_EXIST|NONE',
    'PolicyHashCondition' => '<string>',
    'PolicyInJson' => '<string>', // REQUIRED
    'ResourceArn' => '<string>',
]);

Parameter Details

Members
EnableHybrid
Type: string

If 'TRUE', indicates that you are using both methods to grant cross-account access to Data Catalog resources:

  • By directly updating the resource policy with PutResourePolicy

  • By using the Grant permissions command on the Amazon Web Services Management Console.

Must be set to 'TRUE' if you have already used the Management Console to grant cross-account access, otherwise the call fails. Default is 'FALSE'.

PolicyExistsCondition
Type: string

A value of MUST_EXIST is used to update a policy. A value of NOT_EXIST is used to create a new policy. If a value of NONE or a null value is used, the call does not depend on the existence of a policy.

PolicyHashCondition
Type: string

The hash value returned when the previous policy was set using PutResourcePolicy. Its purpose is to prevent concurrent modifications of a policy. Do not use this parameter if no previous policy has been set.

PolicyInJson
Required: Yes
Type: string

Contains the policy document to set, in JSON format.

ResourceArn
Type: string

Do not use. For internal use only.

Result Syntax

[
    'PolicyHash' => '<string>',
]

Result Details

Members
PolicyHash
Type: string

A hash of the policy that has just been set. This must be included in a subsequent call that overwrites or updates this policy.

Errors

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

InvalidInputException:

The input provided was not valid.

ConditionCheckFailureException:

A specified condition was not satisfied.

PutSchemaVersionMetadata

$result = $client->putSchemaVersionMetadata([/* ... */]);
$promise = $client->putSchemaVersionMetadataAsync([/* ... */]);

Puts the metadata key value pair for a specified schema version ID. A maximum of 10 key value pairs will be allowed per schema version. They can be added over one or more calls.

Parameter Syntax

$result = $client->putSchemaVersionMetadata([
    'MetadataKeyValue' => [ // REQUIRED
        'MetadataKey' => '<string>',
        'MetadataValue' => '<string>',
    ],
    'SchemaId' => [
        'RegistryName' => '<string>',
        'SchemaArn' => '<string>',
        'SchemaName' => '<string>',
    ],
    'SchemaVersionId' => '<string>',
    'SchemaVersionNumber' => [
        'LatestVersion' => true || false,
        'VersionNumber' => <integer>,
    ],
]);

Parameter Details

Members
MetadataKeyValue
Required: Yes
Type: MetadataKeyValuePair structure

The metadata key's corresponding value.

SchemaId
Type: SchemaId structure

The unique ID for the schema.

SchemaVersionId
Type: string

The unique version ID of the schema version.

SchemaVersionNumber
Type: SchemaVersionNumber structure

The version number of the schema.

Result Syntax

[
    'LatestVersion' => true || false,
    'MetadataKey' => '<string>',
    'MetadataValue' => '<string>',
    'RegistryName' => '<string>',
    'SchemaArn' => '<string>',
    'SchemaName' => '<string>',
    'SchemaVersionId' => '<string>',
    'VersionNumber' => <integer>,
]

Result Details

Members
LatestVersion
Type: boolean

The latest version of the schema.

MetadataKey
Type: string

The metadata key.

MetadataValue
Type: string

The value of the metadata key.

RegistryName
Type: string

The name for the registry.

SchemaArn
Type: string

The Amazon Resource Name (ARN) for the schema.

SchemaName
Type: string

The name for the schema.

SchemaVersionId
Type: string

The unique version ID of the schema version.

VersionNumber
Type: long (int|float)

The version number of the schema.

Errors

InvalidInputException:

The input provided was not valid.

AccessDeniedException:

Access to a resource was denied.

AlreadyExistsException:

A resource to be created or added already exists.

EntityNotFoundException:

A specified entity does not exist

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

PutWorkflowRunProperties

$result = $client->putWorkflowRunProperties([/* ... */]);
$promise = $client->putWorkflowRunPropertiesAsync([/* ... */]);

Puts the specified workflow run properties for the given workflow run. If a property already exists for the specified run, then it overrides the value otherwise adds the property to existing properties.

Parameter Syntax

$result = $client->putWorkflowRunProperties([
    'Name' => '<string>', // REQUIRED
    'RunId' => '<string>', // REQUIRED
    'RunProperties' => ['<string>', ...], // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

Name of the workflow which was run.

RunId
Required: Yes
Type: string

The ID of the workflow run for which the run properties should be updated.

RunProperties
Required: Yes
Type: Associative array of custom strings keys (IdString) to strings

The properties to put for the specified run.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

AlreadyExistsException:

A resource to be created or added already exists.

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

ConcurrentModificationException:

Two processes are trying to modify a resource simultaneously.

QuerySchemaVersionMetadata

$result = $client->querySchemaVersionMetadata([/* ... */]);
$promise = $client->querySchemaVersionMetadataAsync([/* ... */]);

Queries for the schema version metadata information.

Parameter Syntax

$result = $client->querySchemaVersionMetadata([
    'MaxResults' => <integer>,
    'MetadataList' => [
        [
            'MetadataKey' => '<string>',
            'MetadataValue' => '<string>',
        ],
        // ...
    ],
    'NextToken' => '<string>',
    'SchemaId' => [
        'RegistryName' => '<string>',
        'SchemaArn' => '<string>',
        'SchemaName' => '<string>',
    ],
    'SchemaVersionId' => '<string>',
    'SchemaVersionNumber' => [
        'LatestVersion' => true || false,
        'VersionNumber' => <integer>,
    ],
]);

Parameter Details

Members
MaxResults
Type: int

Maximum number of results required per page. If the value is not supplied, this will be defaulted to 25 per page.

MetadataList
Type: Array of MetadataKeyValuePair structures

Search key-value pairs for metadata, if they are not provided all the metadata information will be fetched.

NextToken
Type: string

A continuation token, if this is a continuation call.

SchemaId
Type: SchemaId structure

A wrapper structure that may contain the schema name and Amazon Resource Name (ARN).

SchemaVersionId
Type: string

The unique version ID of the schema version.

SchemaVersionNumber
Type: SchemaVersionNumber structure

The version number of the schema.

Result Syntax

[
    'MetadataInfoMap' => [
        '<MetadataKeyString>' => [
            'CreatedTime' => '<string>',
            'MetadataValue' => '<string>',
            'OtherMetadataValueList' => [
                [
                    'CreatedTime' => '<string>',
                    'MetadataValue' => '<string>',
                ],
                // ...
            ],
        ],
        // ...
    ],
    'NextToken' => '<string>',
    'SchemaVersionId' => '<string>',
]

Result Details

Members
MetadataInfoMap
Type: Associative array of custom strings keys (MetadataKeyString) to MetadataInfo structures

A map of a metadata key and associated values.

NextToken
Type: string

A continuation token for paginating the returned list of tokens, returned if the current segment of the list is not the last.

SchemaVersionId
Type: string

The unique version ID of the schema version.

Errors

InvalidInputException:

The input provided was not valid.

AccessDeniedException:

Access to a resource was denied.

EntityNotFoundException:

A specified entity does not exist

RegisterSchemaVersion

$result = $client->registerSchemaVersion([/* ... */]);
$promise = $client->registerSchemaVersionAsync([/* ... */]);

Adds a new version to the existing schema. Returns an error if new version of schema does not meet the compatibility requirements of the schema set. This API will not create a new schema set and will return a 404 error if the schema set is not already present in the Schema Registry.

If this is the first schema definition to be registered in the Schema Registry, this API will store the schema version and return immediately. Otherwise, this call has the potential to run longer than other operations due to compatibility modes. You can call the GetSchemaVersion API with the SchemaVersionId to check compatibility modes.

If the same schema definition is already stored in Schema Registry as a version, the schema ID of the existing schema is returned to the caller.

Parameter Syntax

$result = $client->registerSchemaVersion([
    'SchemaDefinition' => '<string>', // REQUIRED
    'SchemaId' => [ // REQUIRED
        'RegistryName' => '<string>',
        'SchemaArn' => '<string>',
        'SchemaName' => '<string>',
    ],
]);

Parameter Details

Members
SchemaDefinition
Required: Yes
Type: string

The schema definition using the DataFormat setting for the SchemaName.

SchemaId
Required: Yes
Type: SchemaId structure

This is a wrapper structure to contain schema identity fields. The structure contains:

  • SchemaId$SchemaArn: The Amazon Resource Name (ARN) of the schema. Either SchemaArn or SchemaName and RegistryName has to be provided.

  • SchemaId$SchemaName: The name of the schema. Either SchemaArn or SchemaName and RegistryName has to be provided.

Result Syntax

[
    'SchemaVersionId' => '<string>',
    'Status' => 'AVAILABLE|PENDING|FAILURE|DELETING',
    'VersionNumber' => <integer>,
]

Result Details

Members
SchemaVersionId
Type: string

The unique ID that represents the version of this schema.

Status
Type: string

The status of the schema version.

VersionNumber
Type: long (int|float)

The version of this schema (for sync flow only, in case this is the first version).

Errors

InvalidInputException:

The input provided was not valid.

AccessDeniedException:

Access to a resource was denied.

EntityNotFoundException:

A specified entity does not exist

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

ConcurrentModificationException:

Two processes are trying to modify a resource simultaneously.

InternalServiceException:

An internal service error occurred.

RemoveSchemaVersionMetadata

$result = $client->removeSchemaVersionMetadata([/* ... */]);
$promise = $client->removeSchemaVersionMetadataAsync([/* ... */]);

Removes a key value pair from the schema version metadata for the specified schema version ID.

Parameter Syntax

$result = $client->removeSchemaVersionMetadata([
    'MetadataKeyValue' => [ // REQUIRED
        'MetadataKey' => '<string>',
        'MetadataValue' => '<string>',
    ],
    'SchemaId' => [
        'RegistryName' => '<string>',
        'SchemaArn' => '<string>',
        'SchemaName' => '<string>',
    ],
    'SchemaVersionId' => '<string>',
    'SchemaVersionNumber' => [
        'LatestVersion' => true || false,
        'VersionNumber' => <integer>,
    ],
]);

Parameter Details

Members
MetadataKeyValue
Required: Yes
Type: MetadataKeyValuePair structure

The value of the metadata key.

SchemaId
Type: SchemaId structure

A wrapper structure that may contain the schema name and Amazon Resource Name (ARN).

SchemaVersionId
Type: string

The unique version ID of the schema version.

SchemaVersionNumber
Type: SchemaVersionNumber structure

The version number of the schema.

Result Syntax

[
    'LatestVersion' => true || false,
    'MetadataKey' => '<string>',
    'MetadataValue' => '<string>',
    'RegistryName' => '<string>',
    'SchemaArn' => '<string>',
    'SchemaName' => '<string>',
    'SchemaVersionId' => '<string>',
    'VersionNumber' => <integer>,
]

Result Details

Members
LatestVersion
Type: boolean

The latest version of the schema.

MetadataKey
Type: string

The metadata key.

MetadataValue
Type: string

The value of the metadata key.

RegistryName
Type: string

The name of the registry.

SchemaArn
Type: string

The Amazon Resource Name (ARN) of the schema.

SchemaName
Type: string

The name of the schema.

SchemaVersionId
Type: string

The version ID for the schema version.

VersionNumber
Type: long (int|float)

The version number of the schema.

Errors

InvalidInputException:

The input provided was not valid.

AccessDeniedException:

Access to a resource was denied.

EntityNotFoundException:

A specified entity does not exist

ResetJobBookmark

$result = $client->resetJobBookmark([/* ... */]);
$promise = $client->resetJobBookmarkAsync([/* ... */]);

Resets a bookmark entry.

For more information about enabling and using job bookmarks, see:

Parameter Syntax

$result = $client->resetJobBookmark([
    'JobName' => '<string>', // REQUIRED
    'RunId' => '<string>',
]);

Parameter Details

Members
JobName
Required: Yes
Type: string

The name of the job in question.

RunId
Type: string

The unique run identifier associated with this job run.

Result Syntax

[
    'JobBookmarkEntry' => [
        'Attempt' => <integer>,
        'JobBookmark' => '<string>',
        'JobName' => '<string>',
        'PreviousRunId' => '<string>',
        'Run' => <integer>,
        'RunId' => '<string>',
        'Version' => <integer>,
    ],
]

Result Details

Members
JobBookmarkEntry
Type: JobBookmarkEntry structure

The reset bookmark entry.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

ResumeWorkflowRun

$result = $client->resumeWorkflowRun([/* ... */]);
$promise = $client->resumeWorkflowRunAsync([/* ... */]);

Restarts selected nodes of a previous partially completed workflow run and resumes the workflow run. The selected nodes and all nodes that are downstream from the selected nodes are run.

Parameter Syntax

$result = $client->resumeWorkflowRun([
    'Name' => '<string>', // REQUIRED
    'NodeIds' => ['<string>', ...], // REQUIRED
    'RunId' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the workflow to resume.

NodeIds
Required: Yes
Type: Array of strings

A list of the node IDs for the nodes you want to restart. The nodes that are to be restarted must have a run attempt in the original run.

RunId
Required: Yes
Type: string

The ID of the workflow run to resume.

Result Syntax

[
    'NodeIds' => ['<string>', ...],
    'RunId' => '<string>',
]

Result Details

Members
NodeIds
Type: Array of strings

A list of the node IDs for the nodes that were actually restarted.

RunId
Type: string

The new ID assigned to the resumed workflow run. Each resume of a workflow run will have a new run ID.

Errors

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

ConcurrentRunsExceededException:

Too many jobs are being run concurrently.

IllegalWorkflowStateException:

The workflow is in an invalid state to perform a requested operation.

RunStatement

$result = $client->runStatement([/* ... */]);
$promise = $client->runStatementAsync([/* ... */]);

Executes the statement.

Parameter Syntax

$result = $client->runStatement([
    'Code' => '<string>', // REQUIRED
    'RequestOrigin' => '<string>',
    'SessionId' => '<string>', // REQUIRED
]);

Parameter Details

Members
Code
Required: Yes
Type: string

The statement code to be run.

RequestOrigin
Type: string

The origin of the request.

SessionId
Required: Yes
Type: string

The Session Id of the statement to be run.

Result Syntax

[
    'Id' => <integer>,
]

Result Details

Members
Id
Type: int

Returns the Id of the statement that was run.

Errors

EntityNotFoundException:

A specified entity does not exist

AccessDeniedException:

Access to a resource was denied.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

InvalidInputException:

The input provided was not valid.

ValidationException:

A value could not be validated.

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

IllegalSessionStateException:

The session is in an invalid state to perform a requested operation.

SearchTables

$result = $client->searchTables([/* ... */]);
$promise = $client->searchTablesAsync([/* ... */]);

Searches a set of tables based on properties in the table metadata as well as on the parent database. You can search against text or filter conditions.

You can only get tables that you have access to based on the security policies defined in Lake Formation. You need at least a read-only access to the table for it to be returned. If you do not have access to all the columns in the table, these columns will not be searched against when returning the list of tables back to you. If you have access to the columns but not the data in the columns, those columns and the associated metadata for those columns will be included in the search.

Parameter Syntax

$result = $client->searchTables([
    'CatalogId' => '<string>',
    'Filters' => [
        [
            'Comparator' => 'EQUALS|GREATER_THAN|LESS_THAN|GREATER_THAN_EQUALS|LESS_THAN_EQUALS',
            'Key' => '<string>',
            'Value' => '<string>',
        ],
        // ...
    ],
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
    'ResourceShareType' => 'FOREIGN|ALL|FEDERATED',
    'SearchText' => '<string>',
    'SortCriteria' => [
        [
            'FieldName' => '<string>',
            'Sort' => 'ASC|DESC',
        ],
        // ...
    ],
]);

Parameter Details

Members
CatalogId
Type: string

A unique identifier, consisting of account_id .

Filters
Type: Array of PropertyPredicate structures

A list of key-value pairs, and a comparator used to filter the search results. Returns all entities matching the predicate.

The Comparator member of the PropertyPredicate struct is used only for time fields, and can be omitted for other field types. Also, when comparing string values, such as when Key=Name, a fuzzy match algorithm is used. The Key field (for example, the value of the Name field) is split on certain punctuation characters, for example, -, :, #, etc. into tokens. Then each token is exact-match compared with the Value member of PropertyPredicate. For example, if Key=Name and Value=link, tables named customer-link and xx-link-yy are returned, but xxlinkyy is not returned.

MaxResults
Type: int

The maximum number of tables to return in a single response.

NextToken
Type: string

A continuation token, included if this is a continuation call.

ResourceShareType
Type: string

Allows you to specify that you want to search the tables shared with your account. The allowable values are FOREIGN or ALL.

  • If set to FOREIGN, will search the tables shared with your account.

  • If set to ALL, will search the tables shared with your account, as well as the tables in yor local account.

SearchText
Type: string

A string used for a text search.

Specifying a value in quotes filters based on an exact match to the value.

SortCriteria
Type: Array of SortCriterion structures

A list of criteria for sorting the results by a field name, in an ascending or descending order.

Result Syntax

[
    'NextToken' => '<string>',
    'TableList' => [
        [
            'CatalogId' => '<string>',
            'CreateTime' => <DateTime>,
            'CreatedBy' => '<string>',
            'DatabaseName' => '<string>',
            'Description' => '<string>',
            'FederatedTable' => [
                'ConnectionName' => '<string>',
                'DatabaseIdentifier' => '<string>',
                'Identifier' => '<string>',
            ],
            'IsMultiDialectView' => true || false,
            'IsRegisteredWithLakeFormation' => true || false,
            'LastAccessTime' => <DateTime>,
            'LastAnalyzedTime' => <DateTime>,
            'Name' => '<string>',
            'Owner' => '<string>',
            'Parameters' => ['<string>', ...],
            'PartitionKeys' => [
                [
                    'Comment' => '<string>',
                    'Name' => '<string>',
                    'Parameters' => ['<string>', ...],
                    'Type' => '<string>',
                ],
                // ...
            ],
            'Retention' => <integer>,
            'StorageDescriptor' => [
                'AdditionalLocations' => ['<string>', ...],
                'BucketColumns' => ['<string>', ...],
                'Columns' => [
                    [
                        'Comment' => '<string>',
                        'Name' => '<string>',
                        'Parameters' => ['<string>', ...],
                        'Type' => '<string>',
                    ],
                    // ...
                ],
                'Compressed' => true || false,
                'InputFormat' => '<string>',
                'Location' => '<string>',
                'NumberOfBuckets' => <integer>,
                'OutputFormat' => '<string>',
                'Parameters' => ['<string>', ...],
                'SchemaReference' => [
                    'SchemaId' => [
                        'RegistryName' => '<string>',
                        'SchemaArn' => '<string>',
                        'SchemaName' => '<string>',
                    ],
                    'SchemaVersionId' => '<string>',
                    'SchemaVersionNumber' => <integer>,
                ],
                'SerdeInfo' => [
                    'Name' => '<string>',
                    'Parameters' => ['<string>', ...],
                    'SerializationLibrary' => '<string>',
                ],
                'SkewedInfo' => [
                    'SkewedColumnNames' => ['<string>', ...],
                    'SkewedColumnValueLocationMaps' => ['<string>', ...],
                    'SkewedColumnValues' => ['<string>', ...],
                ],
                'SortColumns' => [
                    [
                        'Column' => '<string>',
                        'SortOrder' => <integer>,
                    ],
                    // ...
                ],
                'StoredAsSubDirectories' => true || false,
            ],
            'TableType' => '<string>',
            'TargetTable' => [
                'CatalogId' => '<string>',
                'DatabaseName' => '<string>',
                'Name' => '<string>',
                'Region' => '<string>',
            ],
            'UpdateTime' => <DateTime>,
            'VersionId' => '<string>',
            'ViewDefinition' => [
                'Definer' => '<string>',
                'IsProtected' => true || false,
                'Representations' => [
                    [
                        'Dialect' => 'REDSHIFT|ATHENA|SPARK',
                        'DialectVersion' => '<string>',
                        'IsStale' => true || false,
                        'ViewExpandedText' => '<string>',
                        'ViewOriginalText' => '<string>',
                    ],
                    // ...
                ],
                'SubObjects' => ['<string>', ...],
            ],
            'ViewExpandedText' => '<string>',
            'ViewOriginalText' => '<string>',
        ],
        // ...
    ],
]

Result Details

Members
NextToken
Type: string

A continuation token, present if the current list segment is not the last.

TableList
Type: Array of Table structures

A list of the requested Table objects. The SearchTables response returns only the tables that you have access to.

Errors

InternalServiceException:

An internal service error occurred.

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

StartBlueprintRun

$result = $client->startBlueprintRun([/* ... */]);
$promise = $client->startBlueprintRunAsync([/* ... */]);

Starts a new run of the specified blueprint.

Parameter Syntax

$result = $client->startBlueprintRun([
    'BlueprintName' => '<string>', // REQUIRED
    'Parameters' => '<string>',
    'RoleArn' => '<string>', // REQUIRED
]);

Parameter Details

Members
BlueprintName
Required: Yes
Type: string

The name of the blueprint.

Parameters
Type: string

Specifies the parameters as a BlueprintParameters object.

RoleArn
Required: Yes
Type: string

Specifies the IAM role used to create the workflow.

Result Syntax

[
    'RunId' => '<string>',
]

Result Details

Members
RunId
Type: string

The run ID for this blueprint run.

Errors

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

EntityNotFoundException:

A specified entity does not exist

IllegalBlueprintStateException:

The blueprint is in an invalid state to perform a requested operation.

StartColumnStatisticsTaskRun

$result = $client->startColumnStatisticsTaskRun([/* ... */]);
$promise = $client->startColumnStatisticsTaskRunAsync([/* ... */]);

Starts a column statistics task run, for a specified table and columns.

Parameter Syntax

$result = $client->startColumnStatisticsTaskRun([
    'CatalogID' => '<string>',
    'ColumnNameList' => ['<string>', ...],
    'DatabaseName' => '<string>', // REQUIRED
    'Role' => '<string>', // REQUIRED
    'SampleSize' => <float>,
    'SecurityConfiguration' => '<string>',
    'TableName' => '<string>', // REQUIRED
]);

Parameter Details

Members
CatalogID
Type: string

The ID of the Data Catalog where the table reside. If none is supplied, the Amazon Web Services account ID is used by default.

ColumnNameList
Type: Array of strings

A list of the column names to generate statistics. If none is supplied, all column names for the table will be used by default.

DatabaseName
Required: Yes
Type: string

The name of the database where the table resides.

Role
Required: Yes
Type: string

The IAM role that the service assumes to generate statistics.

SampleSize
Type: double

The percentage of rows used to generate statistics. If none is supplied, the entire table will be used to generate stats.

SecurityConfiguration
Type: string

Name of the security configuration that is used to encrypt CloudWatch logs for the column stats task run.

TableName
Required: Yes
Type: string

The name of the table to generate statistics.

Result Syntax

[
    'ColumnStatisticsTaskRunId' => '<string>',
]

Result Details

Members
ColumnStatisticsTaskRunId
Type: string

The identifier for the column statistics task run.

Errors

AccessDeniedException:

Access to a resource was denied.

EntityNotFoundException:

A specified entity does not exist

ColumnStatisticsTaskRunningException:

An exception thrown when you try to start another job while running a column stats generation job.

OperationTimeoutException:

The operation timed out.

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

InvalidInputException:

The input provided was not valid.

StartCrawler

$result = $client->startCrawler([/* ... */]);
$promise = $client->startCrawlerAsync([/* ... */]);

Starts a crawl using the specified crawler, regardless of what is scheduled. If the crawler is already running, returns a CrawlerRunningException.

Parameter Syntax

$result = $client->startCrawler([
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

Name of the crawler to start.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

EntityNotFoundException:

A specified entity does not exist

CrawlerRunningException:

The operation cannot be performed because the crawler is already running.

OperationTimeoutException:

The operation timed out.

StartCrawlerSchedule

$result = $client->startCrawlerSchedule([/* ... */]);
$promise = $client->startCrawlerScheduleAsync([/* ... */]);

Changes the schedule state of the specified crawler to SCHEDULED, unless the crawler is already running or the schedule state is already SCHEDULED.

Parameter Syntax

$result = $client->startCrawlerSchedule([
    'CrawlerName' => '<string>', // REQUIRED
]);

Parameter Details

Members
CrawlerName
Required: Yes
Type: string

Name of the crawler to schedule.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

EntityNotFoundException:

A specified entity does not exist

SchedulerRunningException:

The specified scheduler is already running.

SchedulerTransitioningException:

The specified scheduler is transitioning.

NoScheduleException:

There is no applicable schedule.

OperationTimeoutException:

The operation timed out.

StartDataQualityRuleRecommendationRun

$result = $client->startDataQualityRuleRecommendationRun([/* ... */]);
$promise = $client->startDataQualityRuleRecommendationRunAsync([/* ... */]);

Starts a recommendation run that is used to generate rules when you don't know what rules to write. Glue Data Quality analyzes the data and comes up with recommendations for a potential ruleset. You can then triage the ruleset and modify the generated ruleset to your liking.

Recommendation runs are automatically deleted after 90 days.

Parameter Syntax

$result = $client->startDataQualityRuleRecommendationRun([
    'ClientToken' => '<string>',
    'CreatedRulesetName' => '<string>',
    'DataSource' => [ // REQUIRED
        'GlueTable' => [ // REQUIRED
            'AdditionalOptions' => ['<string>', ...],
            'CatalogId' => '<string>',
            'ConnectionName' => '<string>',
            'DatabaseName' => '<string>', // REQUIRED
            'TableName' => '<string>', // REQUIRED
        ],
    ],
    'NumberOfWorkers' => <integer>,
    'Role' => '<string>', // REQUIRED
    'Timeout' => <integer>,
]);

Parameter Details

Members
ClientToken
Type: string

Used for idempotency and is recommended to be set to a random ID (such as a UUID) to avoid creating or starting multiple instances of the same resource.

CreatedRulesetName
Type: string

A name for the ruleset.

DataSource
Required: Yes
Type: DataSource structure

The data source (Glue table) associated with this run.

NumberOfWorkers
Type: int

The number of G.1X workers to be used in the run. The default is 5.

Role
Required: Yes
Type: string

An IAM role supplied to encrypt the results of the run.

Timeout
Type: int

The timeout for a run in minutes. This is the maximum time that a run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).

Result Syntax

[
    'RunId' => '<string>',
]

Result Details

Members
RunId
Type: string

The unique run identifier associated with this run.

Errors

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

ConflictException:

The CreatePartitions API was called on a table that has indexes enabled.

StartDataQualityRulesetEvaluationRun

$result = $client->startDataQualityRulesetEvaluationRun([/* ... */]);
$promise = $client->startDataQualityRulesetEvaluationRunAsync([/* ... */]);

Once you have a ruleset definition (either recommended or your own), you call this operation to evaluate the ruleset against a data source (Glue table). The evaluation computes results which you can retrieve with the GetDataQualityResult API.

Parameter Syntax

$result = $client->startDataQualityRulesetEvaluationRun([
    'AdditionalDataSources' => [
        '<NameString>' => [
            'GlueTable' => [ // REQUIRED
                'AdditionalOptions' => ['<string>', ...],
                'CatalogId' => '<string>',
                'ConnectionName' => '<string>',
                'DatabaseName' => '<string>', // REQUIRED
                'TableName' => '<string>', // REQUIRED
            ],
        ],
        // ...
    ],
    'AdditionalRunOptions' => [
        'CloudWatchMetricsEnabled' => true || false,
        'ResultsS3Prefix' => '<string>',
    ],
    'ClientToken' => '<string>',
    'DataSource' => [ // REQUIRED
        'GlueTable' => [ // REQUIRED
            'AdditionalOptions' => ['<string>', ...],
            'CatalogId' => '<string>',
            'ConnectionName' => '<string>',
            'DatabaseName' => '<string>', // REQUIRED
            'TableName' => '<string>', // REQUIRED
        ],
    ],
    'NumberOfWorkers' => <integer>,
    'Role' => '<string>', // REQUIRED
    'RulesetNames' => ['<string>', ...], // REQUIRED
    'Timeout' => <integer>,
]);

Parameter Details

Members
AdditionalDataSources
Type: Associative array of custom strings keys (NameString) to DataSource structures

A map of reference strings to additional data sources you can specify for an evaluation run.

AdditionalRunOptions

Additional run options you can specify for an evaluation run.

ClientToken
Type: string

Used for idempotency and is recommended to be set to a random ID (such as a UUID) to avoid creating or starting multiple instances of the same resource.

DataSource
Required: Yes
Type: DataSource structure

The data source (Glue table) associated with this run.

NumberOfWorkers
Type: int

The number of G.1X workers to be used in the run. The default is 5.

Role
Required: Yes
Type: string

An IAM role supplied to encrypt the results of the run.

RulesetNames
Required: Yes
Type: Array of strings

A list of ruleset names.

Timeout
Type: int

The timeout for a run in minutes. This is the maximum time that a run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).

Result Syntax

[
    'RunId' => '<string>',
]

Result Details

Members
RunId
Type: string

The unique run identifier associated with this run.

Errors

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

ConflictException:

The CreatePartitions API was called on a table that has indexes enabled.

StartExportLabelsTaskRun

$result = $client->startExportLabelsTaskRun([/* ... */]);
$promise = $client->startExportLabelsTaskRunAsync([/* ... */]);

Begins an asynchronous task to export all labeled data for a particular transform. This task is the only label-related API call that is not part of the typical active learning workflow. You typically use StartExportLabelsTaskRun when you want to work with all of your existing labels at the same time, such as when you want to remove or change labels that were previously submitted as truth. This API operation accepts the TransformId whose labels you want to export and an Amazon Simple Storage Service (Amazon S3) path to export the labels to. The operation returns a TaskRunId. You can check on the status of your task run by calling the GetMLTaskRun API.

Parameter Syntax

$result = $client->startExportLabelsTaskRun([
    'OutputS3Path' => '<string>', // REQUIRED
    'TransformId' => '<string>', // REQUIRED
]);

Parameter Details

Members
OutputS3Path
Required: Yes
Type: string

The Amazon S3 path where you export the labels.

TransformId
Required: Yes
Type: string

The unique identifier of the machine learning transform.

Result Syntax

[
    'TaskRunId' => '<string>',
]

Result Details

Members
TaskRunId
Type: string

The unique identifier for the task run.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

StartImportLabelsTaskRun

$result = $client->startImportLabelsTaskRun([/* ... */]);
$promise = $client->startImportLabelsTaskRunAsync([/* ... */]);

Enables you to provide additional labels (examples of truth) to be used to teach the machine learning transform and improve its quality. This API operation is generally used as part of the active learning workflow that starts with the StartMLLabelingSetGenerationTaskRun call and that ultimately results in improving the quality of your machine learning transform.

After the StartMLLabelingSetGenerationTaskRun finishes, Glue machine learning will have generated a series of questions for humans to answer. (Answering these questions is often called 'labeling' in the machine learning workflows). In the case of the FindMatches transform, these questions are of the form, “What is the correct way to group these rows together into groups composed entirely of matching records?” After the labeling process is finished, users upload their answers/labels with a call to StartImportLabelsTaskRun. After StartImportLabelsTaskRun finishes, all future runs of the machine learning transform use the new and improved labels and perform a higher-quality transformation.

By default, StartMLLabelingSetGenerationTaskRun continually learns from and combines all labels that you upload unless you set Replace to true. If you set Replace to true, StartImportLabelsTaskRun deletes and forgets all previously uploaded labels and learns only from the exact set that you upload. Replacing labels can be helpful if you realize that you previously uploaded incorrect labels, and you believe that they are having a negative effect on your transform quality.

You can check on the status of your task run by calling the GetMLTaskRun operation.

Parameter Syntax

$result = $client->startImportLabelsTaskRun([
    'InputS3Path' => '<string>', // REQUIRED
    'ReplaceAllLabels' => true || false,
    'TransformId' => '<string>', // REQUIRED
]);

Parameter Details

Members
InputS3Path
Required: Yes
Type: string

The Amazon Simple Storage Service (Amazon S3) path from where you import the labels.

ReplaceAllLabels
Type: boolean

Indicates whether to overwrite your existing labels.

TransformId
Required: Yes
Type: string

The unique identifier of the machine learning transform.

Result Syntax

[
    'TaskRunId' => '<string>',
]

Result Details

Members
TaskRunId
Type: string

The unique identifier for the task run.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

InternalServiceException:

An internal service error occurred.

StartJobRun

$result = $client->startJobRun([/* ... */]);
$promise = $client->startJobRunAsync([/* ... */]);

Starts a job run using a job definition.

Parameter Syntax

$result = $client->startJobRun([
    'AllocatedCapacity' => <integer>,
    'Arguments' => ['<string>', ...],
    'ExecutionClass' => 'FLEX|STANDARD',
    'JobName' => '<string>', // REQUIRED
    'JobRunId' => '<string>',
    'MaxCapacity' => <float>,
    'NotificationProperty' => [
        'NotifyDelayAfter' => <integer>,
    ],
    'NumberOfWorkers' => <integer>,
    'SecurityConfiguration' => '<string>',
    'Timeout' => <integer>,
    'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X',
]);

Parameter Details

Members
AllocatedCapacity
Type: int

This field is deprecated. Use MaxCapacity instead.

The number of Glue data processing units (DPUs) to allocate to this JobRun. You can allocate a minimum of 2 DPUs; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.

Arguments
Type: Associative array of custom strings keys (GenericString) to strings

The job arguments associated with this run. For this job run, they replace the default arguments set in the job definition itself.

You can specify arguments here that your own job-execution script consumes, as well as arguments that Glue itself consumes.

Job arguments may be logged. Do not pass plaintext secrets as arguments. Retrieve secrets from a Glue Connection, Secrets Manager or other secret management mechanism if you intend to keep them within the Job.

For information about how to specify and consume your own Job arguments, see the Calling Glue APIs in Python topic in the developer guide.

For information about the arguments you can provide to this field when configuring Spark jobs, see the Special Parameters Used by Glue topic in the developer guide.

For information about the arguments you can provide to this field when configuring Ray jobs, see Using job parameters in Ray jobs in the developer guide.

ExecutionClass
Type: string

Indicates whether the job is run with a standard or flexible execution class. The standard execution-class is ideal for time-sensitive workloads that require fast job startup and dedicated resources.

The flexible execution class is appropriate for time-insensitive jobs whose start and completion times may vary.

Only jobs with Glue version 3.0 and above and command type glueetl will be allowed to set ExecutionClass to FLEX. The flexible execution class is available for Spark jobs.

JobName
Required: Yes
Type: string

The name of the job definition to use.

JobRunId
Type: string

The ID of a previous JobRun to retry.

MaxCapacity
Type: double

For Glue version 1.0 or earlier jobs, using the standard worker type, the number of Glue data processing units (DPUs) that can be allocated when this job runs. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.

For Glue version 2.0+ jobs, you cannot specify a Maximum capacity. Instead, you should specify a Worker type and the Number of workers.

Do not set MaxCapacity if using WorkerType and NumberOfWorkers.

The value that can be allocated for MaxCapacity depends on whether you are running a Python shell job, an Apache Spark ETL job, or an Apache Spark streaming ETL job:

  • When you specify a Python shell job (JobCommand.Name="pythonshell"), you can allocate either 0.0625 or 1 DPU. The default is 0.0625 DPU.

  • When you specify an Apache Spark ETL job (JobCommand.Name="glueetl") or Apache Spark streaming ETL job (JobCommand.Name="gluestreaming"), you can allocate from 2 to 100 DPUs. The default is 10 DPUs. This job type cannot have a fractional DPU allocation.

NotificationProperty
Type: NotificationProperty structure

Specifies configuration properties of a job run notification.

NumberOfWorkers
Type: int

The number of workers of a defined workerType that are allocated when a job runs.

SecurityConfiguration
Type: string

The name of the SecurityConfiguration structure to be used with this job run.

Timeout
Type: int

The JobRun timeout in minutes. This is the maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status. This value overrides the timeout value set in the parent job.

Streaming jobs do not have a timeout. The default for non-streaming jobs is 2,880 minutes (48 hours).

WorkerType
Type: string

The type of predefined worker that is allocated when a job runs. Accepts a value of G.1X, G.2X, G.4X, G.8X or G.025X for Spark jobs. Accepts the value Z.2X for Ray jobs.

  • For the G.1X worker type, each worker maps to 1 DPU (4 vCPUs, 16 GB of memory) with 84GB disk (approximately 34GB free), and provides 1 executor per worker. We recommend this worker type for workloads such as data transforms, joins, and queries, to offers a scalable and cost effective way to run most jobs.

  • For the G.2X worker type, each worker maps to 2 DPU (8 vCPUs, 32 GB of memory) with 128GB disk (approximately 77GB free), and provides 1 executor per worker. We recommend this worker type for workloads such as data transforms, joins, and queries, to offers a scalable and cost effective way to run most jobs.

  • For the G.4X worker type, each worker maps to 4 DPU (16 vCPUs, 64 GB of memory) with 256GB disk (approximately 235GB free), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for Glue version 3.0 or later Spark ETL jobs in the following Amazon Web Services Regions: US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm).

  • For the G.8X worker type, each worker maps to 8 DPU (32 vCPUs, 128 GB of memory) with 512GB disk (approximately 487GB free), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for Glue version 3.0 or later Spark ETL jobs, in the same Amazon Web Services Regions as supported for the G.4X worker type.

  • For the G.025X worker type, each worker maps to 0.25 DPU (2 vCPUs, 4 GB of memory) with 84GB disk (approximately 34GB free), and provides 1 executor per worker. We recommend this worker type for low volume streaming jobs. This worker type is only available for Glue version 3.0 streaming jobs.

  • For the Z.2X worker type, each worker maps to 2 M-DPU (8vCPUs, 64 GB of memory) with 128 GB disk (approximately 120GB free), and provides up to 8 Ray workers based on the autoscaler.

Result Syntax

[
    'JobRunId' => '<string>',
]

Result Details

Members
JobRunId
Type: string

The ID assigned to this job run.

Errors

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

ConcurrentRunsExceededException:

Too many jobs are being run concurrently.

StartMLEvaluationTaskRun

$result = $client->startMLEvaluationTaskRun([/* ... */]);
$promise = $client->startMLEvaluationTaskRunAsync([/* ... */]);

Starts a task to estimate the quality of the transform.

When you provide label sets as examples of truth, Glue machine learning uses some of those examples to learn from them. The rest of the labels are used as a test to estimate quality.

Returns a unique identifier for the run. You can call GetMLTaskRun to get more information about the stats of the EvaluationTaskRun.

Parameter Syntax

$result = $client->startMLEvaluationTaskRun([
    'TransformId' => '<string>', // REQUIRED
]);

Parameter Details

Members
TransformId
Required: Yes
Type: string

The unique identifier of the machine learning transform.

Result Syntax

[
    'TaskRunId' => '<string>',
]

Result Details

Members
TaskRunId
Type: string

The unique identifier associated with this run.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

ConcurrentRunsExceededException:

Too many jobs are being run concurrently.

MLTransformNotReadyException:

The machine learning transform is not ready to run.

StartMLLabelingSetGenerationTaskRun

$result = $client->startMLLabelingSetGenerationTaskRun([/* ... */]);
$promise = $client->startMLLabelingSetGenerationTaskRunAsync([/* ... */]);

Starts the active learning workflow for your machine learning transform to improve the transform's quality by generating label sets and adding labels.

When the StartMLLabelingSetGenerationTaskRun finishes, Glue will have generated a "labeling set" or a set of questions for humans to answer.

In the case of the FindMatches transform, these questions are of the form, “What is the correct way to group these rows together into groups composed entirely of matching records?”

After the labeling process is finished, you can upload your labels with a call to StartImportLabelsTaskRun. After StartImportLabelsTaskRun finishes, all future runs of the machine learning transform will use the new and improved labels and perform a higher-quality transformation.

Parameter Syntax

$result = $client->startMLLabelingSetGenerationTaskRun([
    'OutputS3Path' => '<string>', // REQUIRED
    'TransformId' => '<string>', // REQUIRED
]);

Parameter Details

Members
OutputS3Path
Required: Yes
Type: string

The Amazon Simple Storage Service (Amazon S3) path where you generate the labeling set.

TransformId
Required: Yes
Type: string

The unique identifier of the machine learning transform.

Result Syntax

[
    'TaskRunId' => '<string>',
]

Result Details

Members
TaskRunId
Type: string

The unique run identifier that is associated with this task run.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

ConcurrentRunsExceededException:

Too many jobs are being run concurrently.

StartTrigger

$result = $client->startTrigger([/* ... */]);
$promise = $client->startTriggerAsync([/* ... */]);

Starts an existing trigger. See Triggering Jobs for information about how different types of trigger are started.

Parameter Syntax

$result = $client->startTrigger([
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the trigger to start.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Type: string

The name of the trigger that was started.

Errors

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

EntityNotFoundException:

A specified entity does not exist

OperationTimeoutException:

The operation timed out.

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

ConcurrentRunsExceededException:

Too many jobs are being run concurrently.

StartWorkflowRun

$result = $client->startWorkflowRun([/* ... */]);
$promise = $client->startWorkflowRunAsync([/* ... */]);

Starts a new run of the specified workflow.

Parameter Syntax

$result = $client->startWorkflowRun([
    'Name' => '<string>', // REQUIRED
    'RunProperties' => ['<string>', ...],
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the workflow to start.

RunProperties
Type: Associative array of custom strings keys (IdString) to strings

The workflow run properties for the new workflow run.

Result Syntax

[
    'RunId' => '<string>',
]

Result Details

Members
RunId
Type: string

An Id for the new run.

Errors

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

ConcurrentRunsExceededException:

Too many jobs are being run concurrently.

StopColumnStatisticsTaskRun

$result = $client->stopColumnStatisticsTaskRun([/* ... */]);
$promise = $client->stopColumnStatisticsTaskRunAsync([/* ... */]);

Stops a task run for the specified table.

Parameter Syntax

$result = $client->stopColumnStatisticsTaskRun([
    'DatabaseName' => '<string>', // REQUIRED
    'TableName' => '<string>', // REQUIRED
]);

Parameter Details

Members
DatabaseName
Required: Yes
Type: string

The name of the database where the table resides.

TableName
Required: Yes
Type: string

The name of the table.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

EntityNotFoundException:

A specified entity does not exist

ColumnStatisticsTaskNotRunningException:

An exception thrown when you try to stop a task run when there is no task running.

ColumnStatisticsTaskStoppingException:

An exception thrown when you try to stop a task run.

OperationTimeoutException:

The operation timed out.

StopCrawler

$result = $client->stopCrawler([/* ... */]);
$promise = $client->stopCrawlerAsync([/* ... */]);

If the specified crawler is running, stops the crawl.

Parameter Syntax

$result = $client->stopCrawler([
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

Name of the crawler to stop.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

EntityNotFoundException:

A specified entity does not exist

CrawlerNotRunningException:

The specified crawler is not running.

CrawlerStoppingException:

The specified crawler is stopping.

OperationTimeoutException:

The operation timed out.

StopCrawlerSchedule

$result = $client->stopCrawlerSchedule([/* ... */]);
$promise = $client->stopCrawlerScheduleAsync([/* ... */]);

Sets the schedule state of the specified crawler to NOT_SCHEDULED, but does not stop the crawler if it is already running.

Parameter Syntax

$result = $client->stopCrawlerSchedule([
    'CrawlerName' => '<string>', // REQUIRED
]);

Parameter Details

Members
CrawlerName
Required: Yes
Type: string

Name of the crawler whose schedule state to set.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

EntityNotFoundException:

A specified entity does not exist

SchedulerNotRunningException:

The specified scheduler is not running.

SchedulerTransitioningException:

The specified scheduler is transitioning.

OperationTimeoutException:

The operation timed out.

StopSession

$result = $client->stopSession([/* ... */]);
$promise = $client->stopSessionAsync([/* ... */]);

Stops the session.

Parameter Syntax

$result = $client->stopSession([
    'Id' => '<string>', // REQUIRED
    'RequestOrigin' => '<string>',
]);

Parameter Details

Members
Id
Required: Yes
Type: string

The ID of the session to be stopped.

RequestOrigin
Type: string

The origin of the request.

Result Syntax

[
    'Id' => '<string>',
]

Result Details

Members
Id
Type: string

Returns the Id of the stopped session.

Errors

AccessDeniedException:

Access to a resource was denied.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

InvalidInputException:

The input provided was not valid.

IllegalSessionStateException:

The session is in an invalid state to perform a requested operation.

ConcurrentModificationException:

Two processes are trying to modify a resource simultaneously.

StopTrigger

$result = $client->stopTrigger([/* ... */]);
$promise = $client->stopTriggerAsync([/* ... */]);

Stops a specified trigger.

Parameter Syntax

$result = $client->stopTrigger([
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the trigger to stop.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Type: string

The name of the trigger that was stopped.

Errors

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

EntityNotFoundException:

A specified entity does not exist

OperationTimeoutException:

The operation timed out.

ConcurrentModificationException:

Two processes are trying to modify a resource simultaneously.

StopWorkflowRun

$result = $client->stopWorkflowRun([/* ... */]);
$promise = $client->stopWorkflowRunAsync([/* ... */]);

Stops the execution of the specified workflow run.

Parameter Syntax

$result = $client->stopWorkflowRun([
    'Name' => '<string>', // REQUIRED
    'RunId' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the workflow to stop.

RunId
Required: Yes
Type: string

The ID of the workflow run to stop.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

IllegalWorkflowStateException:

The workflow is in an invalid state to perform a requested operation.

TagResource

$result = $client->tagResource([/* ... */]);
$promise = $client->tagResourceAsync([/* ... */]);

Adds tags to a resource. A tag is a label you can assign to an Amazon Web Services resource. In Glue, you can tag only certain resources. For information about what resources you can tag, see Amazon Web Services Tags in Glue.

Parameter Syntax

$result = $client->tagResource([
    'ResourceArn' => '<string>', // REQUIRED
    'TagsToAdd' => ['<string>', ...], // REQUIRED
]);

Parameter Details

Members
ResourceArn
Required: Yes
Type: string

The ARN of the Glue resource to which to add the tags. For more information about Glue resource ARNs, see the Glue ARN string pattern.

TagsToAdd
Required: Yes
Type: Associative array of custom strings keys (TagKey) to strings

Tags to add to this resource.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

EntityNotFoundException:

A specified entity does not exist

UntagResource

$result = $client->untagResource([/* ... */]);
$promise = $client->untagResourceAsync([/* ... */]);

Removes tags from a resource.

Parameter Syntax

$result = $client->untagResource([
    'ResourceArn' => '<string>', // REQUIRED
    'TagsToRemove' => ['<string>', ...], // REQUIRED
]);

Parameter Details

Members
ResourceArn
Required: Yes
Type: string

The Amazon Resource Name (ARN) of the resource from which to remove the tags.

TagsToRemove
Required: Yes
Type: Array of strings

Tags to remove from this resource.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

EntityNotFoundException:

A specified entity does not exist

UpdateBlueprint

$result = $client->updateBlueprint([/* ... */]);
$promise = $client->updateBlueprintAsync([/* ... */]);

Updates a registered blueprint.

Parameter Syntax

$result = $client->updateBlueprint([
    'BlueprintLocation' => '<string>', // REQUIRED
    'Description' => '<string>',
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
BlueprintLocation
Required: Yes
Type: string

Specifies a path in Amazon S3 where the blueprint is published.

Description
Type: string

A description of the blueprint.

Name
Required: Yes
Type: string

The name of the blueprint.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Type: string

Returns the name of the blueprint that was updated.

Errors

EntityNotFoundException:

A specified entity does not exist

ConcurrentModificationException:

Two processes are trying to modify a resource simultaneously.

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

IllegalBlueprintStateException:

The blueprint is in an invalid state to perform a requested operation.

UpdateClassifier

$result = $client->updateClassifier([/* ... */]);
$promise = $client->updateClassifierAsync([/* ... */]);

Modifies an existing classifier (a GrokClassifier, an XMLClassifier, a JsonClassifier, or a CsvClassifier, depending on which field is present).

Parameter Syntax

$result = $client->updateClassifier([
    'CsvClassifier' => [
        'AllowSingleColumn' => true || false,
        'ContainsHeader' => 'UNKNOWN|PRESENT|ABSENT',
        'CustomDatatypeConfigured' => true || false,
        'CustomDatatypes' => ['<string>', ...],
        'Delimiter' => '<string>',
        'DisableValueTrimming' => true || false,
        'Header' => ['<string>', ...],
        'Name' => '<string>', // REQUIRED
        'QuoteSymbol' => '<string>',
        'Serde' => 'OpenCSVSerDe|LazySimpleSerDe|None',
    ],
    'GrokClassifier' => [
        'Classification' => '<string>',
        'CustomPatterns' => '<string>',
        'GrokPattern' => '<string>',
        'Name' => '<string>', // REQUIRED
    ],
    'JsonClassifier' => [
        'JsonPath' => '<string>',
        'Name' => '<string>', // REQUIRED
    ],
    'XMLClassifier' => [
        'Classification' => '<string>',
        'Name' => '<string>', // REQUIRED
        'RowTag' => '<string>',
    ],
]);

Parameter Details

Members
CsvClassifier
Type: UpdateCsvClassifierRequest structure

A CsvClassifier object with updated fields.

GrokClassifier
Type: UpdateGrokClassifierRequest structure

A GrokClassifier object with updated fields.

JsonClassifier
Type: UpdateJsonClassifierRequest structure

A JsonClassifier object with updated fields.

XMLClassifier
Type: UpdateXMLClassifierRequest structure

An XMLClassifier object with updated fields.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

InvalidInputException:

The input provided was not valid.

VersionMismatchException:

There was a version conflict.

EntityNotFoundException:

A specified entity does not exist

OperationTimeoutException:

The operation timed out.

UpdateColumnStatisticsForPartition

$result = $client->updateColumnStatisticsForPartition([/* ... */]);
$promise = $client->updateColumnStatisticsForPartitionAsync([/* ... */]);

Creates or updates partition statistics of columns.

The Identity and Access Management (IAM) permission required for this operation is UpdatePartition.

Parameter Syntax

$result = $client->updateColumnStatisticsForPartition([
    'CatalogId' => '<string>',
    'ColumnStatisticsList' => [ // REQUIRED
        [
            'AnalyzedTime' => <integer || string || DateTime>, // REQUIRED
            'ColumnName' => '<string>', // REQUIRED
            'ColumnType' => '<string>', // REQUIRED
            'StatisticsData' => [ // REQUIRED
                'BinaryColumnStatisticsData' => [
                    'AverageLength' => <float>, // REQUIRED
                    'MaximumLength' => <integer>, // REQUIRED
                    'NumberOfNulls' => <integer>, // REQUIRED
                ],
                'BooleanColumnStatisticsData' => [
                    'NumberOfFalses' => <integer>, // REQUIRED
                    'NumberOfNulls' => <integer>, // REQUIRED
                    'NumberOfTrues' => <integer>, // REQUIRED
                ],
                'DateColumnStatisticsData' => [
                    'MaximumValue' => <integer || string || DateTime>,
                    'MinimumValue' => <integer || string || DateTime>,
                    'NumberOfDistinctValues' => <integer>, // REQUIRED
                    'NumberOfNulls' => <integer>, // REQUIRED
                ],
                'DecimalColumnStatisticsData' => [
                    'MaximumValue' => [
                        'Scale' => <integer>, // REQUIRED
                        'UnscaledValue' => <string || resource || Psr\Http\Message\StreamInterface>, // REQUIRED
                    ],
                    'MinimumValue' => [
                        'Scale' => <integer>, // REQUIRED
                        'UnscaledValue' => <string || resource || Psr\Http\Message\StreamInterface>, // REQUIRED
                    ],
                    'NumberOfDistinctValues' => <integer>, // REQUIRED
                    'NumberOfNulls' => <integer>, // REQUIRED
                ],
                'DoubleColumnStatisticsData' => [
                    'MaximumValue' => <float>,
                    'MinimumValue' => <float>,
                    'NumberOfDistinctValues' => <integer>, // REQUIRED
                    'NumberOfNulls' => <integer>, // REQUIRED
                ],
                'LongColumnStatisticsData' => [
                    'MaximumValue' => <integer>,
                    'MinimumValue' => <integer>,
                    'NumberOfDistinctValues' => <integer>, // REQUIRED
                    'NumberOfNulls' => <integer>, // REQUIRED
                ],
                'StringColumnStatisticsData' => [
                    'AverageLength' => <float>, // REQUIRED
                    'MaximumLength' => <integer>, // REQUIRED
                    'NumberOfDistinctValues' => <integer>, // REQUIRED
                    'NumberOfNulls' => <integer>, // REQUIRED
                ],
                'Type' => 'BOOLEAN|DATE|DECIMAL|DOUBLE|LONG|STRING|BINARY', // REQUIRED
            ],
        ],
        // ...
    ],
    'DatabaseName' => '<string>', // REQUIRED
    'PartitionValues' => ['<string>', ...], // REQUIRED
    'TableName' => '<string>', // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog where the partitions in question reside. If none is supplied, the Amazon Web Services account ID is used by default.

ColumnStatisticsList
Required: Yes
Type: Array of ColumnStatistics structures

A list of the column statistics.

DatabaseName
Required: Yes
Type: string

The name of the catalog database where the partitions reside.

PartitionValues
Required: Yes
Type: Array of strings

A list of partition values identifying the partition.

TableName
Required: Yes
Type: string

The name of the partitions' table.

Result Syntax

[
    'Errors' => [
        [
            'ColumnStatistics' => [
                'AnalyzedTime' => <DateTime>,
                'ColumnName' => '<string>',
                'ColumnType' => '<string>',
                'StatisticsData' => [
                    'BinaryColumnStatisticsData' => [
                        'AverageLength' => <float>,
                        'MaximumLength' => <integer>,
                        'NumberOfNulls' => <integer>,
                    ],
                    'BooleanColumnStatisticsData' => [
                        'NumberOfFalses' => <integer>,
                        'NumberOfNulls' => <integer>,
                        'NumberOfTrues' => <integer>,
                    ],
                    'DateColumnStatisticsData' => [
                        'MaximumValue' => <DateTime>,
                        'MinimumValue' => <DateTime>,
                        'NumberOfDistinctValues' => <integer>,
                        'NumberOfNulls' => <integer>,
                    ],
                    'DecimalColumnStatisticsData' => [
                        'MaximumValue' => [
                            'Scale' => <integer>,
                            'UnscaledValue' => <string || resource || Psr\Http\Message\StreamInterface>,
                        ],
                        'MinimumValue' => [
                            'Scale' => <integer>,
                            'UnscaledValue' => <string || resource || Psr\Http\Message\StreamInterface>,
                        ],
                        'NumberOfDistinctValues' => <integer>,
                        'NumberOfNulls' => <integer>,
                    ],
                    'DoubleColumnStatisticsData' => [
                        'MaximumValue' => <float>,
                        'MinimumValue' => <float>,
                        'NumberOfDistinctValues' => <integer>,
                        'NumberOfNulls' => <integer>,
                    ],
                    'LongColumnStatisticsData' => [
                        'MaximumValue' => <integer>,
                        'MinimumValue' => <integer>,
                        'NumberOfDistinctValues' => <integer>,
                        'NumberOfNulls' => <integer>,
                    ],
                    'StringColumnStatisticsData' => [
                        'AverageLength' => <float>,
                        'MaximumLength' => <integer>,
                        'NumberOfDistinctValues' => <integer>,
                        'NumberOfNulls' => <integer>,
                    ],
                    'Type' => 'BOOLEAN|DATE|DECIMAL|DOUBLE|LONG|STRING|BINARY',
                ],
            ],
            'Error' => [
                'ErrorCode' => '<string>',
                'ErrorMessage' => '<string>',
            ],
        ],
        // ...
    ],
]

Result Details

Members
Errors
Type: Array of ColumnStatisticsError structures

Error occurred during updating column statistics data.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GlueEncryptionException:

An encryption operation failed.

UpdateColumnStatisticsForTable

$result = $client->updateColumnStatisticsForTable([/* ... */]);
$promise = $client->updateColumnStatisticsForTableAsync([/* ... */]);

Creates or updates table statistics of columns.

The Identity and Access Management (IAM) permission required for this operation is UpdateTable.

Parameter Syntax

$result = $client->updateColumnStatisticsForTable([
    'CatalogId' => '<string>',
    'ColumnStatisticsList' => [ // REQUIRED
        [
            'AnalyzedTime' => <integer || string || DateTime>, // REQUIRED
            'ColumnName' => '<string>', // REQUIRED
            'ColumnType' => '<string>', // REQUIRED
            'StatisticsData' => [ // REQUIRED
                'BinaryColumnStatisticsData' => [
                    'AverageLength' => <float>, // REQUIRED
                    'MaximumLength' => <integer>, // REQUIRED
                    'NumberOfNulls' => <integer>, // REQUIRED
                ],
                'BooleanColumnStatisticsData' => [
                    'NumberOfFalses' => <integer>, // REQUIRED
                    'NumberOfNulls' => <integer>, // REQUIRED
                    'NumberOfTrues' => <integer>, // REQUIRED
                ],
                'DateColumnStatisticsData' => [
                    'MaximumValue' => <integer || string || DateTime>,
                    'MinimumValue' => <integer || string || DateTime>,
                    'NumberOfDistinctValues' => <integer>, // REQUIRED
                    'NumberOfNulls' => <integer>, // REQUIRED
                ],
                'DecimalColumnStatisticsData' => [
                    'MaximumValue' => [
                        'Scale' => <integer>, // REQUIRED
                        'UnscaledValue' => <string || resource || Psr\Http\Message\StreamInterface>, // REQUIRED
                    ],
                    'MinimumValue' => [
                        'Scale' => <integer>, // REQUIRED
                        'UnscaledValue' => <string || resource || Psr\Http\Message\StreamInterface>, // REQUIRED
                    ],
                    'NumberOfDistinctValues' => <integer>, // REQUIRED
                    'NumberOfNulls' => <integer>, // REQUIRED
                ],
                'DoubleColumnStatisticsData' => [
                    'MaximumValue' => <float>,
                    'MinimumValue' => <float>,
                    'NumberOfDistinctValues' => <integer>, // REQUIRED
                    'NumberOfNulls' => <integer>, // REQUIRED
                ],
                'LongColumnStatisticsData' => [
                    'MaximumValue' => <integer>,
                    'MinimumValue' => <integer>,
                    'NumberOfDistinctValues' => <integer>, // REQUIRED
                    'NumberOfNulls' => <integer>, // REQUIRED
                ],
                'StringColumnStatisticsData' => [
                    'AverageLength' => <float>, // REQUIRED
                    'MaximumLength' => <integer>, // REQUIRED
                    'NumberOfDistinctValues' => <integer>, // REQUIRED
                    'NumberOfNulls' => <integer>, // REQUIRED
                ],
                'Type' => 'BOOLEAN|DATE|DECIMAL|DOUBLE|LONG|STRING|BINARY', // REQUIRED
            ],
        ],
        // ...
    ],
    'DatabaseName' => '<string>', // REQUIRED
    'TableName' => '<string>', // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog where the partitions in question reside. If none is supplied, the Amazon Web Services account ID is used by default.

ColumnStatisticsList
Required: Yes
Type: Array of ColumnStatistics structures

A list of the column statistics.

DatabaseName
Required: Yes
Type: string

The name of the catalog database where the partitions reside.

TableName
Required: Yes
Type: string

The name of the partitions' table.

Result Syntax

[
    'Errors' => [
        [
            'ColumnStatistics' => [
                'AnalyzedTime' => <DateTime>,
                'ColumnName' => '<string>',
                'ColumnType' => '<string>',
                'StatisticsData' => [
                    'BinaryColumnStatisticsData' => [
                        'AverageLength' => <float>,
                        'MaximumLength' => <integer>,
                        'NumberOfNulls' => <integer>,
                    ],
                    'BooleanColumnStatisticsData' => [
                        'NumberOfFalses' => <integer>,
                        'NumberOfNulls' => <integer>,
                        'NumberOfTrues' => <integer>,
                    ],
                    'DateColumnStatisticsData' => [
                        'MaximumValue' => <DateTime>,
                        'MinimumValue' => <DateTime>,
                        'NumberOfDistinctValues' => <integer>,
                        'NumberOfNulls' => <integer>,
                    ],
                    'DecimalColumnStatisticsData' => [
                        'MaximumValue' => [
                            'Scale' => <integer>,
                            'UnscaledValue' => <string || resource || Psr\Http\Message\StreamInterface>,
                        ],
                        'MinimumValue' => [
                            'Scale' => <integer>,
                            'UnscaledValue' => <string || resource || Psr\Http\Message\StreamInterface>,
                        ],
                        'NumberOfDistinctValues' => <integer>,
                        'NumberOfNulls' => <integer>,
                    ],
                    'DoubleColumnStatisticsData' => [
                        'MaximumValue' => <float>,
                        'MinimumValue' => <float>,
                        'NumberOfDistinctValues' => <integer>,
                        'NumberOfNulls' => <integer>,
                    ],
                    'LongColumnStatisticsData' => [
                        'MaximumValue' => <integer>,
                        'MinimumValue' => <integer>,
                        'NumberOfDistinctValues' => <integer>,
                        'NumberOfNulls' => <integer>,
                    ],
                    'StringColumnStatisticsData' => [
                        'AverageLength' => <float>,
                        'MaximumLength' => <integer>,
                        'NumberOfDistinctValues' => <integer>,
                        'NumberOfNulls' => <integer>,
                    ],
                    'Type' => 'BOOLEAN|DATE|DECIMAL|DOUBLE|LONG|STRING|BINARY',
                ],
            ],
            'Error' => [
                'ErrorCode' => '<string>',
                'ErrorMessage' => '<string>',
            ],
        ],
        // ...
    ],
]

Result Details

Members
Errors
Type: Array of ColumnStatisticsError structures

List of ColumnStatisticsErrors.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GlueEncryptionException:

An encryption operation failed.

UpdateConnection

$result = $client->updateConnection([/* ... */]);
$promise = $client->updateConnectionAsync([/* ... */]);

Updates a connection definition in the Data Catalog.

Parameter Syntax

$result = $client->updateConnection([
    'CatalogId' => '<string>',
    'ConnectionInput' => [ // REQUIRED
        'ConnectionProperties' => ['<string>', ...], // REQUIRED
        'ConnectionType' => 'JDBC|SFTP|MONGODB|KAFKA|NETWORK|MARKETPLACE|CUSTOM', // REQUIRED
        'Description' => '<string>',
        'MatchCriteria' => ['<string>', ...],
        'Name' => '<string>', // REQUIRED
        'PhysicalConnectionRequirements' => [
            'AvailabilityZone' => '<string>',
            'SecurityGroupIdList' => ['<string>', ...],
            'SubnetId' => '<string>',
        ],
    ],
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog in which the connection resides. If none is provided, the Amazon Web Services account ID is used by default.

ConnectionInput
Required: Yes
Type: ConnectionInput structure

A ConnectionInput object that redefines the connection in question.

Name
Required: Yes
Type: string

The name of the connection definition to update.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

OperationTimeoutException:

The operation timed out.

InvalidInputException:

The input provided was not valid.

GlueEncryptionException:

An encryption operation failed.

UpdateCrawler

$result = $client->updateCrawler([/* ... */]);
$promise = $client->updateCrawlerAsync([/* ... */]);

Updates a crawler. If a crawler is running, you must stop it using StopCrawler before updating it.

Parameter Syntax

$result = $client->updateCrawler([
    'Classifiers' => ['<string>', ...],
    'Configuration' => '<string>',
    'CrawlerSecurityConfiguration' => '<string>',
    'DatabaseName' => '<string>',
    'Description' => '<string>',
    'LakeFormationConfiguration' => [
        'AccountId' => '<string>',
        'UseLakeFormationCredentials' => true || false,
    ],
    'LineageConfiguration' => [
        'CrawlerLineageSettings' => 'ENABLE|DISABLE',
    ],
    'Name' => '<string>', // REQUIRED
    'RecrawlPolicy' => [
        'RecrawlBehavior' => 'CRAWL_EVERYTHING|CRAWL_NEW_FOLDERS_ONLY|CRAWL_EVENT_MODE',
    ],
    'Role' => '<string>',
    'Schedule' => '<string>',
    'SchemaChangePolicy' => [
        'DeleteBehavior' => 'LOG|DELETE_FROM_DATABASE|DEPRECATE_IN_DATABASE',
        'UpdateBehavior' => 'LOG|UPDATE_IN_DATABASE',
    ],
    'TablePrefix' => '<string>',
    'Targets' => [
        'CatalogTargets' => [
            [
                'ConnectionName' => '<string>',
                'DatabaseName' => '<string>', // REQUIRED
                'DlqEventQueueArn' => '<string>',
                'EventQueueArn' => '<string>',
                'Tables' => ['<string>', ...], // REQUIRED
            ],
            // ...
        ],
        'DeltaTargets' => [
            [
                'ConnectionName' => '<string>',
                'CreateNativeDeltaTable' => true || false,
                'DeltaTables' => ['<string>', ...],
                'WriteManifest' => true || false,
            ],
            // ...
        ],
        'DynamoDBTargets' => [
            [
                'Path' => '<string>',
                'scanAll' => true || false,
                'scanRate' => <float>,
            ],
            // ...
        ],
        'HudiTargets' => [
            [
                'ConnectionName' => '<string>',
                'Exclusions' => ['<string>', ...],
                'MaximumTraversalDepth' => <integer>,
                'Paths' => ['<string>', ...],
            ],
            // ...
        ],
        'IcebergTargets' => [
            [
                'ConnectionName' => '<string>',
                'Exclusions' => ['<string>', ...],
                'MaximumTraversalDepth' => <integer>,
                'Paths' => ['<string>', ...],
            ],
            // ...
        ],
        'JdbcTargets' => [
            [
                'ConnectionName' => '<string>',
                'EnableAdditionalMetadata' => ['<string>', ...],
                'Exclusions' => ['<string>', ...],
                'Path' => '<string>',
            ],
            // ...
        ],
        'MongoDBTargets' => [
            [
                'ConnectionName' => '<string>',
                'Path' => '<string>',
                'ScanAll' => true || false,
            ],
            // ...
        ],
        'S3Targets' => [
            [
                'ConnectionName' => '<string>',
                'DlqEventQueueArn' => '<string>',
                'EventQueueArn' => '<string>',
                'Exclusions' => ['<string>', ...],
                'Path' => '<string>',
                'SampleSize' => <integer>,
            ],
            // ...
        ],
    ],
]);

Parameter Details

Members
Classifiers
Type: Array of strings

A list of custom classifiers that the user has registered. By default, all built-in classifiers are included in a crawl, but these custom classifiers always override the default classifiers for a given classification.

Configuration
Type: string

Crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler's behavior. For more information, see Setting crawler configuration options.

CrawlerSecurityConfiguration
Type: string

The name of the SecurityConfiguration structure to be used by this crawler.

DatabaseName
Type: string

The Glue database where results are stored, such as: arn:aws:daylight:us-east-1::database/sometable/*.

Description
Type: string

A description of the new crawler.

LakeFormationConfiguration
Type: LakeFormationConfiguration structure

Specifies Lake Formation configuration settings for the crawler.

LineageConfiguration
Type: LineageConfiguration structure

Specifies data lineage configuration settings for the crawler.

Name
Required: Yes
Type: string

Name of the new crawler.

RecrawlPolicy
Type: RecrawlPolicy structure

A policy that specifies whether to crawl the entire dataset again, or to crawl only folders that were added since the last crawler run.

Role
Type: string

The IAM role or Amazon Resource Name (ARN) of an IAM role that is used by the new crawler to access customer resources.

Schedule
Type: string

A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *).

SchemaChangePolicy
Type: SchemaChangePolicy structure

The policy for the crawler's update and deletion behavior.

TablePrefix
Type: string

The table prefix used for catalog tables that are created.

Targets
Type: CrawlerTargets structure

A list of targets to crawl.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

InvalidInputException:

The input provided was not valid.

VersionMismatchException:

There was a version conflict.

EntityNotFoundException:

A specified entity does not exist

CrawlerRunningException:

The operation cannot be performed because the crawler is already running.

OperationTimeoutException:

The operation timed out.

UpdateCrawlerSchedule

$result = $client->updateCrawlerSchedule([/* ... */]);
$promise = $client->updateCrawlerScheduleAsync([/* ... */]);

Updates the schedule of a crawler using a cron expression.

Parameter Syntax

$result = $client->updateCrawlerSchedule([
    'CrawlerName' => '<string>', // REQUIRED
    'Schedule' => '<string>',
]);

Parameter Details

Members
CrawlerName
Required: Yes
Type: string

The name of the crawler whose schedule to update.

Schedule
Type: string

The updated cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *).

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

VersionMismatchException:

There was a version conflict.

SchedulerTransitioningException:

The specified scheduler is transitioning.

OperationTimeoutException:

The operation timed out.

UpdateDataQualityRuleset

$result = $client->updateDataQualityRuleset([/* ... */]);
$promise = $client->updateDataQualityRulesetAsync([/* ... */]);

Updates the specified data quality ruleset.

Parameter Syntax

$result = $client->updateDataQualityRuleset([
    'Description' => '<string>',
    'Name' => '<string>', // REQUIRED
    'Ruleset' => '<string>',
]);

Parameter Details

Members
Description
Type: string

A description of the ruleset.

Name
Required: Yes
Type: string

The name of the data quality ruleset.

Ruleset
Type: string

A Data Quality Definition Language (DQDL) ruleset. For more information, see the Glue developer guide.

Result Syntax

[
    'Description' => '<string>',
    'Name' => '<string>',
    'Ruleset' => '<string>',
]

Result Details

Members
Description
Type: string

A description of the ruleset.

Name
Type: string

The name of the data quality ruleset.

Ruleset
Type: string

A Data Quality Definition Language (DQDL) ruleset. For more information, see the Glue developer guide.

Errors

EntityNotFoundException:

A specified entity does not exist

AlreadyExistsException:

A resource to be created or added already exists.

IdempotentParameterMismatchException:

The same unique identifier was associated with two different records.

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

UpdateDatabase

$result = $client->updateDatabase([/* ... */]);
$promise = $client->updateDatabaseAsync([/* ... */]);

Updates an existing database definition in a Data Catalog.

Parameter Syntax

$result = $client->updateDatabase([
    'CatalogId' => '<string>',
    'DatabaseInput' => [ // REQUIRED
        'CreateTableDefaultPermissions' => [
            [
                'Permissions' => ['<string>', ...],
                'Principal' => [
                    'DataLakePrincipalIdentifier' => '<string>',
                ],
            ],
            // ...
        ],
        'Description' => '<string>',
        'FederatedDatabase' => [
            'ConnectionName' => '<string>',
            'Identifier' => '<string>',
        ],
        'LocationUri' => '<string>',
        'Name' => '<string>', // REQUIRED
        'Parameters' => ['<string>', ...],
        'TargetDatabase' => [
            'CatalogId' => '<string>',
            'DatabaseName' => '<string>',
            'Region' => '<string>',
        ],
    ],
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog in which the metadata database resides. If none is provided, the Amazon Web Services account ID is used by default.

DatabaseInput
Required: Yes
Type: DatabaseInput structure

A DatabaseInput object specifying the new definition of the metadata database in the catalog.

Name
Required: Yes
Type: string

The name of the database to update in the catalog. For Hive compatibility, this is folded to lowercase.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GlueEncryptionException:

An encryption operation failed.

ConcurrentModificationException:

Two processes are trying to modify a resource simultaneously.

UpdateDevEndpoint

$result = $client->updateDevEndpoint([/* ... */]);
$promise = $client->updateDevEndpointAsync([/* ... */]);

Updates a specified development endpoint.

Parameter Syntax

$result = $client->updateDevEndpoint([
    'AddArguments' => ['<string>', ...],
    'AddPublicKeys' => ['<string>', ...],
    'CustomLibraries' => [
        'ExtraJarsS3Path' => '<string>',
        'ExtraPythonLibsS3Path' => '<string>',
    ],
    'DeleteArguments' => ['<string>', ...],
    'DeletePublicKeys' => ['<string>', ...],
    'EndpointName' => '<string>', // REQUIRED
    'PublicKey' => '<string>',
    'UpdateEtlLibraries' => true || false,
]);

Parameter Details

Members
AddArguments
Type: Associative array of custom strings keys (GenericString) to strings

The map of arguments to add the map of arguments used to configure the DevEndpoint.

Valid arguments are:

  • "--enable-glue-datacatalog": ""

You can specify a version of Python support for development endpoints by using the Arguments parameter in the CreateDevEndpoint or UpdateDevEndpoint APIs. If no arguments are provided, the version defaults to Python 2.

AddPublicKeys
Type: Array of strings

The list of public keys for the DevEndpoint to use.

CustomLibraries
Type: DevEndpointCustomLibraries structure

Custom Python or Java libraries to be loaded in the DevEndpoint.

DeleteArguments
Type: Array of strings

The list of argument keys to be deleted from the map of arguments used to configure the DevEndpoint.

DeletePublicKeys
Type: Array of strings

The list of public keys to be deleted from the DevEndpoint.

EndpointName
Required: Yes
Type: string

The name of the DevEndpoint to be updated.

PublicKey
Type: string

The public key for the DevEndpoint to use.

UpdateEtlLibraries
Type: boolean

True if the list of custom libraries to be loaded in the development endpoint needs to be updated, or False if otherwise.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

InvalidInputException:

The input provided was not valid.

ValidationException:

A value could not be validated.

UpdateJob

$result = $client->updateJob([/* ... */]);
$promise = $client->updateJobAsync([/* ... */]);

Updates an existing job definition. The previous job definition is completely overwritten by this information.

Parameter Syntax

$result = $client->updateJob([
    'JobName' => '<string>', // REQUIRED
    'JobUpdate' => [ // REQUIRED
        'AllocatedCapacity' => <integer>,
        'CodeGenConfigurationNodes' => [
            '<NodeId>' => [
                'Aggregate' => [
                    'Aggs' => [ // REQUIRED
                        [
                            'AggFunc' => 'avg|countDistinct|count|first|last|kurtosis|max|min|skewness|stddev_samp|stddev_pop|sum|sumDistinct|var_samp|var_pop', // REQUIRED
                            'Column' => ['<string>', ...], // REQUIRED
                        ],
                        // ...
                    ],
                    'Groups' => [ // REQUIRED
                        ['<string>', ...],
                        // ...
                    ],
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                ],
                'AmazonRedshiftSource' => [
                    'Data' => [
                        'AccessType' => '<string>',
                        'Action' => '<string>',
                        'AdvancedOptions' => [
                            [
                                'Key' => '<string>',
                                'Value' => '<string>',
                            ],
                            // ...
                        ],
                        'CatalogDatabase' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'CatalogRedshiftSchema' => '<string>',
                        'CatalogRedshiftTable' => '<string>',
                        'CatalogTable' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'Connection' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'CrawlerConnection' => '<string>',
                        'IamRole' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'MergeAction' => '<string>',
                        'MergeClause' => '<string>',
                        'MergeWhenMatched' => '<string>',
                        'MergeWhenNotMatched' => '<string>',
                        'PostAction' => '<string>',
                        'PreAction' => '<string>',
                        'SampleQuery' => '<string>',
                        'Schema' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'SelectedColumns' => [
                            [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            // ...
                        ],
                        'SourceType' => '<string>',
                        'StagingTable' => '<string>',
                        'Table' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'TablePrefix' => '<string>',
                        'TableSchema' => [
                            [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            // ...
                        ],
                        'TempDir' => '<string>',
                        'Upsert' => true || false,
                    ],
                    'Name' => '<string>',
                ],
                'AmazonRedshiftTarget' => [
                    'Data' => [
                        'AccessType' => '<string>',
                        'Action' => '<string>',
                        'AdvancedOptions' => [
                            [
                                'Key' => '<string>',
                                'Value' => '<string>',
                            ],
                            // ...
                        ],
                        'CatalogDatabase' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'CatalogRedshiftSchema' => '<string>',
                        'CatalogRedshiftTable' => '<string>',
                        'CatalogTable' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'Connection' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'CrawlerConnection' => '<string>',
                        'IamRole' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'MergeAction' => '<string>',
                        'MergeClause' => '<string>',
                        'MergeWhenMatched' => '<string>',
                        'MergeWhenNotMatched' => '<string>',
                        'PostAction' => '<string>',
                        'PreAction' => '<string>',
                        'SampleQuery' => '<string>',
                        'Schema' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'SelectedColumns' => [
                            [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            // ...
                        ],
                        'SourceType' => '<string>',
                        'StagingTable' => '<string>',
                        'Table' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'TablePrefix' => '<string>',
                        'TableSchema' => [
                            [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            // ...
                        ],
                        'TempDir' => '<string>',
                        'Upsert' => true || false,
                    ],
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>',
                ],
                'ApplyMapping' => [
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Mapping' => [ // REQUIRED
                        [
                            'Children' => [...], // RECURSIVE
                            'Dropped' => true || false,
                            'FromPath' => ['<string>', ...],
                            'FromType' => '<string>',
                            'ToKey' => '<string>',
                            'ToType' => '<string>',
                        ],
                        // ...
                    ],
                    'Name' => '<string>', // REQUIRED
                ],
                'AthenaConnectorSource' => [
                    'ConnectionName' => '<string>', // REQUIRED
                    'ConnectionTable' => '<string>',
                    'ConnectionType' => '<string>', // REQUIRED
                    'ConnectorName' => '<string>', // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>', // REQUIRED
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                    'SchemaName' => '<string>', // REQUIRED
                ],
                'CatalogDeltaSource' => [
                    'AdditionalDeltaOptions' => ['<string>', ...],
                    'Database' => '<string>', // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>', // REQUIRED
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                    'Table' => '<string>', // REQUIRED
                ],
                'CatalogHudiSource' => [
                    'AdditionalHudiOptions' => ['<string>', ...],
                    'Database' => '<string>', // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>', // REQUIRED
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                    'Table' => '<string>', // REQUIRED
                ],
                'CatalogKafkaSource' => [
                    'DataPreviewOptions' => [
                        'PollingTime' => <integer>,
                        'RecordPollingLimit' => <integer>,
                    ],
                    'Database' => '<string>', // REQUIRED
                    'DetectSchema' => true || false,
                    'Name' => '<string>', // REQUIRED
                    'StreamingOptions' => [
                        'AddRecordTimestamp' => '<string>',
                        'Assign' => '<string>',
                        'BootstrapServers' => '<string>',
                        'Classification' => '<string>',
                        'ConnectionName' => '<string>',
                        'Delimiter' => '<string>',
                        'EmitConsumerLagMetrics' => '<string>',
                        'EndingOffsets' => '<string>',
                        'IncludeHeaders' => true || false,
                        'MaxOffsetsPerTrigger' => <integer>,
                        'MinPartitions' => <integer>,
                        'NumRetries' => <integer>,
                        'PollTimeoutMs' => <integer>,
                        'RetryIntervalMs' => <integer>,
                        'SecurityProtocol' => '<string>',
                        'StartingOffsets' => '<string>',
                        'StartingTimestamp' => <integer || string || DateTime>,
                        'SubscribePattern' => '<string>',
                        'TopicName' => '<string>',
                    ],
                    'Table' => '<string>', // REQUIRED
                    'WindowSize' => <integer>,
                ],
                'CatalogKinesisSource' => [
                    'DataPreviewOptions' => [
                        'PollingTime' => <integer>,
                        'RecordPollingLimit' => <integer>,
                    ],
                    'Database' => '<string>', // REQUIRED
                    'DetectSchema' => true || false,
                    'Name' => '<string>', // REQUIRED
                    'StreamingOptions' => [
                        'AddIdleTimeBetweenReads' => true || false,
                        'AddRecordTimestamp' => '<string>',
                        'AvoidEmptyBatches' => true || false,
                        'Classification' => '<string>',
                        'Delimiter' => '<string>',
                        'DescribeShardInterval' => <integer>,
                        'EmitConsumerLagMetrics' => '<string>',
                        'EndpointUrl' => '<string>',
                        'IdleTimeBetweenReadsInMs' => <integer>,
                        'MaxFetchRecordsPerShard' => <integer>,
                        'MaxFetchTimeInMs' => <integer>,
                        'MaxRecordPerRead' => <integer>,
                        'MaxRetryIntervalMs' => <integer>,
                        'NumRetries' => <integer>,
                        'RetryIntervalMs' => <integer>,
                        'RoleArn' => '<string>',
                        'RoleSessionName' => '<string>',
                        'StartingPosition' => 'latest|trim_horizon|earliest|timestamp',
                        'StartingTimestamp' => <integer || string || DateTime>,
                        'StreamArn' => '<string>',
                        'StreamName' => '<string>',
                    ],
                    'Table' => '<string>', // REQUIRED
                    'WindowSize' => <integer>,
                ],
                'CatalogSource' => [
                    'Database' => '<string>', // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'Table' => '<string>', // REQUIRED
                ],
                'CatalogTarget' => [
                    'Database' => '<string>', // REQUIRED
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'Table' => '<string>', // REQUIRED
                ],
                'ConnectorDataSource' => [
                    'ConnectionType' => '<string>', // REQUIRED
                    'Data' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>', // REQUIRED
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                ],
                'ConnectorDataTarget' => [
                    'ConnectionType' => '<string>', // REQUIRED
                    'Data' => ['<string>', ...], // REQUIRED
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>', // REQUIRED
                ],
                'CustomCode' => [
                    'ClassName' => '<string>', // REQUIRED
                    'Code' => '<string>', // REQUIRED
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>', // REQUIRED
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                ],
                'DirectJDBCSource' => [
                    'ConnectionName' => '<string>', // REQUIRED
                    'ConnectionType' => 'sqlserver|mysql|oracle|postgresql|redshift', // REQUIRED
                    'Database' => '<string>', // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'RedshiftTmpDir' => '<string>',
                    'Table' => '<string>', // REQUIRED
                ],
                'DirectKafkaSource' => [
                    'DataPreviewOptions' => [
                        'PollingTime' => <integer>,
                        'RecordPollingLimit' => <integer>,
                    ],
                    'DetectSchema' => true || false,
                    'Name' => '<string>', // REQUIRED
                    'StreamingOptions' => [
                        'AddRecordTimestamp' => '<string>',
                        'Assign' => '<string>',
                        'BootstrapServers' => '<string>',
                        'Classification' => '<string>',
                        'ConnectionName' => '<string>',
                        'Delimiter' => '<string>',
                        'EmitConsumerLagMetrics' => '<string>',
                        'EndingOffsets' => '<string>',
                        'IncludeHeaders' => true || false,
                        'MaxOffsetsPerTrigger' => <integer>,
                        'MinPartitions' => <integer>,
                        'NumRetries' => <integer>,
                        'PollTimeoutMs' => <integer>,
                        'RetryIntervalMs' => <integer>,
                        'SecurityProtocol' => '<string>',
                        'StartingOffsets' => '<string>',
                        'StartingTimestamp' => <integer || string || DateTime>,
                        'SubscribePattern' => '<string>',
                        'TopicName' => '<string>',
                    ],
                    'WindowSize' => <integer>,
                ],
                'DirectKinesisSource' => [
                    'DataPreviewOptions' => [
                        'PollingTime' => <integer>,
                        'RecordPollingLimit' => <integer>,
                    ],
                    'DetectSchema' => true || false,
                    'Name' => '<string>', // REQUIRED
                    'StreamingOptions' => [
                        'AddIdleTimeBetweenReads' => true || false,
                        'AddRecordTimestamp' => '<string>',
                        'AvoidEmptyBatches' => true || false,
                        'Classification' => '<string>',
                        'Delimiter' => '<string>',
                        'DescribeShardInterval' => <integer>,
                        'EmitConsumerLagMetrics' => '<string>',
                        'EndpointUrl' => '<string>',
                        'IdleTimeBetweenReadsInMs' => <integer>,
                        'MaxFetchRecordsPerShard' => <integer>,
                        'MaxFetchTimeInMs' => <integer>,
                        'MaxRecordPerRead' => <integer>,
                        'MaxRetryIntervalMs' => <integer>,
                        'NumRetries' => <integer>,
                        'RetryIntervalMs' => <integer>,
                        'RoleArn' => '<string>',
                        'RoleSessionName' => '<string>',
                        'StartingPosition' => 'latest|trim_horizon|earliest|timestamp',
                        'StartingTimestamp' => <integer || string || DateTime>,
                        'StreamArn' => '<string>',
                        'StreamName' => '<string>',
                    ],
                    'WindowSize' => <integer>,
                ],
                'DropDuplicates' => [
                    'Columns' => [
                        ['<string>', ...],
                        // ...
                    ],
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                ],
                'DropFields' => [
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'Paths' => [ // REQUIRED
                        ['<string>', ...],
                        // ...
                    ],
                ],
                'DropNullFields' => [
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'NullCheckBoxList' => [
                        'IsEmpty' => true || false,
                        'IsNegOne' => true || false,
                        'IsNullString' => true || false,
                    ],
                    'NullTextList' => [
                        [
                            'Datatype' => [ // REQUIRED
                                'Id' => '<string>', // REQUIRED
                                'Label' => '<string>', // REQUIRED
                            ],
                            'Value' => '<string>', // REQUIRED
                        ],
                        // ...
                    ],
                ],
                'DynamicTransform' => [
                    'FunctionName' => '<string>', // REQUIRED
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>', // REQUIRED
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                    'Parameters' => [
                        [
                            'IsOptional' => true || false,
                            'ListType' => 'str|int|float|complex|bool|list|null',
                            'Name' => '<string>', // REQUIRED
                            'Type' => 'str|int|float|complex|bool|list|null', // REQUIRED
                            'ValidationMessage' => '<string>',
                            'ValidationRule' => '<string>',
                            'Value' => ['<string>', ...],
                        ],
                        // ...
                    ],
                    'Path' => '<string>', // REQUIRED
                    'TransformName' => '<string>', // REQUIRED
                    'Version' => '<string>',
                ],
                'DynamoDBCatalogSource' => [
                    'Database' => '<string>', // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'Table' => '<string>', // REQUIRED
                ],
                'EvaluateDataQuality' => [
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'Output' => 'PrimaryInput|EvaluationResults',
                    'PublishingOptions' => [
                        'CloudWatchMetricsEnabled' => true || false,
                        'EvaluationContext' => '<string>',
                        'ResultsPublishingEnabled' => true || false,
                        'ResultsS3Prefix' => '<string>',
                    ],
                    'Ruleset' => '<string>', // REQUIRED
                    'StopJobOnFailureOptions' => [
                        'StopJobOnFailureTiming' => 'Immediate|AfterDataLoad',
                    ],
                ],
                'EvaluateDataQualityMultiFrame' => [
                    'AdditionalDataSources' => ['<string>', ...],
                    'AdditionalOptions' => ['<string>', ...],
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'PublishingOptions' => [
                        'CloudWatchMetricsEnabled' => true || false,
                        'EvaluationContext' => '<string>',
                        'ResultsPublishingEnabled' => true || false,
                        'ResultsS3Prefix' => '<string>',
                    ],
                    'Ruleset' => '<string>', // REQUIRED
                    'StopJobOnFailureOptions' => [
                        'StopJobOnFailureTiming' => 'Immediate|AfterDataLoad',
                    ],
                ],
                'FillMissingValues' => [
                    'FilledPath' => '<string>',
                    'ImputedPath' => '<string>', // REQUIRED
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                ],
                'Filter' => [
                    'Filters' => [ // REQUIRED
                        [
                            'Negated' => true || false,
                            'Operation' => 'EQ|LT|GT|LTE|GTE|REGEX|ISNULL', // REQUIRED
                            'Values' => [ // REQUIRED
                                [
                                    'Type' => 'COLUMNEXTRACTED|CONSTANT', // REQUIRED
                                    'Value' => ['<string>', ...], // REQUIRED
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'LogicalOperator' => 'AND|OR', // REQUIRED
                    'Name' => '<string>', // REQUIRED
                ],
                'GovernedCatalogSource' => [
                    'AdditionalOptions' => [
                        'BoundedFiles' => <integer>,
                        'BoundedSize' => <integer>,
                    ],
                    'Database' => '<string>', // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'PartitionPredicate' => '<string>',
                    'Table' => '<string>', // REQUIRED
                ],
                'GovernedCatalogTarget' => [
                    'Database' => '<string>', // REQUIRED
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'PartitionKeys' => [
                        ['<string>', ...],
                        // ...
                    ],
                    'SchemaChangePolicy' => [
                        'EnableUpdateCatalog' => true || false,
                        'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                    ],
                    'Table' => '<string>', // REQUIRED
                ],
                'JDBCConnectorSource' => [
                    'AdditionalOptions' => [
                        'DataTypeMapping' => ['<string>', ...],
                        'FilterPredicate' => '<string>',
                        'JobBookmarkKeys' => ['<string>', ...],
                        'JobBookmarkKeysSortOrder' => '<string>',
                        'LowerBound' => <integer>,
                        'NumPartitions' => <integer>,
                        'PartitionColumn' => '<string>',
                        'UpperBound' => <integer>,
                    ],
                    'ConnectionName' => '<string>', // REQUIRED
                    'ConnectionTable' => '<string>',
                    'ConnectionType' => '<string>', // REQUIRED
                    'ConnectorName' => '<string>', // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>', // REQUIRED
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                    'Query' => '<string>',
                ],
                'JDBCConnectorTarget' => [
                    'AdditionalOptions' => ['<string>', ...],
                    'ConnectionName' => '<string>', // REQUIRED
                    'ConnectionTable' => '<string>', // REQUIRED
                    'ConnectionType' => '<string>', // REQUIRED
                    'ConnectorName' => '<string>', // REQUIRED
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>', // REQUIRED
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                ],
                'Join' => [
                    'Columns' => [ // REQUIRED
                        [
                            'From' => '<string>', // REQUIRED
                            'Keys' => [ // REQUIRED
                                ['<string>', ...],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'JoinType' => 'equijoin|left|right|outer|leftsemi|leftanti', // REQUIRED
                    'Name' => '<string>', // REQUIRED
                ],
                'Merge' => [
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'PrimaryKeys' => [ // REQUIRED
                        ['<string>', ...],
                        // ...
                    ],
                    'Source' => '<string>', // REQUIRED
                ],
                'MicrosoftSQLServerCatalogSource' => [
                    'Database' => '<string>', // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'Table' => '<string>', // REQUIRED
                ],
                'MicrosoftSQLServerCatalogTarget' => [
                    'Database' => '<string>', // REQUIRED
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'Table' => '<string>', // REQUIRED
                ],
                'MySQLCatalogSource' => [
                    'Database' => '<string>', // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'Table' => '<string>', // REQUIRED
                ],
                'MySQLCatalogTarget' => [
                    'Database' => '<string>', // REQUIRED
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'Table' => '<string>', // REQUIRED
                ],
                'OracleSQLCatalogSource' => [
                    'Database' => '<string>', // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'Table' => '<string>', // REQUIRED
                ],
                'OracleSQLCatalogTarget' => [
                    'Database' => '<string>', // REQUIRED
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'Table' => '<string>', // REQUIRED
                ],
                'PIIDetection' => [
                    'EntityTypesToDetect' => ['<string>', ...], // REQUIRED
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'MaskValue' => '<string>',
                    'Name' => '<string>', // REQUIRED
                    'OutputColumnName' => '<string>',
                    'PiiType' => 'RowAudit|RowMasking|ColumnAudit|ColumnMasking', // REQUIRED
                    'SampleFraction' => <float>,
                    'ThresholdFraction' => <float>,
                ],
                'PostgreSQLCatalogSource' => [
                    'Database' => '<string>', // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'Table' => '<string>', // REQUIRED
                ],
                'PostgreSQLCatalogTarget' => [
                    'Database' => '<string>', // REQUIRED
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'Table' => '<string>', // REQUIRED
                ],
                'Recipe' => [
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'RecipeReference' => [ // REQUIRED
                        'RecipeArn' => '<string>', // REQUIRED
                        'RecipeVersion' => '<string>', // REQUIRED
                    ],
                ],
                'RedshiftSource' => [
                    'Database' => '<string>', // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'RedshiftTmpDir' => '<string>',
                    'Table' => '<string>', // REQUIRED
                    'TmpDirIAMRole' => '<string>',
                ],
                'RedshiftTarget' => [
                    'Database' => '<string>', // REQUIRED
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'RedshiftTmpDir' => '<string>',
                    'Table' => '<string>', // REQUIRED
                    'TmpDirIAMRole' => '<string>',
                    'UpsertRedshiftOptions' => [
                        'ConnectionName' => '<string>',
                        'TableLocation' => '<string>',
                        'UpsertKeys' => ['<string>', ...],
                    ],
                ],
                'RelationalCatalogSource' => [
                    'Database' => '<string>', // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'Table' => '<string>', // REQUIRED
                ],
                'RenameField' => [
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'SourcePath' => ['<string>', ...], // REQUIRED
                    'TargetPath' => ['<string>', ...], // REQUIRED
                ],
                'S3CatalogDeltaSource' => [
                    'AdditionalDeltaOptions' => ['<string>', ...],
                    'Database' => '<string>', // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>', // REQUIRED
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                    'Table' => '<string>', // REQUIRED
                ],
                'S3CatalogHudiSource' => [
                    'AdditionalHudiOptions' => ['<string>', ...],
                    'Database' => '<string>', // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>', // REQUIRED
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                    'Table' => '<string>', // REQUIRED
                ],
                'S3CatalogSource' => [
                    'AdditionalOptions' => [
                        'BoundedFiles' => <integer>,
                        'BoundedSize' => <integer>,
                    ],
                    'Database' => '<string>', // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'PartitionPredicate' => '<string>',
                    'Table' => '<string>', // REQUIRED
                ],
                'S3CatalogTarget' => [
                    'Database' => '<string>', // REQUIRED
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'PartitionKeys' => [
                        ['<string>', ...],
                        // ...
                    ],
                    'SchemaChangePolicy' => [
                        'EnableUpdateCatalog' => true || false,
                        'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                    ],
                    'Table' => '<string>', // REQUIRED
                ],
                'S3CsvSource' => [
                    'AdditionalOptions' => [
                        'BoundedFiles' => <integer>,
                        'BoundedSize' => <integer>,
                        'EnableSamplePath' => true || false,
                        'SamplePath' => '<string>',
                    ],
                    'CompressionType' => 'gzip|bzip2',
                    'Escaper' => '<string>',
                    'Exclusions' => ['<string>', ...],
                    'GroupFiles' => '<string>',
                    'GroupSize' => '<string>',
                    'MaxBand' => <integer>,
                    'MaxFilesInBand' => <integer>,
                    'Multiline' => true || false,
                    'Name' => '<string>', // REQUIRED
                    'OptimizePerformance' => true || false,
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>', // REQUIRED
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                    'Paths' => ['<string>', ...], // REQUIRED
                    'QuoteChar' => 'quote|quillemet|single_quote|disabled', // REQUIRED
                    'Recurse' => true || false,
                    'Separator' => 'comma|ctrla|pipe|semicolon|tab', // REQUIRED
                    'SkipFirst' => true || false,
                    'WithHeader' => true || false,
                    'WriteHeader' => true || false,
                ],
                'S3DeltaCatalogTarget' => [
                    'AdditionalOptions' => ['<string>', ...],
                    'Database' => '<string>', // REQUIRED
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'PartitionKeys' => [
                        ['<string>', ...],
                        // ...
                    ],
                    'SchemaChangePolicy' => [
                        'EnableUpdateCatalog' => true || false,
                        'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                    ],
                    'Table' => '<string>', // REQUIRED
                ],
                'S3DeltaDirectTarget' => [
                    'AdditionalOptions' => ['<string>', ...],
                    'Compression' => 'uncompressed|snappy', // REQUIRED
                    'Format' => 'json|csv|avro|orc|parquet|hudi|delta', // REQUIRED
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'PartitionKeys' => [
                        ['<string>', ...],
                        // ...
                    ],
                    'Path' => '<string>', // REQUIRED
                    'SchemaChangePolicy' => [
                        'Database' => '<string>',
                        'EnableUpdateCatalog' => true || false,
                        'Table' => '<string>',
                        'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                    ],
                ],
                'S3DeltaSource' => [
                    'AdditionalDeltaOptions' => ['<string>', ...],
                    'AdditionalOptions' => [
                        'BoundedFiles' => <integer>,
                        'BoundedSize' => <integer>,
                        'EnableSamplePath' => true || false,
                        'SamplePath' => '<string>',
                    ],
                    'Name' => '<string>', // REQUIRED
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>', // REQUIRED
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                    'Paths' => ['<string>', ...], // REQUIRED
                ],
                'S3DirectTarget' => [
                    'Compression' => '<string>',
                    'Format' => 'json|csv|avro|orc|parquet|hudi|delta', // REQUIRED
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'PartitionKeys' => [
                        ['<string>', ...],
                        // ...
                    ],
                    'Path' => '<string>', // REQUIRED
                    'SchemaChangePolicy' => [
                        'Database' => '<string>',
                        'EnableUpdateCatalog' => true || false,
                        'Table' => '<string>',
                        'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                    ],
                ],
                'S3GlueParquetTarget' => [
                    'Compression' => 'snappy|lzo|gzip|uncompressed|none',
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'PartitionKeys' => [
                        ['<string>', ...],
                        // ...
                    ],
                    'Path' => '<string>', // REQUIRED
                    'SchemaChangePolicy' => [
                        'Database' => '<string>',
                        'EnableUpdateCatalog' => true || false,
                        'Table' => '<string>',
                        'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                    ],
                ],
                'S3HudiCatalogTarget' => [
                    'AdditionalOptions' => ['<string>', ...], // REQUIRED
                    'Database' => '<string>', // REQUIRED
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'PartitionKeys' => [
                        ['<string>', ...],
                        // ...
                    ],
                    'SchemaChangePolicy' => [
                        'EnableUpdateCatalog' => true || false,
                        'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                    ],
                    'Table' => '<string>', // REQUIRED
                ],
                'S3HudiDirectTarget' => [
                    'AdditionalOptions' => ['<string>', ...], // REQUIRED
                    'Compression' => 'gzip|lzo|uncompressed|snappy', // REQUIRED
                    'Format' => 'json|csv|avro|orc|parquet|hudi|delta', // REQUIRED
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'PartitionKeys' => [
                        ['<string>', ...],
                        // ...
                    ],
                    'Path' => '<string>', // REQUIRED
                    'SchemaChangePolicy' => [
                        'Database' => '<string>',
                        'EnableUpdateCatalog' => true || false,
                        'Table' => '<string>',
                        'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG',
                    ],
                ],
                'S3HudiSource' => [
                    'AdditionalHudiOptions' => ['<string>', ...],
                    'AdditionalOptions' => [
                        'BoundedFiles' => <integer>,
                        'BoundedSize' => <integer>,
                        'EnableSamplePath' => true || false,
                        'SamplePath' => '<string>',
                    ],
                    'Name' => '<string>', // REQUIRED
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>', // REQUIRED
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                    'Paths' => ['<string>', ...], // REQUIRED
                ],
                'S3JsonSource' => [
                    'AdditionalOptions' => [
                        'BoundedFiles' => <integer>,
                        'BoundedSize' => <integer>,
                        'EnableSamplePath' => true || false,
                        'SamplePath' => '<string>',
                    ],
                    'CompressionType' => 'gzip|bzip2',
                    'Exclusions' => ['<string>', ...],
                    'GroupFiles' => '<string>',
                    'GroupSize' => '<string>',
                    'JsonPath' => '<string>',
                    'MaxBand' => <integer>,
                    'MaxFilesInBand' => <integer>,
                    'Multiline' => true || false,
                    'Name' => '<string>', // REQUIRED
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>', // REQUIRED
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                    'Paths' => ['<string>', ...], // REQUIRED
                    'Recurse' => true || false,
                ],
                'S3ParquetSource' => [
                    'AdditionalOptions' => [
                        'BoundedFiles' => <integer>,
                        'BoundedSize' => <integer>,
                        'EnableSamplePath' => true || false,
                        'SamplePath' => '<string>',
                    ],
                    'CompressionType' => 'snappy|lzo|gzip|uncompressed|none',
                    'Exclusions' => ['<string>', ...],
                    'GroupFiles' => '<string>',
                    'GroupSize' => '<string>',
                    'MaxBand' => <integer>,
                    'MaxFilesInBand' => <integer>,
                    'Name' => '<string>', // REQUIRED
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>', // REQUIRED
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                    'Paths' => ['<string>', ...], // REQUIRED
                    'Recurse' => true || false,
                ],
                'SelectFields' => [
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'Paths' => [ // REQUIRED
                        ['<string>', ...],
                        // ...
                    ],
                ],
                'SelectFromCollection' => [
                    'Index' => <integer>, // REQUIRED
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                ],
                'SnowflakeSource' => [
                    'Data' => [ // REQUIRED
                        'Action' => '<string>',
                        'AdditionalOptions' => ['<string>', ...],
                        'AutoPushdown' => true || false,
                        'Connection' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'Database' => '<string>',
                        'IamRole' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'MergeAction' => '<string>',
                        'MergeClause' => '<string>',
                        'MergeWhenMatched' => '<string>',
                        'MergeWhenNotMatched' => '<string>',
                        'PostAction' => '<string>',
                        'PreAction' => '<string>',
                        'SampleQuery' => '<string>',
                        'Schema' => '<string>',
                        'SelectedColumns' => [
                            [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            // ...
                        ],
                        'SourceType' => '<string>',
                        'StagingTable' => '<string>',
                        'Table' => '<string>',
                        'TableSchema' => [
                            [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            // ...
                        ],
                        'TempDir' => '<string>',
                        'Upsert' => true || false,
                    ],
                    'Name' => '<string>', // REQUIRED
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>', // REQUIRED
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                ],
                'SnowflakeTarget' => [
                    'Data' => [ // REQUIRED
                        'Action' => '<string>',
                        'AdditionalOptions' => ['<string>', ...],
                        'AutoPushdown' => true || false,
                        'Connection' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'Database' => '<string>',
                        'IamRole' => [
                            'Description' => '<string>',
                            'Label' => '<string>',
                            'Value' => '<string>',
                        ],
                        'MergeAction' => '<string>',
                        'MergeClause' => '<string>',
                        'MergeWhenMatched' => '<string>',
                        'MergeWhenNotMatched' => '<string>',
                        'PostAction' => '<string>',
                        'PreAction' => '<string>',
                        'SampleQuery' => '<string>',
                        'Schema' => '<string>',
                        'SelectedColumns' => [
                            [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            // ...
                        ],
                        'SourceType' => '<string>',
                        'StagingTable' => '<string>',
                        'Table' => '<string>',
                        'TableSchema' => [
                            [
                                'Description' => '<string>',
                                'Label' => '<string>',
                                'Value' => '<string>',
                            ],
                            // ...
                        ],
                        'TempDir' => '<string>',
                        'Upsert' => true || false,
                    ],
                    'Inputs' => ['<string>', ...],
                    'Name' => '<string>', // REQUIRED
                ],
                'SparkConnectorSource' => [
                    'AdditionalOptions' => ['<string>', ...],
                    'ConnectionName' => '<string>', // REQUIRED
                    'ConnectionType' => '<string>', // REQUIRED
                    'ConnectorName' => '<string>', // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>', // REQUIRED
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                ],
                'SparkConnectorTarget' => [
                    'AdditionalOptions' => ['<string>', ...],
                    'ConnectionName' => '<string>', // REQUIRED
                    'ConnectionType' => '<string>', // REQUIRED
                    'ConnectorName' => '<string>', // REQUIRED
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>', // REQUIRED
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                ],
                'SparkSQL' => [
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'OutputSchemas' => [
                        [
                            'Columns' => [
                                [
                                    'Name' => '<string>', // REQUIRED
                                    'Type' => '<string>',
                                ],
                                // ...
                            ],
                        ],
                        // ...
                    ],
                    'SqlAliases' => [ // REQUIRED
                        [
                            'Alias' => '<string>', // REQUIRED
                            'From' => '<string>', // REQUIRED
                        ],
                        // ...
                    ],
                    'SqlQuery' => '<string>', // REQUIRED
                ],
                'Spigot' => [
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'Path' => '<string>', // REQUIRED
                    'Prob' => <float>,
                    'Topk' => <integer>,
                ],
                'SplitFields' => [
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'Paths' => [ // REQUIRED
                        ['<string>', ...],
                        // ...
                    ],
                ],
                'Union' => [
                    'Inputs' => ['<string>', ...], // REQUIRED
                    'Name' => '<string>', // REQUIRED
                    'UnionType' => 'ALL|DISTINCT', // REQUIRED
                ],
            ],
            // ...
        ],
        'Command' => [
            'Name' => '<string>',
            'PythonVersion' => '<string>',
            'Runtime' => '<string>',
            'ScriptLocation' => '<string>',
        ],
        'Connections' => [
            'Connections' => ['<string>', ...],
        ],
        'DefaultArguments' => ['<string>', ...],
        'Description' => '<string>',
        'ExecutionClass' => 'FLEX|STANDARD',
        'ExecutionProperty' => [
            'MaxConcurrentRuns' => <integer>,
        ],
        'GlueVersion' => '<string>',
        'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK',
        'LogUri' => '<string>',
        'MaintenanceWindow' => '<string>',
        'MaxCapacity' => <float>,
        'MaxRetries' => <integer>,
        'NonOverridableArguments' => ['<string>', ...],
        'NotificationProperty' => [
            'NotifyDelayAfter' => <integer>,
        ],
        'NumberOfWorkers' => <integer>,
        'Role' => '<string>',
        'SecurityConfiguration' => '<string>',
        'SourceControlDetails' => [
            'AuthStrategy' => 'PERSONAL_ACCESS_TOKEN|AWS_SECRETS_MANAGER',
            'AuthToken' => '<string>',
            'Branch' => '<string>',
            'Folder' => '<string>',
            'LastCommitId' => '<string>',
            'Owner' => '<string>',
            'Provider' => 'GITHUB|GITLAB|BITBUCKET|AWS_CODE_COMMIT',
            'Repository' => '<string>',
        ],
        'Timeout' => <integer>,
        'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X',
    ],
]);

Parameter Details

Members
JobName
Required: Yes
Type: string

The name of the job definition to update.

JobUpdate
Required: Yes
Type: JobUpdate structure

Specifies the values with which to update the job definition. Unspecified configuration is removed or reset to default values.

Result Syntax

[
    'JobName' => '<string>',
]

Result Details

Members
JobName
Type: string

Returns the name of the updated job definition.

Errors

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

ConcurrentModificationException:

Two processes are trying to modify a resource simultaneously.

UpdateJobFromSourceControl

$result = $client->updateJobFromSourceControl([/* ... */]);
$promise = $client->updateJobFromSourceControlAsync([/* ... */]);

Synchronizes a job from the source control repository. This operation takes the job artifacts that are located in the remote repository and updates the Glue internal stores with these artifacts.

This API supports optional parameters which take in the repository information.

Parameter Syntax

$result = $client->updateJobFromSourceControl([
    'AuthStrategy' => 'PERSONAL_ACCESS_TOKEN|AWS_SECRETS_MANAGER',
    'AuthToken' => '<string>',
    'BranchName' => '<string>',
    'CommitId' => '<string>',
    'Folder' => '<string>',
    'JobName' => '<string>',
    'Provider' => 'GITHUB|GITLAB|BITBUCKET|AWS_CODE_COMMIT',
    'RepositoryName' => '<string>',
    'RepositoryOwner' => '<string>',
]);

Parameter Details

Members
AuthStrategy
Type: string

The type of authentication, which can be an authentication token stored in Amazon Web Services Secrets Manager, or a personal access token.

AuthToken
Type: string

The value of the authorization token.

BranchName
Type: string

An optional branch in the remote repository.

CommitId
Type: string

A commit ID for a commit in the remote repository.

Folder
Type: string

An optional folder in the remote repository.

JobName
Type: string

The name of the Glue job to be synchronized to or from the remote repository.

Provider
Type: string

The provider for the remote repository. Possible values: GITHUB, AWS_CODE_COMMIT, GITLAB, BITBUCKET.

RepositoryName
Type: string

The name of the remote repository that contains the job artifacts. For BitBucket providers, RepositoryName should include WorkspaceName. Use the format <WorkspaceName>/<RepositoryName>.

RepositoryOwner
Type: string

The owner of the remote repository that contains the job artifacts.

Result Syntax

[
    'JobName' => '<string>',
]

Result Details

Members
JobName
Type: string

The name of the Glue job.

Errors

AccessDeniedException:

Access to a resource was denied.

AlreadyExistsException:

A resource to be created or added already exists.

InvalidInputException:

The input provided was not valid.

ValidationException:

A value could not be validated.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

UpdateMLTransform

$result = $client->updateMLTransform([/* ... */]);
$promise = $client->updateMLTransformAsync([/* ... */]);

Updates an existing machine learning transform. Call this operation to tune the algorithm parameters to achieve better results.

After calling this operation, you can call the StartMLEvaluationTaskRun operation to assess how well your new parameters achieved your goals (such as improving the quality of your machine learning transform, or making it more cost-effective).

Parameter Syntax

$result = $client->updateMLTransform([
    'Description' => '<string>',
    'GlueVersion' => '<string>',
    'MaxCapacity' => <float>,
    'MaxRetries' => <integer>,
    'Name' => '<string>',
    'NumberOfWorkers' => <integer>,
    'Parameters' => [
        'FindMatchesParameters' => [
            'AccuracyCostTradeoff' => <float>,
            'EnforceProvidedLabels' => true || false,
            'PrecisionRecallTradeoff' => <float>,
            'PrimaryKeyColumnName' => '<string>',
        ],
        'TransformType' => 'FIND_MATCHES', // REQUIRED
    ],
    'Role' => '<string>',
    'Timeout' => <integer>,
    'TransformId' => '<string>', // REQUIRED
    'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X',
]);

Parameter Details

Members
Description
Type: string

A description of the transform. The default is an empty string.

GlueVersion
Type: string

This value determines which version of Glue this machine learning transform is compatible with. Glue 1.0 is recommended for most customers. If the value is not set, the Glue compatibility defaults to Glue 0.9. For more information, see Glue Versions in the developer guide.

MaxCapacity
Type: double

The number of Glue data processing units (DPUs) that are allocated to task runs for this transform. You can allocate from 2 to 100 DPUs; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.

When the WorkerType field is set to a value other than Standard, the MaxCapacity field is set automatically and becomes read-only.

MaxRetries
Type: int

The maximum number of times to retry a task for this transform after a task run fails.

Name
Type: string

The unique name that you gave the transform when you created it.

NumberOfWorkers
Type: int

The number of workers of a defined workerType that are allocated when this task runs.

Parameters
Type: TransformParameters structure

The configuration parameters that are specific to the transform type (algorithm) used. Conditionally dependent on the transform type.

Role
Type: string

The name or Amazon Resource Name (ARN) of the IAM role with the required permissions.

Timeout
Type: int

The timeout for a task run for this transform in minutes. This is the maximum time that a task run for this transform can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).

TransformId
Required: Yes
Type: string

A unique identifier that was generated when the transform was created.

WorkerType
Type: string

The type of predefined worker that is allocated when this task runs. Accepts a value of Standard, G.1X, or G.2X.

  • For the Standard worker type, each worker provides 4 vCPU, 16 GB of memory and a 50GB disk, and 2 executors per worker.

  • For the G.1X worker type, each worker provides 4 vCPU, 16 GB of memory and a 64GB disk, and 1 executor per worker.

  • For the G.2X worker type, each worker provides 8 vCPU, 32 GB of memory and a 128GB disk, and 1 executor per worker.

Result Syntax

[
    'TransformId' => '<string>',
]

Result Details

Members
TransformId
Type: string

The unique identifier for the transform that was updated.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

OperationTimeoutException:

The operation timed out.

InternalServiceException:

An internal service error occurred.

AccessDeniedException:

Access to a resource was denied.

UpdatePartition

$result = $client->updatePartition([/* ... */]);
$promise = $client->updatePartitionAsync([/* ... */]);

Updates a partition.

Parameter Syntax

$result = $client->updatePartition([
    'CatalogId' => '<string>',
    'DatabaseName' => '<string>', // REQUIRED
    'PartitionInput' => [ // REQUIRED
        'LastAccessTime' => <integer || string || DateTime>,
        'LastAnalyzedTime' => <integer || string || DateTime>,
        'Parameters' => ['<string>', ...],
        'StorageDescriptor' => [
            'AdditionalLocations' => ['<string>', ...],
            'BucketColumns' => ['<string>', ...],
            'Columns' => [
                [
                    'Comment' => '<string>',
                    'Name' => '<string>', // REQUIRED
                    'Parameters' => ['<string>', ...],
                    'Type' => '<string>',
                ],
                // ...
            ],
            'Compressed' => true || false,
            'InputFormat' => '<string>',
            'Location' => '<string>',
            'NumberOfBuckets' => <integer>,
            'OutputFormat' => '<string>',
            'Parameters' => ['<string>', ...],
            'SchemaReference' => [
                'SchemaId' => [
                    'RegistryName' => '<string>',
                    'SchemaArn' => '<string>',
                    'SchemaName' => '<string>',
                ],
                'SchemaVersionId' => '<string>',
                'SchemaVersionNumber' => <integer>,
            ],
            'SerdeInfo' => [
                'Name' => '<string>',
                'Parameters' => ['<string>', ...],
                'SerializationLibrary' => '<string>',
            ],
            'SkewedInfo' => [
                'SkewedColumnNames' => ['<string>', ...],
                'SkewedColumnValueLocationMaps' => ['<string>', ...],
                'SkewedColumnValues' => ['<string>', ...],
            ],
            'SortColumns' => [
                [
                    'Column' => '<string>', // REQUIRED
                    'SortOrder' => <integer>, // REQUIRED
                ],
                // ...
            ],
            'StoredAsSubDirectories' => true || false,
        ],
        'Values' => ['<string>', ...],
    ],
    'PartitionValueList' => ['<string>', ...], // REQUIRED
    'TableName' => '<string>', // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog where the partition to be updated resides. If none is provided, the Amazon Web Services account ID is used by default.

DatabaseName
Required: Yes
Type: string

The name of the catalog database in which the table in question resides.

PartitionInput
Required: Yes
Type: PartitionInput structure

The new partition object to update the partition to.

The Values property can't be changed. If you want to change the partition key values for a partition, delete and recreate the partition.

PartitionValueList
Required: Yes
Type: Array of strings

List of partition key values that define the partition to update.

TableName
Required: Yes
Type: string

The name of the table in which the partition to be updated is located.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GlueEncryptionException:

An encryption operation failed.

UpdateRegistry

$result = $client->updateRegistry([/* ... */]);
$promise = $client->updateRegistryAsync([/* ... */]);

Updates an existing registry which is used to hold a collection of schemas. The updated properties relate to the registry, and do not modify any of the schemas within the registry.

Parameter Syntax

$result = $client->updateRegistry([
    'Description' => '<string>', // REQUIRED
    'RegistryId' => [ // REQUIRED
        'RegistryArn' => '<string>',
        'RegistryName' => '<string>',
    ],
]);

Parameter Details

Members
Description
Required: Yes
Type: string

A description of the registry. If description is not provided, this field will not be updated.

RegistryId
Required: Yes
Type: RegistryId structure

This is a wrapper structure that may contain the registry name and Amazon Resource Name (ARN).

Result Syntax

[
    'RegistryArn' => '<string>',
    'RegistryName' => '<string>',
]

Result Details

Members
RegistryArn
Type: string

The Amazon Resource name (ARN) of the updated registry.

RegistryName
Type: string

The name of the updated registry.

Errors

InvalidInputException:

The input provided was not valid.

AccessDeniedException:

Access to a resource was denied.

EntityNotFoundException:

A specified entity does not exist

ConcurrentModificationException:

Two processes are trying to modify a resource simultaneously.

InternalServiceException:

An internal service error occurred.

UpdateSchema

$result = $client->updateSchema([/* ... */]);
$promise = $client->updateSchemaAsync([/* ... */]);

Updates the description, compatibility setting, or version checkpoint for a schema set.

For updating the compatibility setting, the call will not validate compatibility for the entire set of schema versions with the new compatibility setting. If the value for Compatibility is provided, the VersionNumber (a checkpoint) is also required. The API will validate the checkpoint version number for consistency.

If the value for the VersionNumber (checkpoint) is provided, Compatibility is optional and this can be used to set/reset a checkpoint for the schema.

This update will happen only if the schema is in the AVAILABLE state.

Parameter Syntax

$result = $client->updateSchema([
    'Compatibility' => 'NONE|DISABLED|BACKWARD|BACKWARD_ALL|FORWARD|FORWARD_ALL|FULL|FULL_ALL',
    'Description' => '<string>',
    'SchemaId' => [ // REQUIRED
        'RegistryName' => '<string>',
        'SchemaArn' => '<string>',
        'SchemaName' => '<string>',
    ],
    'SchemaVersionNumber' => [
        'LatestVersion' => true || false,
        'VersionNumber' => <integer>,
    ],
]);

Parameter Details

Members
Compatibility
Type: string

The new compatibility setting for the schema.

Description
Type: string

The new description for the schema.

SchemaId
Required: Yes
Type: SchemaId structure

This is a wrapper structure to contain schema identity fields. The structure contains:

  • SchemaId$SchemaArn: The Amazon Resource Name (ARN) of the schema. One of SchemaArn or SchemaName has to be provided.

  • SchemaId$SchemaName: The name of the schema. One of SchemaArn or SchemaName has to be provided.

SchemaVersionNumber
Type: SchemaVersionNumber structure

Version number required for check pointing. One of VersionNumber or Compatibility has to be provided.

Result Syntax

[
    'RegistryName' => '<string>',
    'SchemaArn' => '<string>',
    'SchemaName' => '<string>',
]

Result Details

Members
RegistryName
Type: string

The name of the registry that contains the schema.

SchemaArn
Type: string

The Amazon Resource Name (ARN) of the schema.

SchemaName
Type: string

The name of the schema.

Errors

InvalidInputException:

The input provided was not valid.

AccessDeniedException:

Access to a resource was denied.

EntityNotFoundException:

A specified entity does not exist

ConcurrentModificationException:

Two processes are trying to modify a resource simultaneously.

InternalServiceException:

An internal service error occurred.

UpdateSourceControlFromJob

$result = $client->updateSourceControlFromJob([/* ... */]);
$promise = $client->updateSourceControlFromJobAsync([/* ... */]);

Synchronizes a job to the source control repository. This operation takes the job artifacts from the Glue internal stores and makes a commit to the remote repository that is configured on the job.

This API supports optional parameters which take in the repository information.

Parameter Syntax

$result = $client->updateSourceControlFromJob([
    'AuthStrategy' => 'PERSONAL_ACCESS_TOKEN|AWS_SECRETS_MANAGER',
    'AuthToken' => '<string>',
    'BranchName' => '<string>',
    'CommitId' => '<string>',
    'Folder' => '<string>',
    'JobName' => '<string>',
    'Provider' => 'GITHUB|GITLAB|BITBUCKET|AWS_CODE_COMMIT',
    'RepositoryName' => '<string>',
    'RepositoryOwner' => '<string>',
]);

Parameter Details

Members
AuthStrategy
Type: string

The type of authentication, which can be an authentication token stored in Amazon Web Services Secrets Manager, or a personal access token.

AuthToken
Type: string

The value of the authorization token.

BranchName
Type: string

An optional branch in the remote repository.

CommitId
Type: string

A commit ID for a commit in the remote repository.

Folder
Type: string

An optional folder in the remote repository.

JobName
Type: string

The name of the Glue job to be synchronized to or from the remote repository.

Provider
Type: string

The provider for the remote repository. Possible values: GITHUB, AWS_CODE_COMMIT, GITLAB, BITBUCKET.

RepositoryName
Type: string

The name of the remote repository that contains the job artifacts. For BitBucket providers, RepositoryName should include WorkspaceName. Use the format <WorkspaceName>/<RepositoryName>.

RepositoryOwner
Type: string

The owner of the remote repository that contains the job artifacts.

Result Syntax

[
    'JobName' => '<string>',
]

Result Details

Members
JobName
Type: string

The name of the Glue job.

Errors

AccessDeniedException:

Access to a resource was denied.

AlreadyExistsException:

A resource to be created or added already exists.

InvalidInputException:

The input provided was not valid.

ValidationException:

A value could not be validated.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

UpdateTable

$result = $client->updateTable([/* ... */]);
$promise = $client->updateTableAsync([/* ... */]);

Updates a metadata table in the Data Catalog.

Parameter Syntax

$result = $client->updateTable([
    'CatalogId' => '<string>',
    'DatabaseName' => '<string>', // REQUIRED
    'SkipArchive' => true || false,
    'TableInput' => [ // REQUIRED
        'Description' => '<string>',
        'LastAccessTime' => <integer || string || DateTime>,
        'LastAnalyzedTime' => <integer || string || DateTime>,
        'Name' => '<string>', // REQUIRED
        'Owner' => '<string>',
        'Parameters' => ['<string>', ...],
        'PartitionKeys' => [
            [
                'Comment' => '<string>',
                'Name' => '<string>', // REQUIRED
                'Parameters' => ['<string>', ...],
                'Type' => '<string>',
            ],
            // ...
        ],
        'Retention' => <integer>,
        'StorageDescriptor' => [
            'AdditionalLocations' => ['<string>', ...],
            'BucketColumns' => ['<string>', ...],
            'Columns' => [
                [
                    'Comment' => '<string>',
                    'Name' => '<string>', // REQUIRED
                    'Parameters' => ['<string>', ...],
                    'Type' => '<string>',
                ],
                // ...
            ],
            'Compressed' => true || false,
            'InputFormat' => '<string>',
            'Location' => '<string>',
            'NumberOfBuckets' => <integer>,
            'OutputFormat' => '<string>',
            'Parameters' => ['<string>', ...],
            'SchemaReference' => [
                'SchemaId' => [
                    'RegistryName' => '<string>',
                    'SchemaArn' => '<string>',
                    'SchemaName' => '<string>',
                ],
                'SchemaVersionId' => '<string>',
                'SchemaVersionNumber' => <integer>,
            ],
            'SerdeInfo' => [
                'Name' => '<string>',
                'Parameters' => ['<string>', ...],
                'SerializationLibrary' => '<string>',
            ],
            'SkewedInfo' => [
                'SkewedColumnNames' => ['<string>', ...],
                'SkewedColumnValueLocationMaps' => ['<string>', ...],
                'SkewedColumnValues' => ['<string>', ...],
            ],
            'SortColumns' => [
                [
                    'Column' => '<string>', // REQUIRED
                    'SortOrder' => <integer>, // REQUIRED
                ],
                // ...
            ],
            'StoredAsSubDirectories' => true || false,
        ],
        'TableType' => '<string>',
        'TargetTable' => [
            'CatalogId' => '<string>',
            'DatabaseName' => '<string>',
            'Name' => '<string>',
            'Region' => '<string>',
        ],
        'ViewExpandedText' => '<string>',
        'ViewOriginalText' => '<string>',
    ],
    'TransactionId' => '<string>',
    'VersionId' => '<string>',
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog where the table resides. If none is provided, the Amazon Web Services account ID is used by default.

DatabaseName
Required: Yes
Type: string

The name of the catalog database in which the table resides. For Hive compatibility, this name is entirely lowercase.

SkipArchive
Type: boolean

By default, UpdateTable always creates an archived version of the table before updating it. However, if skipArchive is set to true, UpdateTable does not create the archived version.

TableInput
Required: Yes
Type: TableInput structure

An updated TableInput object to define the metadata table in the catalog.

TransactionId
Type: string

The transaction ID at which to update the table contents.

VersionId
Type: string

The version ID at which to update the table contents.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

ConcurrentModificationException:

Two processes are trying to modify a resource simultaneously.

ResourceNumberLimitExceededException:

A resource numerical limit was exceeded.

GlueEncryptionException:

An encryption operation failed.

ResourceNotReadyException:

A resource was not ready for a transaction.

UpdateTableOptimizer

$result = $client->updateTableOptimizer([/* ... */]);
$promise = $client->updateTableOptimizerAsync([/* ... */]);

Updates the configuration for an existing table optimizer.

Parameter Syntax

$result = $client->updateTableOptimizer([
    'CatalogId' => '<string>', // REQUIRED
    'DatabaseName' => '<string>', // REQUIRED
    'TableName' => '<string>', // REQUIRED
    'TableOptimizerConfiguration' => [ // REQUIRED
        'enabled' => true || false,
        'roleArn' => '<string>',
    ],
    'Type' => 'compaction', // REQUIRED
]);

Parameter Details

Members
CatalogId
Required: Yes
Type: string

The Catalog ID of the table.

DatabaseName
Required: Yes
Type: string

The name of the database in the catalog in which the table resides.

TableName
Required: Yes
Type: string

The name of the table.

TableOptimizerConfiguration
Required: Yes
Type: TableOptimizerConfiguration structure

A TableOptimizerConfiguration object representing the configuration of a table optimizer.

Type
Required: Yes
Type: string

The type of table optimizer. Currently, the only valid value is compaction.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

AccessDeniedException:

Access to a resource was denied.

InternalServiceException:

An internal service error occurred.

UpdateTrigger

$result = $client->updateTrigger([/* ... */]);
$promise = $client->updateTriggerAsync([/* ... */]);

Updates a trigger definition.

Parameter Syntax

$result = $client->updateTrigger([
    'Name' => '<string>', // REQUIRED
    'TriggerUpdate' => [ // REQUIRED
        'Actions' => [
            [
                'Arguments' => ['<string>', ...],
                'CrawlerName' => '<string>',
                'JobName' => '<string>',
                'NotificationProperty' => [
                    'NotifyDelayAfter' => <integer>,
                ],
                'SecurityConfiguration' => '<string>',
                'Timeout' => <integer>,
            ],
            // ...
        ],
        'Description' => '<string>',
        'EventBatchingCondition' => [
            'BatchSize' => <integer>, // REQUIRED
            'BatchWindow' => <integer>,
        ],
        'Name' => '<string>',
        'Predicate' => [
            'Conditions' => [
                [
                    'CrawlState' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR',
                    'CrawlerName' => '<string>',
                    'JobName' => '<string>',
                    'LogicalOperator' => 'EQUALS',
                    'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED',
                ],
                // ...
            ],
            'Logical' => 'AND|ANY',
        ],
        'Schedule' => '<string>',
    ],
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the trigger to update.

TriggerUpdate
Required: Yes
Type: TriggerUpdate structure

The new values with which to update the trigger.

Result Syntax

[
    'Trigger' => [
        'Actions' => [
            [
                'Arguments' => ['<string>', ...],
                'CrawlerName' => '<string>',
                'JobName' => '<string>',
                'NotificationProperty' => [
                    'NotifyDelayAfter' => <integer>,
                ],
                'SecurityConfiguration' => '<string>',
                'Timeout' => <integer>,
            ],
            // ...
        ],
        'Description' => '<string>',
        'EventBatchingCondition' => [
            'BatchSize' => <integer>,
            'BatchWindow' => <integer>,
        ],
        'Id' => '<string>',
        'Name' => '<string>',
        'Predicate' => [
            'Conditions' => [
                [
                    'CrawlState' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR',
                    'CrawlerName' => '<string>',
                    'JobName' => '<string>',
                    'LogicalOperator' => 'EQUALS',
                    'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED',
                ],
                // ...
            ],
            'Logical' => 'AND|ANY',
        ],
        'Schedule' => '<string>',
        'State' => 'CREATING|CREATED|ACTIVATING|ACTIVATED|DEACTIVATING|DEACTIVATED|DELETING|UPDATING',
        'Type' => 'SCHEDULED|CONDITIONAL|ON_DEMAND|EVENT',
        'WorkflowName' => '<string>',
    ],
]

Result Details

Members
Trigger
Type: Trigger structure

The resulting trigger definition.

Errors

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

EntityNotFoundException:

A specified entity does not exist

OperationTimeoutException:

The operation timed out.

ConcurrentModificationException:

Two processes are trying to modify a resource simultaneously.

UpdateUserDefinedFunction

$result = $client->updateUserDefinedFunction([/* ... */]);
$promise = $client->updateUserDefinedFunctionAsync([/* ... */]);

Updates an existing function definition in the Data Catalog.

Parameter Syntax

$result = $client->updateUserDefinedFunction([
    'CatalogId' => '<string>',
    'DatabaseName' => '<string>', // REQUIRED
    'FunctionInput' => [ // REQUIRED
        'ClassName' => '<string>',
        'FunctionName' => '<string>',
        'OwnerName' => '<string>',
        'OwnerType' => 'USER|ROLE|GROUP',
        'ResourceUris' => [
            [
                'ResourceType' => 'JAR|FILE|ARCHIVE',
                'Uri' => '<string>',
            ],
            // ...
        ],
    ],
    'FunctionName' => '<string>', // REQUIRED
]);

Parameter Details

Members
CatalogId
Type: string

The ID of the Data Catalog where the function to be updated is located. If none is provided, the Amazon Web Services account ID is used by default.

DatabaseName
Required: Yes
Type: string

The name of the catalog database where the function to be updated is located.

FunctionInput
Required: Yes
Type: UserDefinedFunctionInput structure

A FunctionInput object that redefines the function in the Data Catalog.

FunctionName
Required: Yes
Type: string

The name of the function.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

EntityNotFoundException:

A specified entity does not exist

InvalidInputException:

The input provided was not valid.

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

GlueEncryptionException:

An encryption operation failed.

UpdateWorkflow

$result = $client->updateWorkflow([/* ... */]);
$promise = $client->updateWorkflowAsync([/* ... */]);

Updates an existing workflow.

Parameter Syntax

$result = $client->updateWorkflow([
    'DefaultRunProperties' => ['<string>', ...],
    'Description' => '<string>',
    'MaxConcurrentRuns' => <integer>,
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
DefaultRunProperties
Type: Associative array of custom strings keys (IdString) to strings

A collection of properties to be used as part of each execution of the workflow.

Description
Type: string

The description of the workflow.

MaxConcurrentRuns
Type: int

You can use this parameter to prevent unwanted multiple updates to data, to control costs, or in some cases, to prevent exceeding the maximum number of concurrent runs of any of the component jobs. If you leave this parameter blank, there is no limit to the number of concurrent workflow runs.

Name
Required: Yes
Type: string

Name of the workflow to be updated.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Type: string

The name of the workflow which was specified in input.

Errors

InvalidInputException:

The input provided was not valid.

EntityNotFoundException:

A specified entity does not exist

InternalServiceException:

An internal service error occurred.

OperationTimeoutException:

The operation timed out.

ConcurrentModificationException:

Two processes are trying to modify a resource simultaneously.

Shapes

AccessDeniedException

Description

Access to a resource was denied.

Members
Message
Type: string

A message describing the problem.

Action

Description

Defines an action to be initiated by a trigger.

Members
Arguments
Type: Associative array of custom strings keys (GenericString) to strings

The job arguments used when this trigger fires. For this job run, they replace the default arguments set in the job definition itself.

You can specify arguments here that your own job-execution script consumes, as well as arguments that Glue itself consumes.

For information about how to specify and consume your own Job arguments, see the Calling Glue APIs in Python topic in the developer guide.

For information about the key-value pairs that Glue consumes to set up your job, see the Special Parameters Used by Glue topic in the developer guide.

CrawlerName
Type: string

The name of the crawler to be used with this action.

JobName
Type: string

The name of a job to be run.

NotificationProperty
Type: NotificationProperty structure

Specifies configuration properties of a job run notification.

SecurityConfiguration
Type: string

The name of the SecurityConfiguration structure to be used with this action.

Timeout
Type: int

The JobRun timeout in minutes. This is the maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours). This overrides the timeout value set in the parent job.

Aggregate

Description

Specifies a transform that groups rows by chosen fields and computes the aggregated value by specified function.

Members
Aggs
Required: Yes
Type: Array of AggregateOperation structures

Specifies the aggregate functions to be performed on specified fields.

Groups
Required: Yes
Type: Array of stringss

Specifies the fields to group by.

Inputs
Required: Yes
Type: Array of strings

Specifies the fields and rows to use as inputs for the aggregate transform.

Name
Required: Yes
Type: string

The name of the transform node.

AggregateOperation

Description

Specifies the set of parameters needed to perform aggregation in the aggregate transform.

Members
AggFunc
Required: Yes
Type: string

Specifies the aggregation function to apply.

Possible aggregation functions include: avg countDistinct, count, first, last, kurtosis, max, min, skewness, stddev_samp, stddev_pop, sum, sumDistinct, var_samp, var_pop

Column
Required: Yes
Type: Array of strings

Specifies the column on the data set on which the aggregation function will be applied.

AlreadyExistsException

Description

A resource to be created or added already exists.

Members
Message
Type: string

A message describing the problem.

AmazonRedshiftAdvancedOption

Description

Specifies an optional value when connecting to the Redshift cluster.

Members
Key
Type: string

The key for the additional connection option.

Value
Type: string

The value for the additional connection option.

AmazonRedshiftNodeData

Description

Specifies an Amazon Redshift node.

Members
AccessType
Type: string

The access type for the Redshift connection. Can be a direct connection or catalog connections.

Action
Type: string

Specifies how writing to a Redshift cluser will occur.

AdvancedOptions
Type: Array of AmazonRedshiftAdvancedOption structures

Optional values when connecting to the Redshift cluster.

CatalogDatabase
Type: Option structure

The name of the Glue Data Catalog database when working with a data catalog.

CatalogRedshiftSchema
Type: string

The Redshift schema name when working with a data catalog.

CatalogRedshiftTable
Type: string

The database table to read from.

CatalogTable
Type: Option structure

The Glue Data Catalog table name when working with a data catalog.

Connection
Type: Option structure

The Glue connection to the Redshift cluster.

CrawlerConnection
Type: string

Specifies the name of the connection that is associated with the catalog table used.

IamRole
Type: Option structure

Optional. The role name use when connection to S3. The IAM role ill default to the role on the job when left blank.

MergeAction
Type: string

The action used when to detemine how a MERGE in a Redshift sink will be handled.

MergeClause
Type: string

The SQL used in a custom merge to deal with matching records.

MergeWhenMatched
Type: string

The action used when to detemine how a MERGE in a Redshift sink will be handled when an existing record matches a new record.

MergeWhenNotMatched
Type: string

The action used when to detemine how a MERGE in a Redshift sink will be handled when an existing record doesn't match a new record.

PostAction
Type: string

The SQL used before a MERGE or APPEND with upsert is run.

PreAction
Type: string

The SQL used before a MERGE or APPEND with upsert is run.

SampleQuery
Type: string

The SQL used to fetch the data from a Redshift sources when the SourceType is 'query'.

Schema
Type: Option structure

The Redshift schema name when working with a direct connection.

SelectedColumns
Type: Array of Option structures

The list of column names used to determine a matching record when doing a MERGE or APPEND with upsert.

SourceType
Type: string

The source type to specify whether a specific table is the source or a custom query.

StagingTable
Type: string

The name of the temporary staging table that is used when doing a MERGE or APPEND with upsert.

Table
Type: Option structure

The Redshift table name when working with a direct connection.

TablePrefix
Type: string

Specifies the prefix to a table.

TableSchema
Type: Array of Option structures

The array of schema output for a given node.

TempDir
Type: string

The Amazon S3 path where temporary data can be staged when copying out of the database.

Upsert
Type: boolean

The action used on Redshift sinks when doing an APPEND.

AmazonRedshiftSource

Description

Specifies an Amazon Redshift source.

Members
Data
Type: AmazonRedshiftNodeData structure

Specifies the data of the Amazon Reshift source node.

Name
Type: string

The name of the Amazon Redshift source.

AmazonRedshiftTarget

Description

Specifies an Amazon Redshift target.

Members
Data
Type: AmazonRedshiftNodeData structure

Specifies the data of the Amazon Redshift target node.

Inputs
Type: Array of strings

The nodes that are inputs to the data target.

Name
Type: string

The name of the Amazon Redshift target.

ApplyMapping

Description

Specifies a transform that maps data property keys in the data source to data property keys in the data target. You can rename keys, modify the data types for keys, and choose which keys to drop from the dataset.

Members
Inputs
Required: Yes
Type: Array of strings

The data inputs identified by their node names.

Mapping
Required: Yes
Type: Array of Mapping structures

Specifies the mapping of data property keys in the data source to data property keys in the data target.

Name
Required: Yes
Type: string

The name of the transform node.

AthenaConnectorSource

Description

Specifies a connector to an Amazon Athena data source.

Members
ConnectionName
Required: Yes
Type: string

The name of the connection that is associated with the connector.

ConnectionTable
Type: string

The name of the table in the data source.

ConnectionType
Required: Yes
Type: string

The type of connection, such as marketplace.athena or custom.athena, designating a connection to an Amazon Athena data store.

ConnectorName
Required: Yes
Type: string

The name of a connector that assists with accessing the data store in Glue Studio.

Name
Required: Yes
Type: string

The name of the data source.

OutputSchemas
Type: Array of GlueSchema structures

Specifies the data schema for the custom Athena source.

SchemaName
Required: Yes
Type: string

The name of the Cloudwatch log group to read from. For example, /aws-glue/jobs/output.

AuditContext

Description

A structure containing the Lake Formation audit context.

Members
AdditionalAuditContext
Type: string

A string containing the additional audit context information.

AllColumnsRequested
Type: boolean

All columns request for audit.

RequestedColumns
Type: Array of strings

The requested columns for audit.

BackfillError

Description

A list of errors that can occur when registering partition indexes for an existing table.

These errors give the details about why an index registration failed and provide a limited number of partitions in the response, so that you can fix the partitions at fault and try registering the index again. The most common set of errors that can occur are categorized as follows:

  • EncryptedPartitionError: The partitions are encrypted.

  • InvalidPartitionTypeDataError: The partition value doesn't match the data type for that partition column.

  • MissingPartitionValueError: The partitions are encrypted.

  • UnsupportedPartitionCharacterError: Characters inside the partition value are not supported. For example: U+0000 , U+0001, U+0002.

  • InternalError: Any error which does not belong to other error codes.

Members
Code
Type: string

The error code for an error that occurred when registering partition indexes for an existing table.

Partitions
Type: Array of PartitionValueList structures

A list of a limited number of partitions in the response.

BasicCatalogTarget

Description

Specifies a target that uses a Glue Data Catalog table.

Members
Database
Required: Yes
Type: string

The database that contains the table you want to use as the target. This database must already exist in the Data Catalog.

Inputs
Required: Yes
Type: Array of strings

The nodes that are inputs to the data target.

Name
Required: Yes
Type: string

The name of your data target.

Table
Required: Yes
Type: string

The table that defines the schema of your output data. This table must already exist in the Data Catalog.

BatchGetTableOptimizerEntry

Description

Represents a table optimizer to retrieve in the BatchGetTableOptimizer operation.

Members
catalogId
Type: string

The Catalog ID of the table.

databaseName
Type: string

The name of the database in the catalog in which the table resides.

tableName
Type: string

The name of the table.

type
Type: string

The type of table optimizer.

BatchGetTableOptimizerError

Description

Contains details on one of the errors in the error list returned by the BatchGetTableOptimizer operation.

Members
catalogId
Type: string

The Catalog ID of the table.

databaseName
Type: string

The name of the database in the catalog in which the table resides.

error
Type: ErrorDetail structure

An ErrorDetail object containing code and message details about the error.

tableName
Type: string

The name of the table.

type
Type: string

The type of table optimizer.

BatchStopJobRunError

Description

Records an error that occurred when attempting to stop a specified job run.

Members
ErrorDetail
Type: ErrorDetail structure

Specifies details about the error that was encountered.

JobName
Type: string

The name of the job definition that is used in the job run in question.

JobRunId
Type: string

The JobRunId of the job run in question.

BatchStopJobRunSuccessfulSubmission

Description

Records a successful request to stop a specified JobRun.

Members
JobName
Type: string

The name of the job definition used in the job run that was stopped.

JobRunId
Type: string

The JobRunId of the job run that was stopped.

BatchTableOptimizer

Description

Contains details for one of the table optimizers returned by the BatchGetTableOptimizer operation.

Members
catalogId
Type: string

The Catalog ID of the table.

databaseName
Type: string

The name of the database in the catalog in which the table resides.

tableName
Type: string

The name of the table.

tableOptimizer
Type: TableOptimizer structure

A TableOptimizer object that contains details on the configuration and last run of a table optimzer.

BatchUpdatePartitionFailureEntry

Description

Contains information about a batch update partition error.

Members
ErrorDetail
Type: ErrorDetail structure

The details about the batch update partition error.

PartitionValueList
Type: Array of strings

A list of values defining the partitions.

BatchUpdatePartitionRequestEntry

Description

A structure that contains the values and structure used to update a partition.

Members
PartitionInput
Required: Yes
Type: PartitionInput structure

The structure used to update a partition.

PartitionValueList
Required: Yes
Type: Array of strings

A list of values defining the partitions.

BinaryColumnStatisticsData

Description

Defines column statistics supported for bit sequence data values.

Members
AverageLength
Required: Yes
Type: double

The average bit sequence length in the column.

MaximumLength
Required: Yes
Type: long (int|float)

The size of the longest bit sequence in the column.

NumberOfNulls
Required: Yes
Type: long (int|float)

The number of null values in the column.

Blueprint

Description

The details of a blueprint.

Members
BlueprintLocation
Type: string

Specifies the path in Amazon S3 where the blueprint is published.

BlueprintServiceLocation
Type: string

Specifies a path in Amazon S3 where the blueprint is copied when you call CreateBlueprint/UpdateBlueprint to register the blueprint in Glue.

CreatedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time the blueprint was registered.

Description
Type: string

The description of the blueprint.

ErrorMessage
Type: string

An error message.

LastActiveDefinition
Type: LastActiveDefinition structure

When there are multiple versions of a blueprint and the latest version has some errors, this attribute indicates the last successful blueprint definition that is available with the service.

LastModifiedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time the blueprint was last modified.

Name
Type: string

The name of the blueprint.

ParameterSpec
Type: string

A JSON string that indicates the list of parameter specifications for the blueprint.

Status
Type: string

The status of the blueprint registration.

  • Creating — The blueprint registration is in progress.

  • Active — The blueprint has been successfully registered.

  • Updating — An update to the blueprint registration is in progress.

  • Failed — The blueprint registration failed.

BlueprintDetails

Description

The details of a blueprint.

Members
BlueprintName
Type: string

The name of the blueprint.

RunId
Type: string

The run ID for this blueprint.

BlueprintRun

Description

The details of a blueprint run.

Members
BlueprintName
Type: string

The name of the blueprint.

CompletedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time that the blueprint run completed.

ErrorMessage
Type: string

Indicates any errors that are seen while running the blueprint.

Parameters
Type: string

The blueprint parameters as a string. You will have to provide a value for each key that is required from the parameter spec that is defined in the Blueprint$ParameterSpec.

RoleArn
Type: string

The role ARN. This role will be assumed by the Glue service and will be used to create the workflow and other entities of a workflow.

RollbackErrorMessage
Type: string

If there are any errors while creating the entities of a workflow, we try to roll back the created entities until that point and delete them. This attribute indicates the errors seen while trying to delete the entities that are created.

RunId
Type: string

The run ID for this blueprint run.

StartedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time that the blueprint run started.

State
Type: string

The state of the blueprint run. Possible values are:

  • Running — The blueprint run is in progress.

  • Succeeded — The blueprint run completed successfully.

  • Failed — The blueprint run failed and rollback is complete.

  • Rolling Back — The blueprint run failed and rollback is in progress.

WorkflowName
Type: string

The name of a workflow that is created as a result of a successful blueprint run. If a blueprint run has an error, there will not be a workflow created.

BooleanColumnStatisticsData

Description

Defines column statistics supported for Boolean data columns.

Members
NumberOfFalses
Required: Yes
Type: long (int|float)

The number of false values in the column.

NumberOfNulls
Required: Yes
Type: long (int|float)

The number of null values in the column.

NumberOfTrues
Required: Yes
Type: long (int|float)

The number of true values in the column.

CatalogDeltaSource

Description

Specifies a Delta Lake data source that is registered in the Glue Data Catalog.

Members
AdditionalDeltaOptions
Type: Associative array of custom strings keys (EnclosedInStringProperty) to strings

Specifies additional connection options.

Database
Required: Yes
Type: string

The name of the database to read from.

Name
Required: Yes
Type: string

The name of the Delta Lake data source.

OutputSchemas
Type: Array of GlueSchema structures

Specifies the data schema for the Delta Lake source.

Table
Required: Yes
Type: string

The name of the table in the database to read from.

CatalogEntry

Description

Specifies a table definition in the Glue Data Catalog.

Members
DatabaseName
Required: Yes
Type: string

The database in which the table metadata resides.

TableName
Required: Yes
Type: string

The name of the table in question.

CatalogHudiSource

Description

Specifies a Hudi data source that is registered in the Glue Data Catalog.

Members
AdditionalHudiOptions
Type: Associative array of custom strings keys (EnclosedInStringProperty) to strings

Specifies additional connection options.

Database
Required: Yes
Type: string

The name of the database to read from.

Name
Required: Yes
Type: string

The name of the Hudi data source.

OutputSchemas
Type: Array of GlueSchema structures

Specifies the data schema for the Hudi source.

Table
Required: Yes
Type: string

The name of the table in the database to read from.

CatalogImportStatus

Description

A structure containing migration status information.

Members
ImportCompleted
Type: boolean

True if the migration has completed, or False otherwise.

ImportTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The time that the migration was started.

ImportedBy
Type: string

The name of the person who initiated the migration.

CatalogKafkaSource

Description

Specifies an Apache Kafka data store in the Data Catalog.

Members
DataPreviewOptions
Type: StreamingDataPreviewOptions structure

Specifies options related to data preview for viewing a sample of your data.

Database
Required: Yes
Type: string

The name of the database to read from.

DetectSchema
Type: boolean

Whether to automatically determine the schema from the incoming data.

Name
Required: Yes
Type: string

The name of the data store.

StreamingOptions
Type: KafkaStreamingSourceOptions structure

Specifies the streaming options.

Table
Required: Yes
Type: string

The name of the table in the database to read from.

WindowSize
Type: int

The amount of time to spend processing each micro batch.

CatalogKinesisSource

Description

Specifies a Kinesis data source in the Glue Data Catalog.

Members
DataPreviewOptions
Type: StreamingDataPreviewOptions structure

Additional options for data preview.

Database
Required: Yes
Type: string

The name of the database to read from.

DetectSchema
Type: boolean

Whether to automatically determine the schema from the incoming data.

Name
Required: Yes
Type: string

The name of the data source.

StreamingOptions

Additional options for the Kinesis streaming data source.

Table
Required: Yes
Type: string

The name of the table in the database to read from.

WindowSize
Type: int

The amount of time to spend processing each micro batch.

CatalogSchemaChangePolicy

Description

A policy that specifies update behavior for the crawler.

Members
EnableUpdateCatalog
Type: boolean

Whether to use the specified update behavior when the crawler finds a changed schema.

UpdateBehavior
Type: string

The update behavior when the crawler finds a changed schema.

CatalogSource

Description

Specifies a data store in the Glue Data Catalog.

Members
Database
Required: Yes
Type: string

The name of the database to read from.

Name
Required: Yes
Type: string

The name of the data store.

Table
Required: Yes
Type: string

The name of the table in the database to read from.

CatalogTarget

Description

Specifies an Glue Data Catalog target.

Members
ConnectionName
Type: string

The name of the connection for an Amazon S3-backed Data Catalog table to be a target of the crawl when using a Catalog connection type paired with a NETWORK Connection type.

DatabaseName
Required: Yes
Type: string

The name of the database to be synchronized.

DlqEventQueueArn
Type: string

A valid Amazon dead-letter SQS ARN. For example, arn:aws:sqs:region:account:deadLetterQueue.

EventQueueArn
Type: string

A valid Amazon SQS ARN. For example, arn:aws:sqs:region:account:sqs.

Tables
Required: Yes
Type: Array of strings

A list of the tables to be synchronized.

Classifier

Description

Classifiers are triggered during a crawl task. A classifier checks whether a given file is in a format it can handle. If it is, the classifier creates a schema in the form of a StructType object that matches that data format.

You can use the standard classifiers that Glue provides, or you can write your own classifiers to best categorize your data sources and specify the appropriate schemas to use for them. A classifier can be a grok classifier, an XML classifier, a JSON classifier, or a custom CSV classifier, as specified in one of the fields in the Classifier object.

Members
CsvClassifier
Type: CsvClassifier structure

A classifier for comma-separated values (CSV).

GrokClassifier
Type: GrokClassifier structure

A classifier that uses grok.

JsonClassifier
Type: JsonClassifier structure

A classifier for JSON content.

XMLClassifier
Type: XMLClassifier structure

A classifier for XML content.

CloudWatchEncryption

Description

Specifies how Amazon CloudWatch data should be encrypted.

Members
CloudWatchEncryptionMode
Type: string

The encryption mode to use for CloudWatch data.

KmsKeyArn
Type: string

The Amazon Resource Name (ARN) of the KMS key to be used to encrypt the data.

CodeGenConfigurationNode

Description

CodeGenConfigurationNode enumerates all valid Node types. One and only one of its member variables can be populated.

Members
Aggregate
Type: Aggregate structure

Specifies a transform that groups rows by chosen fields and computes the aggregated value by specified function.

AmazonRedshiftSource
Type: AmazonRedshiftSource structure

Specifies a target that writes to a data source in Amazon Redshift.

AmazonRedshiftTarget
Type: AmazonRedshiftTarget structure

Specifies a target that writes to a data target in Amazon Redshift.

ApplyMapping
Type: ApplyMapping structure

Specifies a transform that maps data property keys in the data source to data property keys in the data target. You can rename keys, modify the data types for keys, and choose which keys to drop from the dataset.

AthenaConnectorSource
Type: AthenaConnectorSource structure

Specifies a connector to an Amazon Athena data source.

CatalogDeltaSource
Type: CatalogDeltaSource structure

Specifies a Delta Lake data source that is registered in the Glue Data Catalog.

CatalogHudiSource
Type: CatalogHudiSource structure

Specifies a Hudi data source that is registered in the Glue Data Catalog.

CatalogKafkaSource
Type: CatalogKafkaSource structure

Specifies an Apache Kafka data store in the Data Catalog.

CatalogKinesisSource
Type: CatalogKinesisSource structure

Specifies a Kinesis data source in the Glue Data Catalog.

CatalogSource
Type: CatalogSource structure

Specifies a data store in the Glue Data Catalog.

CatalogTarget
Type: BasicCatalogTarget structure

Specifies a target that uses a Glue Data Catalog table.

ConnectorDataSource
Type: ConnectorDataSource structure

Specifies a source generated with standard connection options.

ConnectorDataTarget
Type: ConnectorDataTarget structure

Specifies a target generated with standard connection options.

CustomCode
Type: CustomCode structure

Specifies a transform that uses custom code you provide to perform the data transformation. The output is a collection of DynamicFrames.

DirectJDBCSource
Type: DirectJDBCSource structure

Specifies the direct JDBC source connection.

DirectKafkaSource
Type: DirectKafkaSource structure

Specifies an Apache Kafka data store.

DirectKinesisSource
Type: DirectKinesisSource structure

Specifies a direct Amazon Kinesis data source.

DropDuplicates
Type: DropDuplicates structure

Specifies a transform that removes rows of repeating data from a data set.

DropFields
Type: DropFields structure

Specifies a transform that chooses the data property keys that you want to drop.

DropNullFields
Type: DropNullFields structure

Specifies a transform that removes columns from the dataset if all values in the column are 'null'. By default, Glue Studio will recognize null objects, but some values such as empty strings, strings that are "null", -1 integers or other placeholders such as zeros, are not automatically recognized as nulls.

DynamicTransform
Type: DynamicTransform structure

Specifies a custom visual transform created by a user.

DynamoDBCatalogSource
Type: DynamoDBCatalogSource structure

Specifies a DynamoDBC Catalog data store in the Glue Data Catalog.

EvaluateDataQuality
Type: EvaluateDataQuality structure

Specifies your data quality evaluation criteria.

EvaluateDataQualityMultiFrame

Specifies your data quality evaluation criteria. Allows multiple input data and returns a collection of Dynamic Frames.

FillMissingValues
Type: FillMissingValues structure

Specifies a transform that locates records in the dataset that have missing values and adds a new field with a value determined by imputation. The input data set is used to train the machine learning model that determines what the missing value should be.

Filter
Type: Filter structure

Specifies a transform that splits a dataset into two, based on a filter condition.

GovernedCatalogSource
Type: GovernedCatalogSource structure

Specifies a data source in a goverened Data Catalog.

GovernedCatalogTarget
Type: GovernedCatalogTarget structure

Specifies a data target that writes to a goverened catalog.

JDBCConnectorSource
Type: JDBCConnectorSource structure

Specifies a connector to a JDBC data source.

JDBCConnectorTarget
Type: JDBCConnectorTarget structure

Specifies a data target that writes to Amazon S3 in Apache Parquet columnar storage.

Join
Type: Join structure

Specifies a transform that joins two datasets into one dataset using a comparison phrase on the specified data property keys. You can use inner, outer, left, right, left semi, and left anti joins.

Merge
Type: Merge structure

Specifies a transform that merges a DynamicFrame with a staging DynamicFrame based on the specified primary keys to identify records. Duplicate records (records with the same primary keys) are not de-duplicated.

MicrosoftSQLServerCatalogSource

Specifies a Microsoft SQL server data source in the Glue Data Catalog.

MicrosoftSQLServerCatalogTarget

Specifies a target that uses Microsoft SQL.

MySQLCatalogSource
Type: MySQLCatalogSource structure

Specifies a MySQL data source in the Glue Data Catalog.

MySQLCatalogTarget
Type: MySQLCatalogTarget structure

Specifies a target that uses MySQL.

OracleSQLCatalogSource
Type: OracleSQLCatalogSource structure

Specifies an Oracle data source in the Glue Data Catalog.

OracleSQLCatalogTarget
Type: OracleSQLCatalogTarget structure

Specifies a target that uses Oracle SQL.

PIIDetection
Type: PIIDetection structure

Specifies a transform that identifies, removes or masks PII data.

PostgreSQLCatalogSource
Type: PostgreSQLCatalogSource structure

Specifies a PostgresSQL data source in the Glue Data Catalog.

PostgreSQLCatalogTarget
Type: PostgreSQLCatalogTarget structure

Specifies a target that uses Postgres SQL.

Recipe
Type: Recipe structure

Specifies a Glue DataBrew recipe node.

RedshiftSource
Type: RedshiftSource structure

Specifies an Amazon Redshift data store.

RedshiftTarget
Type: RedshiftTarget structure

Specifies a target that uses Amazon Redshift.

RelationalCatalogSource
Type: RelationalCatalogSource structure

Specifies a relational catalog data store in the Glue Data Catalog.

RenameField
Type: RenameField structure

Specifies a transform that renames a single data property key.

S3CatalogDeltaSource
Type: S3CatalogDeltaSource structure

Specifies a Delta Lake data source that is registered in the Glue Data Catalog. The data source must be stored in Amazon S3.

S3CatalogHudiSource
Type: S3CatalogHudiSource structure

Specifies a Hudi data source that is registered in the Glue Data Catalog. The data source must be stored in Amazon S3.

S3CatalogSource
Type: S3CatalogSource structure

Specifies an Amazon S3 data store in the Glue Data Catalog.

S3CatalogTarget
Type: S3CatalogTarget structure

Specifies a data target that writes to Amazon S3 using the Glue Data Catalog.

S3CsvSource
Type: S3CsvSource structure

Specifies a command-separated value (CSV) data store stored in Amazon S3.

S3DeltaCatalogTarget
Type: S3DeltaCatalogTarget structure

Specifies a target that writes to a Delta Lake data source in the Glue Data Catalog.

S3DeltaDirectTarget
Type: S3DeltaDirectTarget structure

Specifies a target that writes to a Delta Lake data source in Amazon S3.

S3DeltaSource
Type: S3DeltaSource structure

Specifies a Delta Lake data source stored in Amazon S3.

S3DirectTarget
Type: S3DirectTarget structure

Specifies a data target that writes to Amazon S3.

S3GlueParquetTarget
Type: S3GlueParquetTarget structure

Specifies a data target that writes to Amazon S3 in Apache Parquet columnar storage.

S3HudiCatalogTarget
Type: S3HudiCatalogTarget structure

Specifies a target that writes to a Hudi data source in the Glue Data Catalog.

S3HudiDirectTarget
Type: S3HudiDirectTarget structure

Specifies a target that writes to a Hudi data source in Amazon S3.

S3HudiSource
Type: S3HudiSource structure

Specifies a Hudi data source stored in Amazon S3.

S3JsonSource
Type: S3JsonSource structure

Specifies a JSON data store stored in Amazon S3.

S3ParquetSource
Type: S3ParquetSource structure

Specifies an Apache Parquet data store stored in Amazon S3.

SelectFields
Type: SelectFields structure

Specifies a transform that chooses the data property keys that you want to keep.

SelectFromCollection
Type: SelectFromCollection structure

Specifies a transform that chooses one DynamicFrame from a collection of DynamicFrames. The output is the selected DynamicFrame

SnowflakeSource
Type: SnowflakeSource structure

Specifies a Snowflake data source.

SnowflakeTarget
Type: SnowflakeTarget structure

Specifies a target that writes to a Snowflake data source.

SparkConnectorSource
Type: SparkConnectorSource structure

Specifies a connector to an Apache Spark data source.

SparkConnectorTarget
Type: SparkConnectorTarget structure

Specifies a target that uses an Apache Spark connector.

SparkSQL
Type: SparkSQL structure

Specifies a transform where you enter a SQL query using Spark SQL syntax to transform the data. The output is a single DynamicFrame.

Spigot
Type: Spigot structure

Specifies a transform that writes samples of the data to an Amazon S3 bucket.

SplitFields
Type: SplitFields structure

Specifies a transform that splits data property keys into two DynamicFrames. The output is a collection of DynamicFrames: one with selected data property keys, and one with the remaining data property keys.

Union
Type: Union structure

Specifies a transform that combines the rows from two or more datasets into a single result.

CodeGenEdge

Description

Represents a directional edge in a directed acyclic graph (DAG).

Members
Source
Required: Yes
Type: string

The ID of the node at which the edge starts.

Target
Required: Yes
Type: string

The ID of the node at which the edge ends.

TargetParameter
Type: string

The target of the edge.

CodeGenNode

Description

Represents a node in a directed acyclic graph (DAG)

Members
Args
Required: Yes
Type: Array of CodeGenNodeArg structures

Properties of the node, in the form of name-value pairs.

Id
Required: Yes
Type: string

A node identifier that is unique within the node's graph.

LineNumber
Type: int

The line number of the node.

NodeType
Required: Yes
Type: string

The type of node that this is.

CodeGenNodeArg

Description

An argument or property of a node.

Members
Name
Required: Yes
Type: string

The name of the argument or property.

Param
Type: boolean

True if the value is used as a parameter.

Value
Required: Yes
Type: string

The value of the argument or property.

Column

Description

A column in a Table.

Members
Comment
Type: string

A free-form text comment.

Name
Required: Yes
Type: string

The name of the Column.

Parameters
Type: Associative array of custom strings keys (KeyString) to strings

These key-value pairs define properties associated with the column.

Type
Type: string

The data type of the Column.

ColumnError

Description

Encapsulates a column name that failed and the reason for failure.

Members
ColumnName
Type: string

The name of the column that failed.

Error
Type: ErrorDetail structure

An error message with the reason for the failure of an operation.

ColumnImportance

Description

A structure containing the column name and column importance score for a column.

Column importance helps you understand how columns contribute to your model, by identifying which columns in your records are more important than others.

Members
ColumnName
Type: string

The name of a column.

Importance
Type: double

The column importance score for the column, as a decimal.

ColumnRowFilter

Description

A filter that uses both column-level and row-level filtering.

Members
ColumnName
Type: string

A string containing the name of the column.

RowFilterExpression
Type: string

A string containing the row-level filter expression.

ColumnStatistics

Description

Represents the generated column-level statistics for a table or partition.

Members
AnalyzedTime
Required: Yes
Type: timestamp (string|DateTime or anything parsable by strtotime)

The timestamp of when column statistics were generated.

ColumnName
Required: Yes
Type: string

Name of column which statistics belong to.

ColumnType
Required: Yes
Type: string

The data type of the column.

StatisticsData
Required: Yes
Type: ColumnStatisticsData structure

A ColumnStatisticData object that contains the statistics data values.

ColumnStatisticsData

Description

Contains the individual types of column statistics data. Only one data object should be set and indicated by the Type attribute.

Members
BinaryColumnStatisticsData
Type: BinaryColumnStatisticsData structure

Binary column statistics data.

BooleanColumnStatisticsData
Type: BooleanColumnStatisticsData structure

Boolean column statistics data.

DateColumnStatisticsData
Type: DateColumnStatisticsData structure

Date column statistics data.

DecimalColumnStatisticsData
Type: DecimalColumnStatisticsData structure

Decimal column statistics data. UnscaledValues within are Base64-encoded binary objects storing big-endian, two's complement representations of the decimal's unscaled value.

DoubleColumnStatisticsData
Type: DoubleColumnStatisticsData structure

Double column statistics data.

LongColumnStatisticsData
Type: LongColumnStatisticsData structure

Long column statistics data.

StringColumnStatisticsData
Type: StringColumnStatisticsData structure

String column statistics data.

Type
Required: Yes
Type: string

The type of column statistics data.

ColumnStatisticsError

Description

Encapsulates a ColumnStatistics object that failed and the reason for failure.

Members
ColumnStatistics
Type: ColumnStatistics structure

The ColumnStatistics of the column.

Error
Type: ErrorDetail structure

An error message with the reason for the failure of an operation.

ColumnStatisticsTaskNotRunningException

Description

An exception thrown when you try to stop a task run when there is no task running.

Members
Message
Type: string

A message describing the problem.

ColumnStatisticsTaskRun

Description

The object that shows the details of the column stats run.

Members
CatalogID
Type: string

The ID of the Data Catalog where the table resides. If none is supplied, the Amazon Web Services account ID is used by default.

ColumnNameList
Type: Array of strings

A list of the column names. If none is supplied, all column names for the table will be used by default.

ColumnStatisticsTaskRunId
Type: string

The identifier for the particular column statistics task run.

CreationTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The time that this task was created.

CustomerId
Type: string

The Amazon Web Services account ID.

DPUSeconds
Type: double

The calculated DPU usage in seconds for all autoscaled workers.

DatabaseName
Type: string

The database where the table resides.

EndTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The end time of the task.

ErrorMessage
Type: string

The error message for the job.

LastUpdated
Type: timestamp (string|DateTime or anything parsable by strtotime)

The last point in time when this task was modified.

NumberOfWorkers
Type: int

The number of workers used to generate column statistics. The job is preconfigured to autoscale up to 25 instances.

Role
Type: string

The IAM role that the service assumes to generate statistics.

SampleSize
Type: double

The percentage of rows used to generate statistics. If none is supplied, the entire table will be used to generate stats.

SecurityConfiguration
Type: string

Name of the security configuration that is used to encrypt CloudWatch logs for the column stats task run.

StartTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The start time of the task.

Status
Type: string

The status of the task run.

TableName
Type: string

The name of the table for which column statistics is generated.

WorkerType
Type: string

The type of workers being used for generating stats. The default is g.1x.

ColumnStatisticsTaskRunningException

Description

An exception thrown when you try to start another job while running a column stats generation job.

Members
Message
Type: string

A message describing the problem.

ColumnStatisticsTaskStoppingException

Description

An exception thrown when you try to stop a task run.

Members
Message
Type: string

A message describing the problem.

ConcurrentModificationException

Description

Two processes are trying to modify a resource simultaneously.

Members
Message
Type: string

A message describing the problem.

ConcurrentRunsExceededException

Description

Too many jobs are being run concurrently.

Members
Message
Type: string

A message describing the problem.

Condition

Description

Defines a condition under which a trigger fires.

Members
CrawlState
Type: string

The state of the crawler to which this condition applies.

CrawlerName
Type: string

The name of the crawler to which this condition applies.

JobName
Type: string

The name of the job whose JobRuns this condition applies to, and on which this trigger waits.

LogicalOperator
Type: string

A logical operator.

State
Type: string

The condition state. Currently, the only job states that a trigger can listen for are SUCCEEDED, STOPPED, FAILED, and TIMEOUT. The only crawler states that a trigger can listen for are SUCCEEDED, FAILED, and CANCELLED.

ConditionCheckFailureException

Description

A specified condition was not satisfied.

Members
Message
Type: string

A message describing the problem.

ConflictException

Description

The CreatePartitions API was called on a table that has indexes enabled.

Members
Message
Type: string

A message describing the problem.

ConfusionMatrix

Description

The confusion matrix shows you what your transform is predicting accurately and what types of errors it is making.

For more information, see Confusion matrix in Wikipedia.

Members
NumFalseNegatives
Type: long (int|float)

The number of matches in the data that the transform didn't find, in the confusion matrix for your transform.

NumFalsePositives
Type: long (int|float)

The number of nonmatches in the data that the transform incorrectly classified as a match, in the confusion matrix for your transform.

NumTrueNegatives
Type: long (int|float)

The number of nonmatches in the data that the transform correctly rejected, in the confusion matrix for your transform.

NumTruePositives
Type: long (int|float)

The number of matches in the data that the transform correctly found, in the confusion matrix for your transform.

Connection

Description

Defines a connection to a data source.

Members
ConnectionProperties
Type: Associative array of custom strings keys (ConnectionPropertyKey) to strings

These key-value pairs define parameters for the connection:

  • HOST - The host URI: either the fully qualified domain name (FQDN) or the IPv4 address of the database host.

  • PORT - The port number, between 1024 and 65535, of the port on which the database host is listening for database connections.

  • USER_NAME - The name under which to log in to the database. The value string for USER_NAME is "USERNAME".

  • PASSWORD - A password, if one is used, for the user name.

  • ENCRYPTED_PASSWORD - When you enable connection password protection by setting ConnectionPasswordEncryption in the Data Catalog encryption settings, this field stores the encrypted password.

  • JDBC_DRIVER_JAR_URI - The Amazon Simple Storage Service (Amazon S3) path of the JAR file that contains the JDBC driver to use.

  • JDBC_DRIVER_CLASS_NAME - The class name of the JDBC driver to use.

  • JDBC_ENGINE - The name of the JDBC engine to use.

  • JDBC_ENGINE_VERSION - The version of the JDBC engine to use.

  • CONFIG_FILES - (Reserved for future use.)

  • INSTANCE_ID - The instance ID to use.

  • JDBC_CONNECTION_URL - The URL for connecting to a JDBC data source.

  • JDBC_ENFORCE_SSL - A Boolean string (true, false) specifying whether Secure Sockets Layer (SSL) with hostname matching is enforced for the JDBC connection on the client. The default is false.

  • CUSTOM_JDBC_CERT - An Amazon S3 location specifying the customer's root certificate. Glue uses this root certificate to validate the customer’s certificate when connecting to the customer database. Glue only handles X.509 certificates. The certificate provided must be DER-encoded and supplied in Base64 encoding PEM format.

  • SKIP_CUSTOM_JDBC_CERT_VALIDATION - By default, this is false. Glue validates the Signature algorithm and Subject Public Key Algorithm for the customer certificate. The only permitted algorithms for the Signature algorithm are SHA256withRSA, SHA384withRSA or SHA512withRSA. For the Subject Public Key Algorithm, the key length must be at least 2048. You can set the value of this property to true to skip Glue’s validation of the customer certificate.

  • CUSTOM_JDBC_CERT_STRING - A custom JDBC certificate string which is used for domain match or distinguished name match to prevent a man-in-the-middle attack. In Oracle database, this is used as the SSL_SERVER_CERT_DN; in Microsoft SQL Server, this is used as the hostNameInCertificate.

  • CONNECTION_URL - The URL for connecting to a general (non-JDBC) data source.

  • SECRET_ID - The secret ID used for the secret manager of credentials.

  • CONNECTOR_URL - The connector URL for a MARKETPLACE or CUSTOM connection.

  • CONNECTOR_TYPE - The connector type for a MARKETPLACE or CUSTOM connection.

  • CONNECTOR_CLASS_NAME - The connector class name for a MARKETPLACE or CUSTOM connection.

  • KAFKA_BOOTSTRAP_SERVERS - A comma-separated list of host and port pairs that are the addresses of the Apache Kafka brokers in a Kafka cluster to which a Kafka client will connect to and bootstrap itself.

  • KAFKA_SSL_ENABLED - Whether to enable or disable SSL on an Apache Kafka connection. Default value is "true".

  • KAFKA_CUSTOM_CERT - The Amazon S3 URL for the private CA cert file (.pem format). The default is an empty string.

  • KAFKA_SKIP_CUSTOM_CERT_VALIDATION - Whether to skip the validation of the CA cert file or not. Glue validates for three algorithms: SHA256withRSA, SHA384withRSA and SHA512withRSA. Default value is "false".

  • KAFKA_CLIENT_KEYSTORE - The Amazon S3 location of the client keystore file for Kafka client side authentication (Optional).

  • KAFKA_CLIENT_KEYSTORE_PASSWORD - The password to access the provided keystore (Optional).

  • KAFKA_CLIENT_KEY_PASSWORD - A keystore can consist of multiple keys, so this is the password to access the client key to be used with the Kafka server side key (Optional).

  • ENCRYPTED_KAFKA_CLIENT_KEYSTORE_PASSWORD - The encrypted version of the Kafka client keystore password (if the user has the Glue encrypt passwords setting selected).

  • ENCRYPTED_KAFKA_CLIENT_KEY_PASSWORD - The encrypted version of the Kafka client key password (if the user has the Glue encrypt passwords setting selected).

  • KAFKA_SASL_MECHANISM - "SCRAM-SHA-512", "GSSAPI", "AWS_MSK_IAM", or "PLAIN". These are the supported SASL Mechanisms.

  • KAFKA_SASL_PLAIN_USERNAME - A plaintext username used to authenticate with the "PLAIN" mechanism.

  • KAFKA_SASL_PLAIN_PASSWORD - A plaintext password used to authenticate with the "PLAIN" mechanism.

  • ENCRYPTED_KAFKA_SASL_PLAIN_PASSWORD - The encrypted version of the Kafka SASL PLAIN password (if the user has the Glue encrypt passwords setting selected).

  • KAFKA_SASL_SCRAM_USERNAME - A plaintext username used to authenticate with the "SCRAM-SHA-512" mechanism.

  • KAFKA_SASL_SCRAM_PASSWORD - A plaintext password used to authenticate with the "SCRAM-SHA-512" mechanism.

  • ENCRYPTED_KAFKA_SASL_SCRAM_PASSWORD - The encrypted version of the Kafka SASL SCRAM password (if the user has the Glue encrypt passwords setting selected).

  • KAFKA_SASL_SCRAM_SECRETS_ARN - The Amazon Resource Name of a secret in Amazon Web Services Secrets Manager.

  • KAFKA_SASL_GSSAPI_KEYTAB - The S3 location of a Kerberos keytab file. A keytab stores long-term keys for one or more principals. For more information, see MIT Kerberos Documentation: Keytab.

  • KAFKA_SASL_GSSAPI_KRB5_CONF - The S3 location of a Kerberos krb5.conf file. A krb5.conf stores Kerberos configuration information, such as the location of the KDC server. For more information, see MIT Kerberos Documentation: krb5.conf.

  • KAFKA_SASL_GSSAPI_SERVICE - The Kerberos service name, as set with sasl.kerberos.service.name in your Kafka Configuration.

  • KAFKA_SASL_GSSAPI_PRINCIPAL - The name of the Kerberos princial used by Glue. For more information, see Kafka Documentation: Configuring Kafka Brokers.

ConnectionType
Type: string

The type of the connection. Currently, SFTP is not supported.

CreationTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The time that this connection definition was created.

Description
Type: string

The description of the connection.

LastUpdatedBy
Type: string

The user, group, or role that last updated this connection definition.

LastUpdatedTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The last time that this connection definition was updated.

MatchCriteria
Type: Array of strings

A list of criteria that can be used in selecting this connection.

Name
Type: string

The name of the connection definition.

PhysicalConnectionRequirements

A map of physical connection requirements, such as virtual private cloud (VPC) and SecurityGroup, that are needed to make this connection successfully.

ConnectionInput

Description

A structure that is used to specify a connection to create or update.

Members
ConnectionProperties
Required: Yes
Type: Associative array of custom strings keys (ConnectionPropertyKey) to strings

These key-value pairs define parameters for the connection.

ConnectionType
Required: Yes
Type: string

The type of the connection. Currently, these types are supported:

  • JDBC - Designates a connection to a database through Java Database Connectivity (JDBC).

    JDBC Connections use the following ConnectionParameters.

    • Required: All of (HOST, PORT, JDBC_ENGINE) or JDBC_CONNECTION_URL.

    • Required: All of (USERNAME, PASSWORD) or SECRET_ID.

    • Optional: JDBC_ENFORCE_SSL, CUSTOM_JDBC_CERT, CUSTOM_JDBC_CERT_STRING, SKIP_CUSTOM_JDBC_CERT_VALIDATION. These parameters are used to configure SSL with JDBC.

  • KAFKA - Designates a connection to an Apache Kafka streaming platform.

    KAFKA Connections use the following ConnectionParameters.

    • Required: KAFKA_BOOTSTRAP_SERVERS.

    • Optional: KAFKA_SSL_ENABLED, KAFKA_CUSTOM_CERT, KAFKA_SKIP_CUSTOM_CERT_VALIDATION. These parameters are used to configure SSL with KAFKA.

    • Optional: KAFKA_CLIENT_KEYSTORE, KAFKA_CLIENT_KEYSTORE_PASSWORD, KAFKA_CLIENT_KEY_PASSWORD, ENCRYPTED_KAFKA_CLIENT_KEYSTORE_PASSWORD, ENCRYPTED_KAFKA_CLIENT_KEY_PASSWORD. These parameters are used to configure TLS client configuration with SSL in KAFKA.

    • Optional: KAFKA_SASL_MECHANISM. Can be specified as SCRAM-SHA-512, GSSAPI, or AWS_MSK_IAM.

    • Optional: KAFKA_SASL_SCRAM_USERNAME, KAFKA_SASL_SCRAM_PASSWORD, ENCRYPTED_KAFKA_SASL_SCRAM_PASSWORD. These parameters are used to configure SASL/SCRAM-SHA-512 authentication with KAFKA.

    • Optional: KAFKA_SASL_GSSAPI_KEYTAB, KAFKA_SASL_GSSAPI_KRB5_CONF, KAFKA_SASL_GSSAPI_SERVICE, KAFKA_SASL_GSSAPI_PRINCIPAL. These parameters are used to configure SASL/GSSAPI authentication with KAFKA.

  • MONGODB - Designates a connection to a MongoDB document database.

    MONGODB Connections use the following ConnectionParameters.

    • Required: CONNECTION_URL.

    • Required: All of (USERNAME, PASSWORD) or SECRET_ID.

  • NETWORK - Designates a network connection to a data source within an Amazon Virtual Private Cloud environment (Amazon VPC).

    NETWORK Connections do not require ConnectionParameters. Instead, provide a PhysicalConnectionRequirements.

  • MARKETPLACE - Uses configuration settings contained in a connector purchased from Amazon Web Services Marketplace to read from and write to data stores that are not natively supported by Glue.

    MARKETPLACE Connections use the following ConnectionParameters.

    • Required: CONNECTOR_TYPE, CONNECTOR_URL, CONNECTOR_CLASS_NAME, CONNECTION_URL.

    • Required for JDBC CONNECTOR_TYPE connections: All of (USERNAME, PASSWORD) or SECRET_ID.

  • CUSTOM - Uses configuration settings contained in a custom connector to read from and write to data stores that are not natively supported by Glue.

SFTP is not supported.

For more information about how optional ConnectionProperties are used to configure features in Glue, consult Glue connection properties.

For more information about how optional ConnectionProperties are used to configure features in Glue Studio, consult Using connectors and connections.

Description
Type: string

The description of the connection.

MatchCriteria
Type: Array of strings

A list of criteria that can be used in selecting this connection.

Name
Required: Yes
Type: string

The name of the connection. Connection will not function as expected without a name.

PhysicalConnectionRequirements

A map of physical connection requirements, such as virtual private cloud (VPC) and SecurityGroup, that are needed to successfully make this connection.

ConnectionPasswordEncryption

Description

The data structure used by the Data Catalog to encrypt the password as part of CreateConnection or UpdateConnection and store it in the ENCRYPTED_PASSWORD field in the connection properties. You can enable catalog encryption or only password encryption.

When a CreationConnection request arrives containing a password, the Data Catalog first encrypts the password using your KMS key. It then encrypts the whole connection object again if catalog encryption is also enabled.

This encryption requires that you set KMS key permissions to enable or restrict access on the password key according to your security requirements. For example, you might want only administrators to have decrypt permission on the password key.

Members
AwsKmsKeyId
Type: string

An KMS key that is used to encrypt the connection password.

If connection password protection is enabled, the caller of CreateConnection and UpdateConnection needs at least kms:Encrypt permission on the specified KMS key, to encrypt passwords before storing them in the Data Catalog.

You can set the decrypt permission to enable or restrict access on the password key according to your security requirements.

ReturnConnectionPasswordEncrypted
Required: Yes
Type: boolean

When the ReturnConnectionPasswordEncrypted flag is set to "true", passwords remain encrypted in the responses of GetConnection and GetConnections. This encryption takes effect independently from catalog encryption.

ConnectionsList

Description

Specifies the connections used by a job.

Members
Connections
Type: Array of strings

A list of connections used by the job.

ConnectorDataSource

Description

Specifies a source generated with standard connection options.

Members
ConnectionType
Required: Yes
Type: string

The connectionType, as provided to the underlying Glue library. This node type supports the following connection types:

  • opensearch

  • azuresql

  • azurecosmos

  • bigquery

  • saphana

  • teradata

  • vertica

Data
Required: Yes
Type: Associative array of custom strings keys (GenericString) to strings

A map specifying connection options for the node. You can find standard connection options for the corresponding connection type in the Connection parameters section of the Glue documentation.

Name
Required: Yes
Type: string

The name of this source node.

OutputSchemas
Type: Array of GlueSchema structures

Specifies the data schema for this source.

ConnectorDataTarget

Description

Specifies a target generated with standard connection options.

Members
ConnectionType
Required: Yes
Type: string

The connectionType, as provided to the underlying Glue library. This node type supports the following connection types:

  • opensearch

  • azuresql

  • azurecosmos

  • bigquery

  • saphana

  • teradata

  • vertica

Data
Required: Yes
Type: Associative array of custom strings keys (GenericString) to strings

A map specifying connection options for the node. You can find standard connection options for the corresponding connection type in the Connection parameters section of the Glue documentation.

Inputs
Type: Array of strings

The nodes that are inputs to the data target.

Name
Required: Yes
Type: string

The name of this target node.

Crawl

Description

The details of a crawl in the workflow.

Members
CompletedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time on which the crawl completed.

ErrorMessage
Type: string

The error message associated with the crawl.

LogGroup
Type: string

The log group associated with the crawl.

LogStream
Type: string

The log stream associated with the crawl.

StartedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time on which the crawl started.

State
Type: string

The state of the crawler.

Crawler

Description

Specifies a crawler program that examines a data source and uses classifiers to try to determine its schema. If successful, the crawler records metadata concerning the data source in the Glue Data Catalog.

Members
Classifiers
Type: Array of strings

A list of UTF-8 strings that specify the custom classifiers that are associated with the crawler.

Configuration
Type: string

Crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler's behavior. For more information, see Setting crawler configuration options.

CrawlElapsedTime
Type: long (int|float)

If the crawler is running, contains the total time elapsed since the last crawl began.

CrawlerSecurityConfiguration
Type: string

The name of the SecurityConfiguration structure to be used by this crawler.

CreationTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The time that the crawler was created.

DatabaseName
Type: string

The name of the database in which the crawler's output is stored.

Description
Type: string

A description of the crawler.

LakeFormationConfiguration
Type: LakeFormationConfiguration structure

Specifies whether the crawler should use Lake Formation credentials for the crawler instead of the IAM role credentials.

LastCrawl
Type: LastCrawlInfo structure

The status of the last crawl, and potentially error information if an error occurred.

LastUpdated
Type: timestamp (string|DateTime or anything parsable by strtotime)

The time that the crawler was last updated.

LineageConfiguration
Type: LineageConfiguration structure

A configuration that specifies whether data lineage is enabled for the crawler.

Name
Type: string

The name of the crawler.

RecrawlPolicy
Type: RecrawlPolicy structure

A policy that specifies whether to crawl the entire dataset again, or to crawl only folders that were added since the last crawler run.

Role
Type: string

The Amazon Resource Name (ARN) of an IAM role that's used to access customer resources, such as Amazon Simple Storage Service (Amazon S3) data.

Schedule
Type: Schedule structure

For scheduled crawlers, the schedule when the crawler runs.

SchemaChangePolicy
Type: SchemaChangePolicy structure

The policy that specifies update and delete behaviors for the crawler.

State
Type: string

Indicates whether the crawler is running, or whether a run is pending.

TablePrefix
Type: string

The prefix added to the names of tables that are created.

Targets
Type: CrawlerTargets structure

A collection of targets to crawl.

Version
Type: long (int|float)

The version of the crawler.

CrawlerHistory

Description

Contains the information for a run of a crawler.

Members
CrawlId
Type: string

A UUID identifier for each crawl.

DPUHour
Type: double

The number of data processing units (DPU) used in hours for the crawl.

EndTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time on which the crawl ended.

ErrorMessage
Type: string

If an error occurred, the error message associated with the crawl.

LogGroup
Type: string

The log group associated with the crawl.

LogStream
Type: string

The log stream associated with the crawl.

MessagePrefix
Type: string

The prefix for a CloudWatch message about this crawl.

StartTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time on which the crawl started.

State
Type: string

The state of the crawl.

Summary
Type: string

A run summary for the specific crawl in JSON. Contains the catalog tables and partitions that were added, updated, or deleted.

CrawlerMetrics

Description

Metrics for a specified crawler.

Members
CrawlerName
Type: string

The name of the crawler.

LastRuntimeSeconds
Type: double

The duration of the crawler's most recent run, in seconds.

MedianRuntimeSeconds
Type: double

The median duration of this crawler's runs, in seconds.

StillEstimating
Type: boolean

True if the crawler is still estimating how long it will take to complete this run.

TablesCreated
Type: int

The number of tables created by this crawler.

TablesDeleted
Type: int

The number of tables deleted by this crawler.

TablesUpdated
Type: int

The number of tables updated by this crawler.

TimeLeftSeconds
Type: double

The estimated time left to complete a running crawl.

CrawlerNodeDetails

Description

The details of a Crawler node present in the workflow.

Members
Crawls
Type: Array of Crawl structures

A list of crawls represented by the crawl node.

CrawlerNotRunningException

Description

The specified crawler is not running.

Members
Message
Type: string

A message describing the problem.

CrawlerRunningException

Description

The operation cannot be performed because the crawler is already running.

Members
Message
Type: string

A message describing the problem.

CrawlerStoppingException

Description

The specified crawler is stopping.

Members
Message
Type: string

A message describing the problem.

CrawlerTargets

Description

Specifies data stores to crawl.

Members
CatalogTargets
Type: Array of CatalogTarget structures

Specifies Glue Data Catalog targets.

DeltaTargets
Type: Array of DeltaTarget structures

Specifies Delta data store targets.

DynamoDBTargets
Type: Array of DynamoDBTarget structures

Specifies Amazon DynamoDB targets.

HudiTargets
Type: Array of HudiTarget structures

Specifies Apache Hudi data store targets.

IcebergTargets
Type: Array of IcebergTarget structures

Specifies Apache Iceberg data store targets.

JdbcTargets
Type: Array of JdbcTarget structures

Specifies JDBC targets.

MongoDBTargets
Type: Array of MongoDBTarget structures

Specifies Amazon DocumentDB or MongoDB targets.

S3Targets
Type: Array of S3Target structures

Specifies Amazon Simple Storage Service (Amazon S3) targets.

CrawlsFilter

Description

A list of fields, comparators and value that you can use to filter the crawler runs for a specified crawler.

Members
FieldName
Type: string

A key used to filter the crawler runs for a specified crawler. Valid values for each of the field names are:

  • CRAWL_ID: A string representing the UUID identifier for a crawl.

  • STATE: A string representing the state of the crawl.

  • START_TIME and END_TIME: The epoch timestamp in milliseconds.

  • DPU_HOUR: The number of data processing unit (DPU) hours used for the crawl.

FieldValue
Type: string

The value provided for comparison on the crawl field.

FilterOperator
Type: string

A defined comparator that operates on the value. The available operators are:

  • GT: Greater than.

  • GE: Greater than or equal to.

  • LT: Less than.

  • LE: Less than or equal to.

  • EQ: Equal to.

  • NE: Not equal to.

CreateCsvClassifierRequest

Description

Specifies a custom CSV classifier for CreateClassifier to create.

Members
AllowSingleColumn
Type: boolean

Enables the processing of files that contain only one column.

ContainsHeader
Type: string

Indicates whether the CSV file contains a header.

CustomDatatypeConfigured
Type: boolean

Enables the configuration of custom datatypes.

CustomDatatypes
Type: Array of strings

Creates a list of supported custom datatypes.

Delimiter
Type: string

A custom symbol to denote what separates each column entry in the row.

DisableValueTrimming
Type: boolean

Specifies not to trim values before identifying the type of column values. The default value is true.

Header
Type: Array of strings

A list of strings representing column names.

Name
Required: Yes
Type: string

The name of the classifier.

QuoteSymbol
Type: string

A custom symbol to denote what combines content into a single column value. Must be different from the column delimiter.

Serde
Type: string

Sets the SerDe for processing CSV in the classifier, which will be applied in the Data Catalog. Valid values are OpenCSVSerDe, LazySimpleSerDe, and None. You can specify the None value when you want the crawler to do the detection.

CreateGrokClassifierRequest

Description

Specifies a grok classifier for CreateClassifier to create.

Members
Classification
Required: Yes
Type: string

An identifier of the data format that the classifier matches, such as Twitter, JSON, Omniture logs, Amazon CloudWatch Logs, and so on.

CustomPatterns
Type: string

Optional custom grok patterns used by this classifier.

GrokPattern
Required: Yes
Type: string

The grok pattern used by this classifier.

Name
Required: Yes
Type: string

The name of the new classifier.

CreateJsonClassifierRequest

Description

Specifies a JSON classifier for CreateClassifier to create.

Members
JsonPath
Required: Yes
Type: string

A JsonPath string defining the JSON data for the classifier to classify. Glue supports a subset of JsonPath, as described in Writing JsonPath Custom Classifiers.

Name
Required: Yes
Type: string

The name of the classifier.

CreateXMLClassifierRequest

Description

Specifies an XML classifier for CreateClassifier to create.

Members
Classification
Required: Yes
Type: string

An identifier of the data format that the classifier matches.

Name
Required: Yes
Type: string

The name of the classifier.

RowTag
Type: string

The XML tag designating the element that contains each record in an XML document being parsed. This can't identify a self-closing element (closed by />). An empty row element that contains only attributes can be parsed as long as it ends with a closing tag (for example, <row item_a="A" item_b="B"></row> is okay, but <row item_a="A" item_b="B" /> is not).

CsvClassifier

Description

A classifier for custom CSV content.

Members
AllowSingleColumn
Type: boolean

Enables the processing of files that contain only one column.

ContainsHeader
Type: string

Indicates whether the CSV file contains a header.

CreationTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The time that this classifier was registered.

CustomDatatypeConfigured
Type: boolean

Enables the custom datatype to be configured.

CustomDatatypes
Type: Array of strings

A list of custom datatypes including "BINARY", "BOOLEAN", "DATE", "DECIMAL", "DOUBLE", "FLOAT", "INT", "LONG", "SHORT", "STRING", "TIMESTAMP".

Delimiter
Type: string

A custom symbol to denote what separates each column entry in the row.

DisableValueTrimming
Type: boolean

Specifies not to trim values before identifying the type of column values. The default value is true.

Header
Type: Array of strings

A list of strings representing column names.

LastUpdated
Type: timestamp (string|DateTime or anything parsable by strtotime)

The time that this classifier was last updated.

Name
Required: Yes
Type: string

The name of the classifier.

QuoteSymbol
Type: string

A custom symbol to denote what combines content into a single column value. It must be different from the column delimiter.

Serde
Type: string

Sets the SerDe for processing CSV in the classifier, which will be applied in the Data Catalog. Valid values are OpenCSVSerDe, LazySimpleSerDe, and None. You can specify the None value when you want the crawler to do the detection.

Version
Type: long (int|float)

The version of this classifier.

CustomCode

Description

Specifies a transform that uses custom code you provide to perform the data transformation. The output is a collection of DynamicFrames.

Members
ClassName
Required: Yes
Type: string

The name defined for the custom code node class.

Code
Required: Yes
Type: string

The custom code that is used to perform the data transformation.

Inputs
Required: Yes
Type: Array of strings

The data inputs identified by their node names.

Name
Required: Yes
Type: string

The name of the transform node.

OutputSchemas
Type: Array of GlueSchema structures

Specifies the data schema for the custom code transform.

CustomEntityType

Description

An object representing a custom pattern for detecting sensitive data across the columns and rows of your structured data.

Members
ContextWords
Type: Array of strings

A list of context words. If none of these context words are found within the vicinity of the regular expression the data will not be detected as sensitive data.

If no context words are passed only a regular expression is checked.

Name
Required: Yes
Type: string

A name for the custom pattern that allows it to be retrieved or deleted later. This name must be unique per Amazon Web Services account.

RegexString
Required: Yes
Type: string

A regular expression string that is used for detecting sensitive data in a custom pattern.

DQResultsPublishingOptions

Description

Options to configure how your data quality evaluation results are published.

Members
CloudWatchMetricsEnabled
Type: boolean

Enable metrics for your data quality results.

EvaluationContext
Type: string

The context of the evaluation.

ResultsPublishingEnabled
Type: boolean

Enable publishing for your data quality results.

ResultsS3Prefix
Type: string

The Amazon S3 prefix prepended to the results.

DQStopJobOnFailureOptions

Description

Options to configure how your job will stop if your data quality evaluation fails.

Members
StopJobOnFailureTiming
Type: string

When to stop job if your data quality evaluation fails. Options are Immediate or AfterDataLoad.

DataCatalogEncryptionSettings

Description

Contains configuration information for maintaining Data Catalog security.

Members
ConnectionPasswordEncryption

When connection password protection is enabled, the Data Catalog uses a customer-provided key to encrypt the password as part of CreateConnection or UpdateConnection and store it in the ENCRYPTED_PASSWORD field in the connection properties. You can enable catalog encryption or only password encryption.

EncryptionAtRest
Type: EncryptionAtRest structure

Specifies the encryption-at-rest configuration for the Data Catalog.

DataLakePrincipal

Description

The Lake Formation principal.

Members
DataLakePrincipalIdentifier
Type: string

An identifier for the Lake Formation principal.

DataQualityAnalyzerResult

Description

Describes the result of the evaluation of a data quality analyzer.

Members
Description
Type: string

A description of the data quality analyzer.

EvaluatedMetrics
Type: Associative array of custom strings keys (NameString) to doubles

A map of metrics associated with the evaluation of the analyzer.

EvaluationMessage
Type: string

An evaluation message.

Name
Type: string

The name of the data quality analyzer.

DataQualityEvaluationRunAdditionalRunOptions

Description

Additional run options you can specify for an evaluation run.

Members
CloudWatchMetricsEnabled
Type: boolean

Whether or not to enable CloudWatch metrics.

ResultsS3Prefix
Type: string

Prefix for Amazon S3 to store results.

DataQualityMetricValues

Description

Describes the data quality metric value according to the analysis of historical data.

Members
ActualValue
Type: double

The actual value of the data quality metric.

ExpectedValue
Type: double

The expected value of the data quality metric according to the analysis of historical data.

LowerLimit
Type: double

The lower limit of the data quality metric value according to the analysis of historical data.

UpperLimit
Type: double

The upper limit of the data quality metric value according to the analysis of historical data.

DataQualityObservation

Description

Describes the observation generated after evaluating the rules and analyzers.

Members
Description
Type: string

A description of the data quality observation.

MetricBasedObservation
Type: MetricBasedObservation structure

An object of type MetricBasedObservation representing the observation that is based on evaluated data quality metrics.

DataQualityResult

Description

Describes a data quality result.

Members
AnalyzerResults
Type: Array of DataQualityAnalyzerResult structures

A list of DataQualityAnalyzerResult objects representing the results for each analyzer.

CompletedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time when this data quality run completed.

DataSource
Type: DataSource structure

The table associated with the data quality result, if any.

EvaluationContext
Type: string

In the context of a job in Glue Studio, each node in the canvas is typically assigned some sort of name and data quality nodes will have names. In the case of multiple nodes, the evaluationContext can differentiate the nodes.

JobName
Type: string

The job name associated with the data quality result, if any.

JobRunId
Type: string

The job run ID associated with the data quality result, if any.

Observations
Type: Array of DataQualityObservation structures

A list of DataQualityObservation objects representing the observations generated after evaluating the rules and analyzers.

ResultId
Type: string

A unique result ID for the data quality result.

RuleResults
Type: Array of DataQualityRuleResult structures

A list of DataQualityRuleResult objects representing the results for each rule.

RulesetEvaluationRunId
Type: string

The unique run ID for the ruleset evaluation for this data quality result.

RulesetName
Type: string

The name of the ruleset associated with the data quality result.

Score
Type: double

An aggregate data quality score. Represents the ratio of rules that passed to the total number of rules.

StartedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time when this data quality run started.

DataQualityResultDescription

Description

Describes a data quality result.

Members
DataSource
Type: DataSource structure

The table name associated with the data quality result.

JobName
Type: string

The job name associated with the data quality result.

JobRunId
Type: string

The job run ID associated with the data quality result.

ResultId
Type: string

The unique result ID for this data quality result.

StartedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The time that the run started for this data quality result.

DataQualityResultFilterCriteria

Description

Criteria used to return data quality results.

Members
DataSource
Type: DataSource structure

Filter results by the specified data source. For example, retrieving all results for an Glue table.

JobName
Type: string

Filter results by the specified job name.

JobRunId
Type: string

Filter results by the specified job run ID.

StartedAfter
Type: timestamp (string|DateTime or anything parsable by strtotime)

Filter results by runs that started after this time.

StartedBefore
Type: timestamp (string|DateTime or anything parsable by strtotime)

Filter results by runs that started before this time.

DataQualityRuleRecommendationRunDescription

Description

Describes the result of a data quality rule recommendation run.

Members
DataSource
Type: DataSource structure

The data source (Glue table) associated with the recommendation run.

RunId
Type: string

The unique run identifier associated with this run.

StartedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time when this run started.

Status
Type: string

The status for this run.

DataQualityRuleRecommendationRunFilter

Description

A filter for listing data quality recommendation runs.

Members
DataSource
Required: Yes
Type: DataSource structure

Filter based on a specified data source (Glue table).

StartedAfter
Type: timestamp (string|DateTime or anything parsable by strtotime)

Filter based on time for results started after provided time.

StartedBefore
Type: timestamp (string|DateTime or anything parsable by strtotime)

Filter based on time for results started before provided time.

DataQualityRuleResult

Description

Describes the result of the evaluation of a data quality rule.

Members
Description
Type: string

A description of the data quality rule.

EvaluatedMetrics
Type: Associative array of custom strings keys (NameString) to doubles

A map of metrics associated with the evaluation of the rule.

EvaluationMessage
Type: string

An evaluation message.

Name
Type: string

The name of the data quality rule.

Result
Type: string

A pass or fail status for the rule.

DataQualityRulesetEvaluationRunDescription

Description

Describes the result of a data quality ruleset evaluation run.

Members
DataSource
Type: DataSource structure

The data source (an Glue table) associated with the run.

RunId
Type: string

The unique run identifier associated with this run.

StartedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time when the run started.

Status
Type: string

The status for this run.

DataQualityRulesetEvaluationRunFilter

Description

The filter criteria.

Members
DataSource
Required: Yes
Type: DataSource structure

Filter based on a data source (an Glue table) associated with the run.

StartedAfter
Type: timestamp (string|DateTime or anything parsable by strtotime)

Filter results by runs that started after this time.

StartedBefore
Type: timestamp (string|DateTime or anything parsable by strtotime)

Filter results by runs that started before this time.

DataQualityRulesetFilterCriteria

Description

The criteria used to filter data quality rulesets.

Members
CreatedAfter
Type: timestamp (string|DateTime or anything parsable by strtotime)

Filter on rulesets created after this date.

CreatedBefore
Type: timestamp (string|DateTime or anything parsable by strtotime)

Filter on rulesets created before this date.

Description
Type: string

The description of the ruleset filter criteria.

LastModifiedAfter
Type: timestamp (string|DateTime or anything parsable by strtotime)

Filter on rulesets last modified after this date.

LastModifiedBefore
Type: timestamp (string|DateTime or anything parsable by strtotime)

Filter on rulesets last modified before this date.

Name
Type: string

The name of the ruleset filter criteria.

TargetTable
Type: DataQualityTargetTable structure

The name and database name of the target table.

DataQualityRulesetListDetails

Description

Describes a data quality ruleset returned by GetDataQualityRuleset.

Members
CreatedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time the data quality ruleset was created.

Description
Type: string

A description of the data quality ruleset.

LastModifiedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time the data quality ruleset was last modified.

Name
Type: string

The name of the data quality ruleset.

RecommendationRunId
Type: string

When a ruleset was created from a recommendation run, this run ID is generated to link the two together.

RuleCount
Type: int

The number of rules in the ruleset.

TargetTable
Type: DataQualityTargetTable structure

An object representing an Glue table.

DataQualityTargetTable

Description

An object representing an Glue table.

Members
CatalogId
Type: string

The catalog id where the Glue table exists.

DatabaseName
Required: Yes
Type: string

The name of the database where the Glue table exists.

TableName
Required: Yes
Type: string

The name of the Glue table.

DataSource

Description

A data source (an Glue table) for which you want data quality results.

Members
GlueTable
Required: Yes
Type: GlueTable structure

An Glue table.

Database

Description

The Database object represents a logical grouping of tables that might reside in a Hive metastore or an RDBMS.

Members
CatalogId
Type: string

The ID of the Data Catalog in which the database resides.

CreateTableDefaultPermissions
Type: Array of PrincipalPermissions structures

Creates a set of default permissions on the table for principals. Used by Lake Formation. Not used in the normal course of Glue operations.

CreateTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The time at which the metadata database was created in the catalog.

Description
Type: string

A description of the database.

FederatedDatabase
Type: FederatedDatabase structure

A FederatedDatabase structure that references an entity outside the Glue Data Catalog.

LocationUri
Type: string

The location of the database (for example, an HDFS path).

Name
Required: Yes
Type: string

The name of the database. For Hive compatibility, this is folded to lowercase when it is stored.

Parameters
Type: Associative array of custom strings keys (KeyString) to strings

These key-value pairs define parameters and properties of the database.

TargetDatabase
Type: DatabaseIdentifier structure

A DatabaseIdentifier structure that describes a target database for resource linking.

DatabaseIdentifier

Description

A structure that describes a target database for resource linking.

Members
CatalogId
Type: string

The ID of the Data Catalog in which the database resides.

DatabaseName
Type: string

The name of the catalog database.

Region
Type: string

Region of the target database.

DatabaseInput

Description

The structure used to create or update a database.

Members
CreateTableDefaultPermissions
Type: Array of PrincipalPermissions structures

Creates a set of default permissions on the table for principals. Used by Lake Formation. Not used in the normal course of Glue operations.

Description
Type: string

A description of the database.

FederatedDatabase
Type: FederatedDatabase structure

A FederatedDatabase structure that references an entity outside the Glue Data Catalog.

LocationUri
Type: string

The location of the database (for example, an HDFS path).

Name
Required: Yes
Type: string

The name of the database. For Hive compatibility, this is folded to lowercase when it is stored.

Parameters
Type: Associative array of custom strings keys (KeyString) to strings

These key-value pairs define parameters and properties of the database.

These key-value pairs define parameters and properties of the database.

TargetDatabase
Type: DatabaseIdentifier structure

A DatabaseIdentifier structure that describes a target database for resource linking.

Datatype

Description

A structure representing the datatype of the value.

Members
Id
Required: Yes
Type: string

The datatype of the value.

Label
Required: Yes
Type: string

A label assigned to the datatype.

DateColumnStatisticsData

Description

Defines column statistics supported for timestamp data columns.

Members
MaximumValue
Type: timestamp (string|DateTime or anything parsable by strtotime)

The highest value in the column.

MinimumValue
Type: timestamp (string|DateTime or anything parsable by strtotime)

The lowest value in the column.

NumberOfDistinctValues
Required: Yes
Type: long (int|float)

The number of distinct values in a column.

NumberOfNulls
Required: Yes
Type: long (int|float)

The number of null values in the column.

DecimalColumnStatisticsData

Description

Defines column statistics supported for fixed-point number data columns.

Members
MaximumValue
Type: DecimalNumber structure

The highest value in the column.

MinimumValue
Type: DecimalNumber structure

The lowest value in the column.

NumberOfDistinctValues
Required: Yes
Type: long (int|float)

The number of distinct values in a column.

NumberOfNulls
Required: Yes
Type: long (int|float)

The number of null values in the column.

DecimalNumber

Description

Contains a numeric value in decimal format.

Members
Scale
Required: Yes
Type: int

The scale that determines where the decimal point falls in the unscaled value.

UnscaledValue
Required: Yes
Type: blob (string|resource|Psr\Http\Message\StreamInterface)

The unscaled numeric value.

DeltaTarget

Description

Specifies a Delta data store to crawl one or more Delta tables.

Members
ConnectionName
Type: string

The name of the connection to use to connect to the Delta table target.

CreateNativeDeltaTable
Type: boolean

Specifies whether the crawler will create native tables, to allow integration with query engines that support querying of the Delta transaction log directly.

DeltaTables
Type: Array of strings

A list of the Amazon S3 paths to the Delta tables.

WriteManifest
Type: boolean

Specifies whether to write the manifest files to the Delta table path.

DevEndpoint

Description

A development endpoint where a developer can remotely debug extract, transform, and load (ETL) scripts.

Members
Arguments
Type: Associative array of custom strings keys (GenericString) to strings

A map of arguments used to configure the DevEndpoint.

Valid arguments are:

  • "--enable-glue-datacatalog": ""

You can specify a version of Python support for development endpoints by using the Arguments parameter in the CreateDevEndpoint or UpdateDevEndpoint APIs. If no arguments are provided, the version defaults to Python 2.

AvailabilityZone
Type: string

The Amazon Web Services Availability Zone where this DevEndpoint is located.

CreatedTimestamp
Type: timestamp (string|DateTime or anything parsable by strtotime)

The point in time at which this DevEndpoint was created.

EndpointName
Type: string

The name of the DevEndpoint.

ExtraJarsS3Path
Type: string

The path to one or more Java .jar files in an S3 bucket that should be loaded in your DevEndpoint.

You can only use pure Java/Scala libraries with a DevEndpoint.

ExtraPythonLibsS3Path
Type: string

The paths to one or more Python libraries in an Amazon S3 bucket that should be loaded in your DevEndpoint. Multiple values must be complete paths separated by a comma.

You can only use pure Python libraries with a DevEndpoint. Libraries that rely on C extensions, such as the pandas Python data analysis library, are not currently supported.

FailureReason
Type: string

The reason for a current failure in this DevEndpoint.

GlueVersion
Type: string

Glue version determines the versions of Apache Spark and Python that Glue supports. The Python version indicates the version supported for running your ETL scripts on development endpoints.

For more information about the available Glue versions and corresponding Spark and Python versions, see Glue version in the developer guide.

Development endpoints that are created without specifying a Glue version default to Glue 0.9.

You can specify a version of Python support for development endpoints by using the Arguments parameter in the CreateDevEndpoint or UpdateDevEndpoint APIs. If no arguments are provided, the version defaults to Python 2.

LastModifiedTimestamp
Type: timestamp (string|DateTime or anything parsable by strtotime)

The point in time at which this DevEndpoint was last modified.

LastUpdateStatus
Type: string

The status of the last update.

NumberOfNodes
Type: int

The number of Glue Data Processing Units (DPUs) allocated to this DevEndpoint.

NumberOfWorkers
Type: int

The number of workers of a defined workerType that are allocated to the development endpoint.

The maximum number of workers you can define are 299 for G.1X, and 149 for G.2X.

PrivateAddress
Type: string

A private IP address to access the DevEndpoint within a VPC if the DevEndpoint is created within one. The PrivateAddress field is present only when you create the DevEndpoint within your VPC.

PublicAddress
Type: string

The public IP address used by this DevEndpoint. The PublicAddress field is present only when you create a non-virtual private cloud (VPC) DevEndpoint.

PublicKey
Type: string

The public key to be used by this DevEndpoint for authentication. This attribute is provided for backward compatibility because the recommended attribute to use is public keys.

PublicKeys
Type: Array of strings

A list of public keys to be used by the DevEndpoints for authentication. Using this attribute is preferred over a single public key because the public keys allow you to have a different private key per client.

If you previously created an endpoint with a public key, you must remove that key to be able to set a list of public keys. Call the UpdateDevEndpoint API operation with the public key content in the deletePublicKeys attribute, and the list of new keys in the addPublicKeys attribute.

RoleArn
Type: string

The Amazon Resource Name (ARN) of the IAM role used in this DevEndpoint.

SecurityConfiguration
Type: string

The name of the SecurityConfiguration structure to be used with this DevEndpoint.

SecurityGroupIds
Type: Array of strings

A list of security group identifiers used in this DevEndpoint.

Status
Type: string

The current status of this DevEndpoint.

SubnetId
Type: string

The subnet ID for this DevEndpoint.

VpcId
Type: string

The ID of the virtual private cloud (VPC) used by this DevEndpoint.

WorkerType
Type: string

The type of predefined worker that is allocated to the development endpoint. Accepts a value of Standard, G.1X, or G.2X.

  • For the Standard worker type, each worker provides 4 vCPU, 16 GB of memory and a 50GB disk, and 2 executors per worker.

  • For the G.1X worker type, each worker maps to 1 DPU (4 vCPU, 16 GB of memory, 64 GB disk), and provides 1 executor per worker. We recommend this worker type for memory-intensive jobs.

  • For the G.2X worker type, each worker maps to 2 DPU (8 vCPU, 32 GB of memory, 128 GB disk), and provides 1 executor per worker. We recommend this worker type for memory-intensive jobs.

Known issue: when a development endpoint is created with the G.2X WorkerType configuration, the Spark drivers for the development endpoint will run on 4 vCPU, 16 GB of memory, and a 64 GB disk.

YarnEndpointAddress
Type: string

The YARN endpoint address used by this DevEndpoint.

ZeppelinRemoteSparkInterpreterPort
Type: int

The Apache Zeppelin port for the remote Apache Spark interpreter.

DevEndpointCustomLibraries

Description

Custom libraries to be loaded into a development endpoint.

Members
ExtraJarsS3Path
Type: string

The path to one or more Java .jar files in an S3 bucket that should be loaded in your DevEndpoint.

You can only use pure Java/Scala libraries with a DevEndpoint.

ExtraPythonLibsS3Path
Type: string

The paths to one or more Python libraries in an Amazon Simple Storage Service (Amazon S3) bucket that should be loaded in your DevEndpoint. Multiple values must be complete paths separated by a comma.

You can only use pure Python libraries with a DevEndpoint. Libraries that rely on C extensions, such as the pandas Python data analysis library, are not currently supported.

DirectJDBCSource

Description

Specifies the direct JDBC source connection.

Members
ConnectionName
Required: Yes
Type: string

The connection name of the JDBC source.

ConnectionType
Required: Yes
Type: string

The connection type of the JDBC source.

Database
Required: Yes
Type: string

The database of the JDBC source connection.

Name
Required: Yes
Type: string

The name of the JDBC source connection.

RedshiftTmpDir
Type: string

The temp directory of the JDBC Redshift source.

Table
Required: Yes
Type: string

The table of the JDBC source connection.

DirectKafkaSource

Description

Specifies an Apache Kafka data store.

Members
DataPreviewOptions
Type: StreamingDataPreviewOptions structure

Specifies options related to data preview for viewing a sample of your data.

DetectSchema
Type: boolean

Whether to automatically determine the schema from the incoming data.

Name
Required: Yes
Type: string

The name of the data store.

StreamingOptions
Type: KafkaStreamingSourceOptions structure

Specifies the streaming options.

WindowSize
Type: int

The amount of time to spend processing each micro batch.

DirectKinesisSource

Description

Specifies a direct Amazon Kinesis data source.

Members
DataPreviewOptions
Type: StreamingDataPreviewOptions structure

Additional options for data preview.

DetectSchema
Type: boolean

Whether to automatically determine the schema from the incoming data.

Name
Required: Yes
Type: string

The name of the data source.

StreamingOptions

Additional options for the Kinesis streaming data source.

WindowSize
Type: int

The amount of time to spend processing each micro batch.

DirectSchemaChangePolicy

Description

A policy that specifies update behavior for the crawler.

Members
Database
Type: string

Specifies the database that the schema change policy applies to.

EnableUpdateCatalog
Type: boolean

Whether to use the specified update behavior when the crawler finds a changed schema.

Table
Type: string

Specifies the table in the database that the schema change policy applies to.

UpdateBehavior
Type: string

The update behavior when the crawler finds a changed schema.

DoubleColumnStatisticsData

Description

Defines column statistics supported for floating-point number data columns.

Members
MaximumValue
Type: double

The highest value in the column.

MinimumValue
Type: double

The lowest value in the column.

NumberOfDistinctValues
Required: Yes
Type: long (int|float)

The number of distinct values in a column.

NumberOfNulls
Required: Yes
Type: long (int|float)

The number of null values in the column.

DropDuplicates

Description

Specifies a transform that removes rows of repeating data from a data set.

Members
Columns
Type: Array of stringss

The name of the columns to be merged or removed if repeating.

Inputs
Required: Yes
Type: Array of strings

The data inputs identified by their node names.

Name
Required: Yes
Type: string

The name of the transform node.

DropFields

Description

Specifies a transform that chooses the data property keys that you want to drop.

Members
Inputs
Required: Yes
Type: Array of strings

The data inputs identified by their node names.

Name
Required: Yes
Type: string

The name of the transform node.

Paths
Required: Yes
Type: Array of stringss

A JSON path to a variable in the data structure.

DropNullFields

Description

Specifies a transform that removes columns from the dataset if all values in the column are 'null'. By default, Glue Studio will recognize null objects, but some values such as empty strings, strings that are "null", -1 integers or other placeholders such as zeros, are not automatically recognized as nulls.

Members
Inputs
Required: Yes
Type: Array of strings

The data inputs identified by their node names.

Name
Required: Yes
Type: string

The name of the transform node.

NullCheckBoxList
Type: NullCheckBoxList structure

A structure that represents whether certain values are recognized as null values for removal.

NullTextList
Type: Array of NullValueField structures

A structure that specifies a list of NullValueField structures that represent a custom null value such as zero or other value being used as a null placeholder unique to the dataset.

The DropNullFields transform removes custom null values only if both the value of the null placeholder and the datatype match the data.

DynamicTransform

Description

Specifies the set of parameters needed to perform the dynamic transform.

Members
FunctionName
Required: Yes
Type: string

Specifies the name of the function of the dynamic transform.

Inputs
Required: Yes
Type: Array of strings

Specifies the inputs for the dynamic transform that are required.

Name
Required: Yes
Type: string

Specifies the name of the dynamic transform.

OutputSchemas
Type: Array of GlueSchema structures

Specifies the data schema for the dynamic transform.

Parameters
Type: Array of TransformConfigParameter structures

Specifies the parameters of the dynamic transform.

Path
Required: Yes
Type: string

Specifies the path of the dynamic transform source and config files.

TransformName
Required: Yes
Type: string

Specifies the name of the dynamic transform as it appears in the Glue Studio visual editor.

Version
Type: string

This field is not used and will be deprecated in future release.

DynamoDBCatalogSource

Description

Specifies a DynamoDB data source in the Glue Data Catalog.

Members
Database
Required: Yes
Type: string

The name of the database to read from.

Name
Required: Yes
Type: string

The name of the data source.

Table
Required: Yes
Type: string

The name of the table in the database to read from.

DynamoDBTarget

Description

Specifies an Amazon DynamoDB table to crawl.

Members
Path
Type: string

The name of the DynamoDB table to crawl.

scanAll
Type: boolean

Indicates whether to scan all the records, or to sample rows from the table. Scanning all the records can take a long time when the table is not a high throughput table.

A value of true means to scan all records, while a value of false means to sample the records. If no value is specified, the value defaults to true.

scanRate
Type: double

The percentage of the configured read capacity units to use by the Glue crawler. Read capacity units is a term defined by DynamoDB, and is a numeric value that acts as rate limiter for the number of reads that can be performed on that table per second.

The valid values are null or a value between 0.1 to 1.5. A null value is used when user does not provide a value, and defaults to 0.5 of the configured Read Capacity Unit (for provisioned tables), or 0.25 of the max configured Read Capacity Unit (for tables using on-demand mode).

Edge

Description

An edge represents a directed connection between two Glue components that are part of the workflow the edge belongs to.

Members
DestinationId
Type: string

The unique of the node within the workflow where the edge ends.

SourceId
Type: string

The unique of the node within the workflow where the edge starts.

EncryptionAtRest

Description

Specifies the encryption-at-rest configuration for the Data Catalog.

Members
CatalogEncryptionMode
Required: Yes
Type: string

The encryption-at-rest mode for encrypting Data Catalog data.

CatalogEncryptionServiceRole
Type: string

The role that Glue assumes to encrypt and decrypt the Data Catalog objects on the caller's behalf.

SseAwsKmsKeyId
Type: string

The ID of the KMS key to use for encryption at rest.

EncryptionConfiguration

Description

Specifies an encryption configuration.

Members
CloudWatchEncryption
Type: CloudWatchEncryption structure

The encryption configuration for Amazon CloudWatch.

JobBookmarksEncryption
Type: JobBookmarksEncryption structure

The encryption configuration for job bookmarks.

S3Encryption
Type: Array of S3Encryption structures

The encryption configuration for Amazon Simple Storage Service (Amazon S3) data.

EntityNotFoundException

Description

A specified entity does not exist

Members
FromFederationSource
Type: boolean

Indicates whether or not the exception relates to a federated source.

Message
Type: string

A message describing the problem.

ErrorDetail

Description

Contains details about an error.

Members
ErrorCode
Type: string

The code associated with this error.

ErrorMessage
Type: string

A message describing the error.

ErrorDetails

Description

An object containing error details.

Members
ErrorCode
Type: string

The error code for an error.

ErrorMessage
Type: string

The error message for an error.

EvaluateDataQuality

Description

Specifies your data quality evaluation criteria.

Members
Inputs
Required: Yes
Type: Array of strings

The inputs of your data quality evaluation.

Name
Required: Yes
Type: string

The name of the data quality evaluation.

Output
Type: string

The output of your data quality evaluation.

PublishingOptions
Type: DQResultsPublishingOptions structure

Options to configure how your results are published.

Ruleset
Required: Yes
Type: string

The ruleset for your data quality evaluation.

StopJobOnFailureOptions
Type: DQStopJobOnFailureOptions structure

Options to configure how your job will stop if your data quality evaluation fails.

EvaluateDataQualityMultiFrame

Description

Specifies your data quality evaluation criteria.

Members
AdditionalDataSources
Type: Associative array of custom strings keys (NodeName) to strings

The aliases of all data sources except primary.

AdditionalOptions
Type: Associative array of custom strings keys (AdditionalOptionKeys) to strings

Options to configure runtime behavior of the transform.

Inputs
Required: Yes
Type: Array of strings

The inputs of your data quality evaluation. The first input in this list is the primary data source.

Name
Required: Yes
Type: string

The name of the data quality evaluation.

PublishingOptions
Type: DQResultsPublishingOptions structure

Options to configure how your results are published.

Ruleset
Required: Yes
Type: string

The ruleset for your data quality evaluation.

StopJobOnFailureOptions
Type: DQStopJobOnFailureOptions structure

Options to configure how your job will stop if your data quality evaluation fails.

EvaluationMetrics

Description

Evaluation metrics provide an estimate of the quality of your machine learning transform.

Members
FindMatchesMetrics
Type: FindMatchesMetrics structure

The evaluation metrics for the find matches algorithm.

TransformType
Required: Yes
Type: string

The type of machine learning transform.

EventBatchingCondition

Description

Batch condition that must be met (specified number of events received or batch time window expired) before EventBridge event trigger fires.

Members
BatchSize
Required: Yes
Type: int

Number of events that must be received from Amazon EventBridge before EventBridge event trigger fires.

BatchWindow
Type: int

Window of time in seconds after which EventBridge event trigger fires. Window starts when first event is received.

ExecutionProperty

Description

An execution property of a job.

Members
MaxConcurrentRuns
Type: int

The maximum number of concurrent runs allowed for the job. The default is 1. An error is returned when this threshold is reached. The maximum value you can specify is controlled by a service limit.

ExportLabelsTaskRunProperties

Description

Specifies configuration properties for an exporting labels task run.

Members
OutputS3Path
Type: string

The Amazon Simple Storage Service (Amazon S3) path where you will export the labels.

FederatedDatabase

Description

A database that points to an entity outside the Glue Data Catalog.

Members
ConnectionName
Type: string

The name of the connection to the external metastore.

Identifier
Type: string

A unique identifier for the federated database.

FederatedResourceAlreadyExistsException

Description

A federated resource already exists.

Members
AssociatedGlueResource
Type: string

The associated Glue resource already exists.

Message
Type: string

The message describing the problem.

FederatedTable

Description

A table that points to an entity outside the Glue Data Catalog.

Members
ConnectionName
Type: string

The name of the connection to the external metastore.

DatabaseIdentifier
Type: string

A unique identifier for the federated database.

Identifier
Type: string

A unique identifier for the federated table.

FederationSourceException

Description

A federation source failed.

Members
FederationSourceErrorCode
Type: string

The error code of the problem.

Message
Type: string

The message describing the problem.

FederationSourceRetryableException

Description

A federation source failed, but the operation may be retried.

Members
Message
Type: string

A message describing the problem.

FillMissingValues

Description

Specifies a transform that locates records in the dataset that have missing values and adds a new field with a value determined by imputation. The input data set is used to train the machine learning model that determines what the missing value should be.

Members
FilledPath
Type: string

A JSON path to a variable in the data structure for the dataset that is filled.

ImputedPath
Required: Yes
Type: string

A JSON path to a variable in the data structure for the dataset that is imputed.

Inputs
Required: Yes
Type: Array of strings

The data inputs identified by their node names.

Name
Required: Yes
Type: string

The name of the transform node.

Filter

Description

Specifies a transform that splits a dataset into two, based on a filter condition.

Members
Filters
Required: Yes
Type: Array of FilterExpression structures

Specifies a filter expression.

Inputs
Required: Yes
Type: Array of strings

The data inputs identified by their node names.

LogicalOperator
Required: Yes
Type: string

The operator used to filter rows by comparing the key value to a specified value.

Name
Required: Yes
Type: string

The name of the transform node.

FilterExpression

Description

Specifies a filter expression.

Members
Negated
Type: boolean

Whether the expression is to be negated.

Operation
Required: Yes
Type: string

The type of operation to perform in the expression.

Values
Required: Yes
Type: Array of FilterValue structures

A list of filter values.

FilterValue

Description

Represents a single entry in the list of values for a FilterExpression.

Members
Type
Required: Yes
Type: string

The type of filter value.

Value
Required: Yes
Type: Array of strings

The value to be associated.

FindMatchesMetrics

Description

The evaluation metrics for the find matches algorithm. The quality of your machine learning transform is measured by getting your transform to predict some matches and comparing the results to known matches from the same dataset. The quality metrics are based on a subset of your data, so they are not precise.

Members
AreaUnderPRCurve
Type: double

The area under the precision/recall curve (AUPRC) is a single number measuring the overall quality of the transform, that is independent of the choice made for precision vs. recall. Higher values indicate that you have a more attractive precision vs. recall tradeoff.

For more information, see Precision and recall in Wikipedia.

ColumnImportances
Type: Array of ColumnImportance structures

A list of ColumnImportance structures containing column importance metrics, sorted in order of descending importance.

ConfusionMatrix
Type: ConfusionMatrix structure

The confusion matrix shows you what your transform is predicting accurately and what types of errors it is making.

For more information, see Confusion matrix in Wikipedia.

F1
Type: double

The maximum F1 metric indicates the transform's accuracy between 0 and 1, where 1 is the best accuracy.

For more information, see F1 score in Wikipedia.

Precision
Type: double

The precision metric indicates when often your transform is correct when it predicts a match. Specifically, it measures how well the transform finds true positives from the total true positives possible.

For more information, see Precision and recall in Wikipedia.

Recall
Type: double

The recall metric indicates that for an actual match, how often your transform predicts the match. Specifically, it measures how well the transform finds true positives from the total records in the source data.

For more information, see Precision and recall in Wikipedia.

FindMatchesParameters

Description

The parameters to configure the find matches transform.

Members
AccuracyCostTradeoff
Type: double

The value that is selected when tuning your transform for a balance between accuracy and cost. A value of 0.5 means that the system balances accuracy and cost concerns. A value of 1.0 means a bias purely for accuracy, which typically results in a higher cost, sometimes substantially higher. A value of 0.0 means a bias purely for cost, which results in a less accurate FindMatches transform, sometimes with unacceptable accuracy.

Accuracy measures how well the transform finds true positives and true negatives. Increasing accuracy requires more machine resources and cost. But it also results in increased recall.

Cost measures how many compute resources, and thus money, are consumed to run the transform.

EnforceProvidedLabels
Type: boolean

The value to switch on or off to force the output to match the provided labels from users. If the value is True, the find matches transform forces the output to match the provided labels. The results override the normal conflation results. If the value is False, the find matches transform does not ensure all the labels provided are respected, and the results rely on the trained model.

Note that setting this value to true may increase the conflation execution time.

PrecisionRecallTradeoff
Type: double

The value selected when tuning your transform for a balance between precision and recall. A value of 0.5 means no preference; a value of 1.0 means a bias purely for precision, and a value of 0.0 means a bias for recall. Because this is a tradeoff, choosing values close to 1.0 means very low recall, and choosing values close to 0.0 results in very low precision.

The precision metric indicates how often your model is correct when it predicts a match.

The recall metric indicates that for an actual match, how often your model predicts the match.

PrimaryKeyColumnName
Type: string

The name of a column that uniquely identifies rows in the source table. Used to help identify matching records.

FindMatchesTaskRunProperties

Description

Specifies configuration properties for a Find Matches task run.

Members
JobId
Type: string

The job ID for the Find Matches task run.

JobName
Type: string

The name assigned to the job for the Find Matches task run.

JobRunId
Type: string

The job run ID for the Find Matches task run.

GetConnectionsFilter

Description

Filters the connection definitions that are returned by the GetConnections API operation.

Members
ConnectionType
Type: string

The type of connections to return. Currently, SFTP is not supported.

MatchCriteria
Type: Array of strings

A criteria string that must match the criteria recorded in the connection definition for that connection definition to be returned.

GlueEncryptionException

Description

An encryption operation failed.

Members
Message
Type: string

The message describing the problem.

GluePolicy

Description

A structure for returning a resource policy.

Members
CreateTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time at which the policy was created.

PolicyHash
Type: string

Contains the hash value associated with this policy.

PolicyInJson
Type: string

Contains the requested policy document, in JSON format.

UpdateTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time at which the policy was last updated.

GlueSchema

Description

Specifies a user-defined schema when a schema cannot be determined by Glue.

Members
Columns
Type: Array of GlueStudioSchemaColumn structures

Specifies the column definitions that make up a Glue schema.

GlueStudioSchemaColumn

Description

Specifies a single column in a Glue schema definition.

Members
Name
Required: Yes
Type: string

The name of the column in the Glue Studio schema.

Type
Type: string

The hive type for this column in the Glue Studio schema.

GlueTable

Description

The database and table in the Glue Data Catalog that is used for input or output data.

Members
AdditionalOptions
Type: Associative array of custom strings keys (NameString) to strings

Additional options for the table. Currently there are two keys supported:

  • pushDownPredicate: to filter on partitions without having to list and read all the files in your dataset.

  • catalogPartitionPredicate: to use server-side partition pruning using partition indexes in the Glue Data Catalog.

CatalogId
Type: string

A unique identifier for the Glue Data Catalog.

ConnectionName
Type: string

The name of the connection to the Glue Data Catalog.

DatabaseName
Required: Yes
Type: string

A database name in the Glue Data Catalog.

TableName
Required: Yes
Type: string

A table name in the Glue Data Catalog.

GovernedCatalogSource

Description

Specifies the data store in the governed Glue Data Catalog.

Members
AdditionalOptions
Type: S3SourceAdditionalOptions structure

Specifies additional connection options.

Database
Required: Yes
Type: string

The database to read from.

Name
Required: Yes
Type: string

The name of the data store.

PartitionPredicate
Type: string

Partitions satisfying this predicate are deleted. Files within the retention period in these partitions are not deleted. Set to "" – empty by default.

Table
Required: Yes
Type: string

The database table to read from.

GovernedCatalogTarget

Description

Specifies a data target that writes to Amazon S3 using the Glue Data Catalog.

Members
Database
Required: Yes
Type: string

The name of the database to write to.

Inputs
Required: Yes
Type: Array of strings

The nodes that are inputs to the data target.

Name
Required: Yes
Type: string

The name of the data target.

PartitionKeys
Type: Array of stringss

Specifies native partitioning using a sequence of keys.

SchemaChangePolicy
Type: CatalogSchemaChangePolicy structure

A policy that specifies update behavior for the governed catalog.

Table
Required: Yes
Type: string

The name of the table in the database to write to.

GrokClassifier

Description

A classifier that uses grok patterns.

Members
Classification
Required: Yes
Type: string

An identifier of the data format that the classifier matches, such as Twitter, JSON, Omniture logs, and so on.

CreationTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The time that this classifier was registered.

CustomPatterns
Type: string

Optional custom grok patterns defined by this classifier. For more information, see custom patterns in Writing Custom Classifiers.

GrokPattern
Required: Yes
Type: string

The grok pattern applied to a data store by this classifier. For more information, see built-in patterns in Writing Custom Classifiers.

LastUpdated
Type: timestamp (string|DateTime or anything parsable by strtotime)

The time that this classifier was last updated.

Name
Required: Yes
Type: string

The name of the classifier.

Version
Type: long (int|float)

The version of this classifier.

HudiTarget

Description

Specifies an Apache Hudi data source.

Members
ConnectionName
Type: string

The name of the connection to use to connect to the Hudi target. If your Hudi files are stored in buckets that require VPC authorization, you can set their connection properties here.

Exclusions
Type: Array of strings

A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler.

MaximumTraversalDepth
Type: int

The maximum depth of Amazon S3 paths that the crawler can traverse to discover the Hudi metadata folder in your Amazon S3 path. Used to limit the crawler run time.

Paths
Type: Array of strings

An array of Amazon S3 location strings for Hudi, each indicating the root folder with which the metadata files for a Hudi table resides. The Hudi folder may be located in a child folder of the root folder.

The crawler will scan all folders underneath a path for a Hudi folder.

IcebergInput

Description

A structure that defines an Apache Iceberg metadata table to create in the catalog.

Members
MetadataOperation
Required: Yes
Type: string

A required metadata operation. Can only be set to CREATE.

Version
Type: string

The table version for the Iceberg table. Defaults to 2.

IcebergTarget

Description

Specifies an Apache Iceberg data source where Iceberg tables are stored in Amazon S3.

Members
ConnectionName
Type: string

The name of the connection to use to connect to the Iceberg target.

Exclusions
Type: Array of strings

A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler.

MaximumTraversalDepth
Type: int

The maximum depth of Amazon S3 paths that the crawler can traverse to discover the Iceberg metadata folder in your Amazon S3 path. Used to limit the crawler run time.

Paths
Type: Array of strings

One or more Amazon S3 paths that contains Iceberg metadata folders as s3://bucket/prefix.

IdempotentParameterMismatchException

Description

The same unique identifier was associated with two different records.

Members
Message
Type: string

A message describing the problem.

IllegalBlueprintStateException

Description

The blueprint is in an invalid state to perform a requested operation.

Members
Message
Type: string

A message describing the problem.

IllegalSessionStateException

Description

The session is in an invalid state to perform a requested operation.

Members
Message
Type: string

A message describing the problem.

IllegalWorkflowStateException

Description

The workflow is in an invalid state to perform a requested operation.

Members
Message
Type: string

A message describing the problem.

ImportLabelsTaskRunProperties

Description

Specifies configuration properties for an importing labels task run.

Members
InputS3Path
Type: string

The Amazon Simple Storage Service (Amazon S3) path from where you will import the labels.

Replace
Type: boolean

Indicates whether to overwrite your existing labels.

InternalServiceException

Description

An internal service error occurred.

Members
Message
Type: string

A message describing the problem.

InvalidInputException

Description

The input provided was not valid.

Members
FromFederationSource
Type: boolean

Indicates whether or not the exception relates to a federated source.

Message
Type: string

A message describing the problem.

InvalidStateException

Description

An error that indicates your data is in an invalid state.

Members
Message
Type: string

A message describing the problem.

JDBCConnectorOptions

Description

Additional connection options for the connector.

Members
DataTypeMapping
Type: Associative array of custom strings keys (JDBCDataType) to strings

Custom data type mapping that builds a mapping from a JDBC data type to an Glue data type. For example, the option "dataTypeMapping":{"FLOAT":"STRING"} maps data fields of JDBC type FLOAT into the Java String type by calling the ResultSet.getString() method of the driver, and uses it to build the Glue record. The ResultSet object is implemented by each driver, so the behavior is specific to the driver you use. Refer to the documentation for your JDBC driver to understand how the driver performs the conversions.

FilterPredicate
Type: string

Extra condition clause to filter data from source. For example:

BillingCity='Mountain View'

When using a query instead of a table name, you should validate that the query works with the specified filterPredicate.

JobBookmarkKeys
Type: Array of strings

The name of the job bookmark keys on which to sort.

JobBookmarkKeysSortOrder
Type: string

Specifies an ascending or descending sort order.

LowerBound
Type: long (int|float)

The minimum value of partitionColumn that is used to decide partition stride.

NumPartitions
Type: long (int|float)

The number of partitions. This value, along with lowerBound (inclusive) and upperBound (exclusive), form partition strides for generated WHERE clause expressions that are used to split the partitionColumn.

PartitionColumn
Type: string

The name of an integer column that is used for partitioning. This option works only when it's included with lowerBound, upperBound, and numPartitions. This option works the same way as in the Spark SQL JDBC reader.

UpperBound
Type: long (int|float)

The maximum value of partitionColumn that is used to decide partition stride.

JDBCConnectorSource

Description

Specifies a connector to a JDBC data source.

Members
AdditionalOptions
Type: JDBCConnectorOptions structure

Additional connection options for the connector.

ConnectionName
Required: Yes
Type: string

The name of the connection that is associated with the connector.

ConnectionTable
Type: string

The name of the table in the data source.

ConnectionType
Required: Yes
Type: string

The type of connection, such as marketplace.jdbc or custom.jdbc, designating a connection to a JDBC data store.

ConnectorName
Required: Yes
Type: string

The name of a connector that assists with accessing the data store in Glue Studio.

Name
Required: Yes
Type: string

The name of the data source.

OutputSchemas
Type: Array of GlueSchema structures

Specifies the data schema for the custom JDBC source.

Query
Type: string

The table or SQL query to get the data from. You can specify either ConnectionTable or query, but not both.

JDBCConnectorTarget

Description

Specifies a data target that writes to Amazon S3 in Apache Parquet columnar storage.

Members
AdditionalOptions
Type: Associative array of custom strings keys (EnclosedInStringProperty) to strings

Additional connection options for the connector.

ConnectionName
Required: Yes
Type: string

The name of the connection that is associated with the connector.

ConnectionTable
Required: Yes
Type: string

The name of the table in the data target.

ConnectionType
Required: Yes
Type: string

The type of connection, such as marketplace.jdbc or custom.jdbc, designating a connection to a JDBC data target.

ConnectorName
Required: Yes
Type: string

The name of a connector that will be used.

Inputs
Required: Yes
Type: Array of strings

The nodes that are inputs to the data target.

Name
Required: Yes
Type: string

The name of the data target.

OutputSchemas
Type: Array of GlueSchema structures

Specifies the data schema for the JDBC target.

JdbcTarget

Description

Specifies a JDBC data store to crawl.

Members
ConnectionName
Type: string

The name of the connection to use to connect to the JDBC target.

EnableAdditionalMetadata
Type: Array of strings

Specify a value of RAWTYPES or COMMENTS to enable additional metadata in table responses. RAWTYPES provides the native-level datatype. COMMENTS provides comments associated with a column or table in the database.

If you do not need additional metadata, keep the field empty.

Exclusions
Type: Array of strings

A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler.

Path
Type: string

The path of the JDBC target.

Job

Description

Specifies a job definition.

Members
AllocatedCapacity
Type: int

This field is deprecated. Use MaxCapacity instead.

The number of Glue data processing units (DPUs) allocated to runs of this job. You can allocate a minimum of 2 DPUs; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.

CodeGenConfigurationNodes
Type: Associative array of custom strings keys (NodeId) to CodeGenConfigurationNode structures

The representation of a directed acyclic graph on which both the Glue Studio visual component and Glue Studio code generation is based.

Command
Type: JobCommand structure

The JobCommand that runs this job.

Connections
Type: ConnectionsList structure

The connections used for this job.

CreatedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The time and date that this job definition was created.

DefaultArguments
Type: Associative array of custom strings keys (GenericString) to strings

The default arguments for every run of this job, specified as name-value pairs.

You can specify arguments here that your own job-execution script consumes, as well as arguments that Glue itself consumes.

Job arguments may be logged. Do not pass plaintext secrets as arguments. Retrieve secrets from a Glue Connection, Secrets Manager or other secret management mechanism if you intend to keep them within the Job.

For information about how to specify and consume your own Job arguments, see the Calling Glue APIs in Python topic in the developer guide.

For information about the arguments you can provide to this field when configuring Spark jobs, see the Special Parameters Used by Glue topic in the developer guide.

For information about the arguments you can provide to this field when configuring Ray jobs, see Using job parameters in Ray jobs in the developer guide.

Description
Type: string

A description of the job.

ExecutionClass
Type: string

Indicates whether the job is run with a standard or flexible execution class. The standard execution class is ideal for time-sensitive workloads that require fast job startup and dedicated resources.

The flexible execution class is appropriate for time-insensitive jobs whose start and completion times may vary.

Only jobs with Glue version 3.0 and above and command type glueetl will be allowed to set ExecutionClass to FLEX. The flexible execution class is available for Spark jobs.

ExecutionProperty
Type: ExecutionProperty structure

An ExecutionProperty specifying the maximum number of concurrent runs allowed for this job.

GlueVersion
Type: string

In Spark jobs, GlueVersion determines the versions of Apache Spark and Python that Glue available in a job. The Python version indicates the version supported for jobs of type Spark.

Ray jobs should set GlueVersion to 4.0 or greater. However, the versions of Ray, Python and additional libraries available in your Ray job are determined by the Runtime parameter of the Job command.

For more information about the available Glue versions and corresponding Spark and Python versions, see Glue version in the developer guide.

Jobs that are created without specifying a Glue version default to Glue 0.9.

JobMode
Type: string

A mode that describes how a job was created. Valid values are:

  • SCRIPT - The job was created using the Glue Studio script editor.

  • VISUAL - The job was created using the Glue Studio visual editor.

  • NOTEBOOK - The job was created using an interactive sessions notebook.

When the JobMode field is missing or null, SCRIPT is assigned as the default value.

LastModifiedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The last point in time when this job definition was modified.

LogUri
Type: string

This field is reserved for future use.

MaintenanceWindow
Type: string

This field specifies a day of the week and hour for a maintenance window for streaming jobs. Glue periodically performs maintenance activities. During these maintenance windows, Glue will need to restart your streaming jobs.

Glue will restart the job within 3 hours of the specified maintenance window. For instance, if you set up the maintenance window for Monday at 10:00AM GMT, your jobs will be restarted between 10:00AM GMT to 1:00PM GMT.

MaxCapacity
Type: double

For Glue version 1.0 or earlier jobs, using the standard worker type, the number of Glue data processing units (DPUs) that can be allocated when this job runs. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.

For Glue version 2.0 or later jobs, you cannot specify a Maximum capacity. Instead, you should specify a Worker type and the Number of workers.

Do not set MaxCapacity if using WorkerType and NumberOfWorkers.

The value that can be allocated for MaxCapacity depends on whether you are running a Python shell job, an Apache Spark ETL job, or an Apache Spark streaming ETL job:

  • When you specify a Python shell job (JobCommand.Name="pythonshell"), you can allocate either 0.0625 or 1 DPU. The default is 0.0625 DPU.

  • When you specify an Apache Spark ETL job (JobCommand.Name="glueetl") or Apache Spark streaming ETL job (JobCommand.Name="gluestreaming"), you can allocate from 2 to 100 DPUs. The default is 10 DPUs. This job type cannot have a fractional DPU allocation.

MaxRetries
Type: int

The maximum number of times to retry this job after a JobRun fails.

Name
Type: string

The name you assign to this job definition.

NonOverridableArguments
Type: Associative array of custom strings keys (GenericString) to strings

Arguments for this job that are not overridden when providing job arguments in a job run, specified as name-value pairs.

NotificationProperty
Type: NotificationProperty structure

Specifies configuration properties of a job notification.

NumberOfWorkers
Type: int

The number of workers of a defined workerType that are allocated when a job runs.

Role
Type: string

The name or Amazon Resource Name (ARN) of the IAM role associated with this job.

SecurityConfiguration
Type: string

The name of the SecurityConfiguration structure to be used with this job.

SourceControlDetails
Type: SourceControlDetails structure

The details for a source control configuration for a job, allowing synchronization of job artifacts to or from a remote repository.

Timeout
Type: int

The job timeout in minutes. This is the maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).

WorkerType
Type: string

The type of predefined worker that is allocated when a job runs. Accepts a value of G.1X, G.2X, G.4X, G.8X or G.025X for Spark jobs. Accepts the value Z.2X for Ray jobs.

  • For the G.1X worker type, each worker maps to 1 DPU (4 vCPUs, 16 GB of memory) with 84GB disk (approximately 34GB free), and provides 1 executor per worker. We recommend this worker type for workloads such as data transforms, joins, and queries, to offers a scalable and cost effective way to run most jobs.

  • For the G.2X worker type, each worker maps to 2 DPU (8 vCPUs, 32 GB of memory) with 128GB disk (approximately 77GB free), and provides 1 executor per worker. We recommend this worker type for workloads such as data transforms, joins, and queries, to offers a scalable and cost effective way to run most jobs.

  • For the G.4X worker type, each worker maps to 4 DPU (16 vCPUs, 64 GB of memory) with 256GB disk (approximately 235GB free), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for Glue version 3.0 or later Spark ETL jobs in the following Amazon Web Services Regions: US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm).

  • For the G.8X worker type, each worker maps to 8 DPU (32 vCPUs, 128 GB of memory) with 512GB disk (approximately 487GB free), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for Glue version 3.0 or later Spark ETL jobs, in the same Amazon Web Services Regions as supported for the G.4X worker type.

  • For the G.025X worker type, each worker maps to 0.25 DPU (2 vCPUs, 4 GB of memory) with 84GB disk (approximately 34GB free), and provides 1 executor per worker. We recommend this worker type for low volume streaming jobs. This worker type is only available for Glue version 3.0 streaming jobs.

  • For the Z.2X worker type, each worker maps to 2 M-DPU (8vCPUs, 64 GB of memory) with 128 GB disk (approximately 120GB free), and provides up to 8 Ray workers based on the autoscaler.

JobBookmarkEntry

Description

Defines a point that a job can resume processing.

Members
Attempt
Type: int

The attempt ID number.

JobBookmark
Type: string

The bookmark itself.

JobName
Type: string

The name of the job in question.

PreviousRunId
Type: string

The unique run identifier associated with the previous job run.

Run
Type: int

The run ID number.

RunId
Type: string

The run ID number.

Version
Type: int

The version of the job.

JobBookmarksEncryption

Description

Specifies how job bookmark data should be encrypted.

Members
JobBookmarksEncryptionMode
Type: string

The encryption mode to use for job bookmarks data.

KmsKeyArn
Type: string

The Amazon Resource Name (ARN) of the KMS key to be used to encrypt the data.

JobCommand

Description

Specifies code that runs when a job is run.

Members
Name
Type: string

The name of the job command. For an Apache Spark ETL job, this must be glueetl. For a Python shell job, it must be pythonshell. For an Apache Spark streaming ETL job, this must be gluestreaming. For a Ray job, this must be glueray.

PythonVersion
Type: string

The Python version being used to run a Python shell job. Allowed values are 2 or 3.

Runtime
Type: string

In Ray jobs, Runtime is used to specify the versions of Ray, Python and additional libraries available in your environment. This field is not used in other job types. For supported runtime environment values, see Supported Ray runtime environments in the Glue Developer Guide.

ScriptLocation
Type: string

Specifies the Amazon Simple Storage Service (Amazon S3) path to a script that runs a job.

JobNodeDetails

Description

The details of a Job node present in the workflow.

Members
JobRuns
Type: Array of JobRun structures

The information for the job runs represented by the job node.

JobRun

Description

Contains information about a job run.

Members
AllocatedCapacity
Type: int

This field is deprecated. Use MaxCapacity instead.

The number of Glue data processing units (DPUs) allocated to this JobRun. From 2 to 100 DPUs can be allocated; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.

Arguments
Type: Associative array of custom strings keys (GenericString) to strings

The job arguments associated with this run. For this job run, they replace the default arguments set in the job definition itself.

You can specify arguments here that your own job-execution script consumes, as well as arguments that Glue itself consumes.

Job arguments may be logged. Do not pass plaintext secrets as arguments. Retrieve secrets from a Glue Connection, Secrets Manager or other secret management mechanism if you intend to keep them within the Job.

For information about how to specify and consume your own Job arguments, see the Calling Glue APIs in Python topic in the developer guide.

For information about the arguments you can provide to this field when configuring Spark jobs, see the Special Parameters Used by Glue topic in the developer guide.

For information about the arguments you can provide to this field when configuring Ray jobs, see Using job parameters in Ray jobs in the developer guide.

Attempt
Type: int

The number of the attempt to run this job.

CompletedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time that this job run completed.

DPUSeconds
Type: double

This field can be set for either job runs with execution class FLEX or when Auto Scaling is enabled, and represents the total time each executor ran during the lifecycle of a job run in seconds, multiplied by a DPU factor (1 for G.1X, 2 for G.2X, or 0.25 for G.025X workers). This value may be different than the executionEngineRuntime * MaxCapacity as in the case of Auto Scaling jobs, as the number of executors running at a given time may be less than the MaxCapacity. Therefore, it is possible that the value of DPUSeconds is less than executionEngineRuntime * MaxCapacity.

ErrorMessage
Type: string

An error message associated with this job run.

ExecutionClass
Type: string

Indicates whether the job is run with a standard or flexible execution class. The standard execution-class is ideal for time-sensitive workloads that require fast job startup and dedicated resources.

The flexible execution class is appropriate for time-insensitive jobs whose start and completion times may vary.

Only jobs with Glue version 3.0 and above and command type glueetl will be allowed to set ExecutionClass to FLEX. The flexible execution class is available for Spark jobs.

ExecutionTime
Type: int

The amount of time (in seconds) that the job run consumed resources.

GlueVersion
Type: string

In Spark jobs, GlueVersion determines the versions of Apache Spark and Python that Glue available in a job. The Python version indicates the version supported for jobs of type Spark.

Ray jobs should set GlueVersion to 4.0 or greater. However, the versions of Ray, Python and additional libraries available in your Ray job are determined by the Runtime parameter of the Job command.

For more information about the available Glue versions and corresponding Spark and Python versions, see Glue version in the developer guide.

Jobs that are created without specifying a Glue version default to Glue 0.9.

Id
Type: string

The ID of this job run.

JobMode
Type: string

A mode that describes how a job was created. Valid values are:

  • SCRIPT - The job was created using the Glue Studio script editor.

  • VISUAL - The job was created using the Glue Studio visual editor.

  • NOTEBOOK - The job was created using an interactive sessions notebook.

When the JobMode field is missing or null, SCRIPT is assigned as the default value.

JobName
Type: string

The name of the job definition being used in this run.

JobRunState
Type: string

The current state of the job run. For more information about the statuses of jobs that have terminated abnormally, see Glue Job Run Statuses.

LastModifiedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The last time that this job run was modified.

LogGroupName
Type: string

The name of the log group for secure logging that can be server-side encrypted in Amazon CloudWatch using KMS. This name can be /aws-glue/jobs/, in which case the default encryption is NONE. If you add a role name and SecurityConfiguration name (in other words, /aws-glue/jobs-yourRoleName-yourSecurityConfigurationName/), then that security configuration is used to encrypt the log group.

MaintenanceWindow
Type: string

This field specifies a day of the week and hour for a maintenance window for streaming jobs. Glue periodically performs maintenance activities. During these maintenance windows, Glue will need to restart your streaming jobs.

Glue will restart the job within 3 hours of the specified maintenance window. For instance, if you set up the maintenance window for Monday at 10:00AM GMT, your jobs will be restarted between 10:00AM GMT to 1:00PM GMT.

MaxCapacity
Type: double

For Glue version 1.0 or earlier jobs, using the standard worker type, the number of Glue data processing units (DPUs) that can be allocated when this job runs. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.

For Glue version 2.0+ jobs, you cannot specify a Maximum capacity. Instead, you should specify a Worker type and the Number of workers.

Do not set MaxCapacity if using WorkerType and NumberOfWorkers.

The value that can be allocated for MaxCapacity depends on whether you are running a Python shell job, an Apache Spark ETL job, or an Apache Spark streaming ETL job:

  • When you specify a Python shell job (JobCommand.Name="pythonshell"), you can allocate either 0.0625 or 1 DPU. The default is 0.0625 DPU.

  • When you specify an Apache Spark ETL job (JobCommand.Name="glueetl") or Apache Spark streaming ETL job (JobCommand.Name="gluestreaming"), you can allocate from 2 to 100 DPUs. The default is 10 DPUs. This job type cannot have a fractional DPU allocation.

NotificationProperty
Type: NotificationProperty structure

Specifies configuration properties of a job run notification.

NumberOfWorkers
Type: int

The number of workers of a defined workerType that are allocated when a job runs.

PredecessorRuns
Type: Array of Predecessor structures

A list of predecessors to this job run.

PreviousRunId
Type: string

The ID of the previous run of this job. For example, the JobRunId specified in the StartJobRun action.

SecurityConfiguration
Type: string

The name of the SecurityConfiguration structure to be used with this job run.

StartedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time at which this job run was started.

Timeout
Type: int

The JobRun timeout in minutes. This is the maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status. This value overrides the timeout value set in the parent job.

The maximum value for timeout for batch jobs is 7 days or 10080 minutes. The default is 2880 minutes (48 hours) for batch jobs.

Any existing Glue jobs that have a greater timeout value are defaulted to 7 days. For instance you have specified a timeout of 20 days for a batch job, it will be stopped on the 7th day.

Streaming jobs must have timeout values less than 7 days or 10080 minutes. When the value is left blank, the job will be restarted after 7 days based if you have not setup a maintenance window. If you have setup maintenance window, it will be restarted during the maintenance window after 7 days.

TriggerName
Type: string

The name of the trigger that started this job run.

WorkerType
Type: string

The type of predefined worker that is allocated when a job runs. Accepts a value of G.1X, G.2X, G.4X, G.8X or G.025X for Spark jobs. Accepts the value Z.2X for Ray jobs.

  • For the G.1X worker type, each worker maps to 1 DPU (4 vCPUs, 16 GB of memory) with 84GB disk (approximately 34GB free), and provides 1 executor per worker. We recommend this worker type for workloads such as data transforms, joins, and queries, to offers a scalable and cost effective way to run most jobs.

  • For the G.2X worker type, each worker maps to 2 DPU (8 vCPUs, 32 GB of memory) with 128GB disk (approximately 77GB free), and provides 1 executor per worker. We recommend this worker type for workloads such as data transforms, joins, and queries, to offers a scalable and cost effective way to run most jobs.

  • For the G.4X worker type, each worker maps to 4 DPU (16 vCPUs, 64 GB of memory) with 256GB disk (approximately 235GB free), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for Glue version 3.0 or later Spark ETL jobs in the following Amazon Web Services Regions: US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm).

  • For the G.8X worker type, each worker maps to 8 DPU (32 vCPUs, 128 GB of memory) with 512GB disk (approximately 487GB free), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for Glue version 3.0 or later Spark ETL jobs, in the same Amazon Web Services Regions as supported for the G.4X worker type.

  • For the G.025X worker type, each worker maps to 0.25 DPU (2 vCPUs, 4 GB of memory) with 84GB disk (approximately 34GB free), and provides 1 executor per worker. We recommend this worker type for low volume streaming jobs. This worker type is only available for Glue version 3.0 streaming jobs.

  • For the Z.2X worker type, each worker maps to 2 M-DPU (8vCPUs, 64 GB of memory) with 128 GB disk (approximately 120GB free), and provides up to 8 Ray workers based on the autoscaler.

JobUpdate

Description

Specifies information used to update an existing job definition. The previous job definition is completely overwritten by this information.

Members
AllocatedCapacity
Type: int

This field is deprecated. Use MaxCapacity instead.

The number of Glue data processing units (DPUs) to allocate to this job. You can allocate a minimum of 2 DPUs; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.

CodeGenConfigurationNodes
Type: Associative array of custom strings keys (NodeId) to CodeGenConfigurationNode structures

The representation of a directed acyclic graph on which both the Glue Studio visual component and Glue Studio code generation is based.

Command
Type: JobCommand structure

The JobCommand that runs this job (required).

Connections
Type: ConnectionsList structure

The connections used for this job.

DefaultArguments
Type: Associative array of custom strings keys (GenericString) to strings

The default arguments for every run of this job, specified as name-value pairs.

You can specify arguments here that your own job-execution script consumes, as well as arguments that Glue itself consumes.

Job arguments may be logged. Do not pass plaintext secrets as arguments. Retrieve secrets from a Glue Connection, Secrets Manager or other secret management mechanism if you intend to keep them within the Job.

For information about how to specify and consume your own Job arguments, see the Calling Glue APIs in Python topic in the developer guide.

For information about the arguments you can provide to this field when configuring Spark jobs, see the Special Parameters Used by Glue topic in the developer guide.

For information about the arguments you can provide to this field when configuring Ray jobs, see Using job parameters in Ray jobs in the developer guide.

Description
Type: string

Description of the job being defined.

ExecutionClass
Type: string

Indicates whether the job is run with a standard or flexible execution class. The standard execution-class is ideal for time-sensitive workloads that require fast job startup and dedicated resources.

The flexible execution class is appropriate for time-insensitive jobs whose start and completion times may vary.

Only jobs with Glue version 3.0 and above and command type glueetl will be allowed to set ExecutionClass to FLEX. The flexible execution class is available for Spark jobs.

ExecutionProperty
Type: ExecutionProperty structure

An ExecutionProperty specifying the maximum number of concurrent runs allowed for this job.

GlueVersion
Type: string

In Spark jobs, GlueVersion determines the versions of Apache Spark and Python that Glue available in a job. The Python version indicates the version supported for jobs of type Spark.

Ray jobs should set GlueVersion to 4.0 or greater. However, the versions of Ray, Python and additional libraries available in your Ray job are determined by the Runtime parameter of the Job command.

For more information about the available Glue versions and corresponding Spark and Python versions, see Glue version in the developer guide.

Jobs that are created without specifying a Glue version default to Glue 0.9.

JobMode
Type: string

A mode that describes how a job was created. Valid values are:

  • SCRIPT - The job was created using the Glue Studio script editor.

  • VISUAL - The job was created using the Glue Studio visual editor.

  • NOTEBOOK - The job was created using an interactive sessions notebook.

When the JobMode field is missing or null, SCRIPT is assigned as the default value.

LogUri
Type: string

This field is reserved for future use.

MaintenanceWindow
Type: string

This field specifies a day of the week and hour for a maintenance window for streaming jobs. Glue periodically performs maintenance activities. During these maintenance windows, Glue will need to restart your streaming jobs.

Glue will restart the job within 3 hours of the specified maintenance window. For instance, if you set up the maintenance window for Monday at 10:00AM GMT, your jobs will be restarted between 10:00AM GMT to 1:00PM GMT.

MaxCapacity
Type: double

For Glue version 1.0 or earlier jobs, using the standard worker type, the number of Glue data processing units (DPUs) that can be allocated when this job runs. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.

For Glue version 2.0+ jobs, you cannot specify a Maximum capacity. Instead, you should specify a Worker type and the Number of workers.

Do not set MaxCapacity if using WorkerType and NumberOfWorkers.

The value that can be allocated for MaxCapacity depends on whether you are running a Python shell job, an Apache Spark ETL job, or an Apache Spark streaming ETL job:

  • When you specify a Python shell job (JobCommand.Name="pythonshell"), you can allocate either 0.0625 or 1 DPU. The default is 0.0625 DPU.

  • When you specify an Apache Spark ETL job (JobCommand.Name="glueetl") or Apache Spark streaming ETL job (JobCommand.Name="gluestreaming"), you can allocate from 2 to 100 DPUs. The default is 10 DPUs. This job type cannot have a fractional DPU allocation.

MaxRetries
Type: int

The maximum number of times to retry this job if it fails.

NonOverridableArguments
Type: Associative array of custom strings keys (GenericString) to strings

Arguments for this job that are not overridden when providing job arguments in a job run, specified as name-value pairs.

NotificationProperty
Type: NotificationProperty structure

Specifies the configuration properties of a job notification.

NumberOfWorkers
Type: int

The number of workers of a defined workerType that are allocated when a job runs.

Role
Type: string

The name or Amazon Resource Name (ARN) of the IAM role associated with this job (required).

SecurityConfiguration
Type: string

The name of the SecurityConfiguration structure to be used with this job.

SourceControlDetails
Type: SourceControlDetails structure

The details for a source control configuration for a job, allowing synchronization of job artifacts to or from a remote repository.

Timeout
Type: int

The job timeout in minutes. This is the maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).

WorkerType
Type: string

The type of predefined worker that is allocated when a job runs. Accepts a value of G.1X, G.2X, G.4X, G.8X or G.025X for Spark jobs. Accepts the value Z.2X for Ray jobs.

  • For the G.1X worker type, each worker maps to 1 DPU (4 vCPUs, 16 GB of memory) with 84GB disk (approximately 34GB free), and provides 1 executor per worker. We recommend this worker type for workloads such as data transforms, joins, and queries, to offers a scalable and cost effective way to run most jobs.

  • For the G.2X worker type, each worker maps to 2 DPU (8 vCPUs, 32 GB of memory) with 128GB disk (approximately 77GB free), and provides 1 executor per worker. We recommend this worker type for workloads such as data transforms, joins, and queries, to offers a scalable and cost effective way to run most jobs.

  • For the G.4X worker type, each worker maps to 4 DPU (16 vCPUs, 64 GB of memory) with 256GB disk (approximately 235GB free), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for Glue version 3.0 or later Spark ETL jobs in the following Amazon Web Services Regions: US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm).

  • For the G.8X worker type, each worker maps to 8 DPU (32 vCPUs, 128 GB of memory) with 512GB disk (approximately 487GB free), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for Glue version 3.0 or later Spark ETL jobs, in the same Amazon Web Services Regions as supported for the G.4X worker type.

  • For the G.025X worker type, each worker maps to 0.25 DPU (2 vCPUs, 4 GB of memory) with 84GB disk (approximately 34GB free), and provides 1 executor per worker. We recommend this worker type for low volume streaming jobs. This worker type is only available for Glue version 3.0 streaming jobs.

  • For the Z.2X worker type, each worker maps to 2 M-DPU (8vCPUs, 64 GB of memory) with 128 GB disk (approximately 120GB free), and provides up to 8 Ray workers based on the autoscaler.

Join

Description

Specifies a transform that joins two datasets into one dataset using a comparison phrase on the specified data property keys. You can use inner, outer, left, right, left semi, and left anti joins.

Members
Columns
Required: Yes
Type: Array of JoinColumn structures

A list of the two columns to be joined.

Inputs
Required: Yes
Type: Array of strings

The data inputs identified by their node names.

JoinType
Required: Yes
Type: string

Specifies the type of join to be performed on the datasets.

Name
Required: Yes
Type: string

The name of the transform node.

JoinColumn

Description

Specifies a column to be joined.

Members
From
Required: Yes
Type: string

The column to be joined.

Keys
Required: Yes
Type: Array of stringss

The key of the column to be joined.

JsonClassifier

Description

A classifier for JSON content.

Members
CreationTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The time that this classifier was registered.

JsonPath
Required: Yes
Type: string

A JsonPath string defining the JSON data for the classifier to classify. Glue supports a subset of JsonPath, as described in Writing JsonPath Custom Classifiers.

LastUpdated
Type: timestamp (string|DateTime or anything parsable by strtotime)

The time that this classifier was last updated.

Name
Required: Yes
Type: string

The name of the classifier.

Version
Type: long (int|float)

The version of this classifier.

KafkaStreamingSourceOptions

Description

Additional options for streaming.

Members
AddRecordTimestamp
Type: string

When this option is set to 'true', the data output will contain an additional column named "__src_timestamp" that indicates the time when the corresponding record received by the topic. The default value is 'false'. This option is supported in Glue version 4.0 or later.

Assign
Type: string

The specific TopicPartitions to consume. You must specify at least one of "topicName", "assign" or "subscribePattern".

BootstrapServers
Type: string

A list of bootstrap server URLs, for example, as b-1.vpc-test-2.o4q88o.c6.kafka.us-east-1.amazonaws.com:9094. This option must be specified in the API call or defined in the table metadata in the Data Catalog.

Classification
Type: string

An optional classification.

ConnectionName
Type: string

The name of the connection.

Delimiter
Type: string

Specifies the delimiter character.

EmitConsumerLagMetrics
Type: string

When this option is set to 'true', for each batch, it will emit the metrics for the duration between the oldest record received by the topic and the time it arrives in Glue to CloudWatch. The metric's name is "glue.driver.streaming.maxConsumerLagInMs". The default value is 'false'. This option is supported in Glue version 4.0 or later.

EndingOffsets
Type: string

The end point when a batch query is ended. Possible values are either "latest" or a JSON string that specifies an ending offset for each TopicPartition.

IncludeHeaders
Type: boolean

Whether to include the Kafka headers. When the option is set to "true", the data output will contain an additional column named "glue_streaming_kafka_headers" with type Array[Struct(key: String, value: String)]. The default value is "false". This option is available in Glue version 3.0 or later only.

MaxOffsetsPerTrigger
Type: long (int|float)

The rate limit on the maximum number of offsets that are processed per trigger interval. The specified total number of offsets is proportionally split across topicPartitions of different volumes. The default value is null, which means that the consumer reads all offsets until the known latest offset.

MinPartitions
Type: int

The desired minimum number of partitions to read from Kafka. The default value is null, which means that the number of spark partitions is equal to the number of Kafka partitions.

NumRetries
Type: int

The number of times to retry before failing to fetch Kafka offsets. The default value is 3.

PollTimeoutMs
Type: long (int|float)

The timeout in milliseconds to poll data from Kafka in Spark job executors. The default value is 512.

RetryIntervalMs
Type: long (int|float)

The time in milliseconds to wait before retrying to fetch Kafka offsets. The default value is 10.

SecurityProtocol
Type: string

The protocol used to communicate with brokers. The possible values are "SSL" or "PLAINTEXT".

StartingOffsets
Type: string

The starting position in the Kafka topic to read data from. The possible values are "earliest" or "latest". The default value is "latest".

StartingTimestamp
Type: timestamp (string|DateTime or anything parsable by strtotime)

The timestamp of the record in the Kafka topic to start reading data from. The possible values are a timestamp string in UTC format of the pattern yyyy-mm-ddTHH:MM:SSZ (where Z represents a UTC timezone offset with a +/-. For example: "2023-04-04T08:00:00+08:00").

Only one of StartingTimestamp or StartingOffsets must be set.

SubscribePattern
Type: string

A Java regex string that identifies the topic list to subscribe to. You must specify at least one of "topicName", "assign" or "subscribePattern".

TopicName
Type: string

The topic name as specified in Apache Kafka. You must specify at least one of "topicName", "assign" or "subscribePattern".

KeySchemaElement

Description

A partition key pair consisting of a name and a type.

Members
Name
Required: Yes
Type: string

The name of a partition key.

Type
Required: Yes
Type: string

The type of a partition key.

KinesisStreamingSourceOptions

Description

Additional options for the Amazon Kinesis streaming data source.

Members
AddIdleTimeBetweenReads
Type: boolean

Adds a time delay between two consecutive getRecords operations. The default value is "False". This option is only configurable for Glue version 2.0 and above.

AddRecordTimestamp
Type: string

When this option is set to 'true', the data output will contain an additional column named "__src_timestamp" that indicates the time when the corresponding record received by the stream. The default value is 'false'. This option is supported in Glue version 4.0 or later.

AvoidEmptyBatches
Type: boolean

Avoids creating an empty microbatch job by checking for unread data in the Kinesis data stream before the batch is started. The default value is "False".

Classification
Type: string

An optional classification.

Delimiter
Type: string

Specifies the delimiter character.

DescribeShardInterval
Type: long (int|float)

The minimum time interval between two ListShards API calls for your script to consider resharding. The default value is 1s.

EmitConsumerLagMetrics
Type: string

When this option is set to 'true', for each batch, it will emit the metrics for the duration between the oldest record received by the stream and the time it arrives in Glue to CloudWatch. The metric's name is "glue.driver.streaming.maxConsumerLagInMs". The default value is 'false'. This option is supported in Glue version 4.0 or later.

EndpointUrl
Type: string

The URL of the Kinesis endpoint.

IdleTimeBetweenReadsInMs
Type: long (int|float)

The minimum time delay between two consecutive getRecords operations, specified in ms. The default value is 1000. This option is only configurable for Glue version 2.0 and above.

MaxFetchRecordsPerShard
Type: long (int|float)

The maximum number of records to fetch per shard in the Kinesis data stream per microbatch. Note: The client can exceed this limit if the streaming job has already read extra records from Kinesis (in the same get-records call). If MaxFetchRecordsPerShard needs to be strict then it needs to be a multiple of MaxRecordPerRead. The default value is 100000.

MaxFetchTimeInMs
Type: long (int|float)

The maximum time spent for the job executor to read records for the current batch from the Kinesis data stream, specified in milliseconds (ms). Multiple GetRecords API calls may be made within this time. The default value is 1000.

MaxRecordPerRead
Type: long (int|float)

The maximum number of records to fetch from the Kinesis data stream in each getRecords operation. The default value is 10000.

MaxRetryIntervalMs
Type: long (int|float)

The maximum cool-off time period (specified in ms) between two retries of a Kinesis Data Streams API call. The default value is 10000.

NumRetries
Type: int

The maximum number of retries for Kinesis Data Streams API requests. The default value is 3.

RetryIntervalMs
Type: long (int|float)

The cool-off time period (specified in ms) before retrying the Kinesis Data Streams API call. The default value is 1000.

RoleArn
Type: string

The Amazon Resource Name (ARN) of the role to assume using AWS Security Token Service (AWS STS). This role must have permissions for describe or read record operations for the Kinesis data stream. You must use this parameter when accessing a data stream in a different account. Used in conjunction with "awsSTSSessionName".

RoleSessionName
Type: string

An identifier for the session assuming the role using AWS STS. You must use this parameter when accessing a data stream in a different account. Used in conjunction with "awsSTSRoleARN".

StartingPosition
Type: string

The starting position in the Kinesis data stream to read data from. The possible values are "latest", "trim_horizon", "earliest", or a timestamp string in UTC format in the pattern yyyy-mm-ddTHH:MM:SSZ (where Z represents a UTC timezone offset with a +/-. For example: "2023-04-04T08:00:00-04:00"). The default value is "latest".

Note: Using a value that is a timestamp string in UTC format for "startingPosition" is supported only for Glue version 4.0 or later.

StartingTimestamp
Type: timestamp (string|DateTime or anything parsable by strtotime)

The timestamp of the record in the Kinesis data stream to start reading data from. The possible values are a timestamp string in UTC format of the pattern yyyy-mm-ddTHH:MM:SSZ (where Z represents a UTC timezone offset with a +/-. For example: "2023-04-04T08:00:00+08:00").

StreamArn
Type: string

The Amazon Resource Name (ARN) of the Kinesis data stream.

StreamName
Type: string

The name of the Kinesis data stream.

LabelingSetGenerationTaskRunProperties

Description

Specifies configuration properties for a labeling set generation task run.

Members
OutputS3Path
Type: string

The Amazon Simple Storage Service (Amazon S3) path where you will generate the labeling set.

LakeFormationConfiguration

Description

Specifies Lake Formation configuration settings for the crawler.

Members
AccountId
Type: string

Required for cross account crawls. For same account crawls as the target data, this can be left as null.

UseLakeFormationCredentials
Type: boolean

Specifies whether to use Lake Formation credentials for the crawler instead of the IAM role credentials.

LastActiveDefinition

Description

When there are multiple versions of a blueprint and the latest version has some errors, this attribute indicates the last successful blueprint definition that is available with the service.

Members
BlueprintLocation
Type: string

Specifies a path in Amazon S3 where the blueprint is published by the Glue developer.

BlueprintServiceLocation
Type: string

Specifies a path in Amazon S3 where the blueprint is copied when you create or update the blueprint.

Description
Type: string

The description of the blueprint.

LastModifiedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time the blueprint was last modified.

ParameterSpec
Type: string

A JSON string specifying the parameters for the blueprint.

LastCrawlInfo

Description

Status and error information about the most recent crawl.

Members
ErrorMessage
Type: string

If an error occurred, the error information about the last crawl.

LogGroup
Type: string

The log group for the last crawl.

LogStream
Type: string

The log stream for the last crawl.

MessagePrefix
Type: string

The prefix for a message about this crawl.

StartTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The time at which the crawl started.

Status
Type: string

Status of the last crawl.

LineageConfiguration

Description

Specifies data lineage configuration settings for the crawler.

Members
CrawlerLineageSettings
Type: string

Specifies whether data lineage is enabled for the crawler. Valid values are:

  • ENABLE: enables data lineage for the crawler

  • DISABLE: disables data lineage for the crawler

Location

Description

The location of resources.

Members
DynamoDB
Type: Array of CodeGenNodeArg structures

An Amazon DynamoDB table location.

Jdbc
Type: Array of CodeGenNodeArg structures

A JDBC location.

S3
Type: Array of CodeGenNodeArg structures

An Amazon Simple Storage Service (Amazon S3) location.

LongColumnStatisticsData

Description

Defines column statistics supported for integer data columns.

Members
MaximumValue
Type: long (int|float)

The highest value in the column.

MinimumValue
Type: long (int|float)

The lowest value in the column.

NumberOfDistinctValues
Required: Yes
Type: long (int|float)

The number of distinct values in a column.

NumberOfNulls
Required: Yes
Type: long (int|float)

The number of null values in the column.

MLTransform

Description

A structure for a machine learning transform.

Members
CreatedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

A timestamp. The time and date that this machine learning transform was created.

Description
Type: string

A user-defined, long-form description text for the machine learning transform. Descriptions are not guaranteed to be unique and can be changed at any time.

EvaluationMetrics
Type: EvaluationMetrics structure

An EvaluationMetrics object. Evaluation metrics provide an estimate of the quality of your machine learning transform.

GlueVersion
Type: string

This value determines which version of Glue this machine learning transform is compatible with. Glue 1.0 is recommended for most customers. If the value is not set, the Glue compatibility defaults to Glue 0.9. For more information, see Glue Versions in the developer guide.

InputRecordTables
Type: Array of GlueTable structures

A list of Glue table definitions used by the transform.

LabelCount
Type: int

A count identifier for the labeling files generated by Glue for this transform. As you create a better transform, you can iteratively download, label, and upload the labeling file.

LastModifiedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

A timestamp. The last point in time when this machine learning transform was modified.

MaxCapacity
Type: double

The number of Glue data processing units (DPUs) that are allocated to task runs for this transform. You can allocate from 2 to 100 DPUs; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.

MaxCapacity is a mutually exclusive option with NumberOfWorkers and WorkerType.

  • If either NumberOfWorkers or WorkerType is set, then MaxCapacity cannot be set.

  • If MaxCapacity is set then neither NumberOfWorkers or WorkerType can be set.

  • If WorkerType is set, then NumberOfWorkers is required (and vice versa).

  • MaxCapacity and NumberOfWorkers must both be at least 1.

When the WorkerType field is set to a value other than Standard, the MaxCapacity field is set automatically and becomes read-only.

MaxRetries
Type: int

The maximum number of times to retry after an MLTaskRun of the machine learning transform fails.

Name
Type: string

A user-defined name for the machine learning transform. Names are not guaranteed unique and can be changed at any time.

NumberOfWorkers
Type: int

The number of workers of a defined workerType that are allocated when a task of the transform runs.

If WorkerType is set, then NumberOfWorkers is required (and vice versa).

Parameters
Type: TransformParameters structure

A TransformParameters object. You can use parameters to tune (customize) the behavior of the machine learning transform by specifying what data it learns from and your preference on various tradeoffs (such as precious vs. recall, or accuracy vs. cost).

Role
Type: string

The name or Amazon Resource Name (ARN) of the IAM role with the required permissions. The required permissions include both Glue service role permissions to Glue resources, and Amazon S3 permissions required by the transform.

  • This role needs Glue service role permissions to allow access to resources in Glue. See Attach a Policy to IAM Users That Access Glue.

  • This role needs permission to your Amazon Simple Storage Service (Amazon S3) sources, targets, temporary directory, scripts, and any libraries used by the task run for this transform.

Schema
Type: Array of SchemaColumn structures

A map of key-value pairs representing the columns and data types that this transform can run against. Has an upper bound of 100 columns.

Status
Type: string

The current status of the machine learning transform.

Timeout
Type: int

The timeout in minutes of the machine learning transform.

TransformEncryption
Type: TransformEncryption structure

The encryption-at-rest settings of the transform that apply to accessing user data. Machine learning transforms can access user data encrypted in Amazon S3 using KMS.

TransformId
Type: string

The unique transform ID that is generated for the machine learning transform. The ID is guaranteed to be unique and does not change.

WorkerType
Type: string

The type of predefined worker that is allocated when a task of this transform runs. Accepts a value of Standard, G.1X, or G.2X.

  • For the Standard worker type, each worker provides 4 vCPU, 16 GB of memory and a 50GB disk, and 2 executors per worker.

  • For the G.1X worker type, each worker provides 4 vCPU, 16 GB of memory and a 64GB disk, and 1 executor per worker.

  • For the G.2X worker type, each worker provides 8 vCPU, 32 GB of memory and a 128GB disk, and 1 executor per worker.

MaxCapacity is a mutually exclusive option with NumberOfWorkers and WorkerType.

  • If either NumberOfWorkers or WorkerType is set, then MaxCapacity cannot be set.

  • If MaxCapacity is set then neither NumberOfWorkers or WorkerType can be set.

  • If WorkerType is set, then NumberOfWorkers is required (and vice versa).

  • MaxCapacity and NumberOfWorkers must both be at least 1.

MLTransformNotReadyException

Description

The machine learning transform is not ready to run.

Members
Message
Type: string

A message describing the problem.

MLUserDataEncryption

Description

The encryption-at-rest settings of the transform that apply to accessing user data.

Members
KmsKeyId
Type: string

The ID for the customer-provided KMS key.

MlUserDataEncryptionMode
Required: Yes
Type: string

The encryption mode applied to user data. Valid values are:

  • DISABLED: encryption is disabled

  • SSEKMS: use of server-side encryption with Key Management Service (SSE-KMS) for user data stored in Amazon S3.

Mapping

Description

Specifies the mapping of data property keys.

Members
Children
Type: Array of Mapping structures

Only applicable to nested data structures. If you want to change the parent structure, but also one of its children, you can fill out this data strucutre. It is also Mapping, but its FromPath will be the parent's FromPath plus the FromPath from this structure.

For the children part, suppose you have the structure:

{ "FromPath": "OuterStructure", "ToKey": "OuterStructure", "ToType": "Struct", "Dropped": false, "Chidlren": [{ "FromPath": "inner", "ToKey": "inner", "ToType": "Double", "Dropped": false, }] }

You can specify a Mapping that looks like:

{ "FromPath": "OuterStructure", "ToKey": "OuterStructure", "ToType": "Struct", "Dropped": false, "Chidlren": [{ "FromPath": "inner", "ToKey": "inner", "ToType": "Double", "Dropped": false, }] }

Dropped
Type: boolean

If true, then the column is removed.

FromPath
Type: Array of strings

The table or column to be modified.

FromType
Type: string

The type of the data to be modified.

ToKey
Type: string

After the apply mapping, what the name of the column should be. Can be the same as FromPath.

ToType
Type: string

The data type that the data is to be modified to.

MappingEntry

Description

Defines a mapping.

Members
SourcePath
Type: string

The source path.

SourceTable
Type: string

The name of the source table.

SourceType
Type: string

The source type.

TargetPath
Type: string

The target path.

TargetTable
Type: string

The target table.

TargetType
Type: string

The target type.

Merge

Description

Specifies a transform that merges a DynamicFrame with a staging DynamicFrame based on the specified primary keys to identify records. Duplicate records (records with the same primary keys) are not de-duplicated.

Members
Inputs
Required: Yes
Type: Array of strings

The data inputs identified by their node names.

Name
Required: Yes
Type: string

The name of the transform node.

PrimaryKeys
Required: Yes
Type: Array of stringss

The list of primary key fields to match records from the source and staging dynamic frames.

Source
Required: Yes
Type: string

The source DynamicFrame that will be merged with a staging DynamicFrame.

MetadataInfo

Description

A structure containing metadata information for a schema version.

Members
CreatedTime
Type: string

The time at which the entry was created.

MetadataValue
Type: string

The metadata key’s corresponding value.

OtherMetadataValueList
Type: Array of OtherMetadataValueListItem structures

Other metadata belonging to the same metadata key.

MetadataKeyValuePair

Description

A structure containing a key value pair for metadata.

Members
MetadataKey
Type: string

A metadata key.

MetadataValue
Type: string

A metadata key’s corresponding value.

MetricBasedObservation

Description

Describes the metric based observation generated based on evaluated data quality metrics.

Members
MetricName
Type: string

The name of the data quality metric used for generating the observation.

MetricValues
Type: DataQualityMetricValues structure

An object of type DataQualityMetricValues representing the analysis of the data quality metric value.

NewRules
Type: Array of strings

A list of new data quality rules generated as part of the observation based on the data quality metric value.

MicrosoftSQLServerCatalogSource

Description

Specifies a Microsoft SQL server data source in the Glue Data Catalog.

Members
Database
Required: Yes
Type: string

The name of the database to read from.

Name
Required: Yes
Type: string

The name of the data source.

Table
Required: Yes
Type: string

The name of the table in the database to read from.

MicrosoftSQLServerCatalogTarget

Description

Specifies a target that uses Microsoft SQL.

Members
Database
Required: Yes
Type: string

The name of the database to write to.

Inputs
Required: Yes
Type: Array of strings

The nodes that are inputs to the data target.

Name
Required: Yes
Type: string

The name of the data target.

Table
Required: Yes
Type: string

The name of the table in the database to write to.

MongoDBTarget

Description

Specifies an Amazon DocumentDB or MongoDB data store to crawl.

Members
ConnectionName
Type: string

The name of the connection to use to connect to the Amazon DocumentDB or MongoDB target.

Path
Type: string

The path of the Amazon DocumentDB or MongoDB target (database/collection).

ScanAll
Type: boolean

Indicates whether to scan all the records, or to sample rows from the table. Scanning all the records can take a long time when the table is not a high throughput table.

A value of true means to scan all records, while a value of false means to sample the records. If no value is specified, the value defaults to true.

MySQLCatalogSource

Description

Specifies a MySQL data source in the Glue Data Catalog.

Members
Database
Required: Yes
Type: string

The name of the database to read from.

Name
Required: Yes
Type: string

The name of the data source.

Table
Required: Yes
Type: string

The name of the table in the database to read from.

MySQLCatalogTarget

Description

Specifies a target that uses MySQL.

Members
Database
Required: Yes
Type: string

The name of the database to write to.

Inputs
Required: Yes
Type: Array of strings

The nodes that are inputs to the data target.

Name
Required: Yes
Type: string

The name of the data target.

Table
Required: Yes
Type: string

The name of the table in the database to write to.

NoScheduleException

Description

There is no applicable schedule.

Members
Message
Type: string

A message describing the problem.

Node

Description

A node represents an Glue component (trigger, crawler, or job) on a workflow graph.

Members
CrawlerDetails
Type: CrawlerNodeDetails structure

Details of the crawler when the node represents a crawler.

JobDetails
Type: JobNodeDetails structure

Details of the Job when the node represents a Job.

Name
Type: string

The name of the Glue component represented by the node.

TriggerDetails
Type: TriggerNodeDetails structure

Details of the Trigger when the node represents a Trigger.

Type
Type: string

The type of Glue component represented by the node.

UniqueId
Type: string

The unique Id assigned to the node within the workflow.

NotificationProperty

Description

Specifies configuration properties of a notification.

Members
NotifyDelayAfter
Type: int

After a job run starts, the number of minutes to wait before sending a job run delay notification.

NullCheckBoxList

Description

Represents whether certain values are recognized as null values for removal.

Members
IsEmpty
Type: boolean

Specifies that an empty string is considered as a null value.

IsNegOne
Type: boolean

Specifies that an integer value of -1 is considered as a null value.

IsNullString
Type: boolean

Specifies that a value spelling out the word 'null' is considered as a null value.

NullValueField

Description

Represents a custom null value such as a zeros or other value being used as a null placeholder unique to the dataset.

Members
Datatype
Required: Yes
Type: Datatype structure

The datatype of the value.

Value
Required: Yes
Type: string

The value of the null placeholder.

OpenTableFormatInput

Description

A structure representing an open format table.

Members
IcebergInput
Type: IcebergInput structure

Specifies an IcebergInput structure that defines an Apache Iceberg metadata table.

OperationTimeoutException

Description

The operation timed out.

Members
Message
Type: string

A message describing the problem.

Option

Description

Specifies an option value.

Members
Description
Type: string

Specifies the description of the option.

Label
Type: string

Specifies the label of the option.

Value
Type: string

Specifies the value of the option.

OracleSQLCatalogSource

Description

Specifies an Oracle data source in the Glue Data Catalog.

Members
Database
Required: Yes
Type: string

The name of the database to read from.

Name
Required: Yes
Type: string

The name of the data source.

Table
Required: Yes
Type: string

The name of the table in the database to read from.

OracleSQLCatalogTarget

Description

Specifies a target that uses Oracle SQL.

Members
Database
Required: Yes
Type: string

The name of the database to write to.

Inputs
Required: Yes
Type: Array of strings

The nodes that are inputs to the data target.

Name
Required: Yes
Type: string

The name of the data target.

Table
Required: Yes
Type: string

The name of the table in the database to write to.

Order

Description

Specifies the sort order of a sorted column.

Members
Column
Required: Yes
Type: string

The name of the column.

SortOrder
Required: Yes
Type: int

Indicates that the column is sorted in ascending order (== 1), or in descending order (==0).

OtherMetadataValueListItem

Description

A structure containing other metadata for a schema version belonging to the same metadata key.

Members
CreatedTime
Type: string

The time at which the entry was created.

MetadataValue
Type: string

The metadata key’s corresponding value for the other metadata belonging to the same metadata key.

PIIDetection

Description

Specifies a transform that identifies, removes or masks PII data.

Members
EntityTypesToDetect
Required: Yes
Type: Array of strings

Indicates the types of entities the PIIDetection transform will identify as PII data.

PII type entities include: PERSON_NAME, DATE, USA_SNN, EMAIL, USA_ITIN, USA_PASSPORT_NUMBER, PHONE_NUMBER, BANK_ACCOUNT, IP_ADDRESS, MAC_ADDRESS, USA_CPT_CODE, USA_HCPCS_CODE, USA_NATIONAL_DRUG_CODE, USA_MEDICARE_BENEFICIARY_IDENTIFIER, USA_HEALTH_INSURANCE_CLAIM_NUMBER,CREDIT_CARD,USA_NATIONAL_PROVIDER_IDENTIFIER,USA_DEA_NUMBER,USA_DRIVING_LICENSE

Inputs
Required: Yes
Type: Array of strings

The node ID inputs to the transform.

MaskValue
Type: string

Indicates the value that will replace the detected entity.

Name
Required: Yes
Type: string

The name of the transform node.

OutputColumnName
Type: string

Indicates the output column name that will contain any entity type detected in that row.

PiiType
Required: Yes
Type: string

Indicates the type of PIIDetection transform.

SampleFraction
Type: double

Indicates the fraction of the data to sample when scanning for PII entities.

ThresholdFraction
Type: double

Indicates the fraction of the data that must be met in order for a column to be identified as PII data.

Partition

Description

Represents a slice of table data.

Members
CatalogId
Type: string

The ID of the Data Catalog in which the partition resides.

CreationTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The time at which the partition was created.

DatabaseName
Type: string

The name of the catalog database in which to create the partition.

LastAccessTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The last time at which the partition was accessed.

LastAnalyzedTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The last time at which column statistics were computed for this partition.

Parameters
Type: Associative array of custom strings keys (KeyString) to strings

These key-value pairs define partition parameters.

StorageDescriptor
Type: StorageDescriptor structure

Provides information about the physical location where the partition is stored.

TableName
Type: string

The name of the database table in which to create the partition.

Values
Type: Array of strings

The values of the partition.

PartitionError

Description

Contains information about a partition error.

Members
ErrorDetail
Type: ErrorDetail structure

The details about the partition error.

PartitionValues
Type: Array of strings

The values that define the partition.

PartitionIndex

Description

A structure for a partition index.

Members
IndexName
Required: Yes
Type: string

The name of the partition index.

Keys
Required: Yes
Type: Array of strings

The keys for the partition index.

PartitionIndexDescriptor

Description

A descriptor for a partition index in a table.

Members
BackfillErrors
Type: Array of BackfillError structures

A list of errors that can occur when registering partition indexes for an existing table.

IndexName
Required: Yes
Type: string

The name of the partition index.

IndexStatus
Required: Yes
Type: string

The status of the partition index.

The possible statuses are:

  • CREATING: The index is being created. When an index is in a CREATING state, the index or its table cannot be deleted.

  • ACTIVE: The index creation succeeds.

  • FAILED: The index creation fails.

  • DELETING: The index is deleted from the list of indexes.

Keys
Required: Yes
Type: Array of KeySchemaElement structures

A list of one or more keys, as KeySchemaElement structures, for the partition index.

PartitionInput

Description

The structure used to create and update a partition.

Members
LastAccessTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The last time at which the partition was accessed.

LastAnalyzedTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The last time at which column statistics were computed for this partition.

Parameters
Type: Associative array of custom strings keys (KeyString) to strings

These key-value pairs define partition parameters.

StorageDescriptor
Type: StorageDescriptor structure

Provides information about the physical location where the partition is stored.

Values
Type: Array of strings

The values of the partition. Although this parameter is not required by the SDK, you must specify this parameter for a valid input.

The values for the keys for the new partition must be passed as an array of String objects that must be ordered in the same order as the partition keys appearing in the Amazon S3 prefix. Otherwise Glue will add the values to the wrong keys.

PartitionValueList

Description

Contains a list of values defining partitions.

Members
Values
Required: Yes
Type: Array of strings

The list of values.

PermissionTypeMismatchException

Description

The operation timed out.

Members
Message
Type: string

There is a mismatch between the SupportedPermissionType used in the query request and the permissions defined on the target table.

PhysicalConnectionRequirements

Description

Specifies the physical requirements for a connection.

Members
AvailabilityZone
Type: string

The connection's Availability Zone. This field is redundant because the specified subnet implies the Availability Zone to be used. Currently the field must be populated, but it will be deprecated in the future.

SecurityGroupIdList
Type: Array of strings

The security group ID list used by the connection.

SubnetId
Type: string

The subnet ID used by the connection.

PostgreSQLCatalogSource

Description

Specifies a PostgresSQL data source in the Glue Data Catalog.

Members
Database
Required: Yes
Type: string

The name of the database to read from.

Name
Required: Yes
Type: string

The name of the data source.

Table
Required: Yes
Type: string

The name of the table in the database to read from.

PostgreSQLCatalogTarget

Description

Specifies a target that uses Postgres SQL.

Members
Database
Required: Yes
Type: string

The name of the database to write to.

Inputs
Required: Yes
Type: Array of strings

The nodes that are inputs to the data target.

Name
Required: Yes
Type: string

The name of the data target.

Table
Required: Yes
Type: string

The name of the table in the database to write to.

Predecessor

Description

A job run that was used in the predicate of a conditional trigger that triggered this job run.

Members
JobName
Type: string

The name of the job definition used by the predecessor job run.

RunId
Type: string

The job-run ID of the predecessor job run.

Predicate

Description

Defines the predicate of the trigger, which determines when it fires.

Members
Conditions
Type: Array of Condition structures

A list of the conditions that determine when the trigger will fire.

Logical
Type: string

An optional field if only one condition is listed. If multiple conditions are listed, then this field is required.

PrincipalPermissions

Description

Permissions granted to a principal.

Members
Permissions
Type: Array of strings

The permissions that are granted to the principal.

Principal
Type: DataLakePrincipal structure

The principal who is granted permissions.

PropertyPredicate

Description

Defines a property predicate.

Members
Comparator
Type: string

The comparator used to compare this property to others.

Key
Type: string

The key of the property.

Value
Type: string

The value of the property.

QuerySessionContext

Description

A structure used as a protocol between query engines and Lake Formation or Glue. Contains both a Lake Formation generated authorization identifier and information from the request's authorization context.

Members
AdditionalContext
Type: Associative array of custom strings keys (ContextKey) to strings

An opaque string-string map passed by the query engine.

ClusterId
Type: string

An identifier string for the consumer cluster.

QueryAuthorizationId
Type: string

A cryptographically generated query identifier generated by Glue or Lake Formation.

QueryId
Type: string

A unique identifier generated by the query engine for the query.

QueryStartTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

A timestamp provided by the query engine for when the query started.

Recipe

Description

A Glue Studio node that uses a Glue DataBrew recipe in Glue jobs.

Members
Inputs
Required: Yes
Type: Array of strings

The nodes that are inputs to the recipe node, identified by id.

Name
Required: Yes
Type: string

The name of the Glue Studio node.

RecipeReference
Required: Yes
Type: RecipeReference structure

A reference to the DataBrew recipe used by the node.

RecipeReference

Description

A reference to a Glue DataBrew recipe.

Members
RecipeArn
Required: Yes
Type: string

The ARN of the DataBrew recipe.

RecipeVersion
Required: Yes
Type: string

The RecipeVersion of the DataBrew recipe.

RecrawlPolicy

Description

When crawling an Amazon S3 data source after the first crawl is complete, specifies whether to crawl the entire dataset again or to crawl only folders that were added since the last crawler run. For more information, see Incremental Crawls in Glue in the developer guide.

Members
RecrawlBehavior
Type: string

Specifies whether to crawl the entire dataset again or to crawl only folders that were added since the last crawler run.

A value of CRAWL_EVERYTHING specifies crawling the entire dataset again.

A value of CRAWL_NEW_FOLDERS_ONLY specifies crawling only folders that were added since the last crawler run.

A value of CRAWL_EVENT_MODE specifies crawling only the changes identified by Amazon S3 events.

RedshiftSource

Description

Specifies an Amazon Redshift data store.

Members
Database
Required: Yes
Type: string

The database to read from.

Name
Required: Yes
Type: string

The name of the Amazon Redshift data store.

RedshiftTmpDir
Type: string

The Amazon S3 path where temporary data can be staged when copying out of the database.

Table
Required: Yes
Type: string

The database table to read from.

TmpDirIAMRole
Type: string

The IAM role with permissions.

RedshiftTarget

Description

Specifies a target that uses Amazon Redshift.

Members
Database
Required: Yes
Type: string

The name of the database to write to.

Inputs
Required: Yes
Type: Array of strings

The nodes that are inputs to the data target.

Name
Required: Yes
Type: string

The name of the data target.

RedshiftTmpDir
Type: string

The Amazon S3 path where temporary data can be staged when copying out of the database.

Table
Required: Yes
Type: string

The name of the table in the database to write to.

TmpDirIAMRole
Type: string

The IAM role with permissions.

UpsertRedshiftOptions
Type: UpsertRedshiftTargetOptions structure

The set of options to configure an upsert operation when writing to a Redshift target.

RegistryId

Description

A wrapper structure that may contain the registry name and Amazon Resource Name (ARN).

Members
RegistryArn
Type: string

Arn of the registry to be updated. One of RegistryArn or RegistryName has to be provided.

RegistryName
Type: string

Name of the registry. Used only for lookup. One of RegistryArn or RegistryName has to be provided.

RegistryListItem

Description

A structure containing the details for a registry.

Members
CreatedTime
Type: string

The data the registry was created.

Description
Type: string

A description of the registry.

RegistryArn
Type: string

The Amazon Resource Name (ARN) of the registry.

RegistryName
Type: string

The name of the registry.

Status
Type: string

The status of the registry.

UpdatedTime
Type: string

The date the registry was updated.

RelationalCatalogSource

Description

Specifies a Relational database data source in the Glue Data Catalog.

Members
Database
Required: Yes
Type: string

The name of the database to read from.

Name
Required: Yes
Type: string

The name of the data source.

Table
Required: Yes
Type: string

The name of the table in the database to read from.

RenameField

Description

Specifies a transform that renames a single data property key.

Members
Inputs
Required: Yes
Type: Array of strings

The data inputs identified by their node names.

Name
Required: Yes
Type: string

The name of the transform node.

SourcePath
Required: Yes
Type: Array of strings

A JSON path to a variable in the data structure for the source data.

TargetPath
Required: Yes
Type: Array of strings

A JSON path to a variable in the data structure for the target data.

ResourceNotReadyException

Description

A resource was not ready for a transaction.

Members
Message
Type: string

A message describing the problem.

ResourceNumberLimitExceededException

Description

A resource numerical limit was exceeded.

Members
Message
Type: string

A message describing the problem.

ResourceUri

Description

The URIs for function resources.

Members
ResourceType
Type: string

The type of the resource.

Uri
Type: string

The URI for accessing the resource.

RunMetrics

Description

Metrics for the optimizer run.

Members
JobDurationInHour
Type: string

The duration of the job in hours.

NumberOfBytesCompacted
Type: string

The number of bytes removed by the compaction job run.

NumberOfDpus
Type: string

The number of DPU hours consumed by the job.

NumberOfFilesCompacted
Type: string

The number of files removed by the compaction job run.

S3CatalogDeltaSource

Description

Specifies a Delta Lake data source that is registered in the Glue Data Catalog. The data source must be stored in Amazon S3.

Members
AdditionalDeltaOptions
Type: Associative array of custom strings keys (EnclosedInStringProperty) to strings

Specifies additional connection options.

Database
Required: Yes
Type: string

The name of the database to read from.

Name
Required: Yes
Type: string

The name of the Delta Lake data source.

OutputSchemas
Type: Array of GlueSchema structures

Specifies the data schema for the Delta Lake source.

Table
Required: Yes
Type: string

The name of the table in the database to read from.

S3CatalogHudiSource

Description

Specifies a Hudi data source that is registered in the Glue Data Catalog. The Hudi data source must be stored in Amazon S3.

Members
AdditionalHudiOptions
Type: Associative array of custom strings keys (EnclosedInStringProperty) to strings

Specifies additional connection options.

Database
Required: Yes
Type: string

The name of the database to read from.

Name
Required: Yes
Type: string

The name of the Hudi data source.

OutputSchemas
Type: Array of GlueSchema structures

Specifies the data schema for the Hudi source.

Table
Required: Yes
Type: string

The name of the table in the database to read from.

S3CatalogSource

Description

Specifies an Amazon S3 data store in the Glue Data Catalog.

Members
AdditionalOptions
Type: S3SourceAdditionalOptions structure

Specifies additional connection options.

Database
Required: Yes
Type: string

The database to read from.

Name
Required: Yes
Type: string

The name of the data store.

PartitionPredicate
Type: string

Partitions satisfying this predicate are deleted. Files within the retention period in these partitions are not deleted. Set to "" – empty by default.

Table
Required: Yes
Type: string

The database table to read from.

S3CatalogTarget

Description

Specifies a data target that writes to Amazon S3 using the Glue Data Catalog.

Members
Database
Required: Yes
Type: string

The name of the database to write to.

Inputs
Required: Yes
Type: Array of strings

The nodes that are inputs to the data target.

Name
Required: Yes
Type: string

The name of the data target.

PartitionKeys
Type: Array of stringss

Specifies native partitioning using a sequence of keys.

SchemaChangePolicy
Type: CatalogSchemaChangePolicy structure

A policy that specifies update behavior for the crawler.

Table
Required: Yes
Type: string

The name of the table in the database to write to.

S3CsvSource

Description

Specifies a command-separated value (CSV) data store stored in Amazon S3.

Members
AdditionalOptions

Specifies additional connection options.

CompressionType
Type: string

Specifies how the data is compressed. This is generally not necessary if the data has a standard file extension. Possible values are "gzip" and "bzip").

Escaper
Type: string

Specifies a character to use for escaping. This option is used only when reading CSV files. The default value is none. If enabled, the character which immediately follows is used as-is, except for a small set of well-known escapes (\n, \r, \t, and \0).

Exclusions
Type: Array of strings

A string containing a JSON list of Unix-style glob patterns to exclude. For example, "[\"**.pdf\"]" excludes all PDF files.

GroupFiles
Type: string

Grouping files is turned on by default when the input contains more than 50,000 files. To turn on grouping with fewer than 50,000 files, set this parameter to "inPartition". To disable grouping when there are more than 50,000 files, set this parameter to "none".

GroupSize
Type: string

The target group size in bytes. The default is computed based on the input data size and the size of your cluster. When there are fewer than 50,000 input files, "groupFiles" must be set to "inPartition" for this to take effect.

MaxBand
Type: int

This option controls the duration in milliseconds after which the s3 listing is likely to be consistent. Files with modification timestamps falling within the last maxBand milliseconds are tracked specially when using JobBookmarks to account for Amazon S3 eventual consistency. Most users don't need to set this option. The default is 900000 milliseconds, or 15 minutes.

MaxFilesInBand
Type: int

This option specifies the maximum number of files to save from the last maxBand seconds. If this number is exceeded, extra files are skipped and only processed in the next job run.

Multiline
Type: boolean

A Boolean value that specifies whether a single record can span multiple lines. This can occur when a field contains a quoted new-line character. You must set this option to True if any record spans multiple lines. The default value is False, which allows for more aggressive file-splitting during parsing.

Name
Required: Yes
Type: string

The name of the data store.

OptimizePerformance
Type: boolean

A Boolean value that specifies whether to use the advanced SIMD CSV reader along with Apache Arrow based columnar memory formats. Only available in Glue version 3.0.

OutputSchemas
Type: Array of GlueSchema structures

Specifies the data schema for the S3 CSV source.

Paths
Required: Yes
Type: Array of strings

A list of the Amazon S3 paths to read from.

QuoteChar
Required: Yes
Type: string

Specifies the character to use for quoting. The default is a double quote: '"'. Set this to -1 to turn off quoting entirely.

Recurse
Type: boolean

If set to true, recursively reads files in all subdirectories under the specified paths.

Separator
Required: Yes
Type: string

Specifies the delimiter character. The default is a comma: ",", but any other character can be specified.

SkipFirst
Type: boolean

A Boolean value that specifies whether to skip the first data line. The default value is False.

WithHeader
Type: boolean

A Boolean value that specifies whether to treat the first line as a header. The default value is False.

WriteHeader
Type: boolean

A Boolean value that specifies whether to write the header to output. The default value is True.

S3DeltaCatalogTarget

Description

Specifies a target that writes to a Delta Lake data source in the Glue Data Catalog.

Members
AdditionalOptions
Type: Associative array of custom strings keys (EnclosedInStringProperty) to strings

Specifies additional connection options for the connector.

Database
Required: Yes
Type: string

The name of the database to write to.

Inputs
Required: Yes
Type: Array of strings

The nodes that are inputs to the data target.

Name
Required: Yes
Type: string

The name of the data target.

PartitionKeys
Type: Array of stringss

Specifies native partitioning using a sequence of keys.

SchemaChangePolicy
Type: CatalogSchemaChangePolicy structure

A policy that specifies update behavior for the crawler.

Table
Required: Yes
Type: string

The name of the table in the database to write to.

S3DeltaDirectTarget

Description

Specifies a target that writes to a Delta Lake data source in Amazon S3.

Members
AdditionalOptions
Type: Associative array of custom strings keys (EnclosedInStringProperty) to strings

Specifies additional connection options for the connector.

Compression
Required: Yes
Type: string

Specifies how the data is compressed. This is generally not necessary if the data has a standard file extension. Possible values are "gzip" and "bzip").

Format
Required: Yes
Type: string

Specifies the data output format for the target.

Inputs
Required: Yes
Type: Array of strings

The nodes that are inputs to the data target.

Name
Required: Yes
Type: string

The name of the data target.

PartitionKeys
Type: Array of stringss

Specifies native partitioning using a sequence of keys.

Path
Required: Yes
Type: string

The Amazon S3 path of your Delta Lake data source to write to.

SchemaChangePolicy
Type: DirectSchemaChangePolicy structure

A policy that specifies update behavior for the crawler.

S3DeltaSource

Description

Specifies a Delta Lake data source stored in Amazon S3.

Members
AdditionalDeltaOptions
Type: Associative array of custom strings keys (EnclosedInStringProperty) to strings

Specifies additional connection options.

AdditionalOptions

Specifies additional options for the connector.

Name
Required: Yes
Type: string

The name of the Delta Lake source.

OutputSchemas
Type: Array of GlueSchema structures

Specifies the data schema for the Delta Lake source.

Paths
Required: Yes
Type: Array of strings

A list of the Amazon S3 paths to read from.

S3DirectSourceAdditionalOptions

Description

Specifies additional connection options for the Amazon S3 data store.

Members
BoundedFiles
Type: long (int|float)

Sets the upper limit for the target number of files that will be processed.

BoundedSize
Type: long (int|float)

Sets the upper limit for the target size of the dataset in bytes that will be processed.

EnableSamplePath
Type: boolean

Sets option to enable a sample path.

SamplePath
Type: string

If enabled, specifies the sample path.

S3DirectTarget

Description

Specifies a data target that writes to Amazon S3.

Members
Compression
Type: string

Specifies how the data is compressed. This is generally not necessary if the data has a standard file extension. Possible values are "gzip" and "bzip").

Format
Required: Yes
Type: string

Specifies the data output format for the target.

Inputs
Required: Yes
Type: Array of strings

The nodes that are inputs to the data target.

Name
Required: Yes
Type: string

The name of the data target.

PartitionKeys
Type: Array of stringss

Specifies native partitioning using a sequence of keys.

Path
Required: Yes
Type: string

A single Amazon S3 path to write to.

SchemaChangePolicy
Type: DirectSchemaChangePolicy structure

A policy that specifies update behavior for the crawler.

S3Encryption

Description

Specifies how Amazon Simple Storage Service (Amazon S3) data should be encrypted.

Members
KmsKeyArn
Type: string

The Amazon Resource Name (ARN) of the KMS key to be used to encrypt the data.

S3EncryptionMode
Type: string

The encryption mode to use for Amazon S3 data.

S3GlueParquetTarget

Description

Specifies a data target that writes to Amazon S3 in Apache Parquet columnar storage.

Members
Compression
Type: string

Specifies how the data is compressed. This is generally not necessary if the data has a standard file extension. Possible values are "gzip" and "bzip").

Inputs
Required: Yes
Type: Array of strings

The nodes that are inputs to the data target.

Name
Required: Yes
Type: string

The name of the data target.

PartitionKeys
Type: Array of stringss

Specifies native partitioning using a sequence of keys.

Path
Required: Yes
Type: string

A single Amazon S3 path to write to.

SchemaChangePolicy
Type: DirectSchemaChangePolicy structure

A policy that specifies update behavior for the crawler.

S3HudiCatalogTarget

Description

Specifies a target that writes to a Hudi data source in the Glue Data Catalog.

Members
AdditionalOptions
Required: Yes
Type: Associative array of custom strings keys (EnclosedInStringProperty) to strings

Specifies additional connection options for the connector.

Database
Required: Yes
Type: string

The name of the database to write to.

Inputs
Required: Yes
Type: Array of strings

The nodes that are inputs to the data target.

Name
Required: Yes
Type: string

The name of the data target.

PartitionKeys
Type: Array of stringss

Specifies native partitioning using a sequence of keys.

SchemaChangePolicy
Type: CatalogSchemaChangePolicy structure

A policy that specifies update behavior for the crawler.

Table
Required: Yes
Type: string

The name of the table in the database to write to.

S3HudiDirectTarget

Description

Specifies a target that writes to a Hudi data source in Amazon S3.

Members
AdditionalOptions
Required: Yes
Type: Associative array of custom strings keys (EnclosedInStringProperty) to strings

Specifies additional connection options for the connector.

Compression
Required: Yes
Type: string

Specifies how the data is compressed. This is generally not necessary if the data has a standard file extension. Possible values are "gzip" and "bzip").

Format
Required: Yes
Type: string

Specifies the data output format for the target.

Inputs
Required: Yes
Type: Array of strings

The nodes that are inputs to the data target.

Name
Required: Yes
Type: string

The name of the data target.

PartitionKeys
Type: Array of stringss

Specifies native partitioning using a sequence of keys.

Path
Required: Yes
Type: string

The Amazon S3 path of your Hudi data source to write to.

SchemaChangePolicy
Type: DirectSchemaChangePolicy structure

A policy that specifies update behavior for the crawler.

S3HudiSource

Description

Specifies a Hudi data source stored in Amazon S3.

Members
AdditionalHudiOptions
Type: Associative array of custom strings keys (EnclosedInStringProperty) to strings

Specifies additional connection options.

AdditionalOptions

Specifies additional options for the connector.

Name
Required: Yes
Type: string

The name of the Hudi source.

OutputSchemas
Type: Array of GlueSchema structures

Specifies the data schema for the Hudi source.

Paths
Required: Yes
Type: Array of strings

A list of the Amazon S3 paths to read from.

S3JsonSource

Description

Specifies a JSON data store stored in Amazon S3.

Members
AdditionalOptions

Specifies additional connection options.

CompressionType
Type: string

Specifies how the data is compressed. This is generally not necessary if the data has a standard file extension. Possible values are "gzip" and "bzip").

Exclusions
Type: Array of strings

A string containing a JSON list of Unix-style glob patterns to exclude. For example, "[\"**.pdf\"]" excludes all PDF files.

GroupFiles
Type: string

Grouping files is turned on by default when the input contains more than 50,000 files. To turn on grouping with fewer than 50,000 files, set this parameter to "inPartition". To disable grouping when there are more than 50,000 files, set this parameter to "none".

GroupSize
Type: string

The target group size in bytes. The default is computed based on the input data size and the size of your cluster. When there are fewer than 50,000 input files, "groupFiles" must be set to "inPartition" for this to take effect.

JsonPath
Type: string

A JsonPath string defining the JSON data.

MaxBand
Type: int

This option controls the duration in milliseconds after which the s3 listing is likely to be consistent. Files with modification timestamps falling within the last maxBand milliseconds are tracked specially when using JobBookmarks to account for Amazon S3 eventual consistency. Most users don't need to set this option. The default is 900000 milliseconds, or 15 minutes.

MaxFilesInBand
Type: int

This option specifies the maximum number of files to save from the last maxBand seconds. If this number is exceeded, extra files are skipped and only processed in the next job run.

Multiline
Type: boolean

A Boolean value that specifies whether a single record can span multiple lines. This can occur when a field contains a quoted new-line character. You must set this option to True if any record spans multiple lines. The default value is False, which allows for more aggressive file-splitting during parsing.

Name
Required: Yes
Type: string

The name of the data store.

OutputSchemas
Type: Array of GlueSchema structures

Specifies the data schema for the S3 JSON source.

Paths
Required: Yes
Type: Array of strings

A list of the Amazon S3 paths to read from.

Recurse
Type: boolean

If set to true, recursively reads files in all subdirectories under the specified paths.

S3ParquetSource

Description

Specifies an Apache Parquet data store stored in Amazon S3.

Members
AdditionalOptions

Specifies additional connection options.

CompressionType
Type: string

Specifies how the data is compressed. This is generally not necessary if the data has a standard file extension. Possible values are "gzip" and "bzip").

Exclusions
Type: Array of strings

A string containing a JSON list of Unix-style glob patterns to exclude. For example, "[\"**.pdf\"]" excludes all PDF files.

GroupFiles
Type: string

Grouping files is turned on by default when the input contains more than 50,000 files. To turn on grouping with fewer than 50,000 files, set this parameter to "inPartition". To disable grouping when there are more than 50,000 files, set this parameter to "none".

GroupSize
Type: string

The target group size in bytes. The default is computed based on the input data size and the size of your cluster. When there are fewer than 50,000 input files, "groupFiles" must be set to "inPartition" for this to take effect.

MaxBand
Type: int

This option controls the duration in milliseconds after which the s3 listing is likely to be consistent. Files with modification timestamps falling within the last maxBand milliseconds are tracked specially when using JobBookmarks to account for Amazon S3 eventual consistency. Most users don't need to set this option. The default is 900000 milliseconds, or 15 minutes.

MaxFilesInBand
Type: int

This option specifies the maximum number of files to save from the last maxBand seconds. If this number is exceeded, extra files are skipped and only processed in the next job run.

Name
Required: Yes
Type: string

The name of the data store.

OutputSchemas
Type: Array of GlueSchema structures

Specifies the data schema for the S3 Parquet source.

Paths
Required: Yes
Type: Array of strings

A list of the Amazon S3 paths to read from.

Recurse
Type: boolean

If set to true, recursively reads files in all subdirectories under the specified paths.

S3SourceAdditionalOptions

Description

Specifies additional connection options for the Amazon S3 data store.

Members
BoundedFiles
Type: long (int|float)

Sets the upper limit for the target number of files that will be processed.

BoundedSize
Type: long (int|float)

Sets the upper limit for the target size of the dataset in bytes that will be processed.

S3Target

Description

Specifies a data store in Amazon Simple Storage Service (Amazon S3).

Members
ConnectionName
Type: string

The name of a connection which allows a job or crawler to access data in Amazon S3 within an Amazon Virtual Private Cloud environment (Amazon VPC).

DlqEventQueueArn
Type: string

A valid Amazon dead-letter SQS ARN. For example, arn:aws:sqs:region:account:deadLetterQueue.

EventQueueArn
Type: string

A valid Amazon SQS ARN. For example, arn:aws:sqs:region:account:sqs.

Exclusions
Type: Array of strings

A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler.

Path
Type: string

The path to the Amazon S3 target.

SampleSize
Type: int

Sets the number of files in each leaf folder to be crawled when crawling sample files in a dataset. If not set, all the files are crawled. A valid value is an integer between 1 and 249.

Schedule

Description

A scheduling object using a cron statement to schedule an event.

Members
ScheduleExpression
Type: string

A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *).

State
Type: string

The state of the schedule.

SchedulerNotRunningException

Description

The specified scheduler is not running.

Members
Message
Type: string

A message describing the problem.

SchedulerRunningException

Description

The specified scheduler is already running.

Members
Message
Type: string

A message describing the problem.

SchedulerTransitioningException

Description

The specified scheduler is transitioning.

Members
Message
Type: string

A message describing the problem.

SchemaChangePolicy

Description

A policy that specifies update and deletion behaviors for the crawler.

Members
DeleteBehavior
Type: string

The deletion behavior when the crawler finds a deleted object.

UpdateBehavior
Type: string

The update behavior when the crawler finds a changed schema.

SchemaColumn

Description

A key-value pair representing a column and data type that this transform can run against. The Schema parameter of the MLTransform may contain up to 100 of these structures.

Members
DataType
Type: string

The type of data in the column.

Name
Type: string

The name of the column.

SchemaId

Description

The unique ID of the schema in the Glue schema registry.

Members
RegistryName
Type: string

The name of the schema registry that contains the schema.

SchemaArn
Type: string

The Amazon Resource Name (ARN) of the schema. One of SchemaArn or SchemaName has to be provided.

SchemaName
Type: string

The name of the schema. One of SchemaArn or SchemaName has to be provided.

SchemaListItem

Description

An object that contains minimal details for a schema.

Members
CreatedTime
Type: string

The date and time that a schema was created.

Description
Type: string

A description for the schema.

RegistryName
Type: string

the name of the registry where the schema resides.

SchemaArn
Type: string

The Amazon Resource Name (ARN) for the schema.

SchemaName
Type: string

The name of the schema.

SchemaStatus
Type: string

The status of the schema.

UpdatedTime
Type: string

The date and time that a schema was updated.

SchemaReference

Description

An object that references a schema stored in the Glue Schema Registry.

Members
SchemaId
Type: SchemaId structure

A structure that contains schema identity fields. Either this or the SchemaVersionId has to be provided.

SchemaVersionId
Type: string

The unique ID assigned to a version of the schema. Either this or the SchemaId has to be provided.

SchemaVersionNumber
Type: long (int|float)

The version number of the schema.

SchemaVersionErrorItem

Description

An object that contains the error details for an operation on a schema version.

Members
ErrorDetails
Type: ErrorDetails structure

The details of the error for the schema version.

VersionNumber
Type: long (int|float)

The version number of the schema.

SchemaVersionListItem

Description

An object containing the details about a schema version.

Members
CreatedTime
Type: string

The date and time the schema version was created.

SchemaArn
Type: string

The Amazon Resource Name (ARN) of the schema.

SchemaVersionId
Type: string

The unique identifier of the schema version.

Status
Type: string

The status of the schema version.

VersionNumber
Type: long (int|float)

The version number of the schema.

SchemaVersionNumber

Description

A structure containing the schema version information.

Members
LatestVersion
Type: boolean

The latest version available for the schema.

VersionNumber
Type: long (int|float)

The version number of the schema.

SecurityConfiguration

Description

Specifies a security configuration.

Members
CreatedTimeStamp
Type: timestamp (string|DateTime or anything parsable by strtotime)

The time at which this security configuration was created.

EncryptionConfiguration
Type: EncryptionConfiguration structure

The encryption configuration associated with this security configuration.

Name
Type: string

The name of the security configuration.

Segment

Description

Defines a non-overlapping region of a table's partitions, allowing multiple requests to be run in parallel.

Members
SegmentNumber
Required: Yes
Type: int

The zero-based index number of the segment. For example, if the total number of segments is 4, SegmentNumber values range from 0 through 3.

TotalSegments
Required: Yes
Type: int

The total number of segments.

SelectFields

Description

Specifies a transform that chooses the data property keys that you want to keep.

Members
Inputs
Required: Yes
Type: Array of strings

The data inputs identified by their node names.

Name
Required: Yes
Type: string

The name of the transform node.

Paths
Required: Yes
Type: Array of stringss

A JSON path to a variable in the data structure.

SelectFromCollection

Description

Specifies a transform that chooses one DynamicFrame from a collection of DynamicFrames. The output is the selected DynamicFrame

Members
Index
Required: Yes
Type: int

The index for the DynamicFrame to be selected.

Inputs
Required: Yes
Type: Array of strings

The data inputs identified by their node names.

Name
Required: Yes
Type: string

The name of the transform node.

SerDeInfo

Description

Information about a serialization/deserialization program (SerDe) that serves as an extractor and loader.

Members
Name
Type: string

Name of the SerDe.

Parameters
Type: Associative array of custom strings keys (KeyString) to strings

These key-value pairs define initialization parameters for the SerDe.

SerializationLibrary
Type: string

Usually the class that implements the SerDe. An example is org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe.

Session

Description

The period in which a remote Spark runtime environment is running.

Members
Command
Type: SessionCommand structure

The command object.See SessionCommand.

CompletedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time that this session is completed.

Connections
Type: ConnectionsList structure

The number of connections used for the session.

CreatedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The time and date when the session was created.

DPUSeconds
Type: double

The DPUs consumed by the session (formula: ExecutionTime * MaxCapacity).

DefaultArguments
Type: Associative array of custom strings keys (OrchestrationNameString) to strings

A map array of key-value pairs. Max is 75 pairs.

Description
Type: string

The description of the session.

ErrorMessage
Type: string

The error message displayed during the session.

ExecutionTime
Type: double

The total time the session ran for.

GlueVersion
Type: string

The Glue version determines the versions of Apache Spark and Python that Glue supports. The GlueVersion must be greater than 2.0.

Id
Type: string

The ID of the session.

IdleTimeout
Type: int

The number of minutes when idle before the session times out.

MaxCapacity
Type: double

The number of Glue data processing units (DPUs) that can be allocated when the job runs. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB memory.

NumberOfWorkers
Type: int

The number of workers of a defined WorkerType to use for the session.

Progress
Type: double

The code execution progress of the session.

Role
Type: string

The name or Amazon Resource Name (ARN) of the IAM role associated with the Session.

SecurityConfiguration
Type: string

The name of the SecurityConfiguration structure to be used with the session.

Status
Type: string

The session status.

WorkerType
Type: string

The type of predefined worker that is allocated when a session runs. Accepts a value of G.1X, G.2X, G.4X, or G.8X for Spark sessions. Accepts the value Z.2X for Ray sessions.

SessionCommand

Description

The SessionCommand that runs the job.

Members
Name
Type: string

Specifies the name of the SessionCommand. Can be 'glueetl' or 'gluestreaming'.

PythonVersion
Type: string

Specifies the Python version. The Python version indicates the version supported for jobs of type Spark.

SkewedInfo

Description

Specifies skewed values in a table. Skewed values are those that occur with very high frequency.

Members
SkewedColumnNames
Type: Array of strings

A list of names of columns that contain skewed values.

SkewedColumnValueLocationMaps
Type: Associative array of custom strings keys (ColumnValuesString) to strings

A mapping of skewed values to the columns that contain them.

SkewedColumnValues
Type: Array of strings

A list of values that appear so frequently as to be considered skewed.

SnowflakeNodeData

Description

Specifies configuration for Snowflake nodes in Glue Studio.

Members
Action
Type: string

Specifies what action to take when writing to a table with preexisting data. Valid values: append, merge, truncate, drop.

AdditionalOptions
Type: Associative array of custom strings keys (EnclosedInStringProperty) to strings

Specifies additional options passed to the Snowflake connector. If options are specified elsewhere in this node, this will take precedence.

AutoPushdown
Type: boolean

Specifies whether automatic query pushdown is enabled. If pushdown is enabled, then when a query is run on Spark, if part of the query can be "pushed down" to the Snowflake server, it is pushed down. This improves performance of some queries.

Connection
Type: Option structure

Specifies a Glue Data Catalog Connection to a Snowflake endpoint.

Database
Type: string

Specifies a Snowflake database for your node to use.

IamRole
Type: Option structure

Not currently used.

MergeAction
Type: string

Specifies a merge action. Valid values: simple, custom. If simple, merge behavior is defined by MergeWhenMatched and MergeWhenNotMatched. If custom, defined by MergeClause.

MergeClause
Type: string

A SQL statement that specifies a custom merge behavior.

MergeWhenMatched
Type: string

Specifies how to resolve records that match preexisting data when merging. Valid values: update, delete.

MergeWhenNotMatched
Type: string

Specifies how to process records that do not match preexisting data when merging. Valid values: insert, none.

PostAction
Type: string

A SQL string run after the Snowflake connector performs its standard actions.

PreAction
Type: string

A SQL string run before the Snowflake connector performs its standard actions.

SampleQuery
Type: string

A SQL string used to retrieve data with the query sourcetype.

Schema
Type: string

Specifies a Snowflake database schema for your node to use.

SelectedColumns
Type: Array of Option structures

Specifies the columns combined to identify a record when detecting matches for merges and upserts. A list of structures with value, label and description keys. Each structure describes a column.

SourceType
Type: string

Specifies how retrieved data is specified. Valid values: "table", "query".

StagingTable
Type: string

The name of a staging table used when performing merge or upsert append actions. Data is written to this table, then moved to table by a generated postaction.

Table
Type: string

Specifies a Snowflake table for your node to use.

TableSchema
Type: Array of Option structures

Manually defines the target schema for the node. A list of structures with value , label and description keys. Each structure defines a column.

TempDir
Type: string

Not currently used.

Upsert
Type: boolean

Used when Action is append. Specifies the resolution behavior when a row already exists. If true, preexisting rows will be updated. If false, those rows will be inserted.

SnowflakeSource

Description

Specifies a Snowflake data source.

Members
Data
Required: Yes
Type: SnowflakeNodeData structure

Configuration for the Snowflake data source.

Name
Required: Yes
Type: string

The name of the Snowflake data source.

OutputSchemas
Type: Array of GlueSchema structures

Specifies user-defined schemas for your output data.

SnowflakeTarget

Description

Specifies a Snowflake target.

Members
Data
Required: Yes
Type: SnowflakeNodeData structure

Specifies the data of the Snowflake target node.

Inputs
Type: Array of strings

The nodes that are inputs to the data target.

Name
Required: Yes
Type: string

The name of the Snowflake target.

SortCriterion

Description

Specifies a field to sort by and a sort order.

Members
FieldName
Type: string

The name of the field on which to sort.

Sort
Type: string

An ascending or descending sort.

SourceControlDetails

Description

The details for a source control configuration for a job, allowing synchronization of job artifacts to or from a remote repository.

Members
AuthStrategy
Type: string

The type of authentication, which can be an authentication token stored in Amazon Web Services Secrets Manager, or a personal access token.

AuthToken
Type: string

The value of an authorization token.

Branch
Type: string

An optional branch in the remote repository.

Folder
Type: string

An optional folder in the remote repository.

LastCommitId
Type: string

The last commit ID for a commit in the remote repository.

Owner
Type: string

The owner of the remote repository that contains the job artifacts.

Provider
Type: string

The provider for the remote repository.

Repository
Type: string

The name of the remote repository that contains the job artifacts.

SparkConnectorSource

Description

Specifies a connector to an Apache Spark data source.

Members
AdditionalOptions
Type: Associative array of custom strings keys (EnclosedInStringProperty) to strings

Additional connection options for the connector.

ConnectionName
Required: Yes
Type: string

The name of the connection that is associated with the connector.

ConnectionType
Required: Yes
Type: string

The type of connection, such as marketplace.spark or custom.spark, designating a connection to an Apache Spark data store.

ConnectorName
Required: Yes
Type: string

The name of a connector that assists with accessing the data store in Glue Studio.

Name
Required: Yes
Type: string

The name of the data source.

OutputSchemas
Type: Array of GlueSchema structures

Specifies data schema for the custom spark source.

SparkConnectorTarget

Description

Specifies a target that uses an Apache Spark connector.

Members
AdditionalOptions
Type: Associative array of custom strings keys (EnclosedInStringProperty) to strings

Additional connection options for the connector.

ConnectionName
Required: Yes
Type: string

The name of a connection for an Apache Spark connector.

ConnectionType
Required: Yes
Type: string

The type of connection, such as marketplace.spark or custom.spark, designating a connection to an Apache Spark data store.

ConnectorName
Required: Yes
Type: string

The name of an Apache Spark connector.

Inputs
Required: Yes
Type: Array of strings

The nodes that are inputs to the data target.

Name
Required: Yes
Type: string

The name of the data target.

OutputSchemas
Type: Array of GlueSchema structures

Specifies the data schema for the custom spark target.

SparkSQL

Description

Specifies a transform where you enter a SQL query using Spark SQL syntax to transform the data. The output is a single DynamicFrame.

Members
Inputs
Required: Yes
Type: Array of strings

The data inputs identified by their node names. You can associate a table name with each input node to use in the SQL query. The name you choose must meet the Spark SQL naming restrictions.

Name
Required: Yes
Type: string

The name of the transform node.

OutputSchemas
Type: Array of GlueSchema structures

Specifies the data schema for the SparkSQL transform.

SqlAliases
Required: Yes
Type: Array of SqlAlias structures

A list of aliases. An alias allows you to specify what name to use in the SQL for a given input. For example, you have a datasource named "MyDataSource". If you specify From as MyDataSource, and Alias as SqlName, then in your SQL you can do:

select * from SqlName

and that gets data from MyDataSource.

SqlQuery
Required: Yes
Type: string

A SQL query that must use Spark SQL syntax and return a single data set.

Spigot

Description

Specifies a transform that writes samples of the data to an Amazon S3 bucket.

Members
Inputs
Required: Yes
Type: Array of strings

The data inputs identified by their node names.

Name
Required: Yes
Type: string

The name of the transform node.

Path
Required: Yes
Type: string

A path in Amazon S3 where the transform will write a subset of records from the dataset to a JSON file in an Amazon S3 bucket.

Prob
Type: double

The probability (a decimal value with a maximum value of 1) of picking any given record. A value of 1 indicates that each row read from the dataset should be included in the sample output.

Topk
Type: int

Specifies a number of records to write starting from the beginning of the dataset.

SplitFields

Description

Specifies a transform that splits data property keys into two DynamicFrames. The output is a collection of DynamicFrames: one with selected data property keys, and one with the remaining data property keys.

Members
Inputs
Required: Yes
Type: Array of strings

The data inputs identified by their node names.

Name
Required: Yes
Type: string

The name of the transform node.

Paths
Required: Yes
Type: Array of stringss

A JSON path to a variable in the data structure.

SqlAlias

Description

Represents a single entry in the list of values for SqlAliases.

Members
Alias
Required: Yes
Type: string

A temporary name given to a table, or a column in a table.

From
Required: Yes
Type: string

A table, or a column in a table.

StartingEventBatchCondition

Description

The batch condition that started the workflow run. Either the number of events in the batch size arrived, in which case the BatchSize member is non-zero, or the batch window expired, in which case the BatchWindow member is non-zero.

Members
BatchSize
Type: int

Number of events in the batch.

BatchWindow
Type: int

Duration of the batch window in seconds.

Statement

Description

The statement or request for a particular action to occur in a session.

Members
Code
Type: string

The execution code of the statement.

CompletedOn
Type: long (int|float)

The unix time and date that the job definition was completed.

Id
Type: int

The ID of the statement.

Output
Type: StatementOutput structure

The output in JSON.

Progress
Type: double

The code execution progress.

StartedOn
Type: long (int|float)

The unix time and date that the job definition was started.

State
Type: string

The state while request is actioned.

StatementOutput

Description

The code execution output in JSON format.

Members
Data
Type: StatementOutputData structure

The code execution output.

ErrorName
Type: string

The name of the error in the output.

ErrorValue
Type: string

The error value of the output.

ExecutionCount
Type: int

The execution count of the output.

Status
Type: string

The status of the code execution output.

Traceback
Type: Array of strings

The traceback of the output.

StatementOutputData

Description

The code execution output in JSON format.

Members
TextPlain
Type: string

The code execution output in text format.

StorageDescriptor

Description

Describes the physical storage of table data.

Members
AdditionalLocations
Type: Array of strings

A list of locations that point to the path where a Delta table is located.

BucketColumns
Type: Array of strings

A list of reducer grouping columns, clustering columns, and bucketing columns in the table.

Columns
Type: Array of Column structures

A list of the Columns in the table.

Compressed
Type: boolean

True if the data in the table is compressed, or False if not.

InputFormat
Type: string

The input format: SequenceFileInputFormat (binary), or TextInputFormat, or a custom format.

Location
Type: string

The physical location of the table. By default, this takes the form of the warehouse location, followed by the database location in the warehouse, followed by the table name.

NumberOfBuckets
Type: int

Must be specified if the table contains any dimension columns.

OutputFormat
Type: string

The output format: SequenceFileOutputFormat (binary), or IgnoreKeyTextOutputFormat, or a custom format.

Parameters
Type: Associative array of custom strings keys (KeyString) to strings

The user-supplied properties in key-value form.

SchemaReference
Type: SchemaReference structure

An object that references a schema stored in the Glue Schema Registry.

When creating a table, you can pass an empty list of columns for the schema, and instead use a schema reference.

SerdeInfo
Type: SerDeInfo structure

The serialization/deserialization (SerDe) information.

SkewedInfo
Type: SkewedInfo structure

The information about values that appear frequently in a column (skewed values).

SortColumns
Type: Array of Order structures

A list specifying the sort order of each bucket in the table.

StoredAsSubDirectories
Type: boolean

True if the table data is stored in subdirectories, or False if not.

StreamingDataPreviewOptions

Description

Specifies options related to data preview for viewing a sample of your data.

Members
PollingTime
Type: long (int|float)

The polling time in milliseconds.

RecordPollingLimit
Type: long (int|float)

The limit to the number of records polled.

StringColumnStatisticsData

Description

Defines column statistics supported for character sequence data values.

Members
AverageLength
Required: Yes
Type: double

The average string length in the column.

MaximumLength
Required: Yes
Type: long (int|float)

The size of the longest string in the column.

NumberOfDistinctValues
Required: Yes
Type: long (int|float)

The number of distinct values in a column.

NumberOfNulls
Required: Yes
Type: long (int|float)

The number of null values in the column.

SupportedDialect

Description

A structure specifying the dialect and dialect version used by the query engine.

Members
Dialect
Type: string

The dialect of the query engine.

DialectVersion
Type: string

The version of the dialect of the query engine. For example, 3.0.0.

Table

Description

Represents a collection of related data organized in columns and rows.

Members
CatalogId
Type: string

The ID of the Data Catalog in which the table resides.

CreateTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The time when the table definition was created in the Data Catalog.

CreatedBy
Type: string

The person or entity who created the table.

DatabaseName
Type: string

The name of the database where the table metadata resides. For Hive compatibility, this must be all lowercase.

Description
Type: string

A description of the table.

FederatedTable
Type: FederatedTable structure

A FederatedTable structure that references an entity outside the Glue Data Catalog.

IsMultiDialectView
Type: boolean

Specifies whether the view supports the SQL dialects of one or more different query engines and can therefore be read by those engines.

IsRegisteredWithLakeFormation
Type: boolean

Indicates whether the table has been registered with Lake Formation.

LastAccessTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The last time that the table was accessed. This is usually taken from HDFS, and might not be reliable.

LastAnalyzedTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The last time that column statistics were computed for this table.

Name
Required: Yes
Type: string

The table name. For Hive compatibility, this must be entirely lowercase.

Owner
Type: string

The owner of the table.

Parameters
Type: Associative array of custom strings keys (KeyString) to strings

These key-value pairs define properties associated with the table.

PartitionKeys
Type: Array of Column structures

A list of columns by which the table is partitioned. Only primitive types are supported as partition keys.

When you create a table used by Amazon Athena, and you do not specify any partitionKeys, you must at least set the value of partitionKeys to an empty list. For example:

"PartitionKeys": []

Retention
Type: int

The retention time for this table.

StorageDescriptor
Type: StorageDescriptor structure

A storage descriptor containing information about the physical storage of this table.

TableType
Type: string

The type of this table. Glue will create tables with the EXTERNAL_TABLE type. Other services, such as Athena, may create tables with additional table types.

Glue related table types:

EXTERNAL_TABLE

Hive compatible attribute - indicates a non-Hive managed table.

GOVERNED

Used by Lake Formation. The Glue Data Catalog understands GOVERNED.

TargetTable
Type: TableIdentifier structure

A TableIdentifier structure that describes a target table for resource linking.

UpdateTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The last time that the table was updated.

VersionId
Type: string

The ID of the table version.

ViewDefinition
Type: ViewDefinition structure

A structure that contains all the information that defines the view, including the dialect or dialects for the view, and the query.

ViewExpandedText
Type: string

Included for Apache Hive compatibility. Not used in the normal course of Glue operations.

ViewOriginalText
Type: string

Included for Apache Hive compatibility. Not used in the normal course of Glue operations. If the table is a VIRTUAL_VIEW, certain Athena configuration encoded in base64.

TableError

Description

An error record for table operations.

Members
ErrorDetail
Type: ErrorDetail structure

The details about the error.

TableName
Type: string

The name of the table. For Hive compatibility, this must be entirely lowercase.

TableIdentifier

Description

A structure that describes a target table for resource linking.

Members
CatalogId
Type: string

The ID of the Data Catalog in which the table resides.

DatabaseName
Type: string

The name of the catalog database that contains the target table.

Name
Type: string

The name of the target table.

Region
Type: string

Region of the target table.

TableInput

Description

A structure used to define a table.

Members
Description
Type: string

A description of the table.

LastAccessTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The last time that the table was accessed.

LastAnalyzedTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The last time that column statistics were computed for this table.

Name
Required: Yes
Type: string

The table name. For Hive compatibility, this is folded to lowercase when it is stored.

Owner
Type: string

The table owner. Included for Apache Hive compatibility. Not used in the normal course of Glue operations.

Parameters
Type: Associative array of custom strings keys (KeyString) to strings

These key-value pairs define properties associated with the table.

PartitionKeys
Type: Array of Column structures

A list of columns by which the table is partitioned. Only primitive types are supported as partition keys.

When you create a table used by Amazon Athena, and you do not specify any partitionKeys, you must at least set the value of partitionKeys to an empty list. For example:

"PartitionKeys": []

Retention
Type: int

The retention time for this table.

StorageDescriptor
Type: StorageDescriptor structure

A storage descriptor containing information about the physical storage of this table.

TableType
Type: string

The type of this table. Glue will create tables with the EXTERNAL_TABLE type. Other services, such as Athena, may create tables with additional table types.

Glue related table types:

EXTERNAL_TABLE

Hive compatible attribute - indicates a non-Hive managed table.

GOVERNED

Used by Lake Formation. The Glue Data Catalog understands GOVERNED.

TargetTable
Type: TableIdentifier structure

A TableIdentifier structure that describes a target table for resource linking.

ViewExpandedText
Type: string

Included for Apache Hive compatibility. Not used in the normal course of Glue operations.

ViewOriginalText
Type: string

Included for Apache Hive compatibility. Not used in the normal course of Glue operations. If the table is a VIRTUAL_VIEW, certain Athena configuration encoded in base64.

TableOptimizer

Description

Contains details about an optimizer associated with a table.

Members
configuration
Type: TableOptimizerConfiguration structure

A TableOptimizerConfiguration object that was specified when creating or updating a table optimizer.

lastRun
Type: TableOptimizerRun structure

A TableOptimizerRun object representing the last run of the table optimizer.

type
Type: string

The type of table optimizer. Currently, the only valid value is compaction.

TableOptimizerConfiguration

Description

Contains details on the configuration of a table optimizer. You pass this configuration when creating or updating a table optimizer.

Members
enabled
Type: boolean

Whether table optimization is enabled.

roleArn
Type: string

A role passed by the caller which gives the service permission to update the resources associated with the optimizer on the caller's behalf.

TableOptimizerRun

Description

Contains details for a table optimizer run.

Members
endTimestamp
Type: timestamp (string|DateTime or anything parsable by strtotime)

Represents the epoch timestamp at which the compaction job ended.

error
Type: string

An error that occured during the optimizer run.

eventType
Type: string

An event type representing the status of the table optimizer run.

metrics
Type: RunMetrics structure

A RunMetrics object containing metrics for the optimizer run.

startTimestamp
Type: timestamp (string|DateTime or anything parsable by strtotime)

Represents the epoch timestamp at which the compaction job was started within Lake Formation.

TableVersion

Description

Specifies a version of a table.

Members
Table
Type: Table structure

The table in question.

VersionId
Type: string

The ID value that identifies this table version. A VersionId is a string representation of an integer. Each version is incremented by 1.

TableVersionError

Description

An error record for table-version operations.

Members
ErrorDetail
Type: ErrorDetail structure

The details about the error.

TableName
Type: string

The name of the table in question.

VersionId
Type: string

The ID value of the version in question. A VersionID is a string representation of an integer. Each version is incremented by 1.

TaskRun

Description

The sampling parameters that are associated with the machine learning transform.

Members
CompletedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The last point in time that the requested task run was completed.

ErrorString
Type: string

The list of error strings associated with this task run.

ExecutionTime
Type: int

The amount of time (in seconds) that the task run consumed resources.

LastModifiedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The last point in time that the requested task run was updated.

LogGroupName
Type: string

The names of the log group for secure logging, associated with this task run.

Properties
Type: TaskRunProperties structure

Specifies configuration properties associated with this task run.

StartedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time that this task run started.

Status
Type: string

The current status of the requested task run.

TaskRunId
Type: string

The unique identifier for this task run.

TransformId
Type: string

The unique identifier for the transform.

TaskRunFilterCriteria

Description

The criteria that are used to filter the task runs for the machine learning transform.

Members
StartedAfter
Type: timestamp (string|DateTime or anything parsable by strtotime)

Filter on task runs started after this date.

StartedBefore
Type: timestamp (string|DateTime or anything parsable by strtotime)

Filter on task runs started before this date.

Status
Type: string

The current status of the task run.

TaskRunType
Type: string

The type of task run.

TaskRunProperties

Description

The configuration properties for the task run.

Members
ExportLabelsTaskRunProperties

The configuration properties for an exporting labels task run.

FindMatchesTaskRunProperties

The configuration properties for a find matches task run.

ImportLabelsTaskRunProperties

The configuration properties for an importing labels task run.

LabelingSetGenerationTaskRunProperties

The configuration properties for a labeling set generation task run.

TaskType
Type: string

The type of task run.

TaskRunSortCriteria

Description

The sorting criteria that are used to sort the list of task runs for the machine learning transform.

Members
Column
Required: Yes
Type: string

The column to be used to sort the list of task runs for the machine learning transform.

SortDirection
Required: Yes
Type: string

The sort direction to be used to sort the list of task runs for the machine learning transform.

TransformConfigParameter

Description

Specifies the parameters in the config file of the dynamic transform.

Members
IsOptional
Type: boolean

Specifies whether the parameter is optional or not in the config file of the dynamic transform.

ListType
Type: string

Specifies the list type of the parameter in the config file of the dynamic transform.

Name
Required: Yes
Type: string

Specifies the name of the parameter in the config file of the dynamic transform.

Type
Required: Yes
Type: string

Specifies the parameter type in the config file of the dynamic transform.

ValidationMessage
Type: string

Specifies the validation message in the config file of the dynamic transform.

ValidationRule
Type: string

Specifies the validation rule in the config file of the dynamic transform.

Value
Type: Array of strings

Specifies the value of the parameter in the config file of the dynamic transform.

TransformEncryption

Description

The encryption-at-rest settings of the transform that apply to accessing user data. Machine learning transforms can access user data encrypted in Amazon S3 using KMS.

Additionally, imported labels and trained transforms can now be encrypted using a customer provided KMS key.

Members
MlUserDataEncryption
Type: MLUserDataEncryption structure

An MLUserDataEncryption object containing the encryption mode and customer-provided KMS key ID.

TaskRunSecurityConfigurationName
Type: string

The name of the security configuration.

TransformFilterCriteria

Description

The criteria used to filter the machine learning transforms.

Members
CreatedAfter
Type: timestamp (string|DateTime or anything parsable by strtotime)

The time and date after which the transforms were created.

CreatedBefore
Type: timestamp (string|DateTime or anything parsable by strtotime)

The time and date before which the transforms were created.

GlueVersion
Type: string

This value determines which version of Glue this machine learning transform is compatible with. Glue 1.0 is recommended for most customers. If the value is not set, the Glue compatibility defaults to Glue 0.9. For more information, see Glue Versions in the developer guide.

LastModifiedAfter
Type: timestamp (string|DateTime or anything parsable by strtotime)

Filter on transforms last modified after this date.

LastModifiedBefore
Type: timestamp (string|DateTime or anything parsable by strtotime)

Filter on transforms last modified before this date.

Name
Type: string

A unique transform name that is used to filter the machine learning transforms.

Schema
Type: Array of SchemaColumn structures

Filters on datasets with a specific schema. The Map<Column, Type> object is an array of key-value pairs representing the schema this transform accepts, where Column is the name of a column, and Type is the type of the data such as an integer or string. Has an upper bound of 100 columns.

Status
Type: string

Filters the list of machine learning transforms by the last known status of the transforms (to indicate whether a transform can be used or not). One of "NOT_READY", "READY", or "DELETING".

TransformType
Type: string

The type of machine learning transform that is used to filter the machine learning transforms.

TransformParameters

Description

The algorithm-specific parameters that are associated with the machine learning transform.

Members
FindMatchesParameters
Type: FindMatchesParameters structure

The parameters for the find matches algorithm.

TransformType
Required: Yes
Type: string

The type of machine learning transform.

For information about the types of machine learning transforms, see Creating Machine Learning Transforms.

TransformSortCriteria

Description

The sorting criteria that are associated with the machine learning transform.

Members
Column
Required: Yes
Type: string

The column to be used in the sorting criteria that are associated with the machine learning transform.

SortDirection
Required: Yes
Type: string

The sort direction to be used in the sorting criteria that are associated with the machine learning transform.

Trigger

Description

Information about a specific trigger.

Members
Actions
Type: Array of Action structures

The actions initiated by this trigger.

Description
Type: string

A description of this trigger.

EventBatchingCondition
Type: EventBatchingCondition structure

Batch condition that must be met (specified number of events received or batch time window expired) before EventBridge event trigger fires.

Id
Type: string

Reserved for future use.

Name
Type: string

The name of the trigger.

Predicate
Type: Predicate structure

The predicate of this trigger, which defines when it will fire.

Schedule
Type: string

A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *).

State
Type: string

The current state of the trigger.

Type
Type: string

The type of trigger that this is.

WorkflowName
Type: string

The name of the workflow associated with the trigger.

TriggerNodeDetails

Description

The details of a Trigger node present in the workflow.

Members
Trigger
Type: Trigger structure

The information of the trigger represented by the trigger node.

TriggerUpdate

Description

A structure used to provide information used to update a trigger. This object updates the previous trigger definition by overwriting it completely.

Members
Actions
Type: Array of Action structures

The actions initiated by this trigger.

Description
Type: string

A description of this trigger.

EventBatchingCondition
Type: EventBatchingCondition structure

Batch condition that must be met (specified number of events received or batch time window expired) before EventBridge event trigger fires.

Name
Type: string

Reserved for future use.

Predicate
Type: Predicate structure

The predicate of this trigger, which defines when it will fire.

Schedule
Type: string

A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *).

UnfilteredPartition

Description

A partition that contains unfiltered metadata.

Members
AuthorizedColumns
Type: Array of strings

The list of columns the user has permissions to access.

IsRegisteredWithLakeFormation
Type: boolean

A Boolean value indicating that the partition location is registered with Lake Formation.

Partition
Type: Partition structure

The partition object.

Union

Description

Specifies a transform that combines the rows from two or more datasets into a single result.

Members
Inputs
Required: Yes
Type: Array of strings

The node ID inputs to the transform.

Name
Required: Yes
Type: string

The name of the transform node.

UnionType
Required: Yes
Type: string

Indicates the type of Union transform.

Specify ALL to join all rows from data sources to the resulting DynamicFrame. The resulting union does not remove duplicate rows.

Specify DISTINCT to remove duplicate rows in the resulting DynamicFrame.

UpdateCsvClassifierRequest

Description

Specifies a custom CSV classifier to be updated.

Members
AllowSingleColumn
Type: boolean

Enables the processing of files that contain only one column.

ContainsHeader
Type: string

Indicates whether the CSV file contains a header.

CustomDatatypeConfigured
Type: boolean

Specifies the configuration of custom datatypes.

CustomDatatypes
Type: Array of strings

Specifies a list of supported custom datatypes.

Delimiter
Type: string

A custom symbol to denote what separates each column entry in the row.

DisableValueTrimming
Type: boolean

Specifies not to trim values before identifying the type of column values. The default value is true.

Header
Type: Array of strings

A list of strings representing column names.

Name
Required: Yes
Type: string

The name of the classifier.

QuoteSymbol
Type: string

A custom symbol to denote what combines content into a single column value. It must be different from the column delimiter.

Serde
Type: string

Sets the SerDe for processing CSV in the classifier, which will be applied in the Data Catalog. Valid values are OpenCSVSerDe, LazySimpleSerDe, and None. You can specify the None value when you want the crawler to do the detection.

UpdateGrokClassifierRequest

Description

Specifies a grok classifier to update when passed to UpdateClassifier.

Members
Classification
Type: string

An identifier of the data format that the classifier matches, such as Twitter, JSON, Omniture logs, Amazon CloudWatch Logs, and so on.

CustomPatterns
Type: string

Optional custom grok patterns used by this classifier.

GrokPattern
Type: string

The grok pattern used by this classifier.

Name
Required: Yes
Type: string

The name of the GrokClassifier.

UpdateJsonClassifierRequest

Description

Specifies a JSON classifier to be updated.

Members
JsonPath
Type: string

A JsonPath string defining the JSON data for the classifier to classify. Glue supports a subset of JsonPath, as described in Writing JsonPath Custom Classifiers.

Name
Required: Yes
Type: string

The name of the classifier.

UpdateXMLClassifierRequest

Description

Specifies an XML classifier to be updated.

Members
Classification
Type: string

An identifier of the data format that the classifier matches.

Name
Required: Yes
Type: string

The name of the classifier.

RowTag
Type: string

The XML tag designating the element that contains each record in an XML document being parsed. This cannot identify a self-closing element (closed by />). An empty row element that contains only attributes can be parsed as long as it ends with a closing tag (for example, <row item_a="A" item_b="B"></row> is okay, but <row item_a="A" item_b="B" /> is not).

UpsertRedshiftTargetOptions

Description

The options to configure an upsert operation when writing to a Redshift target .

Members
ConnectionName
Type: string

The name of the connection to use to write to Redshift.

TableLocation
Type: string

The physical location of the Redshift table.

UpsertKeys
Type: Array of strings

The keys used to determine whether to perform an update or insert.

UserDefinedFunction

Description

Represents the equivalent of a Hive user-defined function (UDF) definition.

Members
CatalogId
Type: string

The ID of the Data Catalog in which the function resides.

ClassName
Type: string

The Java class that contains the function code.

CreateTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The time at which the function was created.

DatabaseName
Type: string

The name of the catalog database that contains the function.

FunctionName
Type: string

The name of the function.

OwnerName
Type: string

The owner of the function.

OwnerType
Type: string

The owner type.

ResourceUris
Type: Array of ResourceUri structures

The resource URIs for the function.

UserDefinedFunctionInput

Description

A structure used to create or update a user-defined function.

Members
ClassName
Type: string

The Java class that contains the function code.

FunctionName
Type: string

The name of the function.

OwnerName
Type: string

The owner of the function.

OwnerType
Type: string

The owner type.

ResourceUris
Type: Array of ResourceUri structures

The resource URIs for the function.

ValidationException

Description

A value could not be validated.

Members
Message
Type: string

A message describing the problem.

VersionMismatchException

Description

There was a version conflict.

Members
Message
Type: string

A message describing the problem.

ViewDefinition

Description

A structure containing details for representations.

Members
Definer
Type: string

The definer of a view in SQL.

IsProtected
Type: boolean

You can set this flag as true to instruct the engine not to push user-provided operations into the logical plan of the view during query planning. However, setting this flag does not guarantee that the engine will comply. Refer to the engine's documentation to understand the guarantees provided, if any.

Representations
Type: Array of ViewRepresentation structures

A list of representations.

SubObjects
Type: Array of strings

A list of table Amazon Resource Names (ARNs).

ViewRepresentation

Description

A structure that contains the dialect of the view, and the query that defines the view.

Members
Dialect
Type: string

The dialect of the query engine.

DialectVersion
Type: string

The version of the dialect of the query engine. For example, 3.0.0.

IsStale
Type: boolean

Dialects marked as stale are no longer valid and must be updated before they can be queried in their respective query engines.

ViewExpandedText
Type: string

The expanded SQL for the view. This SQL is used by engines while processing a query on a view. Engines may perform operations during view creation to transform ViewOriginalText to ViewExpandedText. For example:

  • Fully qualify identifiers: SELECT * from table1 → SELECT * from db1.table1

ViewOriginalText
Type: string

The SELECT query provided by the customer during CREATE VIEW DDL. This SQL is not used during a query on a view (ViewExpandedText is used instead). ViewOriginalText is used for cases like SHOW CREATE VIEW where users want to see the original DDL command that created the view.

Workflow

Description

A workflow is a collection of multiple dependent Glue jobs and crawlers that are run to complete a complex ETL task. A workflow manages the execution and monitoring of all its jobs and crawlers.

Members
BlueprintDetails
Type: BlueprintDetails structure

This structure indicates the details of the blueprint that this particular workflow is created from.

CreatedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time when the workflow was created.

DefaultRunProperties
Type: Associative array of custom strings keys (IdString) to strings

A collection of properties to be used as part of each execution of the workflow. The run properties are made available to each job in the workflow. A job can modify the properties for the next jobs in the flow.

Description
Type: string

A description of the workflow.

Graph
Type: WorkflowGraph structure

The graph representing all the Glue components that belong to the workflow as nodes and directed connections between them as edges.

LastModifiedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time when the workflow was last modified.

LastRun
Type: WorkflowRun structure

The information about the last execution of the workflow.

MaxConcurrentRuns
Type: int

You can use this parameter to prevent unwanted multiple updates to data, to control costs, or in some cases, to prevent exceeding the maximum number of concurrent runs of any of the component jobs. If you leave this parameter blank, there is no limit to the number of concurrent workflow runs.

Name
Type: string

The name of the workflow.

WorkflowGraph

Description

A workflow graph represents the complete workflow containing all the Glue components present in the workflow and all the directed connections between them.

Members
Edges
Type: Array of Edge structures

A list of all the directed connections between the nodes belonging to the workflow.

Nodes
Type: Array of Node structures

A list of the the Glue components belong to the workflow represented as nodes.

WorkflowRun

Description

A workflow run is an execution of a workflow providing all the runtime information.

Members
CompletedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time when the workflow run completed.

ErrorMessage
Type: string

This error message describes any error that may have occurred in starting the workflow run. Currently the only error message is "Concurrent runs exceeded for workflow: foo."

Graph
Type: WorkflowGraph structure

The graph representing all the Glue components that belong to the workflow as nodes and directed connections between them as edges.

Name
Type: string

Name of the workflow that was run.

PreviousRunId
Type: string

The ID of the previous workflow run.

StartedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time when the workflow run was started.

StartingEventBatchCondition
Type: StartingEventBatchCondition structure

The batch condition that started the workflow run.

Statistics
Type: WorkflowRunStatistics structure

The statistics of the run.

Status
Type: string

The status of the workflow run.

WorkflowRunId
Type: string

The ID of this workflow run.

WorkflowRunProperties
Type: Associative array of custom strings keys (IdString) to strings

The workflow run properties which were set during the run.

WorkflowRunStatistics

Description

Workflow run statistics provides statistics about the workflow run.

Members
ErroredActions
Type: int

Indicates the count of job runs in the ERROR state in the workflow run.

FailedActions
Type: int

Total number of Actions that have failed.

RunningActions
Type: int

Total number Actions in running state.

StoppedActions
Type: int

Total number of Actions that have stopped.

SucceededActions
Type: int

Total number of Actions that have succeeded.

TimeoutActions
Type: int

Total number of Actions that timed out.

TotalActions
Type: int

Total number of Actions in the workflow run.

WaitingActions
Type: int

Indicates the count of job runs in WAITING state in the workflow run.

XMLClassifier

Description

A classifier for XML content.

Members
Classification
Required: Yes
Type: string

An identifier of the data format that the classifier matches.

CreationTime
Type: timestamp (string|DateTime or anything parsable by strtotime)

The time that this classifier was registered.

LastUpdated
Type: timestamp (string|DateTime or anything parsable by strtotime)

The time that this classifier was last updated.

Name
Required: Yes
Type: string

The name of the classifier.

RowTag
Type: string

The XML tag designating the element that contains each record in an XML document being parsed. This can't identify a self-closing element (closed by />). An empty row element that contains only attributes can be parsed as long as it ends with a closing tag (for example, <row item_a="A" item_b="B"></row> is okay, but <row item_a="A" item_b="B" /> is not).

Version
Type: long (int|float)

The version of this classifier.