JobProps

class aws_cdk.aws_glue.JobProps(*, executable, connections=None, continuous_logging=None, default_arguments=None, description=None, enable_profiling_metrics=None, job_name=None, max_capacity=None, max_concurrent_runs=None, max_retries=None, notify_delay_after=None, role=None, security_configuration=None, spark_ui=None, tags=None, timeout=None, worker_count=None, worker_type=None)

Bases: object

(experimental) Construction properties for {@link Job}.

Parameters:
  • executable (JobExecutable) – (experimental) The job’s executable properties.

  • connections (Optional[Sequence[IConnection]]) – (experimental) The {@link Connection}s used for this job. Connections are used to connect to other AWS Service or resources within a VPC. Default: [] - no connections are added to the job

  • continuous_logging (Union[ContinuousLoggingProps, Dict[str, Any], None]) – (experimental) Enables continuous logging with the specified props. Default: - continuous logging is disabled.

  • default_arguments (Optional[Mapping[str, str]]) – (experimental) The default arguments for this job, specified as name-value pairs. Default: - no arguments

  • description (Optional[str]) – (experimental) The description of the job. Default: - no value

  • enable_profiling_metrics (Optional[bool]) – (experimental) Enables the collection of metrics for job profiling. Default: - no profiling metrics emitted.

  • job_name (Optional[str]) – (experimental) The name of the job. Default: - a name is automatically generated

  • max_capacity (Union[int, float, None]) – (experimental) The number of AWS Glue data processing units (DPUs) that can be allocated when this job runs. Cannot be used for Glue version 2.0 and later - workerType and workerCount should be used instead. Default: - 10 when job type is Apache Spark ETL or streaming, 0.0625 when job type is Python shell

  • max_concurrent_runs (Union[int, float, None]) – (experimental) The maximum number of concurrent runs allowed for the job. An error is returned when this threshold is reached. The maximum value you can specify is controlled by a service limit. Default: 1

  • max_retries (Union[int, float, None]) – (experimental) The maximum number of times to retry this job after a job run fails. Default: 0

  • notify_delay_after (Optional[Duration]) – (experimental) The number of minutes to wait after a job run starts, before sending a job run delay notification. Default: - no delay notifications

  • role (Optional[IRole]) – (experimental) The IAM role assumed by Glue to run this job. If providing a custom role, it needs to trust the Glue service principal (glue.amazonaws.com) and be granted sufficient permissions. Default: - a role is automatically generated

  • security_configuration (Optional[ISecurityConfiguration]) – (experimental) The {@link SecurityConfiguration} to use for this job. Default: - no security configuration.

  • spark_ui (Union[SparkUIProps, Dict[str, Any], None]) – (experimental) Enables the Spark UI debugging and monitoring with the specified props. Default: - Spark UI debugging and monitoring is disabled.

  • tags (Optional[Mapping[str, str]]) – (experimental) The tags to add to the resources on which the job runs. Default: {} - no tags

  • timeout (Optional[Duration]) – (experimental) The maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status. Default: cdk.Duration.hours(48)

  • worker_count (Union[int, float, None]) – (experimental) The number of workers of a defined {@link WorkerType} that are allocated when a job runs. Default: - differs based on specific Glue version/worker type

  • worker_type (Optional[WorkerType]) – (experimental) The type of predefined worker that is allocated when a job runs. Default: - differs based on specific Glue version

Stability:

experimental

ExampleMetadata:

infused

Example:

# bucket: s3.Bucket

glue.Job(self, "PythonShellJob",
    executable=glue.JobExecutable.python_shell(
        glue_version=glue.GlueVersion.V1_0,
        python_version=glue.PythonVersion.THREE,
        script=glue.Code.from_bucket(bucket, "script.py")
    ),
    description="an example Python Shell job"
)

Attributes

connections

(experimental) The {@link Connection}s used for this job.

Connections are used to connect to other AWS Service or resources within a VPC.

Default:

[] - no connections are added to the job

Stability:

experimental

continuous_logging

(experimental) Enables continuous logging with the specified props.

Default:
  • continuous logging is disabled.

See:

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html

Stability:

experimental

default_arguments

(experimental) The default arguments for this job, specified as name-value pairs.

Default:
  • no arguments

See:

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html for a list of reserved parameters

Stability:

experimental

description

(experimental) The description of the job.

Default:
  • no value

Stability:

experimental

enable_profiling_metrics

(experimental) Enables the collection of metrics for job profiling.

Default:
  • no profiling metrics emitted.

See:

--enable-metrics at https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html

Stability:

experimental

executable

(experimental) The job’s executable properties.

Stability:

experimental

job_name

(experimental) The name of the job.

Default:
  • a name is automatically generated

Stability:

experimental

max_capacity

(experimental) The number of AWS Glue data processing units (DPUs) that can be allocated when this job runs.

Cannot be used for Glue version 2.0 and later - workerType and workerCount should be used instead.

Default:
  • 10 when job type is Apache Spark ETL or streaming, 0.0625 when job type is Python shell

Stability:

experimental

max_concurrent_runs

(experimental) The maximum number of concurrent runs allowed for the job.

An error is returned when this threshold is reached. The maximum value you can specify is controlled by a service limit.

Default:

1

Stability:

experimental

max_retries

(experimental) The maximum number of times to retry this job after a job run fails.

Default:

0

Stability:

experimental

notify_delay_after

(experimental) The number of minutes to wait after a job run starts, before sending a job run delay notification.

Default:
  • no delay notifications

Stability:

experimental

role

(experimental) The IAM role assumed by Glue to run this job.

If providing a custom role, it needs to trust the Glue service principal (glue.amazonaws.com) and be granted sufficient permissions.

Default:
  • a role is automatically generated

See:

https://docs.aws.amazon.com/glue/latest/dg/getting-started-access.html

Stability:

experimental

security_configuration

(experimental) The {@link SecurityConfiguration} to use for this job.

Default:
  • no security configuration.

Stability:

experimental

spark_ui

(experimental) Enables the Spark UI debugging and monitoring with the specified props.

Default:
  • Spark UI debugging and monitoring is disabled.

See:

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html

Stability:

experimental

tags

(experimental) The tags to add to the resources on which the job runs.

Default:

{} - no tags

Stability:

experimental

timeout

(experimental) The maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status.

Default:

cdk.Duration.hours(48)

Stability:

experimental

worker_count

(experimental) The number of workers of a defined {@link WorkerType} that are allocated when a job runs.

Default:
  • differs based on specific Glue version/worker type

Stability:

experimental

worker_type

(experimental) The type of predefined worker that is allocated when a job runs.

Default:
  • differs based on specific Glue version

Stability:

experimental