View a markdown version of this page

Using Amazon S3 Express One Zone with Amazon Glue - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Using Amazon S3 Express One Zone with Amazon Glue

With Amazon Glue version 5.1 and higher, you can read and write data in Amazon S3 Express One Zone directory buckets from your ETL jobs. S3 Express One Zone is a high-performance, single-zone Amazon S3 storage class that delivers consistent, single-digit millisecond data access for latency-sensitive applications.

Prerequisites

Before you can use S3 Express One Zone with Amazon Glue, you must have the following:

  • An Amazon Glue job running version 5.1 or higher.

  • An S3 directory bucket created in the same region as your Amazon Glue job. Directory buckets do not support cross-region access. For more information, see Creating directory buckets in the Amazon S3 User Guide.

  • The s3express:CreateSession permission on your IAM role. When S3 Express One Zone performs an action on a directory bucket, it calls CreateSession on your behalf.

IAM permissions

Add the following permission to your Amazon Glue job's IAM role to allow access to S3 Express One Zone directory buckets:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "s3express:CreateSession", "Resource": "arn:aws:s3express:*:*:bucket/EXAMPLE-BUCKET--az-id--x-s3" } ] }

Replace EXAMPLE-BUCKET with your directory bucket name and az-id with the Availability Zone ID (for example, use1-az4).

Reading and writing data

Amazon Glue version 5.1+ supports accessing S3 Express One Zone directory buckets using both the s3:// and s3a:// URI schemes. No additional configuration is required.

The following example shows how to read and write data from an S3 Express One Zone directory bucket in a Amazon Glue ETL job:

import sys from pyspark.context import SparkContext from awsglue.context import GlueContext sc = SparkContext.getOrCreate() glueContext = GlueContext(sc) spark = glueContext.spark_session # S3 Express One Zone directory bucket path express_path = "s3://EXAMPLE-BUCKET--use1-az4--x-s3/my-data/" # Read data from S3 Express One Zone df = spark.read.parquet(express_path) # Write data to S3 Express One Zone df.write.mode("overwrite").parquet(express_path + "output/")

You can also use DynamicFrames with S3 Express One Zone:

# Read with DynamicFrame dynamicFrame = glueContext.create_dynamic_frame.from_options( connection_type="s3", connection_options={"paths": [express_path]}, format="parquet" ) # Write with DynamicFrame glueContext.write_dynamic_frame.from_options( frame=dynamicFrame, connection_type="s3", connection_options={"path": express_path + "output/"}, format="parquet" )