Using Amazon S3 Express One Zone with Amazon Glue
With Amazon Glue version 5.1 and higher, you can read and write data in Amazon S3 Express One Zone
Prerequisites
Before you can use S3 Express One Zone with Amazon Glue, you must have the following:
-
An Amazon Glue job running version 5.1 or higher.
-
An S3 directory bucket created in the same region as your Amazon Glue job. Directory buckets do not support cross-region access. For more information, see Creating directory buckets
in the Amazon S3 User Guide. -
The
s3express:CreateSessionpermission on your IAM role. When S3 Express One Zone performs an action on a directory bucket, it callsCreateSessionon your behalf.
IAM permissions
Add the following permission to your Amazon Glue job's IAM role to allow access to S3 Express One Zone directory buckets:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "s3express:CreateSession", "Resource": "arn:aws:s3express:*:*:bucket/EXAMPLE-BUCKET--az-id--x-s3" } ] }
Replace EXAMPLE-BUCKET with your directory bucket name
and az-id with the Availability Zone ID (for example,
use1-az4).
Reading and writing data
Amazon Glue version 5.1+ supports accessing S3 Express One Zone directory
buckets using both the s3:// and s3a:// URI schemes. No
additional configuration is required.
The following example shows how to read and write data from an S3 Express One Zone directory bucket in a Amazon Glue ETL job:
import sys from pyspark.context import SparkContext from awsglue.context import GlueContext sc = SparkContext.getOrCreate() glueContext = GlueContext(sc) spark = glueContext.spark_session # S3 Express One Zone directory bucket path express_path = "s3://EXAMPLE-BUCKET--use1-az4--x-s3/my-data/" # Read data from S3 Express One Zone df = spark.read.parquet(express_path) # Write data to S3 Express One Zone df.write.mode("overwrite").parquet(express_path + "output/")
You can also use DynamicFrames with S3 Express One Zone:
# Read with DynamicFrame dynamicFrame = glueContext.create_dynamic_frame.from_options( connection_type="s3", connection_options={"paths": [express_path]}, format="parquet" ) # Write with DynamicFrame glueContext.write_dynamic_frame.from_options( frame=dynamicFrame, connection_type="s3", connection_options={"path": express_path + "output/"}, format="parquet" )