Prepare data using Amazon Glue Interactive Sessions
Amazon Glue interactive sessions is an on-demand, serverless, Apache Spark runtime environment that data scientists and data engineers can use to rapidly build, test, and run data preparation and analytics applications.
You can initiate an Amazon Glue interactive session by starting a JupyterLab notebook in
Studio or Studio Classic. When starting your notebook, choose the built-in Glue
PySpark and Ray
or Glue Spark
kernel. This automatically starts an
interactive, serverless Spark session. You do not need to provision or manage any compute
cluster or infrastructure. After initialization, you can explore the Amazon Glue Data Catalog, execute
complex queries, and interactively analyze and prepare data using Spark within your
Studio or Studio Classic notebooks. You can then use the prepared data to build, train,
tune, and deploy models using the purpose-built ML tools within SageMaker.
Before starting your Amazon Glue interactive session in Studio or Studio Classic, you need to set the appropriate roles and policies. Additionally, you may need to provide access to additional resources, such as a storage Amazon S3 bucket. For more information about required IAM policies, see Permissions for Amazon Glue interactive sessions in Studio or Studio Classic.
Studio and Studio Classic provide a default configuration for your Amazon Glue interactive session, however, you can use Amazon Glue’s full catalog of Jupyter magic commands to further customize your environment. For information about the default and additional Jupyter magics that you can use in your Amazon Glue interactive session, see Configure your Amazon Glue interactive session in Studio or Studio Classic.
-
For Studio Classic users initiating an Amazon Glue interactive session, they can select from the following images and kernels:
-
Images:
SparkAnalytics 1.0
,SparkAnalytics 2.0
-
Kernel:
Glue Python [PySpark and Ray]
andGlue Spark
-
-
For Studio users, use the default SageMaker Distribution image
and select a Glue Python [PySpark and Ray]
or aGlue Spark
kernel.