SQL extension data source connections - Amazon SageMaker AI
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

SQL extension data source connections

Before using the SQL extension in JupyterLab notebooks, administrators or users must create Amazon Glue connections to their data sources. The SQL extension allows connecting to data sources such as Amazon Redshift Amazon Athena, or Snowflake.

To set up the connections, administrators must first ensure their network configuration allows communication between Studio and the data sources and then grant the necessary IAM permissions to allow Studio to access the data sources. For information on how administrators can set up the networking, see Configure network access between Studio and data sources (for administrators). For information on what policies must be setup, see Set up the IAM permissions to access the data sources (for administrators). Once the connections are set up, data scientists can use the SQL extension in their JupyterLab notebooks to browse and query the connected data sources.

Note

We recommend storing your database access credentials as a secret in Secrets Manager. To learn about how to create secrets for storing Amazon Redshift or Snowflake access credentials, see Create secrets for database access credentials in Secrets Manager.

This section explains how to set up an Amazon Glue connection and lists the IAM permissions required for the Studio JupyterLab application to access the data through the connection.

Note

Amazon SageMaker Assets integrates Amazon DataZone with Studio. It includes a SageMaker AI blueprint for administrators to create Studio environments from Amazon DataZone projects within an Amazon DataZone domain.

Users of a JupyterLab application launched from a Studio domain created with the blueprint can automatically access Amazon Glue connections to data assets in their Amazon DataZone catalog when using the SQL extension. This allows querying those data sources without manually setting up connections.