

# Use an Amazon EMR Studio
<a name="use-an-emr-studio"></a>

This section contains topics that help you configure and interact with an Amazon EMR Studio.

The following video covers practical information such as how to create a new Workspace, and how to launch a new Amazon EMR cluster with a cluster template. The video also runs through a sample notebook.

[![AWS Videos](http://img.youtube.com/vi/https://www.youtube.com/embed/rZ3zeJ6WKPY/0.jpg)](http://www.youtube.com/watch?v=https://www.youtube.com/embed/rZ3zeJ6WKPY)


**Topics**
+ [Learn EMR Studio workspaces](emr-studio-configure-workspace.md)
+ [Configure Workspace collaboration in EMR Studio](emr-studio-workspace-collaboration.md)
+ [Run an EMR Studio Workspace with a runtime role](emr-studio-runtime.md)
+ [Run Amazon EMR Studio Workspace Workspace notebooks programmatically](emr-studio-run-programmatically.md)
+ [Browse data with SQL Explorer for EMR Studio](emr-studio-sql-explorer.md)
+ [Attach a compute to an EMR Studio Workspace](emr-studio-create-use-clusters.md)
+ [Link Git-based repositories to an EMR Studio Workspace](emr-studio-git-repo.md)
+ [Use the Amazon Athena SQL editor in EMR Studio](emr-studio-athena.md)
+ [Amazon CodeWhisperer integration with EMR Studio Workspaces](emr-studio-codewhisperer.md)
+ [Debug applications and jobs with EMR Studio](emr-studio-debug.md)
+ [Install kernels and libraries in an EMR Studio Workspace](emr-studio-install-libraries-and-kernels.md)
+ [Enhance kernels with magic commands in EMR Studio](emr-studio-magics.md)
+ [Use multi-language notebooks with Spark kernels](emr-multi-language-kernels.md)

# Learn EMR Studio workspaces
<a name="emr-studio-configure-workspace"></a>

When you use an EMR Studio, you can create and configure different *Workspaces* to organize and run notebooks. This section covers creating and working with Workspaces. For a conceptual overview, see [Workspaces](how-emr-studio-works.md#emr-studio-workspaces) on the [How Amazon EMR Studio works](how-emr-studio-works.md) page.

**Topics**
+ [Create an EMR Studio Workspace](emr-studio-create-workspace.md)
+ [Launch a Workspace in EMR Studio](emr-studio-use-workspace.md)
+ [Understand the Workspace user interface in EMR Studio](emr-studio-workspace-ui.md)
+ [Explore notebook examples in an EMR Studio workspace](emr-studio-notebook-examples.md)
+ [Save Workspace content in EMR Studio](emr-studio-save-workspace.md)
+ [Delete a Workspace and notebook files in EMR Studio](emr-studio-delete-workspace.md)
+ [Understand Workspace status](emr-studio-workspace-status.md)
+ [Resolve Workspace connectivity issues](emr-studio-workspace-stop-start.md)

# Create an EMR Studio Workspace
<a name="emr-studio-create-workspace"></a>

You can create EMR Studio Workspaces to run notebook code using the EMR Studio interface. 

**To create a Workspace in an EMR Studio**

1. Log in to your EMR Studio.

1. Choose **Create a Workspace**.

1. Enter a **Workspace name** and a **Description**. Naming a Workspace helps you identify it on the **Workspaces** page.

1. If you want to work with other Studio users in this Workspace in real time, enable Workspace collaboration. You can configure collaborators after you launch the Workspace.

1. If you want to attach a cluster to a Workspace, expand the **Advanced configuration** section. You can attach a cluster later, if you prefer. For more information, see [Attach a compute to an EMR Studio Workspace](emr-studio-create-use-clusters.md).
**Note**  
To provision a new cluster, you need access permissions from your administrator. 

   Choose one of the cluster options for the Workspace and attach the cluster. For more information about provisioning a cluster when you create a Workspace, see [Create and attach a new EMR cluster to an EMR Studio Workspace](emr-studio-create-use-clusters.md#emr-studio-create-cluster).

1. Choose **Create a Workspace** in the lower right of the page. 

After you create a Workspace, EMR Studio will open the **Workspaces** page. You will see a green success banner at the top of the page and can find the newly-created Workspace in the list.

By default, a Workspace is shared and can be seen by all Studio users. However, only one user can open and work in a Workspace at a time. To work simultaneously with other users, you can [Configure Workspace collaboration in EMR Studio](emr-studio-workspace-collaboration.md)

# Launch a Workspace in EMR Studio
<a name="emr-studio-use-workspace"></a>

To start working with notebook files, launch a Workspace to access the notebook editor. The **Workspaces** page in a Studio lists all of the Workspaces that you have access to with details including **Name**, **Status**, **Creation time**, and **Last modified**. 

**Note**  
If you had EMR notebooks in the old Amazon EMR console, you can find them in the console as EMR Studio Workspaces. EMR Notebooks users need additional IAM role permissions to access or create Workspaces. If you recently created a notebook in the old console, you might need to refresh the Workspaces list to see it in the console. For more information about the transition, see [Amazon EMR Notebooks are available as Amazon EMR Studio Workspaces in the console](emr-managed-notebooks-migration.md) and [Managing Amazon EMR clusters with the console](whats-new-in-console.md)

**To launch a Workspace for editing and running notebooks**

1. On the **Workspaces** page of your Studio, find the Workspace. You can filter the list by keyword or by column value.

1. Choose the Workspace name to launch the Workspace in a new browser tab. It may take a few minutes for the Workspace to open if it's **Idle**. Alternatively, select the row for the Workspace and then select **Launch Workspace**. You can choose from the following launch options:
   + **Quick launch** – Quickly launch your Workspace with default options. Choose **Quick launch** if you want to attach clusters to the Workspace in JupyterLab.
   + **Launch with options** – Launch your Workspace with custom options. You can choose to launch in either Jupyter or JupyterLab, attach your Workspace to an EMR cluster, and select your security groups.
**Note**  
Only one user can open and work in a Workspace at a time. If you select a Workspace that is already in use, EMR Studio displays a notification when you try to open it. The **User** column on the **Workspaces** page shows the user working in the Workspace.

# Understand the Workspace user interface in EMR Studio
<a name="emr-studio-workspace-ui"></a>

The EMR Studio Workspace user interface is based on the [JupyterLab interface](https://jupyterlab.readthedocs.io/en/latest/user/interface.html) with icon-denoted tabs on the left sidebar. When you pause over an icon, you can see a tooltip that shows the name of the tab. Choose tabs from the left sidebar to access the following panels.
+ **File Browser** – Displays the files and directories in the Workspace, as well as the files and directories of linked Git repositories.
+ **Running Kernels and Terminals** – Lists all of the kernels and terminals running in the Workspace. For more information, see [Managing kernels and terminals](https://jupyterlab.readthedocs.io/en/latest/user/running.html) in the official JupyterLab documentation.
+ **Git** – Provides a graphical user interface for performing commands in the Git repositories attached to the Workspace. This panel is a JupyterLab extension called jupyterlab-git. For more information, see [jupyterlab-git](https://github.com/jupyterlab/jupyterlab-git).
+ ** EMR clusters** – Lets you attach a cluster to or detach a cluster from the Workspace to run notebook code. The EMR cluster configuration panel also provides advanced configuration options to help you create and attach a *new* cluster to the Workspace. For more information, see [Create and attach a new EMR cluster to an EMR Studio Workspace](emr-studio-create-use-clusters.md#emr-studio-create-cluster).
+ **Amazon EMR Git Repository** – Helps you link the Workspace with up to three Git repositories. For details and instructions, see [Link Git-based repositories to an EMR Studio Workspace](emr-studio-git-repo.md).
+ **Notebook Examples** – Provides a list of notebook examples that you can save to the Workspace. You can also access the examples by choosing **Notebook Examples** on the **Launcher** page of the Workspace. 
+ **Commands** – Offers a keyboard-driven way to search for and run JupyterLab commands. For more information, see the [Command palette](https://jupyterlab.readthedocs.io/en/latest/user/commands.html) page in the JupyterLab documentation.
+ **Notebook Tools** – Lets you select and set options such as cell slide type and metadata. The **Notebook Tools** option appears in the left sidebar after you open a notebook file.
+ **Open Tabs** – Lists the open documents and activities in the main work area so that you can jump to an open tab. For more information, see the [Tabs and single-document mode](https://jupyterlab.readthedocs.io/en/latest/user/interface.html#tabs-and-single-document-mode) page in the JupyterLab documentation.
+ **Collaboration** – Lets you enable or disable Workspace collaboration, and manage collaborators. To see the **Collaboration** panel, you must have the necessary permissions. For more information, see [Set ownership for Workspace collaboration](emr-studio-user-permissions.md#emr-studio-workspace-collaboration-permissions).

# Explore notebook examples in an EMR Studio workspace
<a name="emr-studio-notebook-examples"></a>

Every EMR Studio Workspace includes a set of notebook examples that you can use to explore EMR Studio features. To edit or run a notebook example, you can save it to the Workspace.

**To save a notebook example to a Workspace**

1. From the left sidebar, choose the **Notebook Examples** tab to open the **Notebook Examples** panel. You can also access the examples by choosing **Notebook Examples** on the **Launcher** page of the Workspace. 

1. Choose a notebook example to preview it in the main work area. The example is read-only.

1. To save the notebook example to the Workspace, choose **Save to Workspace**. EMR Studio saves the example in your home directory. After you save a notebook example to the Workspace, you can rename, edit, and run it.

For more information about the notebook examples, see the [EMR Studio Notebook examples GitHub repository](https://github.com/aws-samples/emr-studio-notebook-examples).

# Save Workspace content in EMR Studio
<a name="emr-studio-save-workspace"></a>

When you work in the notebook editor of a Workspace, EMR Studio saves the content of notebook cells and output for you in the Amazon S3 location associated with the Studio. This backup process preserves work between sessions. 

You can also save a notebook by pressing **CTRL\$1S** in the open notebook tab or by using one of the save options under **File**.

Another way to back up the notebook files in a Workspace is to associate the Workspace with a Git-based repository and sync your changes with the remote repository. Doing so also lets you save and share notebooks with team members who use a different Workspace or Studio. For instructions, see [Link Git-based repositories to an EMR Studio Workspace](emr-studio-git-repo.md).

# Delete a Workspace and notebook files in EMR Studio
<a name="emr-studio-delete-workspace"></a>

When you delete a notebook file from an EMR Studio Workspace, you delete the file from the **File browser**, and EMR Studio removes its backup copy in Amazon S3. You do not have to take any further steps to avoid storage charges when you delete a file from a Workspace.

When you delete *an entire Workspace*, its notebook files and folders will remain in the Amazon S3 storage location. The files continue to accrue storage charges. To avoid storage charges, remove all backed-up files and folders that are associated with your deleted Workspace from Amazon S3.

**To delete a notebook file from an EMR Studio Workspace**

1. Select the **File browser** panel from the left sidebar in the Workspace.

1. Select the file or folder you want to delete. Right-click your selection and choose **Delete**. The file disappears from the list. EMR Studio removes the file or folder from Amazon S3 for you.

------
#### [ From the Workspace UI ]

**Delete a Workspace and its associated backup files from EMR Studio**

1. Log in to your EMR Studio with your Studio access URL and choose **Workspaces** from the left navigation.

1. Find your Workspace in the list, then select the check box next to its name. You can select multiple Workspaces to delete at the same time.

1. Choose **Delete** in the upper right of the **Workspaces** list and confirm that you want to delete the selected Workspaces. Choose **Delete** to confirm.

1. If you want to remove the notebook files that were associated with the deleted Workspace from Amazon S3, follow the instructions for [Deleting objects](https://docs.amazonaws.cn/AmazonS3/latest/user-guide/delete-objects.html) in the *Amazon Simple Storage Service* *Console User Guide*. If you did not create the Studio, consult your Studio administrator to determine the Amazon S3 backup location for the deleted Workspace.

------
#### [ From the Workspaces list ]

**Delete a Workspace and its associated backup files from the Workspaces list**

1. Navigate to the **Workspace**s list in the console.

1. Select the Workspace that you want to delete from the list and then choose **Actions**.

1. Choose **Delete**.

1. If you want to remove the notebook files that were associated with the deleted Workspace from Amazon S3, follow the instructions for [Deleting objects](https://docs.amazonaws.cn/AmazonS3/latest/user-guide/delete-objects.html) in the *Amazon Simple Storage Service* *Console User Guide*. If you did not create the Studio, consult your Studio administrator to determine the Amazon S3 backup location for the deleted Workspace.

------

# Understand Workspace status
<a name="emr-studio-workspace-status"></a>

After you create an EMR Studio Workspace, it appears as a row in the **Workspaces** list in your Studio with its name, status, creation time, and last modified timestamp. The following table describes Workspace statuses.


****  

| Status | Description | 
| --- | --- | 
| Starting | The Workspace is being prepared, but is not yet ready to use. You can't open a Workspace when its status is Starting. | 
| Ready | You can open the Workspace to use the notebook editor, but you must attach the Workspace to an EMR cluster before you can run notebook code. | 
| Attaching | The Workspace is being attached to a cluster. | 
| Attached | The Workspace is attached to an EMR cluster and ready for you to write and run notebook code. If a Workspace's status is not Attached, you must attach it to a cluster before you can run notebook code. | 
| Idle | The Workspace has stopped. To reactivate an idle Workspace, select it from the Workspaces list. The status changes from Idle to Starting to Ready when you select the Workspace. | 
| Stopping | The Workspace is shutting down and will be set to Idle. When you stop a Workspace, it terminates any corresponding notebook kernels. EMR Studio stops notebooks that have been inactive for a long time.  | 
| Deleting | When you delete a Workspace, EMR Studio marks it for deletion and starts the deletion process. After the deletion process completes, the Workspace disappears from the list. When you delete a Workspace, its notebook files will remain in the Amazon S3 storage location. | 

# Resolve Workspace connectivity issues
<a name="emr-studio-workspace-stop-start"></a>

To resolve Workspace connectivity issues, you can stop and restart a Workspace. When you restart a Workspace, EMR Studio launches the Workspace in a different Availability Zone or a different subnet that is associated with your Studio.

**To stop and restart an EMR Studio Workspace**

1. Close the Workspace in your browser.

1. Navigate to the **Workspace** list in the console.

1. Select your Workspace from the list and choose **Actions**.

1. Choose **Stop** and wait for the Workspace status to change from **Stopping** to **Idle**.

1. Choose **Actions** again, and then choose **Start** to restart the Workspace.

1. Wait for the Workspace status to change from **Starting** to **Ready**, then choose the Workspace name to reopen it in a new browser tab.

# Configure Workspace collaboration in EMR Studio
<a name="emr-studio-workspace-collaboration"></a>

Workspace collaboration lets you write and run notebook code simultaneously with other members of your team. When you work in the same notebook file, you'll see changes as your collaborators make them. You can enable collaboration when you create a Workspace, or switch collaboration on and off in an existing Workspace. 

**Note**  
EMR Studio Workspace collaboration isn't supported with [EMR Serverless interactive applications](https://docs.amazonaws.cn/emr/latest/EMR-Serverless-UserGuide/interactive-workloads.html) or if trusted identity propagation is enabled.

**Prerequisites**

Before you configure collaboration for a Workspace, make sure you complete the following tasks:
+ Ensure that your EMR Studio admin has given you the necessary permissions. For example, the following statement allows a user to configure collaboration for any Workspace with the tag key `creatorUserId` whose value matches the user's ID (indicated by the policy variable `aws:userId`).

  ```
  {
      "Sid": "UserRolePermissionsForCollaboration",
      "Action": [
          "elasticmapreduce:UpdateEditor",
          "elasticmapreduce:PutWorkspaceAccess",
          "elasticmapreduce:DeleteWorkspaceAccess",
          "elasticmapreduce:ListWorkspaceAccessIdentities"
      ],
      "Resource": "*",
      "Effect": "Allow",
      "Condition": {
          "StringEquals": {
              "elasticmapreduce:ResourceTag/creatorUserId": "${aws:userid}"
          }
      }
  }
  ```
+ Ensure that the service role associated with your EMR Studio has the permissions required to enable and configure Workspace collaboration, as in the following example statement.

  ```
  {
      "Sid": "AllowWorkspaceCollaboration",
      "Effect": "Allow",
      "Action": [
          "iam:GetUser",
          "iam:GetRole",
          "iam:ListUsers",
          "iam:ListRoles",
          "sso:GetManagedApplicationInstance",
          "sso-directory:SearchUsers"
      ],
      "Resource": "*"
  }
  ```

  For more information, see [Create an EMR Studio service role](emr-studio-service-role.md).

**To enable Workspace collaboration and add collaborators**

1. In your Workspace, choose the **Collaboration** icon from the Launcher screen or the bottom of the left panel. 
**Note**  
You won't see the **Collaboration** panel unless your Studio administator has given you permission to configure collaboration for the Workspace. For more information, see [Set ownership for Workspace collaboration](emr-studio-user-permissions.md#emr-studio-workspace-collaboration-permissions).

1. Make sure the **Allow Workspace collaboration** toggle is in the on position. When you enable collaboration, only you and the collaborators that you add can see the Workspace in the list on the Studio **Workspaces** page.

1. Enter a **Collaborator name**. Your Workspace can have a maximum of five collaborators including yourself. A collaborator can be any user with access to your EMR Studio. If you don't enter a collaborator, the Workspace is a private Workspace that is only accessible to you.

   The following table specifies the applicable collaborator values to enter based on the identity type of the owner.
**Note**  
An owner can only invite collaborators with the same identity type. For example, a user can only add other a users, and an IAM Identity Center user can only add other IAM Identity Center users.  
****    
[\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/emr/latest/ManagementGuide/emr-studio-workspace-collaboration.html)

1. Choose **Add**. The collaborator can now see the Workspace on their EMR Studio **Workspaces** page, and launch the Workspace to use it in real time with you.

**Note**  
If you disable Workspace collaboration, the Workspace returns to its shared state and can be seen by all Studio users. In the shared state, only one Studio user can open and work in the Workspace at a time. 

# Run an EMR Studio Workspace with a runtime role
<a name="emr-studio-runtime"></a>

**Note**  
The runtime role functionality described on this page only applies to Amazon EMR running on Amazon EC2, and doesn't refer to the runtime role functionality in EMR Serverless interactive applications. To learn more about how to use runtime roles in EMR Serverless, see [Job runtime roles](https://docs.amazonaws.cn/emr/latest/EMR-Serverless-UserGuide/security-iam-runtime-role.html) in the *Amazon EMR Serverless User Guide*.

A *runtime role* is an Amazon Identity and Access Management (IAM) role that you can specify when you submit a job or query to an Amazon EMR cluster. The job or query that you submit to your EMR cluster uses the runtime role to access Amazon resources, such as objects in Amazon S3.

When you attach an EMR Studio Workspace to an EMR cluster that uses Amazon EMR 6.11 or higher, you can select a runtime role for the job or query that you submit to use when it accesses Amazon resources. However, if the EMR cluster doesn't support runtime roles, the EMR cluster won't assume the role when it accesses Amazon resources.

Before you can use a runtime role with an Amazon EMR Studio Workspace, an administrator must configure user permissions so that the Studio user can call the `elasticmapreduce:GetClusterSessionCredentials` API on the runtime role. Then, launch a new cluster with a runtime role that you can use with your Amazon EMR Studio Workspace.

**Topics**
+ [Configure user permissions for the runtime role](#emr-studio-runtime-setup-permissions)
+ [Launch a new cluster with a runtime role](#emr-studio-runtime-setup-cluster)
+ [Use the EMR cluster with a runtime role in Workspaces](#emr-studio-runtime-use)
+ [Considerations](#emr-studio-runtime-considerations)

## Configure user permissions for the runtime role
<a name="emr-studio-runtime-setup-permissions"></a>

Configure user permissions so that the Studio user can call the `elasticmapreduce:GetClusterSessionCredentials` API on the runtime role that the user wants to use. You must also configure [Configure EMR Studio user permissions for Amazon EC2 or Amazon EKS](emr-studio-user-permissions.md) before the user can start using Studio.

**Warning**  
To grant this permission, create a condition based on the `elasticmapreduce:ExecutionRoleArn` context key when you grant a caller access to call the `GetClusterSessionCredentials` APIs. The following examples demonstrate how to do so.

```
{
      "Sid": "AllowSpecificExecRoleArn",
      "Effect": "Allow",
      "Action": [
          "elasticmapreduce:GetClusterSessionCredentials"
      ],
      "Resource": "*",
      "Condition": {
          "StringEquals": {
              "elasticmapreduce:ExecutionRoleArn": [
                  "arn:aws:iam::111122223333:role/test-emr-demo1",
                  "arn:aws:iam::111122223333:role/test-emr-demo2"
              ]
          }
      }
  }
```

The following example demonstrates how to allow an IAM principal to use an IAM role named `test-emr-demo3` as the runtime role. Additionally, the policy holder will only be able to access Amazon EMR clusters with the cluster ID `j-123456789`.

```
{
    "Sid":"AllowSpecificExecRoleArn",
    "Effect":"Allow",
    "Action":[
        "elasticmapreduce:GetClusterSessionCredentials"
    ],
    "Resource": [
          "arn:aws:elasticmapreduce:<region>:111122223333:cluster/j-123456789"
     ],
    "Condition":{
        "StringEquals":{
            "elasticmapreduce:ExecutionRoleArn":[
                "arn:aws:iam::111122223333:role/test-emr-demo3"
            ]
        }
    }
}
```

The following example lets an IAM principal use any IAM role with a name starting with the string `test-emr-demo4` as the runtime role. Additionally, the policy holder will only be able to access Amazon EMR clusters tagged with the key-value pair `tagKey: tagValue`.

```
{
    "Sid":"AllowSpecificExecRoleArn",
    "Effect":"Allow",
    "Action":[
        "elasticmapreduce:GetClusterSessionCredentials"
    ],
    "Resource": "*",
    "Condition":{
        "StringEquals":{
             "elasticmapreduce:ResourceTag/tagKey": "tagValue"
        },
        "StringLike":{
            "elasticmapreduce:ExecutionRoleArn":[
                "arn:aws:iam::111122223333:role/test-emr-demo4*"
            ]
        }
    }
}
```

## Launch a new cluster with a runtime role
<a name="emr-studio-runtime-setup-cluster"></a>

Now that you have the required permissions, launch a new cluster with a runtime role that you can use with your Amazon EMR Studio Workspace.

If you have already launched a new cluster with a runtime role, you can skip to the [Use the EMR cluster with a runtime role in Workspaces](#emr-studio-runtime-use) section.

1. First, complete the prerequisites in the [Runtime roles for Amazon EMR steps](emr-steps-runtime-roles.md#emr-steps-runtime-roles-configure) section.

1. Then, launch a cluster with the following settings to use runtime roles with Amazon EMR Studio Workspaces. For instructions on how to launch your cluster, see [Specify a security configuration for an Amazon EMR cluster](emr-specify-security-configuration.md).
   + Choose release label emr-6.11.0 or later.
   + Select Spark, Livy, and Jupyter Enterprise Gateway as your cluster applications.
   + Use the security configuration that you created in the previous step.
   + Optionally, you can enable Lake Formation for your EMR cluster. For more information, see [Enable Lake Formation with Amazon EMR](emr-lf-enable.md).

After you launch your cluster, you're ready to [use the runtime role-enabled cluster with an EMR Studio Workspace](#emr-studio-runtime-use).

**Note**  
The [ExecutionRoleArn](https://docs.amazonaws.cn/emr/latest/APIReference/API_ExecutionEngineConfig.html           #EMR-Type-ExecutionEngineConfig-ExecutionRoleArn) value is currently not supported with the [ StartNotebookExecution](https://docs.amazonaws.cn/emr/latest/APIReference/API_StartNotebookExecution.html) API operation when the `ExecutionEngineConfig.Type` value is `EMR`.

## Use the EMR cluster with a runtime role in Workspaces
<a name="emr-studio-runtime-use"></a>

Once you have set up and launched your cluster, you can use the runtime role-enabled cluster with your EMR Studio Workspace.

1. Create a new workspace or launch an existing workspace. For more information, see [Create an EMR Studio Workspace](emr-studio-create-workspace.md).

1. Choose the ** EMR clusters** tab in the left sidebar of your open Workspace, expand the **Compute type** section, and choose your cluster from the **EMR cluster on EC2** menu, and the runtime role from the **Runtime role** menu.  
![\[The EMR Studio Workspace user interface, based on the JupyterLab interface, with icon-denoted tabs on the left sidebar.\]](http://docs.amazonaws.cn/en_us/emr/latest/ManagementGuide/images/emr-studio-jupyter-runtime.png)

1. Choose **Attach** to attach the cluster with runtime role to your Workspace.

**Note**  
When you choose a runtime role, note that it can have underlying managed policies associated with it. In most cases we recommend choosing limited resources, such as specific notebooks. If you choose a runtime role that includes access for all of your notebooks, for instance, the managed policy associated with the role provides full access.

## Considerations
<a name="emr-studio-runtime-considerations"></a>

Keep in mind the following considerations when you use a runtime role-enabled cluster with your Amazon EMR Studio Workspace:
+ You can only select a runtime role when you attach an EMR Studio Workspace to an EMR cluster that uses Amazon EMR release 6.11 or higher.
+ The runtime role functionality described on this page is only supported with Amazon EMR running on Amazon EC2, and isn't supported with EMR Serverless interactive applications. To learn more about runtime roles for EMR Serverless, see [Job runtime roles](https://docs.amazonaws.cn/emr/latest/EMR-Serverless-UserGuide/security-iam-runtime-role.html) in the *Amazon EMR Serverless User Guide*.
+ Although you need to configure additional permissions before you can specify a runtime role when submitting a job to a cluster, you don't need additional permissions to access the files generated by an EMR Studio Workspace. The permissions for such files are the same as files generated from clusters without runtime roles.
+ You can't use SQL Explorer in an EMR Studio Workspace with a cluster that has a runtime role. Amazon EMR disables SQL Explorer in the UI when a Workspace is attached to a runtime role-enabled EMR cluster.
+ You can't use collaboration mode in an EMR Studio Workspace with a cluster that has a runtime role. Amazon EMR disables Workspace collaboration capabilities when a Workspace is attached to a runtime role-enabled EMR cluster. The Workspace will remain accessible only to the user who attached the Workspace.
+ You can't use runtime roles in a Studio with IAM Identity Center trusted identity propagation enabled.
+ You might encounter a warning **"Page may not be safe\$1"** from Spark UI for a runtime role-enabled cluster that uses Amazon EMR release 7.4.0 and lower. If this happens, bypass the alert to continue to see the Spark UI.

# Run Amazon EMR Studio Workspace Workspace notebooks programmatically
<a name="emr-studio-run-programmatically"></a>

**Note**  
Programmatic execution of notebooks isn't supported with Amazon EMR Serverless interactive applications.

You can run your Amazon EMR Studio Workspace notebooks programmatically with a script or on the Amazon CLI. To learn how to run your notebook programmatically, see [Sample programmatic commands for EMR Notebooks](emr-managed-notebooks-headless.md).

# Browse data with SQL Explorer for EMR Studio
<a name="emr-studio-sql-explorer"></a>

**Note**  
SQL Explorer for EMR Studio isn't supported with Amazon EMR Serverless interactive applications or in a Studio with IAM Identity Center trusted identity propagation enabled. 

This topic provides information to help you get started with SQL Explorer in Amazon EMR Studio. SQL Explorer is a single-page tool in your Workspace that helps you understand the data sources in your EMR cluster's data catalog. You can use SQL Explorer to browse your data, run SQL queries to retrieve data, and download query results.

SQL Explorer supports Presto. Before you use SQL Explorer, make sure you have a cluster that uses Amazon EMR version 5.34.0 or later or version 6.4.0 or later with Presto installed. The Amazon EMR Studio SQL Explorer doesn't support Presto clusters that you've configured with in-transit encryption. This is because Presto runs in TLS mode on these clusters.

## Browse your cluster's data catalog
<a name="emr-studio-sql-explorer-browse"></a>

SQL Explorer provides a catalog browser interface that you can use to explore and understand how your data is organized. For example, you can use the data catalog browser to verify table and column names before you write a SQL query.

**To browse your data catalog**

1. Open SQL Explorer in your Workspace.

1. Make sure your Workspace is attached to an EMR cluster running on EC2 that uses Amazon EMR version 6.4.0 or later with Presto installed. You can choose an existing cluster, or create a new one. For more information, see [Attach a compute to an EMR Studio Workspace](emr-studio-create-use-clusters.md).

1. Select a **Database** from the dropdown list to browse.

1. Expand a table in your database to see the table's column names. You can also enter a keyword in the search bar to filter table results.

## Run a SQL query to retrieve data
<a name="emr-studio-sql-explorer-run-query"></a>

**To retrieve data with a SQL query and download the results**

1. Open SQL Explorer in your Workspace.

1. Make sure your Workspace is attached to an EMR cluster running on EC2 with Presto and Spark installed. You can choose an existing cluster, or create a new one. For more information, see [Attach a compute to an EMR Studio Workspace](emr-studio-create-use-clusters.md).

1. Select **Open editor** to open a new editor tab in your Workspace.

1. Compose your SQL query in the editor tab.

1. Choose **Run**.

1. View your query results under **Result preview**. SQL Explorer displays the first 100 results by default. You can choose a different number of results to display (up to 1000) using the **Preview first 100 query results** drowdown.

1. Choose **Download results** to download your results in CSV format. You can download up to 1000 rows of results.

# Attach a compute to an EMR Studio Workspace
<a name="emr-studio-create-use-clusters"></a>

Amazon EMR Studio runs notebook commands using a kernel on an EMR cluster. Before you can select a kernel, you should attach the Workspace to a cluster that uses Amazon EC2 instances, to an Amazon EMR on EKS cluster, or to an EMR Serverless application. EMR Studio lets you attach Workspaces to new or existing clusters, and gives you the flexibility to change clusters without closing the Workspace.

**Topics**
+ [Attach an Amazon EC2 cluster](#emr-studio-attach-cluster)
+ [Attach an Amazon EMR on EKS cluster](#emr-studio-use-eks-cluster)
+ [Attach an EMR Serverless application](#emr-studio-use-serverless-studio)
+ [Create a cluster](#emr-studio-create-cluster)
+ [Detach a compute](#emr-studio-detach-cluster)

## Attach an Amazon EC2 cluster to an EMR Studio Workspace
<a name="emr-studio-attach-cluster"></a>

You can attach an EMR cluster running on Amazon EC2 to a Workspace when you create the Workspace, or attach a cluster to an existing Workspace. If you want to create and attach a *new* cluster, see [Create and attach a new EMR cluster to an EMR Studio Workspace](#emr-studio-create-cluster).

**Note**  
A workspace in a Studio that has IAM Identity Center trusted identity propagation enabled can only attach to an EMR cluster with a security configuration that has Identity Center enabled.

------
#### [ On create ]

**Attach to an Amazon EMR compute cluster when you create a Workspace**

1. In the **Create a Workspace** dialog box, make sure you've already selected a subnet for the new Workspace. Expand the **Advanced configuration** section.

1. Choose **Attach Workspace to an EMR cluster**.

1. In the ** EMR cluster** dropdown list, select an existing EMR cluster to attach to the Workspace.

After you attach a cluster, finish creating the Workspace. When you open the new Workspace for the first time and choose the ** EMR clusters** panel, you should see your selected cluster attached.

------
#### [ On launch ]

**Attach to an Amazon EMR compute cluster when you launch the Workspace**

1. Navigate to the Workspaces list and select the row for the Workspace that you want to launch. Then, select **Launch Workspace** > **Launch with options**.

1. Choose an EMR cluster to attach to your Workspace.

After you attach a cluster, finish creating the Workspace. When you open the new Workspace for the first time and choose the **EMR clusters** panel, you should see your selected cluster attached.

------
#### [ In JupyterLab ]

**Attach a Workspace to an Amazon EMR compute cluster in JupyterLab**

1. Select your Workspace, then select **Launch Workspace** > **Quick launch**.

1. Inside JupyterLab, open the **Cluster**tab in the left sidebar.

1. Select the **EMR on EC2 cluster** dropdown, or select an Amazon EMR on EKS cluster.

1. Select **Attach** to attach the cluster to your Workspace.

After you attach the cluster, finish creating the Workspace. When you open the new Workspace for the first time and choose the ** EMR clusters** panel, you should see your selected cluster attached.

------
#### [ In the Workspace UI ]

**Attach a Workspace to an Amazon EMR compute cluster from the Workspace user interface**

1. In the Workspace that you want to attach to a cluster, choose the ** EMR clusters** icon from the left sidebar to open the **Cluster** panel.

1. Under **Cluster type**, expand the dropdown and select ** EMR cluster on EC2**.

1. Choose a cluster from the dropdown list. You might need to detach an existing cluster first to enable the cluster selection dropdown list.

1. Choose **Attach**. When the cluster is attached, you should see a success message appear.

------

## Attach an Amazon EMR on EKS cluster to an EMR Studio Workspace
<a name="emr-studio-use-eks-cluster"></a>

In addition to using Amazon EMR clusters running on Amazon EC2, you can attach a Workspace to an Amazon EMR on EKS cluster to run notebook code. For more information about Amazon EMR on EKS, see [What is Amazon EMR on EKS](https://docs.amazonaws.cn/emr/latest/EMR-on-EKS-DevelopmentGuide/emr-eks.html).

Before you can connect a Workspace to an Amazon EMR on EKS cluster, your Studio administrator must grant you access permissions.

**Note**  
You can't launch an Amazon EMR on EKS cluster in a EMR Studio that uses IAM Identity Center trusted identity propagation. 

------
#### [ On create ]

**To attach an Amazon EMR on EKS cluster when you create a Workspace**

1. In the **Create a Workspace** dialog box, expand the **Advanced configuration** section.

1. Choose **Attach Workspace to an Amazon EMR on EKS cluster**.

1. Under **Amazon EMR on EKS cluster**, choose a cluster from the dropdown list.

1. Under **Select an endpoint**, choose a managed endpoint to attach to the Workspace. A managed endpoint is a gateway that lets EMR Studio communicate with your chosen cluster.

1. Choose **Create a Workspace** to finish the Workspace creation process and attach the selected cluster.

After you attach a cluster, you can finish the Workspace creation process. When you open the new Workspace for the first time and choose the ** EMR clusters** panel, you should see that your selected cluster is attached.

------
#### [ In the Workspace UI ]

**To attach an Amazon EMR on EKS cluster from the Workspace user interface**

1. In the Workspace that you want to attach to a cluster, choose the ** EMR clusters** icon from the left sidebar to open the **Cluster** panel.

1. Expand the **Cluster type** dropdown and choose ** EMR clusters on EKS**.

1. Under ** EMR cluster on EKS**, choose a cluster from the dropdown list.

1. Under **Endpoint**, choose a managed endpoint to attach to the Workspace. A managed endpoint is a gateway that lets EMR Studio communicate with your chosen cluster.

1. Choose **Attach**. When the cluster is attached, you should see a success message appear.

------

## Attach an Amazon EMR Serverless application to an EMR Studio Workspace
<a name="emr-studio-use-serverless-studio"></a>

You can attach a Workspace to an EMR Serverless application to run interactive workloads. For more information, see [Using notebooks to run interactive workloads with EMR Serverless through EMR Studio](https://docs.amazonaws.cn/emr/latest/EMR-Serverless-UserGuide/interactive-workloads.html).

**Note**  
You can't attach an EMR Serverless application to a EMR Studio that uses IAM Identity Center trusted identity propagation. 

**Example Attach a Workspace to an EMR Serverless application in JupyterLab**  
Before you can connect a Workspace to an EMR Serverless application, your account administrator must grant you access permissions as described in [Required permissions for interactive workloads](https://docs.amazonaws.cn/emr/latest/EMR-Serverless-UserGuide/interactive-workloads.html#interactive-permissions).  

1. Navigate to EMR Studio select your Workspace, then select **Launch Workspace** > **Quick launch**.

1. Inside JupyterLab, open the **Cluster** tab in the left sidebar.

1. Select **EMR Serverless** as a compute option, then select an EMR Serverless application and a runtime role.

1. To attach the cluster to your Workspace, choose **Attach**.
Now when you open this Workspace, you should see your selected application attached.

## Create and attach a new EMR cluster to an EMR Studio Workspace
<a name="emr-studio-create-cluster"></a>

Advanced EMR Studio users can provision new EMR clusters running on Amazon EC2 to use with a Workspace. The new cluster has all of the big data applications that are required for EMR Studio installed by default. 

To create clusters, your Studio administrator must first give you permission using a session policy. For more information, see [Create permissions policies for EMR Studio users](emr-studio-user-permissions.md#emr-studio-permissions-policies).

You can create a new cluster in the **Create a Workspace** dialog box or from the **Cluster** panel in the Workspace UI. Either way, you have two cluster creation options:

1. **Create an EMR cluster** – Create an EMR cluster by choosing the Amazon EC2 instance type and count.

1. **Use a cluster template** – Provision a cluster by selecting a predefined cluster template. This option appears if you have permission to use cluster templates.
**Note**  
If you enabled trusted identity propagation with IAM Identity Center for your Studio, then you must use a template to create a cluster.

**To create an EMR cluster by providing a cluster configuration**

1. Choose a starting point.  
****    
[\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/emr/latest/ManagementGuide/emr-studio-create-use-clusters.html)

1. Enter a **Cluster name**. Naming the cluster helps you find it later in the EMR Studio Clusters list.

1. For **Amazon EMR release**, Choose an Amazon EMR release version for the cluster.

1. For **Instance**, select the type and number of Amazon EC2 instances for the cluster. For more information about selecting instance types, see [Configure Amazon EC2 instance types for use with Amazon EMR](emr-plan-ec2-instances.md). One instance will be used as the primary node.

1. Select a **Subnet** where EMR Studio can launch the new cluster. Each subnet option is preapproved by your Studio administrator, and your Workspace should be able to connect to a cluster in any listed subnet.

1. Choose an **S3 URI for log storage**.

1. Choose **Create EMR cluster** to provision the cluster. If you use the **Create a Workspace** dialog box, choose **Create a Workspace** to create the Workspace and provision the cluster. After EMR Studio provisions the new cluster, it attaches the cluster to the Workspace.

**To create a cluster using a cluster template**

1. Choose a starting point.    
[\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/emr/latest/ManagementGuide/emr-studio-create-use-clusters.html)

1. Select a cluster template from the dropdown list. Each available cluster template includes a brief description to help you make a selection.

1. The cluster template you choose may have additional parameters such as Amazon EMR release version or cluster name. You can choose or insert values, or use the default values that your administrator selected.

1. Select a **Subnet** where EMR Studio can launch the new cluster. Each subnet option is preapproved by your Studio administrator, and your Workspace should be able to connect to a cluster in any subnet.

1. Choose **Use cluster template** to provision the cluster and attach it to the Workspace. It will take a few minutes for EMR Studio to create the cluster. If you use the **Create a Workspace** dialog box, choose **Create a Workspace** to create the Workspace and provision the cluster. After EMR Studio provisions the new cluster, it attaches the cluster to your Workspace.

## Detach a compute from an EMR Studio Workspace
<a name="emr-studio-detach-cluster"></a>

To exchange the cluster attached to a Workspace, you can detach a cluster from the Workspace UI.

**To detach a cluster from a Workspace**

1. In the Workspace that you want to detach from a cluster, choose the ** EMR clusters** icon from the left sidebar to open the **Cluster** panel.

1. Under **Select cluster**, choose **Detach** and wait for EMR Studio to detach the cluster. When the cluster is detached, you will see a success message.

**To detach an EMR Serverless application from an EMR Studio Workspace**

To exchange the compute attached to a Workspace, you can detach the application from the Workspace UI. 

1. In the Workspace that you want to detach from a cluster, choose the **Amazon EMR compute** icon from the left sidebar to open the **Compute** panel.

1. Under **Select compute**, choose **Detach** and wait for EMR Studio to detach the application. When the application is detached, you will see a success message.

# Link Git-based repositories to an EMR Studio Workspace
<a name="emr-studio-git-repo"></a>

Associate up to three Git-based repositories with an Amazon EMR Studio Workspace to save and share notebook files.

## About Git repositories for EMR Studio
<a name="emr-studio-git-repo-about"></a>

You can associate a maximum of three Git repositories with an EMR Studio Workspace. By default, each Workspace lets you choose from a list of Git repositories that are associated with the same Amazon account as the Studio. You can also create a new Git repository as a resource for a Workspace.

You can run Git commands like the following using a terminal command while connected to the primary node of a cluster. 

```
!git pull origin <branch-name>
```

Alternatively, you can use the jupyterlab-git extension. Open it from the left sidebar by choosing the **Git** icon. For information about the jupyterlab-git extension for JupyterLab, see [jupyterlab-git](https://github.com/jupyterlab/jupyterlab-git).

## Prerequisites
<a name="emr-studio-git-prereqs"></a>
+ To associate a Git repository with a Workspace, the Studio must be configured to allow Git repository linking. Your Studio administrator should take steps to [Establish access and permissions for Git-based repositories](emr-studio-enable-git.md).
+ If you use a CodeCommit repository, you must use Git credentials and HTTPS. SSH keys and HTTPS with the Amazon Command Line Interface credential helper are not supported. CodeCommit also does not support personal access tokens (PATs). For more information, see [Using IAM with CodeCommit](https://docs.amazonaws.cn/IAM/latest/UserGuide/id_credentials_ssh-keys.html) in the *IAM user Guide* and [Setup for HTTPS users using Git credentials](https://docs.amazonaws.cn/codecommit/latest/userguide/setting-up-gc.html) in the *Amazon CodeCommit User Guide*.

## Instructions
<a name="emr-studio-link-git-repo"></a>

**To link an associated Git repository to a Workspace**

1. Open the Workspace that you want to link to a repository from the **Workspaces** list in the Studio.

1. In the left sidebar, choose the **Amazon EMR Git Repository** icon to open the **Git repository** tool panel.

1. Under **Git repositories**, expand the dropdown list and select a maximum of three repositories to link to the Workspace. EMR Studio registers your selection and begins linking each repository. 

It might take some time for the linking process to complete. You can see the status for each repository that you selected in the **Git repository** tool panel. After EMR Studio links a repository to a Workspace, you should see the files that belong to that repository appear in the **File browser** panel.

**To add a new Git repository to a Workspace as a resource**

1. Open the Workspace that you want to link to a repository from the Workspaces list in your Studio.

1. In the left sidebar, choose the **Amazon EMR Git Repository** icon to open the **Git repository** tool panel.

1. Choose **Add new Git repository**.

1. For **Repository name**, enter a descriptive name for the repository in EMR Studio. Names may only contain alphanumeric characters, hyphens, and underscores.

1. For **Git repository URL**, enter the URL for the repository. When you use a CodeCommit repository, this is the URL that is copied when you choose **Clone URL** and then **Clone HTTPS**. For example, `https://git-codecommit.us-west-2.amazonaws.com/v1/repos/[MyCodeCommitRepoName]`.

1. For **Branch**, enter the name of an existing branch that you want to check out.

1. For Git credentials, choose an option according to the following guidelines. EMR Studio accesses your Git credentials using secrets stored in Secrets Manager.
**Note**  
If you use a GitHub repository, we recommend that you use a personal access token (PAT) to authenticate. Beginning August 13, 2021, GitHub will require token-based authentication and will no longer accept passwords when authenticating Git operations. For more information, see the [Token authentication requirements for Git operations](https://github.blog/2020-12-15-token-authentication-requirements-for-git-operations/) post in *The GitHub Blog*.    
[\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/emr/latest/ManagementGuide/emr-studio-git-repo.html)

1. Choose **Add repository** to create the new repository. After EMR Studio creates the new repository, you will see a success message. The new repository appears in the dropdown list under **Git repositories**.

1. To link the new repository to your Workspace, choose it from the dropdown list under **Git repositories**.

It might take some time for the linking process to complete. After EMR Studio links the new repository to the Workspace, you should see a new folder with the same name as your repository appear in the **File Browser** panel.

To open a different linked repository, navigate to its folder in the **File browser**. 

# Use the Amazon Athena SQL editor in EMR Studio
<a name="emr-studio-athena"></a>

## Overview
<a name="emr-studio-athena-overview"></a>

You can use Amazon EMR Studio to develop and run interactive queries on Amazon Athena. That means that you can perform SQL analytics on Athena from the same EMR Studio interface that you use to run your Spark, Scala, and other workloads. With this integration, you can use auto-completion to develop queries quickly, browse data in your Amazon Glue Data Catalog, create saved queries, view your query history, and more.

For more information on using Amazon Athena, see [Using Athena SQL](https://docs.amazonaws.cn/athena/latest/ug/using-athena-sql.html) in the *Amazon Athena User Guide*.

## Use the Athena SQL editor in EMR Studio
<a name="emr-studio-athena-use"></a>

Use the following steps to develop and run interactive queries on Amazon Athena from your EMR Studio:

1. Add the required permissions to the user role for the users who access the Workspaces in this Studio. The permissions are listed in the [Amazon Identity and Access Management permissions for EMR Studio users](emr-studio-user-permissions.md#emr-studio-iam-permissions-table) table in the column **Access Amazon Athena SQL editor from your EMR Studio**. Alternatively, you can choose to copy the **Advanced** policy contents from the [Example user policies](emr-studio-user-permissions.md#emr-studio-example-policies) to grant users full permissions to EMR Studio capabilities including this one.

1. [Set up](emr-studio-set-up.md) and [create an EMR Studio](emr-studio-create-studio.md).

1. Navigate to your Studio and select **Query editor** from the sidebar.

You should now see the familiar Athena editor UI. For information on getting started and using Athena SQL to run interactive queries, see [Getting started](https://docs.amazonaws.cn/athena/latest/ug/getting-started.html) and [Using Athena SQL](https://docs.amazonaws.cn/athena/latest/ug/using-athena-sql.html) in the *Amazon Athena User Guide*.

**Note**  
If you have enabled trusted identity propagation through IAM Identity Center for your EMR Studio, then you must use Athena workgroups to control query access, and the workgroup that you use must also use trusted identity propagation. For steps to set up Identity Center and enable trusted identity propagation for your workgroup, see [Using IAM Identity Center enabled Athena workgroups](https://docs.amazonaws.cn/athena/latest/ug/workgroups-identity-center.html) in the *Amazon Athena User Guide*.

## Considerations for using the Athena SQL editor in EMR Studio
<a name="emr-studio-athena-considerations"></a>
+ Integration with Athena is available in all commercial Regions where EMR Studio and Athena are available.
+ The following Athena features are not available in EMR Studio:
  + Admin features like creating or updating Athena workgroups, data sources, or capacity reservations
  + Athena for Spark or Spark notebooks
  + Amazon DataZone integration
  + Cost Based Optimizer (CBO)
  + Step functions

# Amazon CodeWhisperer integration with EMR Studio Workspaces
<a name="emr-studio-codewhisperer"></a>

## Overview
<a name="emr-studio-codewhisperer-overview"></a>

You can use [Amazon CodeWhisperer](https://docs.amazonaws.cn/codewhisperer/latest/userguide/what-is-cwspr.html) with Amazon EMR Studio to get real-time recommendations as you write code in JupyterLab. CodeWhisperer can complete your comments, finish single lines of code, make line-by-line recommendations, and generate fully-formed functions. 

**Note**  
When you use Amazon EMR Studio, Amazon might store data about your usage and content for service improvement purposes. For more information and instructions to opt out of data sharing, see [Sharing your data with Amazon](https://docs.amazonaws.cn/codewhisperer/latest/userguide/sharing-data.html) in the *Amazon CodeWhisperer User Guide*. 

## Considerations for using CodeWhisperer with Workspaces
<a name="emr-studio-codewhisperer-considerations"></a>
+ CodeWhisperer integration is available in the same Amazon Web Services Regions where EMR Studio is available, as documented in the [EMR Studio considerations](emr-studio-considerations.md).
+ Amazon EMR Studio automatically uses the CodeWhisperer endpoint in US East (N. Virginia) (us-east-1) for recommendations, regardless of the Region that your studio is in.
+ CodeWhisperer supports only Python language for coding ETL scripts for Spark jobs in EMR Studio. 
+ A client-side telemetry option quantifies your usage of CodeWhisperer. This functionality isn't supported with EMR Studio.

## Permissions required for CodeWhisperer
<a name="emr-studio-codewhisperer-permissions"></a>

To use CodeWhisperer, you must attach the following policy to your IAM user role for Amazon EMR Studio:

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "CodeWhispererPermissions",
      "Effect": "Allow",
      "Action": [
        "codewhisperer:GenerateRecommendations"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}
```

------

## Use CodeWhisperer with Workspaces
<a name="emr-studio-codewhisperer-use"></a>

To display the CodeWhisperer reference log in JupyterLab, open the **CodeWhisperer** panel at the bottom of the JupyterLab window and choose **Open Code Reference Log**.

The following list contains shortcuts that you can use to interact with CodeWhisperer suggestions:
+ **Pause recommendations** – Use **Pause Auto-Suggestions** from the CodeWhisperer settings.
+ **Accept a recommendation** – Press **Tab** on your keyboard.
+ **Reject a recommendation** – Press **Escape** on your keyboard.
+ **Navigate recommendations** – Use the **Up** and **Down** arrows on your keyboard.
+ **Manual invoke** – Press **Alt** and **C** on your keyboard. If you're using a Mac, press **Cmd** and **C**.

You can also use CodeWhisperer to change settings like log level and get suggestions for code references. For more information, see [Setting up CodeWhisperer with JupyterLab](https://docs.amazonaws.cn/codewhisperer/latest/userguide/jupyterlab-setup.html) and [Features](https://docs.amazonaws.cn/codewhisperer/latest/userguide/features.html) in the *Amazon CodeWhisperer User Guide*.

# Debug applications and jobs with EMR Studio
<a name="emr-studio-debug"></a>

With Amazon EMR Studio, you can launch data application interfaces to analyze applications and job runs in the browser.

You can also launch the persistent, off-cluster user interfaces for Amazon EMR running on EC2 clusters from the Amazon EMR console. For more information, see [View persistent application user interfaces in Amazon EMR](app-history-spark-UI.md).

**Note**  
Depending on your browser settings, you might need to enable pop-ups for an application UI to open.

For information about configuring and using the application interfaces, see [The YARN Timeline Server](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/TimelineServer.html), [Monitoring and instrumentation](https://spark.apache.org/docs/latest/monitoring.html), or [Tez UI overview](https://tez.apache.org/tez-ui.html).

## Debug Amazon EMR running on Amazon EC2 jobs
<a name="emr-studio-debug-ec2"></a>

------
#### [ Workspace UI ]

**Launch an on-cluster UI from a notebook file**

When you use Amazon EMR release versions 5.33.0 and later, you can launch the Spark web user interface (the Spark UI or Spark History Server) from a notebook in your Workspace. 

On-cluster UIs work with the PySpark, Spark, or SparkR kernels. The maximum viewable file size for Spark event logs or container logs is 10 MB. If your log files exceed 10 MB, we recommend that you use the persistent Spark History Server instead of the on-cluster Spark UI to debug jobs.
**Important**  
In order for EMR Studio to launch on-cluster application user interfaces from a Workspace, a cluster must be able to communicate with the Amazon API Gateway. You must configure the EMR cluster to allow outgoing network traffic to Amazon API Gateway, and make sure that Amazon API Gateway is reachable from the cluster.   
The Spark UI accesses container logs by resolving hostnames. If you use a custom domain name, you must make sure that the hostnames of your cluster nodes can be resolved by Amazon DNS or by the DNS server you specify. To do so, set the Dynamic Host Configuration Protocol (DHCP) options for the Amazon Virtual Private Cloud (VPC) that is associated with your cluster. For more information about DHCP options, see [DHCP option sets](https://docs.amazonaws.cn/vpc/latest/userguide/VPC_DHCP_Options.html) in the *Amazon Virtual Private Cloud* *User Guide.*

1. In your EMR Studio, open the Workspace that you want to use and make sure that it is attached to an Amazon EMR cluster running on EC2. For instructions, see [Attach a compute to an EMR Studio Workspace](emr-studio-create-use-clusters.md).

1. Open a notebook file and use the PySpark, Spark, or SparkR kernel. To select a kernel, choose the kernel name from the upper right of the notebook toolbar to open the **Select Kernel** dialog box. The name appears as **No Kernel\$1** if no kernel has been selected.

1. Run your notebook code. The following appears as output in the notebook when you start the Spark context. It might take a few seconds to appear. If you have started the Spark context, you can run the `%%info` command to access a link to the Spark UI at any time.
**Note**  
If the Spark UI links do not work or do not appear after a few seconds, create a new notebook cell and run the `%%info` command to regenerate the links.  
![\[Screenshot of the Spark application master information, with link to the Spark UI. The link appears in a notebook when you run a Spark application.\]](http://docs.amazonaws.cn/en_us/emr/latest/ManagementGuide/images/spark-app-ui-link.jpg)

1. To launch the Spark UI, choose **Link** under **Spark UI**. If your Spark application is running, the Spark UI opens in a new tab. If the application has completed, the Spark History Server opens instead.

   After you launch the Spark UI, you can modify the URL in the browser to open the YARN ResourceManager or the Yarn Timeline Server. Add one of the following paths after `amazonaws.com`.  
****    
[\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/emr/latest/ManagementGuide/emr-studio-debug.html)

------
#### [ Studio UI ]

**Launch the persistent YARN Timeline Server, Spark History Server, or Tez UI from the EMR Studio UI**

1. In your EMR Studio, select **Amazon EMR on EC2** on the left side of the page to open the **Amazon EMR on EC2** clusters list. 

1. Filter the list of clusters by **name**, **state**, or **ID** by entering values in the search box. You can also search by creation **time range**.

1. Select a cluster and then choose **Launch application UIs** to select an application user interface. The Application UI opens in a new browser tab and might take some time to load.

------

## Debug EMR Studio running on EMR Serverless
<a name="emr-studio-debug-serverless"></a>

Similar to Amazon EMR running on Amazon EC2, you can use the Workspace user interface to analyze your EMR Serverless applications. From the Workspace UI, when you use Amazon EMR releases 6.14.0 and higher, you can launch the Spark web user interface (the Spark UI or Spark History Server) from a notebook in your Workspace. For your convenience, we also provide a link to the driver log for quick access the Spark driver logs.

## Debug Amazon EMR on EKS job runs with the Spark History Server
<a name="emr-studio-debug-eks"></a>

When you submit a job run to an Amazon EMR on EKS cluster, you can access logs for that job run using the Spark History Server. The Spark History Server provides tools for monitoring Spark applications, such as a list of scheduler stages and tasks, a summary of RDD sizes and memory usage, and environmental information. You can launch the Spark History Server for Amazon EMR on EKS job runs in the following ways:
+ When you submit a job run using EMR Studio with an Amazon EMR on EKS managed endpoint, you can launch the Spark History Server from a notebook file in your Workspace.
+ When you submit a job run using the Amazon CLI or Amazon SDK for Amazon EMR on EKS, you can launch the Spark History Server from the EMR Studio UI.

For information about how to use the Spark History Server, see [Monitoring and Instrumentation](https://spark.apache.org/docs/latest/monitoring.html) in the Apache Spark documentation. For more information about job runs, see [Concepts and components](https://docs.amazonaws.cn/emr/latest/EMR-on-EKS-DevelopmentGuide/emr-eks-concepts.html) in the *Amazon EMR on EKS Development Guide*.

**To launch the Spark History Server from a notebook file in your EMR Studio Workspace**

1. Open a Workspace that is connected to an Amazon EMR on EKS cluster.

1. Select and open your notebook file in the Workspace.

1. Choose **Spark UI** at the top of the notebook file to open the persistent Spark History Server in a new tab.

**To launch the Spark History Server from the EMR Studio UI**
**Note**  
The **Jobs** list in the EMR Studio UI displays only job runs that you submit using the Amazon CLI or Amazon SDK for Amazon EMR on EKS.

1. In your EMR Studio, select **Amazon EMR on EKS** on the left side of the page. 

1. Search for the Amazon EMR on EKS virtual cluster that you used to submit your job run. You can filter the list of clusters by **status** or **ID** by entering values in the search box.

1. Select the cluster to open its detail page. The detail page displays information about the cluster, such as ID, namespace, and status. The page also shows a list of all the job runs submitted to that cluster. 

1. From the cluster detail page, select a job run to debug.

1. In the upper right of the **Jobs** list, choose **Launch Spark History Server** to open the application interface in a new browser tab.

# Install kernels and libraries in an EMR Studio Workspace
<a name="emr-studio-install-libraries-and-kernels"></a>

Each Amazon EMR Studio Workspace comes with a set of pre-installed libraries and kernels. 

## Kernels and libraries on clusters that run on Amazon EC2
<a name="emr-studio-ec2-kernels-libraries"></a>

You can also customize the environment for EMR Studio in the following ways when you use EMR clusters running on Amazon EC2:
+ **Install Jupyter Notebook kernels and Python libraries on a cluster primary node** – When you install libraries using this option, all Workspaces attached to the same cluster share those libraries. You can install kernels or libraries from within a notebook cell or while connected using SSH to the primary node of a cluster.
+ **Use notebook-scoped libraries** – When Workspace users install and use libraries from within a notebook cell, those libraries only available to that notebook alone. This option lets different notebooks using the same cluster work without worrying about conflicting library versions.

EMR Studio Workspaces have the same underlying architecture as EMR Notebooks. You can install and use Jupyter Notebook kernels and Python libraries with EMR Studio in the same way you would with EMR Notebooks. For instructions, see [Installing and using kernels and libraries in EMR Studio](emr-managed-notebooks-installing-libraries-and-kernels.md). 

## Kernels and libraries on Amazon EMR on EKS clusters
<a name="emr-studio-eks-kernels-libraries"></a>

Amazon EMR on EKS clusters include the PySpark and Python 3.7 kernels with a set of pre-installed libraries. Amazon EMR on EKS does not support installing additional libraries or clusters.

Each Amazon EMR on EKS cluster comes with the following Python and PySpark libraries installed:
+ **Python** – boto3, cffi, future, ggplot, jupyter, kubernetes, matplotlib, numpy, pandas, plotly, pycryptodomex, py4j, requests, scikit-learn, scipy, seaborn
+ **PySpark** – ggplot, jupyter, matplotlib, numpy, pandas, plotly, pycryptodomex, py4j, requests, scikit-learn, scipy, seaborn

## Kernels and libraries on EMR Serverless applications
<a name="emr-studio-serverless-kernels-libraries"></a>

Each EMR Serverless application comes with the following Python and PySpark libraries installed:
+ **Python** – ggplot, matplotlib, numpy, pandas, plotly, bokeh, scikit-learn, scipy, seaborn
+ **PySpark** – ggplot, matplotlib,numpy, pandas, plotly, bokeh, scikit-learn, scipy, seaborn

# Enhance kernels with magic commands in EMR Studio
<a name="emr-studio-magics"></a>

## Overview
<a name="overview-magics"></a>

EMR Studio and EMR Notebooks support magic commands. *Magic* commands, or *magics*, are enhancements that the IPython kernel provides to help you run and analyze data. IPython is an interactive shell environment that is built with Python.

Amazon EMR also supports Sparkmagic, a package that provides Spark-related kernels (PySpark, SparkR, and Scala kernels) with specific magic commands and that uses Livy on the cluster to submit Spark jobs.

You can use magic commands as long as you have a Python kernel in your EMR notebook. Similarly, any Spark-related kernel supports Sparkmagic commands.

Magic commands, also called * magics*, come in two varieties:
+ **Line magics** – These magic commands are denoted by a single `%` prefix and operate on a single line of code
+ **Cell magics** – These magic commands are denoted by a double `%%` prefix and operate on multiple lines of code

For all available magics, see [List magic and Sparkmagic commands](#accessing-all-magic-commands).

## Considerations and limitations
<a name="considerations-limitations-magics"></a>
+ EMR Serverless doesn't support `%%sh` to run `spark-submit`. It doesn't support the EMR Notebooks magics.
+ Amazon EMR on EKS clusters don't support Sparkmagic commands for EMR Studio. This is because Spark kernels that you use with managed endpoints are built into Kubernetes, and they aren't supported by Sparkmagic and Livy. You can set the Spark configuration directly into the SparkContext object as a workaround, as the following example demonstrates.

  ```
  spark.conf.set("spark.driver.maxResultSize", '6g') 
  ```
+ The following magic commands and actions are prohibited by Amazon:
  + `%alias`
  + `%alias_magic`
  + `%automagic`
  + `%macro`
  + Modifying `proxy_user` with `%configure`
  + Modifying `KERNEL_USERNAME` with `%env` or `%set_env`

## List magic and Sparkmagic commands
<a name="accessing-all-magic-commands"></a>

Use the following commands to list the available magic commands:
+ `%lsmagic` lists all currently-available magic functions.
+ `%%help` lists currently-available Spark-related magic functions provided by the Sparkmagic package.

## Use `%%configure` to configure Spark
<a name="using-configure-sparkmagic"></a>

One of the most useful Sparkmagic commands is the `%%configure` command, which configures the session creation parameters. Using `conf` settings, you can configure any Spark configuration that's mentioned in the [configuration documentation for Apache Spark](https://spark.apache.org/docs/latest/configuration.html).

**Example Add external JAR file to EMR Notebooks from Maven repository or Amazon S3**  
You can use the following approach to add an external JAR file dependency to any Spark-related kernel that's supported by Sparkmagic.  

```
%%configure -f
{"conf": {
    "spark.jars.packages": "com.jsuereth:scala-arm_2.11:2.0,ml.combust.bundle:bundle-ml_2.11:0.13.0,com.databricks:dbutils-api_2.11:0.0.3",
    "spark.jars": "s3://amzn-s3-demo-bucket/my-jar.jar"
    }
}
```

**Example : Configure Hudi**  
You can use the notebook editor to configure your EMR notebook to use Hudi.  

```
%%configure
{ "conf": {
     "spark.jars": "hdfs://apps/hudi/lib/hudi-spark-bundle.jar,hdfs:///apps/hudi/lib/spark-spark-avro.jar", 
     "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
     "spark.sql.hive.convertMetastoreParquet":"false"
     }
}
```

## Use `%%sh` to run `spark-submit`
<a name="using-sh-sparkmagic"></a>

The `%%sh` magic runs shell commands in a subprocess on an instance of your attached cluster. Typically, you'd use one of the Spark-related kernels to run Spark applications on your attached cluster. However, if you want to use a Python kernel to submit a Spark application, you can use the following magic, replacing the bucket name with your bucket name in lowercase.

```
%%sh
spark-submit --master yarn --deploy-mode cluster s3://amzn-s3-demo-bucket/test.py
```

In this example, the cluster needs access to the location of `s3://amzn-s3-demo-bucket/test.py`, or the command will fail.

You can use any Linux command with the `%%sh` magic. If you want to run any Spark or YARN commands, use one of the following options to create an `emr-notebook` Hadoop user and grant the user permissions to run the commands:
+ You can explicitly create a new user by running the following commands.

  ```
  hadoop fs -mkdir /user/emr-notebook
  hadoop fs -chown emr-notebook /user/emr-notebook
  ```
+ You can turn on user impersonation in Livy, which automatically creates the user. See [Enabling user impersonation to monitor Spark user and job activity](emr-managed-notebooks-spark-monitor.md) for more information.

## Use `%%display` to visualize Spark dataframes
<a name="using-display-sparkmagic"></a>

You can use the `%%display` magic to visualize a Spark dataframe. To use this magic, run the following command. 

```
%%display df
```

Choose to view the results in a table format, as the following image shows.

![\[Output of using the %%display magic that shows results in a table format.\]](http://docs.amazonaws.cn/en_us/emr/latest/ManagementGuide/images/magic-display-table.png)


You can also choose to visualize your data with five types of charts. Your options include pie, scatter, line, area, and bar charts.

![\[Output of using the %%display magic that shows results in a chart format.\]](http://docs.amazonaws.cn/en_us/emr/latest/ManagementGuide/images/magic-display-chart.png)


## Use EMR Notebooks magics
<a name="emr-magics"></a>

Amazon EMR provides the following EMR Notebooks magics that you can use with Python3 and Spark-based kernels:
+ `%mount_workspace_dir` - Mounts your Workspace directory to your cluster so that you can import and run code from other files in your Workspace
**Note**  
With `%mount_workspace_dir`, only the Python 3 kernel can access your local file systems. Spark executors will not have access to the mounted directory with this kernel.
+ `%umount_workspace_dir` - Unmounts your Workspace directory from your cluster
+ `%generate_s3_download_url` - Generates a temporary download link in your notebook output for an Amazon S3 object 

### Prerequisites
<a name="emr-magics-prereqs"></a>

Before you install EMR Notebooks magics, complete the following tasks:
+ Make sure that your [Service role for cluster EC2 instances (EC2 instance profile)](emr-iam-role-for-ec2.md) has read access for Amazon S3. The `EMR_EC2_DefaultRole` with the `AmazonElasticMapReduceforEC2Role` managed policy fulfills this requirement. If you use a custom role or policy, make sure that it has the necessary S3 permissions.
**Note**  
EMR Notebooks magics run on a cluster as the notebook user and use the EC2 instance profile to interact with Amazon S3. When you mount a Workspace directory on an EMR cluster, all Workspaces and EMR notebooks with permission to attach to that cluster can access the mounted directory.  
Directories are mounted as read-only by default. While `s3fs-fuse` and `goofys` allow read-write mounts, we strongly recommend that you do not modify mount parameters to mount directories in read-write mode. If you allow write access, any changes made to the directory are written to the S3 bucket. To avoid accidental deletion or overwriting, you can enable versioning for your S3 bucket. To learn more, see [Using versioning in S3 buckets](https://docs.amazonaws.cn/AmazonS3/latest/userguide/Versioning.html).
+ Run one of the following scripts on your cluster to install the dependencies for EMR Notebooks magics. To run a script, you can either [Use custom bootstrap actions](emr-plan-bootstrap.md#bootstrapCustom) or follow the instructions in [Run commands and scripts on an Amazon EMR cluster](https://docs.amazonaws.cn/emr/latest/ReleaseGuide/emr-commandrunner.html) when you already have a running cluster.

  You can choose which dependency to install. Both [s3fs-fuse](https://github.com/s3fs-fuse/s3fs-fuse) and [goofys](https://github.com/kahing/goofys) are FUSE (Filesystem in Userspace) tools that let you mount an Amazon S3 bucket as a local file system on a cluster. The `s3fs` tool provides an experience similar to POSIX. The `goofys` tool is a good choice when you prefer performance over a POSIX-compliant file system.

  The Amazon EMR 7.x series uses Amazon Linux 2023, which doesn't support EPEL repositories. If you're running Amazon EMR 7.x, follow the [s3fs-fuse GitHub](https://github.com/s3fs-fuse/s3fs-fuse/blob/master/COMPILATION.md) instructions to install `s3fs-fuse`. If you use the 5.x or 6.x series, use the following commands to install `s3fs-fuse`.

  ```
  #!/bin/sh
  
  # Install the s3fs dependency for EMR Notebooks magics 
  sudo amazon-linux-extras install epel -y
  sudo yum install s3fs-fuse -y
  ```

  **OR**

  ```
  #!/bin/sh
  
  # Install the goofys dependency for EMR Notebooks magics 
  sudo wget https://github.com/kahing/goofys/releases/latest/download/goofys -P /usr/bin/
  sudo chmod ugo+x /usr/bin/goofys
  ```

### Install EMR Notebooks magics
<a name="emr-magics-install"></a>

**Note**  
With Amazon EMR releases 6.0 through 6.9.0, and 5.0 through 5.36.0, only `emr-notebooks-magics` package versions 0.2.0 and higher support `%mount_workspace_dir` magic.

Complete the following steps to install EMR Notebooks magics.

1. In your notebook, run the following commands to install the [https://pypi.org/project/emr-notebooks-magics/](https://pypi.org/project/emr-notebooks-magics/) package.

   ```
   %pip install boto3 --upgrade
   %pip install botocore --upgrade
   %pip install emr-notebooks-magics --upgrade
   ```

1. Restart your kernel to load the EMR Notebooks magics.

1. Verify your installation with the following command, which should display output help text for `%mount_workspace_dir`.

   ```
   %mount_workspace_dir?
   ```

### Mount a Workspace directory with `%mount_workspace_dir`
<a name="emr-magics-mount-workspace"></a>

The `%mount_workspace_dir` magic lets you mount your Workspace directory onto your EMR cluster so that you can import and run other files, modules, or packages stored in your directory.

The following example mounts the entire Workspace directory onto a cluster, and specifies the optional *`<--fuse-type>`* argument to use goofys for mounting the directory.

```
%mount_workspace_dir . <--fuse-type goofys>
```

To verify that your Workspace directory is mounted, use the following example to display the current working directory with the `ls` command. The output should display all of the files in your Workspace.

```
%%sh
ls
```

When you're done making changes in your Workspace, you can unmount the Workspace directory with the following command:

**Note**  
Your Workspace directory stays mounted to your cluster even when the Workspace is stopped or detached. You must explicitly unmount your Workspace directory.

```
%umount_workspace_dir
```

### Download an Amazon S3 object with `%generate_s3_download_url`
<a name="emr-magics-generate-s3-download-url"></a>

The `generate_s3_download_url` command creates a presigned URL for an object stored in Amazon S3. You can use the presigned URL to download the object to your local machine. For example, you might run `generate_s3_download_url` to download the result of a SQL query that your code writes to Amazon S3.

The presigned URL is valid for 60 minutes by default. You can change the expiration time by specifying a number of seconds for the `--expires-in` flag. For example, `--expires-in 1800` creates a URL that is valid for 30 minutes.

The following example generates a download link for an object by specifying the full Amazon S3 path: `s3://EXAMPLE-DOC-BUCKET/path/to/my/object`.

```
%generate_s3_download_url s3://EXAMPLE-DOC-BUCKET/path/to/my/object
```

To learn more about using `generate_s3_download_url`, run the following command to display help text.

```
%generate_s3_download_url?
```

### Run a notebook in headless mode with `%execute_notebook`
<a name="headless-execution"></a>

With `%execute_notebook` magic, you can run another notebook in headless mode and view the output for each cell that you've run. This magic requires additional permissions for the instance role that Amazon EMR and Amazon EC2 share. For more details on how to grant additional permissions, run the command `%execute_notebook?`.

During a long-running job, your system might go to sleep because of inactivity, or might temporarily lose internet connectivity. This might disrupt the connection between your browser and the Jupyter Server. In this case, you might lose the output from the cells that you've run and sent from the Jupyter Server.

If you run the notebook in headless mode with `%execute_notebook` magic, EMR Notebooks captures output from the cells that have run, even if the local network experiences disruption. EMR Notebooks saves the output incrementally in a new notebook with the same name as the notebook that you've run. EMR Notebooks then places the notebook into a new folder within the workspace. Headless runs occur on the same cluster and uses service role `EMR_Notebook_DefaultRole`, but additional arguments can alter the default values.

To run a notebook in headless mode, use the following command:

```
%execute_notebook <relative-file-path>
```

To specify a cluster ID and service role for a headless run, use the following command:

```
%execute_notebook <notebook_name>.ipynb --cluster-id <emr-cluster-id> --service-role <emr-notebook-service-role>
```

When Amazon EMR and Amazon EC2 share an instance role, the role requires the following additional permissions:

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "elasticmapreduce:StartNotebookExecution",
        "elasticmapreduce:DescribeNotebookExecution",
        "ec2:DescribeInstances"
      ],
      "Resource": [
        "*"
      ],
      "Sid": "AllowELASTICMAPREDUCEStartnotebookexecution"
    },
    {
      "Effect": "Allow",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": [
        "arn:aws-cn:iam::123456789012:role/EMR_Notebooks_DefaultRole"
      ],
      "Sid": "AllowIAMPassrole"
    }
  ]
}
```

------

**Note**  
To use `%execute_notebook` magic, install the `emr-notebooks-magics` package, version 0.2.3 or higher.

# Use multi-language notebooks with Spark kernels
<a name="emr-multi-language-kernels"></a>

Each Jupyter notebook kernel has a default language. For example, the Spark kernel's default language is Scala, and the PySpark kernels's default language is Python. With Amazon EMR 6.4.0 and later, EMR Studio supports multi-language notebooks. This means that each kernel in EMR Studio can support the following languages in addition to the default language: Python, Spark, R, and Spark SQL.

To activate this feature, specify one of the following magic commands at the beginning of any cell.


****  

| Language | Command | 
| --- | --- | 
| Python | `%%pyspark` | 
| Scala | `%%scalaspark` | 
| R | `%%rspark` Not supported for interactive workloads with EMR Serverless. | 
| Spark SQL | `%%sql` | 

When invoked, these commands execute the entire cell within the same Spark session using the interpreter of the corresponding language.

The `%%pyspark` cell magic allows users to write PySpark code in all Spark kernels.

```
%%pyspark
a = 1
```

The `%%sql` cell magic allows users to execute Spark-SQL code in all Spark kernels.

```
%%sql
SHOW TABLES
```

The `%%rspark` cell magic allows users to execute SparkR code in all Spark kernels.

```
%%rspark
a <- 1
```

The `%%scalaspark` cell magic allows users to execute Spark Scala code in all Spark kernels.

```
%%scalaspark
val a = 1
```

## Share data across language interpreters with temporary tables
<a name="emr-temp-tables"></a>

You can also share data between language interpreters using temporary tables. The following example uses `%%pyspark` in one cell to create a temporary table in Python and uses `%%scalaspark` in the following cell to read data from that table in Scala.

```
%%pyspark
df=spark.sql("SELECT * from nyc_top_trips_report LIMIT 20")
# create a temporary table called nyc_top_trips_report_view in python
df.createOrReplaceTempView("nyc_top_trips_report_view")
```

```
%%scalaspark
// read the temp table in scala
val df=spark.sql("SELECT * from nyc_top_trips_report_view")
df.show(5)
```