Configuring persistence for notebooks in Amazon S3

You can configure a JupyterHub cluster in Amazon EMR so that notebooks saved by a user persist in Amazon S3, outside of ephemeral storage on cluster EC2 instances.

You specify Amazon S3 persistence using the jupyter-s3-conf configuration classification when you create a cluster. For more information, see Configure applications.

In addition to enabling Amazon S3 persistence using the s3.persistence.enabled property, you specify a bucket in Amazon S3 where notebooks are saved using the s3.persistence.bucket property. Notebooks for each user are saved to a jupyter/jupyterhub-user-name folder in the specified bucket. The bucket must already exist in Amazon S3, and the role for the EC2 instance profile that you specify when you create the cluster must have permissions to the bucket (by default, the role is EMR_EC2_DefaultRole). For more information, see Configure IAM roles for Amazon EMR permissions to Amazon services.

When you launch a new cluster using the same configuration classification properties, users can open notebooks with the content from the saved location.

Note that when you import files as modules in a notebook when you have Amazon S3 enabled, this will result in the files uploading to Amazon S3. When you import files without enabling Amazon S3 persistence, they upload to your JupyterHub container.

The following example enables Amazon S3 persistence. Notebooks saved by users are saved in the s3://MyJupyterBackups/jupyter/jupyterhub-user-name folder for each user, where jupyterhub-user-name is a user name, such as diego.


[
    {
        "Classification": "jupyter-s3-conf",
        "Properties": {
            "s3.persistence.enabled": "true",
            "s3.persistence.bucket": "MyJupyterBackups"
        }
    }
]

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Configuring JupyterHub

Connecting to the master node and Notebook servers