JupyterHub
Jupyter Notebook
Sparkmagic is a library of kernels that allows Jupyter notebooks to interact with Apache Spark
The following diagram depicts the components of JupyterHub on Amazon EMR with corresponding authentication methods for notebook users and the administrator. For more information, see Adding Jupyter Notebook users and administrators.
The following table lists the version of JupyterHub included in the latest release of the Amazon EMR 7.x series, along with the components that Amazon EMR installs with JupyterHub.
For the version of components installed with JupyterHub in this release, see Release 7.6.0 Component Versions.
Amazon EMR Release Label | JupyterHub Version | Components Installed With JupyterHub |
---|---|---|
emr-7.6.0 |
JupyterHub 1.5.0 |
emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub |
The following table lists the version of JupyterHub included in the latest release of the Amazon EMR 6.x series, along with the components that Amazon EMR installs with JupyterHub.
For the version of components installed with JupyterHub in this release, see Release 6.15.0 Component Versions.
Amazon EMR Release Label | JupyterHub Version | Components Installed With JupyterHub |
---|---|---|
emr-6.15.0 |
JupyterHub 1.5.0 |
aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub |
The following table lists the version of JupyterHub included in the latest release of the Amazon EMR 5.x series, along with the components that Amazon EMR installs with JupyterHub.
For the version of components installed with JupyterHub in this release, see Release 5.36.2 Component Versions.
Amazon EMR Release Label | JupyterHub Version | Components Installed With JupyterHub |
---|---|---|
emr-5.36.2 |
JupyterHub 1.4.1 |
aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub |
The Python 3 kernel included with JupyterHub on Amazon EMR is 3.6.4.
The libraries installed within the jupyterhub
container may vary between Amazon EMR release versions and Amazon EC2 AMI versions.
To list installed libraries using conda
Run the following command on the master node command line:
sudo docker exec jupyterhub bash -c "conda list"
To list installed libraries using pip
Run the following command on the master node command line:
sudo docker exec jupyterhub bash -c "pip freeze"
Topics
- Create a cluster with JupyterHub
- Considerations when using JupyterHub on Amazon EMR
- Configuring JupyterHub
- Configuring persistence for notebooks in Amazon S3
- Connecting to the master node and Notebook servers
- JupyterHub configuration and administration
- Adding Jupyter Notebook users and administrators
- Installing additional kernels and libraries
- JupyterHub release history