Top 10 tips for working efficiently with Jupyter Notebook
Jupyter Notebook is a powerful tool used by data scientists, researchers, and analysts to write, visualize, and share their code and findings. While it’s a user-friendly platform, there are plenty of tips and tricks that can help you make the most out of your Jupyter Notebook experience.
In this post, we’ll take a look at some tips that can help you become more efficient and productive when using Jupyter Notebook whether you’re a notebook newb or an experienced user.
What is the difference between Jupyter, JupyterLab, and JupyterHub?
Jupyter is a non-profit, open-source project that builds interactive tooling for science and analytics. Jupyter is the overarching project whereas JupyterLab and JupyterHub are both Jupyter subprojects.
JupyterLab is an IDE that supports Jupyter notebooks, code, data, and images. Web-based and desktop versions of JupyterLab are available.
JupyterHub is an open-source multiuser platform for Jupyter notebooks.
How to version control Jupyter notebooks?
Jupyter notebooks can be version controlled using the same version-control systems used for software development. Some widely used online version control platforms are GitHub, Gitlab, and Bitbucket.
These platforms host your notebooks and store the changes you make to your notebooks over time. They provide tooling to compare versions of your notebooks, return to older versions, and collaboratively review notebooks changes.
These platforms have certain challenges (notebook diffs, commenting) when it comes to version control for notebooks. You can visit our blog post where we talk about the notebook version control problem & suggest solutions.
How do I use Docker with Jupyter notebooks?
You can run Jupyter notebooks using Docker containers instead of installing Jupyter directly on your local machine or a cloud compute instance. Using Docker ensures that you are using the same versions of underlying dependencies and Python modules to provide consistent behavior across different environments.
The Jupyter project provides a suite of Jupyter-related Docker images that come preinstalled with dependencies that suit different use cases.
To run Jupyter notebooks using Docker:
- Build or download the Docker container you need. For example:
docker pull jupyter/minimal-notebook
- Run the container, exposing the Jupyter server port. For example:
docker run -p 8008:8888 jupyter/minimal-notebook
- In your browser, navigate to the exposed port on your local machine. For example:
The token can be found on your Jupyter console.
For step-by-step guidance on how to run Jupyter notebooks using Docker, you can follow the instructions provided in these articles:
- How to Run Jupyter Notebook on Docker.
- Tutorial: Running a Dockerized Jupyter Server for Data Science.
What are some good online Jupyter Notebook platforms?
Jupyter is an open source project that can be setup for multi-user collaboration via Git / GitHub.
There are many commercial online Jupyter Notebook platforms as well. Here are a few options:
- Microsoft provides notebooks as part of their Azure Machine Learning Studio.
- Amazon AWS provides notebooks as part of their machine learning platform Amazon Sagemaker.
- Google cloud provides notebooks as part of their machine learning platform Vertex AI Workbench.
- Google also provides a more lightweight, free, non-enterprise notebook platform, Colaboratory, which forms part of Google Suite.
- DeepNote & Notable are newer Jupyter Notebook platforms for Teams.
- Jetbrains offers a suite of online data developer tools, which includes support for working with notebooks.
- CoCalc provides notebooks as part of their platform for collaborative computational work.
- Databricks is a data warehouse platform that also offers data processing tooling, including Jupyter notebooks.
- Qubole is a data processing platform for data lakes that also offers data processing tooling, including Jupyter notebooks.
- Binder is an online platform for turning public Git repositories of notebook code into sharable, executable Jupyter notebooks.
Why would I convert my notebooks to Python modules? And how?
There are a few benefits to converting your notebooks to modules:
- Code in modules can be imported to multiple notebooks across different projects, so you can avoid the error-prone practice of writing the same code snippets repeatedly in different notebooks.
- Code in modules can be tested. You can test your code by modularizing it (breaking it into smaller parts) and using tools like Python’s
pytestto test each component. This leads to more reliable code.
- Modules are easier to collaborate on because version control tools like GitHub are optimized to work with code in modules.
- Modules can be bundled into Python packages, which can be version controlled and distributed through online package repositories.
If I decide to turn my Jupyter notebooks into modules, how would I go about it?
The simplest way to turn notebooks into modules is to take the Python code in your notebook and put it into a
.py file. A convenient way to start is to clear your notebook’s output, then download it as a
.py file using the Jupyter notebook toolbar option
File > Download as > Python (.py).
You’ll need to go through the
.py file and clean it up and ensure that the code is grouped into meaningful functions that can be imported from the module.
How can I test Jupyter notebook code?
Some of the libraries available to test notebooks:
- treon runs notebooks end to end and flags execution errors. treon also runs unittests present in the notebook.
- nbmake is a pytest plugin that also runs notebooks end to end and flags execution errors. It has a particular focus on Jupyter notebooks used for documentation.
- nbval is a pytest plugin that compares the output of a notebook run with the output stored in the notebook.
- testbook tests components of your notebook via a separate
.pyfile containing unit tests, which are run using pytest.
How do you import and use Plotly in Jupyter notebooks?
Plotly provides a Python package with two methods of creating plots:
- Plotly express - a high-level interface for making plots. Plotly express can be imported using:
import plotly.express as px
- Plotly graph objects - an interface to the lower-level Plotly graph components. Plotly graph objects can be imported using:
import plotly.graph_objects as go
How do you split cells in Jupyter notebooks?
You can split a Jupyter notebook cell into two by positioning your cursor at the location you want the split to occur, then you can either:
- Use the keyboard shortcut
Ctrl + Shift + -on windows or
Cmd + Shift + -on Mac
- Use the toolbar, navigating to
How to schedule Jupyter notebooks?
The simplest way to schedule a Jupyter notebook is to convert it to a
.py file and schedule it using a system method like cron or the windows task scheduler.
If you would prefer to schedule jobs with a user interface, you can try Jupyter scheduler, a JupyterLab extension that lets you run Jupyter notebooks on a schedule. Another option is Notebooker, a lightweight web application that lets you schedule notebooks.
Alternatively, many online data science or machine learning platforms that offer Jupyter notebooks also offer notebook scheduling. For example:
How to share Jupyter notebooks with others?
You can share your Jupyter notebook with others in a few ways.
One simple way is to send them the
.ipynb notebook file, which they can then run themselves.
Uploading your notebook to a version control platform like GitHub gives you a few more options to share your notebook:
- You can directly share a link to the notebook in the GitHub repository.
- You can use the GitHub link on NBViewer to share a static version of your notebook.
- You can use the GitHub link on Binder to share an executable version of your notebook.
Alternatively, you can share your notebook on one of the many online Jupyter Notebook platforms. For example, Azure Machine Learning Studio, Amazon Sagemaker, Vertex AI Workbench, DeepNote, Notable, or Jetbrains.
How to code review Jupyter notebooks?
If you’re looking for a way to enhance your collaboration and code review process for Jupyter notebooks, ReviewNB is an excellent tool to consider.
It provides visual diffs for Jupyter notebooks, making it easy to track changes made to a notebook by multiple users over time. With ReviewNB, you can quickly review diffs between notebook versions, comment on changes, and collaborate with your peers to improve your Jupyter Notebook projects.
In conclusion, Jupyter Notebook is a powerful tool for data analysis, research, and development. We hope our tips on getting the most out of your Jupyter Notebook experience will help you to streamline your workflows, increase productivity, and collaborate more efficiently.
Try ReviewNB to help you create compelling and interactive data analysis projects while improving your collaborative workflow.