Jupyter Development Tools for Code Linting, Debugging, Testing & Git Version Control

5 minute read

Tools to make Development in Jupyter much more pleasurable

Why?

When most people think of Jupyter, they think of a single tool - the Notebook. That is not quite accurate, as of today, Jupyter has a wide ecosystem of tools which make it incredibly powerful.

We take a look at select tools which can make your development experience much better. I’ve tried almost all of them and use several of them personally in my day to day data exploration and modeling work.

Readability & Navigation

Code Styling

Black

Black is arguably among the most opinionated and popular Python auto-formatter. In Jupyter, it helps keep your code readable and consistent - with similar style being enforced across all users. The Jupyter Notebook extension is called jupyter-black. It’s easy to use but you’ll need to install black which now only works for Python 3.6+.

Autopep8

Autopep8 is the auto-formatter based on the Python PEP8. The standard is well loved among Pythonistas and autopep8 enforces it. It is part of Jupyter NB Contrib Extensions: jupyter-autopep8

The contrib extension bundle has a few more nifty utilities. E.g. Auto-save is quite handy when you’re SSH’ed into a notebook VM with an unstable internet connection.

isort

The isort library sorts and arranges your imports. This makes your code consistent and more readable. You can also combine this with one of the auto-formatters e.g. black or autopep8. This is also part of the contrib-nbextensions: Sort imports using isort

In some cases, the extension throws an error when used from within the notebook. This is usually with the latest isort release. In those cases, downgrade to isort==4.3.1 instead.

One nice advantage of using these tools is that your diffs won’t be cluttered across commits because these tools will auto-format your code in a consistent manner.

Code Folding

With an IDE, you might collapse certain sections of code so that they’re easier to navigate or scroll past. E.g. you can collapse a function into its definition for faster navigation. You can do this in Jupyter notebooks as well. The keyboard shortcut for this is Alt+F by default. This is also available as part of the Jupyter Contrib NB Extensions bundle: Codefolding.

Debugging

%debug Magic Command

When you see an error, you can run %debug in a new cell to activate IPython Debugger. Standard keyboard shortcuts such as c for continue, n for next, q for quit apply.

Variable Inspector

Variable Inspector is a lightweight utility which helps you keep track of all the variables and their types while you’re writing code. During experimentation and actively exploring code, I’d sometimes lose track if this was a list or a tuple.

Variable Inspector

Instead of calling type(var), the variable inspector quickly shows me the type & the value of any variable.

Testing

With Python, unit tests and Pytest are the most popular libraries to write and maintain tests.

Treon

Treon: This collects both Python unit.tests and doc tests within the notebook and runs them. It also executes your notebook, cell by cell - and flags in case there are errors. This makes it quite easy to add to a CI.

nbval

nbval: This Pytest plugin uses a clever approach where the output stored in .ipynb file is used as the ground truth for each cell. Every time you run the test, fresh output is compared against the ground truth stored in the notebook itself. The goal is to ensure that the notebook is behaving as expected and there are no regressions caused by changes to underlying source code.

This is quite neat, since you don’t have to write any tests manually. You can also configure this to work with a CI via pytest.

Working with Git

For data scientists coming from non-programming backgrounds, the JupyterLab ecosystem (not classic Jupyter) has two tools which make Git and GitHub workflows easier to navigate. This is much better than naming your file final_final_v12.ipynb and sharing via Slack, email, Google Drive etc. Past versions of your notebook are preserved with git commit.

If you’ve used VS Code/PyCharm, you’ll find that these two extensions combined means that you don’t have to leave Jupyter and go to terminal or IDE to perform git operations. This reduced context switching will save some cognitive load for you.

The tools for working with Git are:

JupyterLab Git

  • JupyterLab-git: This is a JupyterLab Extension for Git. It provides a GUI for Git operations directly in the JupyterLab interface.
This gif is from JupyterLab-git repository
This gif is from JupyterLab-git repository

JupyterLab GitPlus

  • JupyterLab-gitplus: JupyterLab extension for GitHub. It provides GUI to push GitHub commits & create pull requests from JupyterLab UI.

Code Review

So you’ve committed your Jupyter notebook and pushed it to GitHub. How do you review notebook changes?

Say someone else has made a few changes to your Jupyter notebook. How do you merge these changes into your own notebook?

With Python files, you’d use your usual git diff to review changes and git merge to perform an automatic merge. Merge conflicts, if any, can be resolved manually.

This does not work well for notebooks because each .ipynb notebook is stored as a rich JSON. The git diff for Jupyter Notebooks is horrible to review & manually merging large JSONs is beyond human capability.

How do we solve this? With nbdime & ReviewNB!

nbdime

nbdime provides tools for notebook diff and merge. Identical to git diff in spirit, it makes an important change over git diff: It goes cell by cell instead of line by line. This means each diff is calculated on a Jupyter cell - which is now human readable!

But for code reviews - nbdime has two key limitations:

  1. You can not comment on diffs
  2. You can only view the diffs locally on your terminal or browser for repositories you’ve cloned locally

ReviewNB

ReviewNB solves both: diffs + discussion for notebook challenges in one go.

ReviewNB

The ReviewNB diff is also richer as it goes line by line within each cell and for outputs, both.

ReviewNB

You can review all the changes in the notebook - with a workflow identical to that of GitHub. First, you make all your comments on a PR and then when your review is done - ReviewNB pushes them to GitHub.

You’ll notice that on GitHub, the comment is against a cell start in JSON. The tool adds a nice link to the comment in ReviewNB - so that you can reply to the same easily while looking at the code cell.

Github Comments

About Author

Nirant (about) has been doing Machine Learning for over 5 years, in addition to being a published author.

Summary

We went through 2-3 tools across each important development task: from writing code to testing, debugging and then working with Git and finally code review.

This table maps some of the tooling which we discussed and the corresponding Notebook library or extension.

Function Code Notebook
Code Styling Black, isort, autopep8 jupyter-black
Code Completion Github Copilot, TabNine, Kite TabNine for Jupyter, Kite for JupyterLab
Git commit, push Git - Terminal Commands JupyterLab-git
Github push commits, raise a PR VS Code/IDE JupyterLab-gitplus
Diff and Merge Git Diff NBDime
Code Review Github Code Review in PR ReviewNB
Unit Tests Python unit.tests Treon
Pytest Python Pytest nbval