Jupyter Development Tools for Code Linting, Debugging, Testing & Git Version Control
Tools to make Development in Jupyter much more pleasurable
Why?
When most people think of Jupyter, they think of a single tool - the Notebook
. That is not quite accurate, as of today, Jupyter has a wide ecosystem of tools which make it incredibly powerful.
We take a look at select tools which can make your development experience much better. I’ve tried almost all of them and use several of them personally in my day to day data exploration and modeling work.
Readability & Navigation
Code Styling
Black
Black is arguably among the most opinionated and popular Python auto-formatter. In Jupyter, it helps keep your code readable and consistent - with similar style being enforced across all users. The Jupyter Notebook extension is called jupyter-black. It’s easy to use but you’ll need to install black
which now only works for Python 3.6+.
Autopep8
Autopep8 is the auto-formatter based on the Python PEP8. The standard is well loved among Pythonistas and autopep8 enforces it. It is part of Jupyter NB Contrib Extensions: jupyter-autopep8
The contrib extension bundle has a few more nifty utilities. E.g. Auto-save is quite handy when you’re SSH’ed into a notebook VM with an unstable internet connection.
isort
The isort library sorts and arranges your imports. This makes your code consistent and more readable. You can also combine this with one of the auto-formatters e.g. black or autopep8. This is also part of the contrib-nbextensions: Sort imports using isort
In some cases, the extension throws an error when used from within the notebook. This is usually with the latest isort release. In those cases, downgrade to isort==4.3.1 instead.
One nice advantage of using these tools is that your diffs won’t be cluttered across commits because these tools will auto-format your code in a consistent manner.
Code Folding
With an IDE, you might collapse certain sections of code so that they’re easier to navigate or scroll past. E.g. you can collapse a function into its definition for faster navigation. You can do this in Jupyter notebooks as well. The keyboard shortcut for this is Alt+F
by default. This is also available as part of the Jupyter Contrib NB Extensions bundle: Codefolding.
Debugging
%debug Magic Command
When you see an error, you can run %debug in a new cell to activate IPython Debugger. Standard keyboard shortcuts such as c
for continue, n
for next, q
for quit apply.
Variable Inspector
Variable Inspector is a lightweight utility which helps you keep track of all the variables and their types while you’re writing code. During experimentation and actively exploring code, I’d sometimes lose track if this was a list or a tuple.
Instead of calling type(var)
, the variable inspector quickly shows me the type & the value of any variable.
Testing
With Python, unit tests and Pytest are the most popular libraries to write and maintain tests.
Treon
Treon: This collects both Python unit.tests and doc tests within the notebook and runs them. It also executes your notebook, cell by cell - and flags in case there are errors. This makes it quite easy to add to a CI.
nbval
nbval: This Pytest plugin uses a clever approach where the output stored in .ipynb file is used as the ground truth for each cell. Every time you run the test, fresh output is compared against the ground truth stored in the notebook itself. The goal is to ensure that the notebook is behaving as expected and there are no regressions caused by changes to underlying source code.
This is quite neat, since you don’t have to write any tests manually. You can also configure this to work with a CI via pytest.
Working with Git
For data scientists coming from non-programming backgrounds, the JupyterLab ecosystem (not classic Jupyter) has two tools which make Git and GitHub workflows easier to navigate. This is much better than naming your file final_final_v12.ipynb
and sharing via Slack, email, Google Drive etc. Past versions of your notebook are preserved with git commit
.
If you’ve used VS Code/PyCharm, you’ll find that these two extensions combined means that you don’t have to leave Jupyter and go to terminal or IDE to perform git operations. This reduced context switching will save some cognitive load for you.
The tools for working with Git are:
JupyterLab Git
- JupyterLab-git: This is a JupyterLab Extension for Git. It provides a GUI for Git operations directly in the JupyterLab interface.
JupyterLab GitPlus
- JupyterLab-gitplus: JupyterLab extension for GitHub. It provides GUI to push GitHub commits & create pull requests from JupyterLab UI.
Code Review
So you’ve committed your Jupyter notebook and pushed it to GitHub. How do you review notebook changes?
Say someone else has made a few changes to your Jupyter notebook. How do you merge these changes into your own notebook?
With Python files, you’d use your usual git diff
to review changes and git merge
to perform an automatic merge. Merge conflicts, if any, can be resolved manually.
This does not work well for notebooks because each .ipynb
notebook is stored as a rich JSON. The git diff for Jupyter Notebooks is horrible to review & manually merging large JSONs is beyond human capability.
How do we solve this? With nbdime & ReviewNB!
nbdime
nbdime provides tools for notebook diff and merge. Identical to git diff in spirit, it makes an important change over git diff: It goes cell by cell instead of line by line. This means each diff is calculated on a Jupyter cell - which is now human readable!
But for code reviews - nbdime has two key limitations:
- You can not comment on diffs
- You can only view the diffs locally on your terminal or browser for repositories you’ve cloned locally
ReviewNB
ReviewNB solves both: diffs + discussion for notebook challenges in one go.
The ReviewNB diff is also richer as it goes line by line within each cell and for outputs, both.
You can review all the changes in the notebook - with a workflow identical to that of GitHub. First, you make all your comments on a PR and then when your review is done - ReviewNB pushes them to GitHub.
You’ll notice that on GitHub, the comment is against a cell start in JSON. The tool adds a nice link to the comment in ReviewNB - so that you can reply to the same easily while looking at the code cell.
About Author
Nirant (about) has been doing Machine Learning for over 5 years, in addition to being a published author.
Summary
We went through 2-3 tools across each important development task: from writing code to testing, debugging and then working with Git and finally code review.
This table maps some of the tooling which we discussed and the corresponding Notebook library or extension.
Function | Code | Notebook |
---|---|---|
Code Styling | Black, isort, autopep8 | jupyter-black |
Code Completion | Github Copilot, TabNine, Kite | TabNine for Jupyter, Kite for JupyterLab |
Git commit, push | Git - Terminal Commands | JupyterLab-git |
Github push commits, raise a PR | VS Code/IDE | JupyterLab-gitplus |
Diff and Merge | Git Diff | NBDime |
Code Review | Github Code Review in PR | ReviewNB |
Unit Tests | Python unit.tests | Treon |
Pytest | Python Pytest | nbval |