Rich Diffs for Jupyter Commits & Pull Requests
Version Control is one of the major challenges with Jupyter Notebooks. One can use git to version control notebooks but it’s hard to review notebook diffs i.e. see what changed from one notebook version to another. The issue stems from the fact that Jupyter uses JSON underneath & stores rich media (HTML, images) in the JSON itself. This kind of hybrid format is not well supported in Git. Hence git diffs for Jupyter Notebook are pretty hard to review & resolving merge conflicts is a source of pain. We’re going to look at the two tools that helps us solve this problem: nbdime & ReviewNB
Disclaimer: I’m the founder of ReviewNB but this is an objective, factual review of both nbdime & ReviewNB.
Please note, there are some other approaches to work around notebook version control. They focus on converting notebooks to .py files (e.g. jupytext) or stripping output from notebooks (e.g. nbstripout). While these might be suitable for some, they take away the most useful feature of notebooks (embedded images, widgets, graphs) from the version control, and subsequently from diff & review process. Hence, we are going to stick to nbdime & ReviewNB which actually work with notebook format itself.
nbdime
nbdime provides tools for diff’ing & merging notebooks in your local environment. You can run nbdiff
or nbdiff-web
commands to see notebook diffs on the command line or web browser respectively. nbmerge
supports three-way merge of notebooks with automatic conflict resolution. You can also configure git to use nbdime’s diff & merge tools when git diff
or git merge
is run on a Jupyter notebook.
Some limitations of nbdime are,
- It can’t render pull request diff (only commits or direct file names are supported with
nbdiff
) - There’s no way to write comments or provide feedback (to be fair, nbdime was not built for this purpose)
ReviewNB
ReviewNB provides diff & commenting for Jupyter Notebooks on GitHub. You can see rich notebook diffs for any commit or pull request. You can comment on a notebook cell and the appropriate email notifications are sent to anyone watching the repository (of course they can unsubscribe). It’s useful to see diffs, ask questions, provide feedback & work collaboratively in the context of notebooks.
We can track all open discussions with conversation threads & have team conversations directly on notebooks (GDoc style comments for Jupyter). ReviewNB is a web application so you don’t need to install anything on your own machine (you can self host ReviewNB server application if required). It integrates directly with your GitHub and the app is verified by GitHub & available for sale in the GitHub marketplace.
Some limitation of ReviewNB are,
- No way to merge notebooks (you’ll need nbdime for this)
- Only works with GitHub as of Dec-2020 (follow/upvote for GitLab & BitBucket support)
- It’s a paid service. They do offer free plans for open source & education.
Summary
As you’d have noticed, both nbdime & ReviewNB have their own strengths & limitations. ReviewNB is good at diff’ing your GitHub commits, PRs & have discussion around notebooks. While nbdime is good at local diff’ing & merging. You can use both the tools parallely to satisfy the notebook version control needs of your team!