Notebook users everywhere will be excited to know that GitHub recently launched rich diffs for Jupyter Notebooks. Earlier, GitHub Jupyter Notebook diffs were shown as textual diffs of the underlying notebook JSON - not very useful. The new functionality shows cell-by-cell rendered diffs and output diffs:
Very nice! The feature is in pilot period as of January 2023, find the pilot sign-up link in this discussion thread.
Unfortunately, one problem the new functionality doesn’t solve is the rendering of large notebooks. GitHub times out showing errors like “Unable to render rich display” or “The notebook took too long to render”.
GitHub can’t render notebooks larger than a few megabytes (we reviewed a few solutions to this problem in an earlier blog post). If you’re working with a large notebook, GitHub’s new rich diff feature doesn’t offer any improvements on this.
Let’s open up an example large notebook that is too big to display on GitHub:
We make some changes to the notebook, and check the changes using the JupyterLab git extension:
Next, we commit our changes, push them to GitHub, and create a pull request. If we navigate to the file comparison page now, we can see whether the new rich diff functionality will show us the diff. It doesn’t:
Why does the rich diff not render?
The full notebook may be big, but the change we made is small. So why does the rich diff not render?
The reason is twofold: Firstly, the new GitHub rich diff functionality uses nbdime under the hood. nbdime is a diff and merge tool for notebooks (you can take a look at our Git-Jupyter workflow tools blog post to read more about it). nbdime diffs can be very slow for large notebooks (as mentioned by others here and here).
Secondly, for performance reasons, GitHub imposes a 5-second limitation on rich diff rendering. So in cases where nbdime can’t render a diff in 5 seconds, it won’t render at all.
So what shall I do?
You can always fall back to using the JSON format source diffs on GitHub. Or you can do similar text-based source diffs with
git diff from a command line. This isn’t ideal because notebook source diffs often contain a large amount of extraneous information (such as metadata changes and binary figure changes) that make them difficult to work with.
Some new tools, such as nbdev and Jupytext, make source diffs easier to use. Take a look at our review of them here. Or you can run nbdime locally and wait the few minutes it might require to parse your large notebook. Another option is making use of code comparison options in local IDEs like Visual Studio Code.
Alternatively, ReviewNB is a reliable option for rendering large notebook diffs. ReviewNB uses its own diff algorithm that relies on
git patch and doesn’t suffer from nbdime’s performance problems. It also doesn’t impose timeouts, so if a larger notebook takes longer to render, it will still render eventually.
ReviewNB integrates seamlessly with GitHub and also offers inline notebook commenting so you can do line-by-line reviews. When it comes to effective collaboration for Jupyter notebook development, ReviewNB is still the most fully featured option.