Jupyter Notebooks on GitHub: Real-World Examples

4 minute read

This is a guest post by András Novoszáth


Do you cringe when you think about comparing two versions of the same Jupyter Notebook? Do you find it frustrating to share and track Jupyter Notebooks with your team? Have you wondered if there is a better way to track changes in notebooks?

We feel your pain!

Tracking changes in Jupyter Notebooks is famously hard because of their special file format. In contrast to simple scripts, the rich JSON becomes a pain when you try to use them in development.

We faced the same issue and created ReviewNB as the answer. It solves change tracking in notebooks by presenting only the relevant changes within their specific and meaningful context. You can also use it to start conversations about notebook changes to address them more cleanly and effectively.

In this article, we show you how some open source projects on GitHub are tracking changes & collaborating on their Jupyter Notebooks via ReviewNB. Let’s jump right in!

Use Cases

Thousand of GitHub repositories are using ReviewNB for their Jupyter Notebook collaboration. Today we’re going to look at some examples from Tensorflow, Google Cloud Platform & numpy.

We have organized the examples into three groups:

  1. Track output diffs in notebooks
  2. Track code diffs in notebooks
  3. Discuss notebook changes

By the end of this article, you will learn ways and examples of how Open Source projects use ReviewNB and how it can help you in your work with Jupyter Notebooks. Let’s start with the most challenging one, tracking cell outputs!

Track Output Diffs in Notebooks

Perhaps the biggest challenge with tracking notebook changes is to track cell outputs. This challenge is especially true when the output includes images, plots etc. Let’s see how ReviewNB can help you to track output changes!

Rich Diffs

Text-based version control systems like git cannot show image-based changes in a meaningful way. When you use ReviewNB, you can see both the original and the changed image next to each other together in the context of the code cell that produced them. Here’s an example of image diffs in ReviewNB -

source

Because the two images differ only in small details, you might not even be aware of the change without this explicit side-by-side visual comparison. If this notebook is part of a report based on which people make critical decisions, the side-by-side visual comparison becomes very useful.

Track Experiment Output Differences

ReviewNB also allows you to track text output which makes it a powerful tool when it comes to running multiple experiments.

Here, for example, the data scientist reran the notebook after making several changes to the model. One of the notebook cells calculates and prints the model’s performance metrics.

source

Because ReviewNB tracks experiment outputs, it allows reviewers to see how code changes affected model performance. This feature allows you to connect model experimentation with code version control.

We saw how ReviewNB allows you to track your notebooks’ output changes. In the next section you will also learn how it can be used to track changes in your code cells.

Track Code Diffs in Notebooks

Tracking code diffs in notebooks should not be that different from tracking diffs in regular scripts. However, the notebook’s format makes this hard by default.

ReviewNB allows users to see code changes line-by-line and without the complex JSON formatting. Even more, it presents the diff in usual notebook cell format, providing a meaningful context & making it easier to comprehend notebook changes.

Variable and parameter name change is perhaps the most common “event” in the lifecycle of a code. This issue is perhaps even more true for notebooks because people often use them as self-contained reports and experiments. In this example, the developer changed the names of two parameters and adjusted the following cells to mirror these changes.

variable_change

source

ReviewNB shows the name changes cell-by-cell and line-by-line. This approach gives you context to interpret the changes together and save you error-prone guesswork.

Discuss Notebook Changes

ReviewNB allows you to have comments about individual cells within the notebook. You can start a discussion under cells and proposed changes in the notebook diff.

Comment on Cells

You can start a discussion on any notebook cell with ReviewNB. For example, data scientists could suggest new changes and track task status.

source

Using ReviewNB to have these conversations make the discussion place-specific and, therefore, more effective. Because of the same reason, the discussion frees up the PR-level discussion where you can focus on higher-level problems.

Comment on Proposed Changes

You can use ReviewNB to comment on specific proposed changes. For example, here, developers can give suggestions on a change at its cell.

source

This has the same benefits as the ‘normal’ ReviewNB discussion type. But here, it also provides space to explain and discuss the proposed changes better.

Summary

More than 100,000 Jupyter notebooks have been reviewed with ReviewNB across thousands of GitHub repositories. In this article, you have seen several examples of ReviewNB improving change tracking in Jupyter Notebooks.

Specifically, you could see how it tracks code and output changes and how it provides space for context-specific discussions. Imagine what it would be like to tackle the same issues by decyphering Github diffs or manually comparing notebooks!

But, of course, these are not the only possible ways to use ReviewNB. Do you have a use-case you miss from the list? Let us know!

Stay on Top of Your Notebook Projects

Would you like to try ReviewNB?

It’s completely free for open source or academic use. If you want to track notebooks in private repositories, check out our options!