How to use Git / GitHub with Jupyter Notebook
This is a basic guide, if you’re already familiar with Git, check out our advanced Git ↔ Jupyter guide.
This is a comprehensive Git tutorial for Jupyter Notebook users. Feel free to skip a section if you’re already familiar with it. At the end you’ll be able to -
- Push your notebooks to a GitHub repository
- Start versioning your notebooks
- Review Jupyter notebook pull requests on GitHub
- Learn how to revert to a specific notebook version
- Get feedback & discuss notebook changes with your peers
- Easily share your notebooks for others to view
Create GitHub Account
If you don’t have a GitHub account please create one here.
Setup Git Locally
- Download and install the latest version of Git.
-
Setup your name & email in git by running following commands on terminal
>> git config --global user.name "Mona Lisa" >> git config --global user.email "email@example.com"
- Connect your local git client with GitHub by caching your password.
Create New Repository
A GitHub repository is like your supercharged folder in the cloud. You can store files (notebooks, data, source code), look at historical changes to these files, open issues, discuss changes and much more. People typically create one repository per project.
Let’s go ahead & create a repository on GitHub. Once created, you’ll see a page like below, copy the highlighted repository URL.
Clone Repository
Let’s clone the GitHub repository on our machine by running following on the terminal. It will create projectA directory on our machine which is linked to amit1rrr/projectA
repository on GitHub.
>> git clone https://github.com/amit1rrr/projectA.git
Cloning into 'projectA'...
warning: You appear to have cloned an empty repository.
Push Notebooks to GitHub
Our repository is empty right now, let’s push some notebooks to it. We copy two notebooks to the directory where we cloned projectA repository,
>> cp /some/path/analysis1.ipynb /path/of/projectA/
>> cp /some/path/scratch.ipynb /path/of/projectA/
Let’s say we want to push analysis1.ipynb
to GitHub. We first need to tell local git client to start tracking the file.
>> git add analysis1.ipynb
You can check which files are being tracked with git status,
You can see that analysis1.ipynb
is under “Changes to be committed:” so it’s being tracked by our local git client. Now let’s commit the changes,
# -m flag is used to provide a human friendly message describing the change
>> git commit -m "Adds customer data analysis notebook"
Commit simply creates a checkpoint that you can revert to at any time. Let’s push this commit to GitHub.
>> git push
Now you can visit the repository page on GitHub to see your commits.
Develop in a Branch
Say you are working on a large project spanning multiple days, but you need to periodically push work in progress commits as a backup. The way to do that is by creating a feature branch.
Each repository has a default branch (typically master
or main
) that stores the most up-to-date versions of completed work. Each member of your team can create their own feature branches to store their WIP commits. When their work in a feature branch is ready to be shared they can create a pull request for peer review & subsequently merge the feature branch into master. Let’s unpack that with concrete steps.
Say I’m about to start working on a new project to analyse customer data. First, I will create a new branch,
>> git checkout -b customer_data_insights
Then I’ll create/edit some notebooks & other files to do the actual analysis. When I’m ready to commit my WIP, I’ll do the usual git add, git commit, git push. At git push you will see following error since the branch does not exist on GitHub yet.
Simply push the branch first by copying the command shown in error,
>> git push --set-upstream origin customer_data_insights
And then do git push to push your commits to this newly created branch.
Create Pull Request
Let’s say you’ve been working on feature branch for a while, and it’s ready for prime time. Most likely, you’d want to first share it with your peers, get their feedback before merging it into master branch. That’s what pull requests are for.
You can create pull requests from GitHub UI. Go to your Project page -> Pull requests tab -> click “New pull request”.
Choose which branch you’d like to merge into master. Verify commits & list of files changed. Click “Create pull request”.
On the next page provide title, describe your changes in brief & click “Create pull request” again.
Review Notebook Pull Request
GitHub pull requests are fantastic for peer review as they let you see changes side-by-side & comment on them. But in the case of Jupyter, GitHub shows JSON diffs which are really hard to read (see below).
March 2023 update - GitHub has introduced rich diffs for Jupyter but the diff can often fail to render. See solutions here & the details about the problem here.
You can use ReviewNB to solve the notebook diff’ing problem. It shows you rich diffs & lets you comment on any notebook cell to discuss changes with your team.
Once your changes are approved you can merge them from GitHub UI.
Or run git merge + git push from command line,
Revert to a specific notebook version
If you want to temporarily go back to a commit, checkout the files, and come back to where you are then you can simply checkout the desired commit. At the end run “git checkout master” to go back to the current state.
If you want to actually revert to an old state and make some changes there, you can start a new branch from that commit.
>> git checkout -b old-state f33939cd63004e3e67b111f7bcb350ffd2b0608a
You can also browse old commits on GitHub by going to Your project page -> Commits. Open the desired commit and click “View File” to see the notebook status at that commit.
Share read-only links to your notebook
When you browse notebooks in your repository on GitHub it renders them as HTML. So it’s very convenient to share read-only links to the notebook like this one. If it’s a private repository, the person you are sharing the link with needs to have a GitHub account and have permission to access your repository.
For security reasons, GitHub does not run any Javascript in the notebook. You can use nbviewer or ReviewNB if your notebook contains interactive widgets and such.
Conclusion
If you are new to Git, it can take some time to get used to all the commands. But it’s a proven way of collaborating on software projects & is widely used in data science work as well. You can combine it with ReviewNB to remove some of the kinks in the workflow.
Happy Hacking!