Jupyter Notebook vs. Google Colab
If you’re working on a Data Science project in python, the first thing you need to decide is which coding platform to use. While there are tons of options out there, two of the most popular choices are Jupyter Notebook and Google Colaboratory. Oftentimes people don’t understand the exact difference between the two & don’t know when to use one over the other.
Today we’re going to review Jupyter Notebook and Google Colab and compare pros & cons of each to help you decide which platform is better for your next Data Science project.
How Jupyter Relates to Google Colab?
To start, it’s important to understand that Google Colab is actually based on the open source Jupyter project. In its most basic sense, Project Jupyter is a free software for interactive computing across multiple programming languages. Project Jupyter includes Jupyter Notebook, the classic interface; JupyterLab, the latest interactive development environment; and Jupyter Hub, the multi-user notebook version.
Google Colab is a hosted Jupyter notebook service. Meaning you can run your Jupyter Notebook online with no setup and access free computing resources including GPUs.
Jupyter Notebook and Google Colab: A Quick Comparison
The table below illustrates the basic features of Jupyter Notebook and Google Colab side-by-side so that you can quickly get an overview of their offerings.
Jupyter Notebook Features | Google Colab Features |
---|---|
Direct access to local file system | Files stored in Google Drive |
Uses your local hardware | 12 GB GPU RAM for up to 12 hours |
Install packages locally just once | Re-install packages for each session |
Considered safer in terms of data security | Usually easier for collaboration |
Git extension for version control | Revision history for version control |
As you can see, even though Google Colab is based on Project Jupyter, the two platforms differ in many ways. Let’s dive a little deeper into some of the features listed above -
Data Safety
When it comes to working with confidential data, you’d want to restrict access to your notebook & associated data. This is simple with Jupyter Notebook since everything is running locally on your own machine. However, Colab runs on Google servers. Their FAQ mentions Code is executed in a virtual machine private to your account so it should be safe for the most part (unless some vulnerability is found that let’s intruders access kernels in other VMs running on the same instance, but we’re being speculative here).
Access Control
If your team is on Gsuite / Gmail, then access control is fairly easy on Colab. In that aspect, you can think of Colab as Google docs for Jupyter Notebooks.
With Jupyter Notebooks, you can use your company’s version control platform (GitHub, GitLab, Bitbucket) which will provide you both - access control & version control.
Commenting in Colab vs. Jupyter Notebooks
Conversation workflow in Colab is pretty similar to Google Docs. Colab let’s you comment on a specific notebook cell to start a conversation. Your teammates can then jump in with their own comments, can resolve / unresolve conversations & so on.
There’s no inbuilt support for commenting in Jupyter Notebooks, you can overcome this limitation by using ReviewNB, which provides diffs & commenting functionality for Jupyter Notebooks on GitHub / Bitbucket.
Google Colab - Resources are Not Guaranteed
While Colab free tier is great, there’s no explicit guarantee on what resources you will get even on Colab Pro & Pro+ paid plans. In fact, there are some very specific limitations that users should be aware of when using Google Colab.
In the Colab documentation, they clearly state the following:
In order to be able to offer computational resources for free, Colab needs to maintain the flexibility to adjust usage limits and hardware availability on the fly. Resources available in Colab vary over time to accommodate fluctuations in demand, as well as to accommodate overall growth and other factors.
This means that although the typical limit is 12GB for 12 hours, this can change dynamically depending on demand. Colab prioritizes interactive use cases and they prohibit actions related to bulk computing. In fact, the documentation also issues the following warning:
Users who use Colab for long-running computations, or users who have recently used more resources in Colab, are more likely to run into usage limits and have their access to GPUs and TPUs temporarily restricted.
This is a huge drawback for programmers who are trying to do heavy computing in their notebooks. At ~$50/month I expected Colab Pro+ plan to offer some resource guarantee for power users.
Installing Packages
Google Colab comes pre-installed with many common libraries e.g. Plotly, Numpy, Scipy, Tensorflow, Matplotlib are all available by default. You can install other libraries by running !pip install <library>
in a code cell. These other libraries need to be re-installed manually for each new Colab session since the environment info is not saved anywhere.
For local Jupyter Notebooks, we can create an environment (virtualenv / conda) once & reuse it anytime we are working on the same project.
Version Control for Notebooks
Google Colab has a Revision history option to help with version control. Just click on File -> Revision history in Colab UI to see historical changes made to the notebook file. It will even show side-by-side diff between any two versions.
For Jupyter Notebook version control, your best bet is to use git based platforms (GitHub, GitLab, Bitbucket). There are JupyterLab extensions (jupyterlab-git & GitPlus) which make it easy to push commits and make pull requests directly from the JupyterLab UI. ReviewNB helps with notebook code reviews on GitHub & Bitbucket.
For more information about version controlling Jupyter Notebooks, refer to our previous post.
Key Takeaways
Here’s the summary of everything we discussed in this article,
- Colab is built as an online notebook service by Google on top of open source Jupyter project.
- There’s no resource guarantee in Colab even on Pro / Pro+ paid plans.
- Colab notebooks can run for a maximum of 12 hours. Idle sessions are terminated much before that.
- In Colab, environment info (installed packages, files etc.) is not saved anywhere. You have to set up environment again for each new session.
- Version control & commenting functionality is built-in with Colab. Whereas we’ve to use GitHub / Bitbucket along with ReviewNB to achieve the same with Jupyter Notebooks.