How to load kaggle dataset in your Colab/jupyter notebook without downloading

Kaggle is a great source of data, you can get any kind of dataset for your practice, but it could be a hassle for someone to download the kaggle dataset, place it in your current working directory and extract it. So, today I will be guiding you through the process of downloading any Kaggle dataset, right through your Jupyter or Colab notebook.
Read more-> what is the difference between Colab and Jupyter and which is best for you?

we will first setup Colab then we will see about jupyter notebook, to download any Kaggle dataset you must have a Kaggle account,

Get Kaggle API Token

The first step is to download your “API token”, which you can do by visiting your Kaggle account setting, there you will see a section called ‘API’.

Generating Kaggle API key

Now there will be two options/buttons ‘Create New API Token’ and ‘Expire API Token’. The second one is for deleting your previous API token if you already have created any, if you have then click the second option and after that, the first one, or if you haven’t you can directly click on the 1st button which will download a file called ‘Kaggle.json’. Which will contain your username and API token.

{"username":"Your username","key":"2f4997fa1d8e4f56ad8eb7659aaf1c31"}

Setup Colab

After you get your token the next step would be to visit Colab and login through your Google account, if you don’t have pls create one. And the next steps are as follows.

Install Kaggle

!pip install kaggle

Upload downloaded JSON file

from google.colab import file
files.upload()

Now, we have to move the kaggle.json file to the .kaggle folder in the home directory so that kaggle can easily find it (json file which contains your credentials) when we made any request

#Make a directory named kaggle and copy the kaggle.json file there.
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
# change the permission of the file
!chmod 600 ~/.kaggle/kaggle.json

Get Kaggle dataset API command

Now, everything is done, we just need to get the API command for the dataset you want to download. So, for that, you have to open the kaggle and you have to find the dataset you want to work with(Currently, I am using the Netflix dataset for the demonstration purpose) after that click on the three dots which you will find on the right side of the ‘New Notebook’ option, inside there you will see an option called ‘copy API command’, click on that and now you have it. After getting that run this command

Kaggle dataset api command, how-to-import-kaggle-dataset-into-google-colab-or-jupyter-notebook-without-downloading-it

# !your dataset api command
!kaggle datasets download -d shivamb/netflix-shows #here I am using neftlix dataset api

This will start downloading the dataset which will be in a .zip file. So, let’s extract the file by the following command

from zipfile import ZipFile
file_name = 'netflix-shows.zip' #the file is your dataset exact name
with ZipFile(file_name, 'r') as zip:
  zip.extractall()
  print('Done')

And that’s it, you can now start working on it by reading it through pandas

 import pandas as pd
data = pd.read_csv('netflix_titles.csv')

Jupyter notebook

Every step is the same in the jupyter notebook as well, just the difference is the way of execution of the process. We will be working in Command prompt so, open your command prompt and follow the steps

Install kaggle

pip install kaggle

Now, we have to make .kaggle folder in the same directory where your python, jupyter notebook is installed usually it’s your home directory. So, if you are not in your home directory by default then you can change it through ‘cd’ command followed by your home directory

cd C:\users\buggyprogrammer

Create .kaggle folder

mkdir .kaggle

Now go back to the download folder or where the JSON file is download and then move it to .kaggle folder, you can do it manually by simply copying the folder from the file explorer then pasting it into your .kaggle folder or you can do it by this command

Download directory

cd C:\users\buggyprogrammer\downloads

Move json folder

move kaggle.json C:\users\buggyprogrammer\.kaggle

Now we are all set, now go to your jupyter notebook/lab and follow this step to download the dataset

import kaggle
!kaggle datasets download -d shivamb/netflix-shows

Extract the folder

from zipfile import ZipFile
file_name = 'netflix-shows.zip' #the file is your dataset exact name
with ZipFile(file_name, 'r') as zip:
  zip.extractall()
  print('Done')

And that’s it, you can now start working on it by reading it through pandas

import pandas as pd
data = pd.read_csv('netflix_titles.csv')

Congrats, 😀🎉

Aman Kumar

Data Scientist with 3+ years of experience in building data-intensive applications in diverse industries. Proficient in predictive modeling, computer vision, natural language processing, data visualization etc. Aside from being a data scientist, I am also a blogger and photographer.