Import and Export Files to and from GitHub via API

Henry Alpert
Plumbers Of Data Science
4 min readJan 12, 2023

--

Photo by Roman Synkevych 🇺🇦 on Unsplash

GitHub is typically used as a repository for code, but GitHub can also be used as a locker for a project’s assets and files.

To upload a file on the GitHub website, the site makes it straightforward with an “Add File” button at the top right of every directory. Downloading isn’t as simple, however. You either have to view the raw file in a browser window and then do a “Save As”, or you need to download the entire repo as a zip file, unzip it, and then locate the file you want.

Regardless, data engineers and software developers sometimes need to access these assets remotely without visiting the GitHub website at all, and they need to automate the upload and download of a project’s assets. For these tasks, we can use GitHub’s APIs.

Create a Python Script to Upload a File to GitHub

Suppose I have a dataset as a csv file in a local folder, and I want to upload it to a GitHub repo. It only takes a relatively short block of code.

For the following scripts, I have PyGithub installed on my machine, which allows for easy interaction with the GitHub API.

This script uploads a csv file to a repo.

Uploading a text-based file

Here’s a general overview of the steps:

  • Create a GitHub instance. In the scripts in this post, I’m authenticating my account with a token. (To generate a token, in your GitHub account go to settingsdeveloper settings, or go to https://github.com/settings/apps when logged in. On the left sidebar, there’s a dropdown called “Personal access tokens.”)
  • Assign the repo you wish to access to a variable in the format username/repo-name.
  • Use a with statement with open to read the file contents and assign the contents to a variable.
  • Send the contents with the create_file command.

With the create_file command, you should have:

  • the directory path in your GitHub repo with the file’s desired filename
  • a commit message
  • the contents to write to the file
  • the branch name (optional), which will default to ‘main’ if not included

After running this script, go to the repo in a browser, and you’ll see the new file there.

Use Variables to Assist Automation

If you’re using the API to upload multiple files to different repos owned by different users, it likely won’t be practical to hardcode the information into the script and rewrite it every time. Instead, you can create the skeleton of a script and pass in the details as variables. A text file or Python’s sys library are two ways to achieve this.

Pass in Variables with a Text File

Create a text file with the unique information. In this example, I’m using a file called vars.txt, and I’m including:

  1. the token
  2. GitHub repo to which to upload
  3. the path to the local file
  4. the destination directory and filename on the GitHub repo
  5. a commit message

Then, have the Python script read this text file, assign each line to a variable, and use the variables in the appropriate places.

Pass in Variables with the sys Module

If you would rather not create text files, Python’s sys module allows you to pass in variables directly from the command line.

Suppose my script file is called ghAPI_script.py, I can call the script in the command line and tack on strings at the end:

In the script itself, I include import sys and assign variables to each string using sys.argv[i] where i refers to the order in which the objects are supplied. (Note that sys.argv[0] is skipped.)

…and the file is uploaded.

Download a File from GitHub Using a Python Script

Let’s now go in the reverse direction and download a file.

Again, start by creating a GitHub instance, and set the relevant repo to a variable. Then, use the get_contents() method to grab all the file’s information from that GitHub repo and assign it to a variable.

This call creates a PyGithub ContentFile object, but the content is not in the desired form yet, because it is still encoded in bytes. You will need to decode it with the decoded_content attribute, and then the contents can be written to a new file.

Downloading a text-based file

Upload and Download Binary Files to GitHub

The above scripts work if I’m using the API to send and receive files that contain only text, such as csv files, markdown files, or files of code. But what if I want to send or receive a zip file or a media file like an image, a video, or a song?

Binary files like these require an extra step or two. Basically, they need to be encoded to or decoded from the Base64 format.

When uploading, include import base64 at the top of the script and then convert the contents into Base64 before doing the API call. As an example, this script uploads an image called “image.jpg”.

Uploading an Image

And when downloading from GitHub, again use import base64. The file will arrive already encoded in Base64, so it takes another step or two to get it into an actual image file.

Downloading an Image

Summary

This Medium post explains how to upload and download files, whether text-based or binary, to a GitHub repo via APIs.

--

--