Open In Colab

colab

6. Introduction to Git#

Let’s learn how to use git to collaboratively code. We saw in the lectures what the main commands are and how they are used. But let’s see if we can solidify these concepts by applying them to a list project! This project just lists different data science resources out there, and will be a way for us to try and collaboratively edit some shared documents using git.


Hopefully some of the lists might even be useful!


You can see the project we will edit here: tbonne/DS_lists

To edit this project we will follow the following steps:

    1. Create a copy of the repository on your GitHub

    1. Make sure github recognizes you!

    1. Create a branch for you to work on

    1. Edit/add one thing to a file (use a random number to choose the file!)

    1. Push your edits to the branch

    1. Submit a pull request

    1. Instructor/TA approoves the pull request

Let’s go through each step in more detail!

6.1. Fork the DS_list repository#

Because the DS_list repository is public, you can see all the lists, and download it. But to edit it, without being an explicit colaborator, you’ll need to create your own copy.

To do this go to github and fork a copy of the DS_list project. The fork button is on the top right.

6.2. Make sure github recognizes you in colab#

We have to let GitHub know who is working on this colab session. That way when we make changes and push them to the repository it knows who did it.

So first we have to get an Access Token from your github account. To do so:

  • login to you github account at github.com

  • Click on your profile picture (top right), then on ‘settings’, then ‘developer settings’ (bottom left).

  • Click on ‘personal access tokens’, then on ‘tokens classic’, then on ‘generate new token.’

  • Give it a name (e.g., colab_project), experiation days 7.

  • For select scope, we can just click ‘repo’ that should be enough for this project.

Now that we have the token, copy it and use it after running the code below to authenticate yourself.

import os, getpass

#Enter your github user name here
os.environ["GITHUB_USER"] = 'tbonne'

# Optional: enter once per runtime
os.environ["GITHUB_TOKEN"] = getpass.getpass("Token: ")

We can then set our name and email. This will be useful when seeing who did what in the code.

# Git identity (for commits)
!git config --global user.name "First Last"
!git config --global user.email "student@example.com"

Ok now we are ready to start editing the project!

6.3. Copy the repository to colab#

Let’s first clone the project to colab.

!git clone https://github.com/tbonne/DS_lists.git

You should now see the repository in the files folder on the top left.

For the next few commands lets make sure we are in the right directory, so let’s change directories (cd) into the DS_lists directory.

%cd /content/DS_lists

Now let’s create a branch for you to work on.

!git switch -c docs/updatelist-LASTNAME

You’ve created the branch locally, now let’s push the branch online.

First we have to setup where to send the new branch: i.e., the origin

#set the origin to the DS_lists repro
!git remote set-url origin "https://${GITHUB_TOKEN}@github.com/${GITHUB_USER}/DS_lists.git"

Once the origin is setup we then need to push the new branch.

!git push origin docs/updatelist-LASTNAME

Ok, we are now ready to make edits!

6.4. Edit a data science list#

Now that you have cloned the repository, and created a branch, open up your chosen list and make an addition or edit.

Use the random number to choose the file you will edit! In the lists folder of the repository you will see the files are numbered.

import numpy as np

np.random.randint(1,10)

Note: to open a list, goto the files folder on the left, and open up the DS_lists/lists folder. You should see your list in there. Click on it and edit it. Then save the file.

6.5. Add and commit your edits locally#

Ok, now that you’ve edited your group list, let’s save these changes.

Let’s learn how to use add to register what you changed and commit to update your local copy of the reprository with these changes.

Before we use these commands let’s look at what git is doing at the momment.

Let’s use git status to get a summary of where git is at the moment.

!git status

Nothing should be added, but git should show that you are on the branch you just created.

We can even look at the current logs to see what’s been done. Not much yet!

!git log --oneline

Above you should see a line for each action git has taken. Right now there’s not much, but you should see where you created your branch.

Ok, now that we checked the status of git, lets add and commit your changes.

#Add all changes
!git add --all

We could use:

  • git add filename.txt to say add a specific file

  • git add -all to add anything that has updated in this folder and any sub-folders

Once we’ve added all the changes let’s commit them to our local repository here on colab.

#Commit the changes
!git commit -m "Some short description of what you did..."

Each time you add and commit the version is save locally.

Let’s see how the logs have changed.

!git log --oneline

You should see an extra line showing your recent commmit with your message. The alphanumeric strings at the start of the lines are your commits. You can always go back in time to a new commit if something goes wrong.

6.6. Push your local changes to your forked repository online#

Now that you’ve edited a list, let’s upload that to github. To do that we’ll firt do a push to send these changes to your forked repository.

!git push origin docs/updatelist-LASTNAME

At this stage, if you are collaborating with others, sometimes you’ll find some conflicts here! For example, someone might have altered the same list as you. In these cases you’ll have to do some work to merge the two changes. Sometimes git will be able to automatically merge the differences, but sometimes it will simply mark where the conflicts are and get you to choose what to keep.

6.7. Start a pull request#

Now that you’ve edited a list, and updated your forked copy of the repository, it’s time to send these updates back to the original repository.

To do this we’ll have to start a pull request.

This can be done on GitHub. If you go to your GitHub page and open your forked repository, you should see yellow banner at the top asking if you want to “compare and pull request”.

When you click on that you just want to make sure:

  • the ‘base fork’ points to the original repository (e.g., tbonne/DS_list)

  • the ‘base’ points to the master branch

  • the ‘head fork’ points to your forked copy of the repository

  • the ‘compare branch’ points to your branch

For the most part these values will be set correctly automatically.

You can add a little note about what you’re updating.

Then you can click “create pull request”.

An instructor or TA will now be able to check your pull request and decide if the addition is something that should be added!

6.8. Learning more!#

If you would like to go deeper into git here is a great series of videos that will expand on what we covered here:

A good step that can take you a little further would be to try and setup git on your computer and use VSCode.

6.9. Notes:#

  • !: This prefix in Colab allows you to execute shell commands directly.

Some other useful git commands:

View logs (what’s been done): git log --oneline

List all branches: git branch -a

Delete a branch: git branch -d theBranchToRemove