Goals

  • explore version control
  • learn to use Git
  • learn to use Gitlab

Version Control Systems

Up to this point in your software engineering career, many of your programming projects will have been small, individual endeavors. As software matures it often grows in complexity, as bugs are fixed, features are added, and the number of contributing developers increases. Managing this complexity becomes a mammoth task that crushes many fledgling startups. A similar fate is in store for projects that lack reliable version control (we received a bug report on version 1.1 … what WAS in version 1.1?). The ability to easily back out changes, or to accept only a subset of a group of changes, especially in a team environment, is also essential and it encourages experimentation. Another software development practice that version control aids is code reviews: proposed changes may be examined before being incorporated into the master source code repository. Tools like Git give programmers the ability to branch away from the main development track to try out a new feature or debug a quick fix, and later merge in the changes later or abandon them entirely. Finally, Version Control Systems also serve as a form of backup … just in case.

Some solo programmers might feel they don’t need version control. This is shortsighted … as the sole person doing the work, with no support or backup team, these folks need version control even more!

Version control systems (VCS) come in many flavors. You may have heard of some of the more popular, open-source incarnations, such as Concurrent Versions System (cvs), Mercurial (hg), or Subversion (svn). While these systems differ in implementation, they all provide methods to manage resources, such as tracking, reverting, and merging changes. Git is an open-source distributed version control system (DVCS).

Any text-based files can be managed by a VCS. Source code, configuration files, SQL scripts, test scripts, documentation written using LaTeX or Markdown, XML, web pages, and more assets are easily maintained in a VCS. (I store this website in Gitlab using git!) Files that are not human-readable can be problematic, but can sometimes be stored in a VCS, without the benefit of some of the text-based tools.

There are two primary groups of Version Control Systems:

  • centralized, such as Subversion and ClearCase, which generally require a network connection for operation, and
  • decentralized, such as Mercurial and git, which do not require a network connection. In fact, these systems always have a complete local version of the repository.

We will be using Git. The Git project was originally developed by (and somewhat named after) Linus Torvalds to manage the Linux kernel (15 million LOC). Many open-source and commercial projects currently rely on Git, including the Linux kernel, Google’s Android OS, and the MacOSX package manager Homebrew.

We will be using Git on the CS systems machines using a CS-managed Git server called Gitlab. We will not cover installation and setup of Git on your personal laptop or the use of third-party Git-related applications such as Github, BitBucket, or SourceTree. The enterprising student may explore these (and other advanced) topics on their own. You will find several free books and interactive tutorials on the resources page. I particularly recommend Pro Git by Scott Chacon and Ben Straub (available free online).

Why Version Control

Imagine the following scenario: You have a term paper to write. So, being a diligent student, you start early by writing a rough draft, saved as paper.doc. As you explore your topic, you think, “I like what I have, but what if I added…” You do not want to lose your current progress, so you copy your paper to a new file paper2.doc and begin exploring new ideas. As ideas come and go, you continue to replicate and rename your files. You reach the final week of classes with 5 different documents, each containing something you want to keep. You create a new file, named final.doc, that will contain all of the disparate pieces of thought linked together. Before the deadline, you realize you want to rework the intro, and thus create final2.doc. In the end, you submit something that looks like final2-good-final.doc. Now imagine you had to write this paper together with others?

Version control systems are designed to alleviate the hardships of managing resources, like source code files or term papers. When using a VCS, files are tracked so that any changes can be recorded. In the simple example above, you ended up with multiple files, some likely sharing large portions of the same text. In some VCS, storage space is saved by only saving changes, or “deltas”, to files. These changes can be logged and timestamped, so that going back to the version last Tuesday before 9 PM would be very easy. In the previous example, you wanted to combine parts of multiple files. This concept is known as merging. In addition to tracking files, and allowing easy reversion, VCS often have capabilities for easy merging.

When collaborating with multiple users, VCS come in two varieties: client-server and distributed. In the client-server model, the central repository is shared amongst all users. Here terms like “checkout” and “lock” come into play. When a user wants to edit a file, they checkout that file. This locks the file from being edited by any other users at the same time. Some VCS allow “stealing” of files for urgent or priority changes, but in general if a user has control, then no one else can have that file until they check it in. A common complaint with client-server systems is their being slow, because every operation must communicate with the central server and it is not generally safe for multiple users to modify the same files. CVS and SVN are examples of the client-server model. In distributed version-control systems, each user maintains their own local repository and changes are shared periodically amongst all other users. Merge conflicts can arise when multiple users modify the same file, and many VCS provide advanced features to simplify this process. Mercurial and Git are among the more popular distributed version control systems, both known for their speed and flexibility.

Git Configuration

Before you can start using Git, you will need to configure your development environment. Git stores environment settings in three different files:

  • /etc/gitconfig — contains settings for all users on a system
  • ~/.gitconfig — contains user-specific settings
  • project/.git/config — contains project-specific settings

Project settings override user settings, which in turn override system settings. There are many configuration options, but at a minimum you need to tell Git who you are (your name) and how to contact you (your email). Here I also set my preferred editor to vi.

[xia@flume ~]$ git config --global user.name "Xia Zhou"
[xia@flume ~]$ git config --global user.email "xia@cs.dartmouth.edu"
[xia@flume ~]$ git config --global core.editor vi
[xia@flume ~]$ git config --global color.ui true

You can view your current settings with the --list option. The last two options may be useful if you want certain commands to color their output.

[xia@flume ~]$ git config --list
user.name=Xia Zhou
user.email=xia@cs.dartmouth.edu
core.editor=vi
color.ui=true
[xia@flume ~]$ 

As you can see, the --global option writes your user specific settings to ~/.gitconfig.

[xia@flume ~]$ cat ~/.gitconfig
[user]
	name = Xia Zhou
	email = xia@cs.dartmouth.edu
[core]
	editor = vi
[color]
	ui = true
[xia@flume ~]$ 

A really useful self-documenting .gitconfig starting point may be found here

Getting Help

You can always get help:

[xia@flume ~]$  man git
[xia@flume ~]$  git help <command>

and there are many tutorials and references on the Internet.

Creating a repository

A repository, called repo for short, is a data structure that contains all of the information needed to manage a project. This often includes the project files and resources themselves, as well as any meta-data used by the VCS to manage them. It is very easy to create a Git repo with the git init command.

create a new (empty) local repository

You always need a local repository to work with git. When starting a new project from the beginning, you can either create a repo and fill it up or create a repo in an existing directory that already contains files.

[xia@flume ~]$ cd cs50/labs
[xia@flume ~/cs50/labs]$ mkdir labx
[xia@flume ~/cs50/labs]$ cd labx
[xia@flume ~/cs50/labs/labx]$ git init
Initialized empty Git repository in /net/nusers/xia/cs50/labs/labx/.git/
[xia@flume ~/cs50/labs/labx]$ ls -a
.  ..  .git
[xia@flume ~/cs50/labs/labx]$ 

create a repository in an existing development directory

Suppose you want to begin using a VCS after a project already had a lot of files in an existing directory tree.

[xia@flume ~]$ cd cs50/labs/tree6
[xia@flume ~/cs50/labs/tree6]$ git init
Initialized empty Git repository in /net/nusers/xia/cs50/labs/tree6/.git/
[xia@flume ~/cs50/labs/tree6]$ git add .
[xia@flume ~/cs50/labs/tree6]$ git status
On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

	new file:   .gitignore
	new file:   Makefile
	new file:   diff56
	new file:   tree.c
	new file:   tree.h
	new file:   treetest.c

[xia@flume ~/cs50/labs/tree6]$ git commit -m "Initial commit of tree code"
[master (root-commit) e098300] Initial commit of tree code
 6 files changed, 564 insertions(+)
 create mode 100644 .gitignore
 create mode 100644 Makefile
 create mode 100644 diff56
 create mode 100644 tree.c
 create mode 100644 tree.h
 create mode 100644 treetest.c
[xia@flume ~/cs50/labs/tree6]$ git status
On branch master
nothing to commit, working directory clean
[xia@flume ~/cs50/labs/tree6]$  

Files in git

git treats all content as being in one of three states:

  • ignored - git never even looks at it
  • untracked - git reports its presence but doesn’t track its changes
  • tracked - git tracks everything that happens to it

Content that is being tracked is always in one of four states:

  • changed
  • staged
  • committed
  • pushed

Workflow

git states

Git workflow figure from Jason Taylor Git Complete.

The general workflow in git is to (1) add/modify files, (2) stage the changes, (3) commit those staged changes to the repo, and sometimes (4) push the changes to a remote repository. Files that are newly created are referred to as “untracked” until they are added to Git. When files added to git are then changed they must be staged before they are committed. You add/modify files in your “working directory.” Git stages files by recording changes in a special file, often called the “index” or “staging area.” When you commit your changes the staged changes become permanently recorded in the Git directory.

Checking git status

At times during your development, it may be helpful to determine the status (untracked, modified, staged, committed, etc.) of the files within your directory and repo. As shown below, nothing has been added, staged, or committed to the repository. You can check the current state with the git status command.

[xia@flume ~/cs50/labs/tree6]$ git status
On branch master
nothing to commit, working directory clean
[xia@flume ~/cs50/labs/tree6]$ 

The status command will often give you hints; Watch for them - these tips can be very helpful.

Staging Files

To move files from your working directory to the staging area, you use the git add command. Below we create a README.md (like all good projects have) and stage it. The git status command shows the file has been staged, according to the “Changes to be committed” heading.

[xia@flume ~/cs50/labs/tree6]$ cat > README.md
# Binary-tree demo code
## Version 6: use function pointers and add tree_print()
 * demonstrates use of function pointers
 * adds tree_print() function
 * hints at how one might support tree_delete() function
[xia@flume ~/cs50/labs/tree6]$ ls
diff56  Makefile  README.md  tree.c  tree.h  treetest.c
[xia@flume ~/cs50/labs/tree6]$ git status
On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)

	README.md

nothing added to commit but untracked files present (use "git add" to track)
[xia@flume ~/cs50/labs/tree6]$ git add README.md
[xia@flume ~/cs50/labs/tree6]$ git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

	new file:   README.md

[xia@flume ~/cs50/labs/tree6]$ git commit -m "Added README"
[master eefec89] Added README
 1 file changed, 5 insertions(+)
 create mode 100644 README.md
[xia@flume ~/cs50/labs/tree6]$ 

Hint: If you have several new, modified, or deleted files to git add, you can use git add --all

Notice that I used README.md rather than README above. A file with extension .md is assumed to be a text file in Markdown syntax, which provides very simple (and readable) markup for headings, lists, italics, bold, and code snippets. (This course website is written in Markdown.) Many VCS web portals (like our Gitlab and the popular Github) allow you to browse the files in your repository and render Markdown format, making such files much nicer to look at. Markdown is easy to learn; see Markdown resources.

Ignoring Files

Git works well with normal files, but executables and binary-format files (like images) can present a challenge. In addition, there are often certain files that you do not want to be under version control, like temporary files from your favorite editor. You should configure Git to ignore those files, so they will not be added to your repo.

Do it now! Create a file ~/.gitignore_global with the following contents:

# Object files and libraries
*.o
*.a

# Emacs
*~
\#*\#
.\#*

# MacOS X
.DS_Store
.AppleDouble
.LSOverride
Icon
._*
.Spotlight-V*
.Trashes

Github maintains a list of common ignore files, if you want to get more ideas.

Then you should tell git about project-specific files that should be ignored. At a minimum, this list would include the ‘executable binary’ file that is produced as a result of compiling your program. As a rule of thumb, you should tell git to ignore anything that make builds and make clean deletes.

We’ll create a .gitignore file for our new repo. Below, I focus on the need to ignore my executable file treetest that builds with make.

[xia@flume ~/cs50/labs/tree6]$ echo treetest > .gitignore
[xia@flume ~/cs50/labs/tree6]$ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   .gitignore

no changes added to commit (use "git add" and/or "git commit -a")
[xia@flume ~/cs50/labs/tree6]$ git add .gitignore
[xia@flume ~/cs50/labs/tree6]$ git commit -m "ignore executable"
[master 8c9e44e] ignore executable
 1 file changed, 1 deletion(-)
[xia@flume ~/cs50/labs/tree6]$ 

I didn’t have to add and commit the .gitignore file for Git to make use of it; I just didn’t want to forget to put it into the repo.

Now let’s make my executable and see whether Git ignores it. [Not shown: I first made a small change to treetest.c].

[xia@flume ~/cs50/labs/tree6]$ make
gcc -Wall -pedantic -std=c11 -ggdb   -c -o treetest.o treetest.c
gcc -Wall -pedantic -std=c11 -ggdb   -c -o tree.o tree.c
gcc -Wall -pedantic -std=c11 -ggdb treetest.o tree.o  -o treetest
[xia@flume ~/cs50/labs/tree6]$ ls
diff56  Makefile  README.md  tree.c  tree.h  tree.o  treetest  treetest.c  treetest.o
[xia@flume ~/cs50/labs/tree6]$ git status
On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)

	tree.o
	treetest.o

nothing added to commit but untracked files present (use "git add" to track)

Uh-oh. It looks like Git properly ignored treetest but did not ignore the .o files. Let’s do the following:

[xia@flume ~/cs50/labs/tree6]$ cat >> .gitignore
*.o
[xia@flume ~/cs50/labs/tree6]$ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   .gitignore

no changes added to commit (use "git add" and/or "git commit -a")

It worked; now I will commit all modified files with commit -a:

[xia@flume ~/cs50/labs/tree6]$ git commit -a -m "tweak the test script"
[master d97bebe] tweak the test script
 1 file changed, 1 insertion(+)
[xia@flume ~/cs50/labs/tree6]$ 

Removing Files

The git rm command is the command to remove files from the repository. For example, you may rearrange your code and some source files are no longer needed; instead of rm you use git rm to remove the file from both your working directory and from the repository. (Prior versions of the file(s) are still in the repository, so you can always get the file back.)

As git-status output sometimes suggests, if you want to remove something from the staging area you use the git rm --cached <file> command to remove staged files. This command doesn’t remove the file from your directory, or from the repository - it’s just the opposite of git add.

Committing Changes

When you have made some changes and staged them, then you are ready to commit those changes (roughly) permanently. Each commit should be accompanied by a message explaining what changes have been made. The command to accomplish this is git commit -m "<message>", as we saw in the examples above.

If you run the git commit command without the -m option, you will be taken to your default editor to enter a commit message. If you commit too early and want to add something to the same commit, then you can use the git commit --amend command.

The commit message is very important, both to you and to your colleagues working with you. Your commits should be fairly granular with a straightforward present tense commit message. We encourage the use of present tense since you should think of the commit message as describing what the commit does. The commit messages become even more important as you work with a team, particularly when the members are geographically distributed.

Best Practice: Make atomic changes

Git doesn’t care why files are changing. It just tracks the content as it changes, allowing you to choose how to use it the best way. Git actions are atomic, and your changes should be also. If you move a function from one file to another that could be two commits: one to delete it in file A, and another to add it to file B. This is ok with Git, but between the two commits the function is missing entirely, so a later build based on that the code base at the first commit would not work. Instead, commit the new version of both files in the same commit.

Similarly, if you need to make several unrelated changes to various files in your project, run git commit separately for each set of changes, each commit with a different message, so that the message is relevant to the files being committed.

git diff

How can you tell what has changed since your last commit? With git diff! Below, I edit the README.md file and then use git diff to show me what changed. Just like diff, it prints a little bit of context and then uses lines beginning with + and - to show added and removed lines, respectively.

[xia@flume ~/cs50/labs/tree6]$ vi README.md
[xia@flume ~/cs50/labs/tree6]$ git diff
diff --git a/README.md b/README.mdESC[m
ESC[1mindex 18a7d0c..4c54f2d 100644ESC[m
ESC[1m--- a/README.mdESC[m
ESC[1m+++ b/README.mdESC[m
ESC[36m@@ -3,3 +3,4 @@ESC[m
  * demonstrates use of function pointersESC[m
  * adds tree_print() functionESC[m
  * hints at how one might support tree_delete() functionESC[m
ESC[32m+ESC[mESC[32m * added a test line

You can also add a specific filename, like git diff Makefile if you just want to see differences for one file rather than all files in the repo.

If you find the output of git diff mangled, it could be because of an incompatibility between git, the ‘pager’ program (less), the ssh connection, and your terminal program.

I worked around it by telling git to use cat as my pager:

	git config --global core.pager cat

git tags

Beginners can ignore this feature for now.

Once you’ve achieved some milestone, such as a deliverable or release, you need some mechanism for marking in your get log history. If you do a good log you’ll see the hashers on the left side that are used to identify each commit. Clearly, it would be unreasonable to expect everyone to remember that 5ba4851 was release 1.0. for this purpose you use the git tag command. This places a marker in the get log with the name you specify. You should try to use this command only when you’re in a committed, up-to-date state. Later, when you need to reconstruct a particular release, you can use this tag in the git checkout command.

The example above is for a “lightweight” tag - one that is just a name and a location in the git history. Another is the “annotated tag” just like the commit command does. These are generally used to note major releases since the annotation can include the release name, release notes, and other information related to the release.

For a nice overview of how to use this feature, please see the “Git Tag Extra” that we’ve added.

git stashes

Beginners can ignore this feature for now.

It’s a common occurrence to be happily working on the next release of the code when a high-priority critical situation comes up with an important customer. The boss wants you to stop what you’re doing and fix the problem. You’re anxious to fix it, but what about all the work you have in your working directory? Where should you save it so that you can do a git checkout of the code that corresponds to the release the customer is running? The answer is to do a git stash, which will save everything that isn’t committed, and reset the directory to the most recent committed version.

Once the high priority work is completed, you can restore what you had with git stash apply and your previous work will reappear. Once you’re finished with a stash, you should delete it using git stash drop which will delete the last stash. You may have multiple stashes saved (see them using the git stash list command), but do not use the stash for version control or backup!

Rolling back to a previous commit

Beginners should ignore this feature for now.

Thanks to Travis Peters for this tip.

To “roll back” your repository to a specified state (i.e., a particular commit) you’ll need to use the git reset command.

WARNING: Please exercise caution when running these commands as this is not something you can undo later if you don’t follow these instructions! This tutorial assumes you are currently on the master branch.

First, we want to ensure that we can come back to this state of our repository at a later point should we need to do so. A commit and its predecessors are always accessible so long as we have a pointer to the most recent commit in that “branch”. So, we first create a branch that points to the most recent commit in our current branch (ideally you are on the “master” branch right now):

$ git branch myoldhead

Next, we use the git reset command which allows you to reset your current HEAD to some specified state. According to the man page (git help reset), the --hard flag does the following: > Resets the index and working tree. Any changes to tracked files in the working tree since are discarded.

git reset --hard <tag/branch/commit hash-id>

Running this command effectively sets your HEAD back to the tag/branch/commit that you specify and completely clears out your working directory and index (i.e., staging area).

Finally, if you wish to commit this state of your local repository so that your remote repository is synchronized (i.e., the remote repository also points back to the commit that you just reset your local repository back to), you need to git push with the -f flag which will forcefully overwrite your remote repository (without the -f flag, git will complain about your current branch being “behind”; this is expected in our case).

git push -f

A Couple of Notes:

  • If you decide that the myoldhead branch is no longer needed and you really don’t want to keep the changes, you can delete the branch (git branch -D) and all of the commits that go with it by running the following command:
git branch -D myoldhead
  • When referencing commits, such as in the git reset command above, you don’t have to copy/paste (or write out!) the whole commit. Large (cryptographic) hashes are used by git for data integrity purposes (the details of which beyond the scope of these notes). What we are concerned with, however, is uniqueness of these commits. While not guaranteed, it is extremely unlikely that the (SHA-1) hashes that are generated for different commits will be the same. Thus, when referring to commits, we need only tell git enough of the commit hash that it can be confident that you have uniquely identified a specific commit. In practice, you’ll see people only use the first 7-10 characters of a git hash to uniquely identify some tag/branch/commit.

Other handy git features

Git has many, many more features than we have detailed here. We have only scratched the surface with the commands necessary to work collaboratively on your class project. The interested student is encouraged to explore the provided resources. Keywords/commands to investigate: show, diff, branch, and rebase. For example, here’s a git branch/merge example, from the folks who brought you gitgraph.js: PDF.

CS Gitlab Server

The CS GitLab server is hosted at: https://gitlab.cs.dartmouth.edu/. We’re using GitLab (instead of public servers elsewhere on the Internet) because it allows you to maintain your own private repositories and because it provides cool tools. For example, GitLab includes a graphical, web-based way of exploring your repositories as shown below.

You must register for a userid and then create a repository for Lab3 using the instructions that follow.

To get started with GitLab, follow these steps:

  1. Use a browser to go to https://gitlab.cs.dartmouth.edu/
  2. If you don’t already have a GitLab userid, create one that is identical to your CS login name.
  3. When the GitLab userid is created, go back and login. Here is the opening screen when I log in.
  4. Click on the green “New Project” button.
  5. Fill out the form by giving the project a name (required) and description (optional). Ensure the project is marked Private.
  6. Now the project has been successfully created and appears like this:
  7. Click on the SSH pop-up to choose using Git with HTTPS. (You can stick with SSH to avoid typing password every time. Check this page on setting up and copying your ssh key.)
  8. Scroll down to see the command-line instructions for using this project:
  9. Let’s assume you had already created a local git repository using the lecture notes above. Look at the instructions under Existing folder or Git repository. Copy the git remote add origin command you see there.
  10. Go to your Terminal window, and cd to the directory where you had earlier set up a local repository. Paste the git remote add origin command. This command ties your local repository to the new ‘remote’ you created on Gitlab, giving this relationship the name ‘origin’.
  11. Then copy and paste the command git push -u origin master. This pushes your local repository’s ‘master’ branch to the remote known as ‘origin’.
    1. Enter your Gitlab username and password.
  12. To ease future pushes, enter the command git config --global push.default simple.
  13. Back in the Gitlab browser window, let’s explore the files now there. Click on the little ‘home’ icon on the left side, then on the little ‘files’ icon. You should see the files of your new project. Notice, below, that it renders my README.md file nicely.

Suppose the first thing you want to do is update the README.md file. So you edit it, git add, and git commit it. Now a git status shows you’re ready to push to the remote.

[xia@flume ~/cs50/labs/tree6]$ cat >> README.md
# This code is now on Gitlab!
[xia@flume ~/cs50/labs/tree6]$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   README.md

no changes added to commit (use "git add" and/or "git commit -a")
[xia@flume ~/cs50/labs/tree6]$ git add README.md 
[xia@flume ~/cs50/labs/tree6]$ git commit -m "updated README"
[master c5ae325] updated README
 1 file changed, 2 insertions(+)
[xia@flume ~/cs50/labs/tree6]$ git push
Username for 'https://gitlab.cs.dartmouth.edu': xia
Password for 'https://xia@gitlab.cs.dartmouth.edu': 
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 338 bytes | 0 bytes/s, done.
Total 3 (delta 2), reused 0 (delta 0)
To https://gitlab.cs.dartmouth.edu/xia/tree.git
   d97bebe..c5ae325  master -> master
[xia@flume ~/cs50/labs/tree6]$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working directory clean
[xia@flume ~/cs50/labs/tree6]$ 

Jump back to the browser and hit ‘refresh’ to see the new README. Cool!

Why is Git always asking for my password?

This page answers that question, and provides some suggestions about how you can use SSH instead of HTTPS, or (on MacOSX) a password manager. The former method is a little tricky but worth some effort if you have time, and the latter method will only work for your MacOS laptop.

Github also provides some info and instructions about SSH vs HTTPS.

Activity

In today’s activity, each group sets up a shared repo on Gitlab.