name: inverse layout: true class: center, middle, inverse --- # Cardinal git ## Jonas Juselius <jonas.juselius@uit.no> --- layout: false # git happens * git is not a better cvs! * do not try to use git like cvs! * git is hard! * git is not hard enough! * git is simpler than cvs! * git is to cvs like wordpad to vim! --- ## cvs * cvs is not a proper version control system * have you ever: * locally backed up a repo? * had multiple active repos (for each branch)? * committed only when a feature is ready and tested? (i.e. once in a week to a month) * cvs is not helping us * cvs is a publication tool, not a scm --- ## git * git has two distinct modes of operation: * scm * communication * the scm is feature rich and powerful, and takes time to master * the communication part is small and simple * to effectively use the scm we must understand the underlying machinery * treat the communication part like any serious publication activity: 1. edit the raw material to tell a story 2. double check that it makes sense 3. press send --- # snap * let's roll our own scm using only standard unix tools * deltas and revisions are hard: * make a snapshot/backup manager instead --- ## snapshots * make copies of the source tree every time you feel the need ```bash #!/bin/bash shopt -s extglob [ ! -d .snap ] && exit 1 n=$((`ls -1 .snap/snapshots | tail -1` + 1)) mkdir .snap/snapshots/$n cp -a ./!(.snap|.|..) .snap/snapshots/$n ``` --- ## messages * this quickly becomes unmanageable * it's impossible to remember what each snapshot was about * let's add a file called ``message`` to each snapshot to record: * a message of what has changed since the previous snapshot * the time and date * the author --- ```bash #!/bin/bash shopt -s extglob [ ! -d .snap ] && exit 1 n=$((`ls -1 .snap/snapshots | tail -1` + 1)) mkdir .snap/snapshots/$n cp -a ./!(.snap|.|..) .snap/snapshots/$n echo -n "message: "; read msg cat << EOF > .snap/snapshots/$n/message author: $USER <$USER@`hostname -f`> date: `date` message: $msg EOF ``` --- ## the branching problem 1. at snapshot 100 you release v1.0 to clients 2. you create 10 new snapshots, increasing the value of the code by 10,000$ 3. a client reports a problem in v1.0 4. you go back to snapshot 100 and fix the problem, producing a new snapshot 5. what should the snapshot be called? it's not a linear development anymore --- ## sha to the resuce * back to the drawing board: * let's identify each snapshot by the sha1 of the message file * since development is non-linear we also need to add the sha1 of the parent commit to the message * now the sha1 of the message file uniquely identifies not only the snapshot, but it's whole history! * then we rename the snapshots to the sha1 of the message file (and put them in .snap/snapshots --- ## branches * how do we find the heads of out branches? $ mkdir .snap/branches echo f3430024ae... > .snap/branches/master echo ae045feed4... > .snap/branches/bugfix echo master > .snap/HEAD ... * every time we make a new snapshot, we update the corresponding branch file * we can also make a directory called ``tags`` which contain files with the sha1 of particular snapshots we want to remember (e.g. v1.0) * branches are cheap, it's just a number in a file ```bash #!/bin/bash [ ! -d .snap ] && exit 1 [ -d .snap/branches/$1 ] && exit 1 head=`cat .snap/HEAD` echo "`cat .snap/branches/$head`" > .snap/branches/$1 ``` --- ### make sha1 snapshot ```bash #!/bin/bash [ ! -d .snap ] && exit 1 echo -n "message: "; read msg branch=`cat .snap/HEAD` cat << EOF0 > .message parent: `cat .snap/branches/$branch` author: $USER <$USER@`hostname -f`> date: `date` message: $msg EOF0 [ x$1 != x ] && echo "parent: `cat .snap/branches/$1`" >> .message sha1=`sha1sum .message | cut -d ' ' -f1` mkdir .snap/snapshots/$sha1 cp -a * .snap/snapshots/$sha1 mv .message .snap/snapshots/$sha1/message echo $sha1 > .snap/branches/$branch ``` --- ## changing branches * changing branches is easy: * remove all project files * copy the new branch snapshot to the project directory ```bash #!/bin/bash shopt -s extglob [ ! -d .snap ] && exit 1 [ ! -e .snap/branches/$1 ] && exit 1 rm -rf ./!(.snap|.|..) sha1=`cat .snap/branches/$1` cp -a .snap/snapshots/$sha1/!(message|.|..) . echo $1 >.snap/HEAD ``` --- ## merging * merging two snapshots is a breeze * note: * merged snapshots have more than one parent * merges can fail! * focus has shifted from simple backups to snapshots and their relationships * behold the dag! ```bash #!/bin/bash shopt -s extglob [ ! -d .snap ] && exit 1 [ ! -e .snap/branches/$1 ] && exit 1 head=`cat .snap/HEAD` mkdir -p /tmp/snap.$$/a /tmp/snap.$$/b cp -a .snap/snapshots/`cat .snap/branches/$head`/!(.|..) /tmp/snap.$$/a cp -a .snap/snapshots/`cat .snap/branches/$1`/!(.|..) /tmp/snap.$$/b rm /tmp/snap.$$/a/message /tmp/snap.$$/b/message diff -uN /tmp/snap.$$/a /tmp/snap.$$/b | patch .snap/bin/make_snapshot.sh $branch rm -rf /tmp/snap.$$ ``` --- ## sharing is caring * we want to be able to work on multiple machines (e.g. laptop) * simply copy the whole project, ``.snap`` and all to the laptop (i.e. remote machine) * on the laptop, copy the branch files to ``.snap/branches/desktop/`` * to get changes back: 1. copy the new snapshots back (using branches/desktop on the laptop) 2. copy the branch files to ``.snap/branches/laptop/`` 3. merge snapshots --- ## cloning ```bash #!/bin/bash shopt -s extglob scp -r $1 $2 cd $2/.snap/branches mkdir origin mv ./!(origin) origin cp origin/master . cd .. echo "master" > HEAD ``` --- ## fetching remote snapshots ```bash #!/bin/bash [ ! -d .snap ] && exit 1 [ ! -e .snap/branches/origin/$2 ] && exit 1 scp $1/.snap/branches/$2 .snap/branches/origin/$2 sha1=`cat .snap/branches/origin/$2` while true; do [ -e .snap/snapshots/$sha1 ] && break echo $sha1 scp -r $1/.snap/snapshots/$sha1 .snap/snapshots/ sha1=`cat .snap/snapshots/$sha1/message | sed -n 's/parent: //p'` [ x$sha1 = x ] && break done ``` --- ## initialization is a snap ```bash #!/bin/bash [ -d .snap ] && exit 1 script=`readlink -f $0` snapdir=`dirname $script` mkdir .snap mkdir .snap/snapshots mkdir .snap/bin mkdir .snap/branches mkdir .snap/tags cp $snapdir/*.sh .snap/bin/ echo "master" > .snap/HEAD echo "0" > .snap/branches/master ``` --- ## optional optimizations * saving complete snapshots is both inefficient and wasteful * we can use sha1 to alleviate the problem 1. at a leaf, record the file name, permission and sha1 of all files in a file called ``tree``: 100644 blob f74993... foo.c 100644 blob 5dd4e1... bar.c 100644 blob eef67a... CMakeLists.txt 2. rename the files (including ``tree``) to their sha1 and move them to ``snapshots`` 3. go one level up, and repeat. compute the sha1 of all tree files and add them to the current tree file with the permissions and name of the corresponding directory: 100644 blob c4g509... CMakeLists.txt 100644 blob 94e477... README.md 100644 tree 77394a... src 4. goto 2 --- ## optional optimizations (contd.) 5. when the toplevel ``tree`` file has been moved, add the sha1 to the ``message`` file,: ``` tree: ee4c33... ``` 6. compute the sha1 of the message file, move it to ``snapshots``, and put the sha1 in the branch file 6. behold the dag! 7. compress all new objects * now the sha1 of every commit is not only dependent on it's entire history, it's dependent on the entire history of each and every file! * if a single bit changes anywhere in the history, the sha1 will not match anymore and we get an error --- ## snap vs. git * what does snap have to do with git? ```shell $ mv .snap/snapshots .snap/objects $ mkdir .snap/refs $ mv .snap/branches .snap/refs/heads $ mv .snap/tags .snap/refs/tags $ mv .snap .git ``` * that's essentially the core git * the rest is user interface and plumbing * (plus some optimizations) --- ## staging * we often want to split the current changes into multiple commits * git uses a staging area (called ``index`` [sic]) to prepare commits: 1. ``git add`` copies new or modified files to the staging area 2. ``git commit`` creates the actual commit (snapshot) and resets the index * many commands (e.g. ``git diff, git status...``) utilize the index --- ## conflicts * sometimes merges can result in conflicts, when files have changes at the same locations * when a conflict occurs: * merged, unconflicted files are added to the index * unmerged, conflicted files are left in place, with conflict markers added * resolving conflicts: 1. edit the conflicting files, fix the code and remove the conflict markers 2. ``git add`` the conflicting files 3. ``git commit`` without editing the commit message --- ## fast forward * sometimes the originating branch has not changed since the branching point * in such cases we only need to update the branch head to make a merge * this is known as a fast-forward merge, since no actual merging is needed --- ## push * when we fetch and merge changes from a remote repository, we risk conflicts which must be resolved by hand * in the opposite case, when pushing commits to a remote, nobody is there to resolve conflicts * a push must always result in a fast-forward merge in the remote * this is easily achieved by a fetch and merge before pushing --- ## rebasing * rebasing is an alternative to merging, and can result in a cleaner/nicer commit history * merges: * result in a commit with two parents * tell what and how things **actually** happened * rebase: * rewrites commits, as if they had happened on a different branch * tells a developers fairy tale * warning! never, ever, never rebase commits which have already been pushed to a shared repository! this will mess up history for everybody! --- ## good to know * if you forget to add a file to a commit, or you find a typo in the commit message: ``git commit --ammend`` (never do this after a push!) * you push a bad commit: ``git revert`` * you want to throw away all changes and start over: ``git reset --hard HEAD`` * you want to unstage a file: ``git reset {file}`` (unstage all ``git reset HEAD``) * you realize you should have branched 3 commits earlier: 1. ``git branch mybranch`` 2. ``git reset --hard HEAD~3`` 3. ``git checkout mybranch`` * commits can be rewritten, edited, split, deleted, squashed: ``git rebase -i`` --- ## my precious ```shell $ git status $ git log $ git log --stat $ git log --graph --abbrev-commit --oneline --decorate --all $ git diff $ git grep $ for i in `git grep -l ...`; do sed -i 's/stuff/newstuff/g'; done ``` --- ## falsifying history ```shell $ git commit --amend $ git rebase $ git filter-branch ``` --- ## picking cherries * sometimes you want to merge only selected commits from a branch * ``git cherry-pick`` allows you to apply specific changes to the current branch * for cherry picking to be useful, commits must be "small"! * ``git cherry`` lists missing commits between branches --- ## when things go south ```shell $ git reflog $ git blame $ git bisect $ git gc ``` --- ## multiple remotes ```shell $ git remote add forked git@github.com:me/forked.git $ git remote set-url --push origin git@github.com:me/forked.git ``` --- ## prompting * bash: git@github.com:magicmonty/bash-git-prompt.git * zsh: git@github.com:olivierverdier/zsh-git-prompt.git --- ## .gitconfig ``` [user] name = Rab Oof email = rab.oof@foo.bar [merge] tool = diffuse [color] branch = auto diff = auto status = auto [pull] rebase = true [alias] ll = log --stat co = checkout ci = commit st = status unstage = reset HEAD pick = cherry-pick history = log --graph --decorate --abbrev-commits --all [core] editor = vim [help] autocorrect = 1 ``` --- ## playing the game * [Git Branching game](http://pcottle.github.com/learnGitBranching/?demo) * [src](https://github.com/pcottle/learnGitBranching)