Conversation

there's something weird about git branches that "a branch is just a reference to a commit" does not capture and I've been struggling with it for weeks

like in this diagram I think most people would say that there are 3 branches (corresponding to the 3 commits at the top of the diagram), though technically in git you could have 0, 3, or 100 branches here, and it's not labelled so you have no way to know how many branches there are

(please don't try to explain branches to me ty)

9
0
0

@b0rk whoa, how can you have 0 branches here!?

2
0
0

@ajh

a branch in git is just a named commit, and you can make any sort of trees you like in detached head mode

0
0
0

@ajh I mean in a git repository you can literally delete all your branches but all the commits will still technically be there

(though of course there's no reason you would do that, and in general git makes it very hard to view commits that aren't on a branch so those commits are kind of "invisible")

0
0
0

@b0rk are you looking for other ways to express "a branch is just a reference to a commit"?

1
0
0

@JackEric no, I want to figure out how to express that I think people are actually right in a way to say that there are 3 branches in that diagram, even if most git nerds might say that they're wrong

0
0
0

@b0rk just for interest, which tool are you using for the graphics? Freeform?

1
0
0

the thing is I feel like it's not really wrong to say that there are 3 branches in that diagram, even though you can easily argue that it's not technically true

like I think people have an intuitive idea of branches that's different from "a branch is a reference to a commit" and it feels like a lot of git is actually built to support that intuitive idea and it seems weird to try to erase it

(2/?)

3
0
0

@b0rk branch is a misnomer. It's a tag. Mercurial calls them bookmarks.

1
0
0

@travisfw really not interested in git vs mercurial terminology arguments thanks

1
0
0

because we have a bunch of rules and expectations around branches like

* commits that aren't on a branch are almost impossible to see in git (without the reflog), so why would you put them on a diagram?
* commits that aren't on a branch get GC'd eventually
* having 2 branches that point to the same commit is of course possible but feels redundant in most cases
* having a branch that points at an ancestor of another branch is a bit weird (but more common)

(3/?)

4
0
0

anyway I feel like there's this intuitive definition of git branches like "a branch is a chain of development that contains commits that aren't on any other branch" that seems just as important as the technical "a branch is a reference to a commit" definition, even though it might be "wrong"

I guess I'm interested in what people's intuitive or "technically incorrect" definitions of a git branch are

(4/?)

15
0
0

@b0rk I think of them more like pointers or labels on top of a commit
which makes them as intuitive as memory pointers 😂

1
0
0

@b0rk I was staring at that phrasing for five minutes and I got a brick to the head when I thought “reference… like a pointer” A pointer to a location not in memory, but to an address in the space of all commit hashes?

1
0
0

@flyingsaceur yeah exactly! if you run `cat .git/refs/heads/BRANCHNAME` in a git repo, you'll see what commit hash it points to currently

0
0
0

@b0rk
having 2+ branches that point to the same commit is very common?

main and origin/main and newly-created feature branch that doesn't have any commits on it yet

main is also routinely the ancestor of origin/main and/or feature branches; origin/feature1 is commonly ancestor of feature1

3
0
0

@sabik yea totally! my guess is that some people don't intuitively think of main as a "branch" for those reasons (though of course it is)

0
0
0

@sabik @b0rk I do it by accident frequently!

0
0
0

@sabik @b0rk I think of the tree in your drawing as the "DAG" -- the underlying structure of the commits in the repository, and branches as sticky-notes that can be attached to any of the nodes.

But then you'd need to explain what a DAG is...

If you can get that concept across, it helps explain commits that aren't on a branch -- they just don't have a sticky note...

0
0
0

@pearofdoom the analogy to pointers is interesting, i never thought about comparing "why is git confusing" to "why are pointers confusing"

0
0
0

@b0rk I usually use “timelines” or “commit histories” when I need to get in the weeds about it. Another reason I prefer short lived feature branches is you mostly don’t need to clarify between these and the technical branch

2
0
0

@stormsweeper i really like "commit history", it seems more intuitive and also correct

0
0
0

@stormsweeper @b0rk
Possibly "line of work", referring to how we use them more than the details of implementation?

"History (of commits)" feels like it'd have unwanted baggage, particularly for workflows that consider feature branches to be proposals rather than history...

0
0
0

@b0rk okay. Regardless of alternatives, branches are not branches. That's the point most people don't seem to get about git. Sorry if you already know that, but a lot of people don't.

1
0
0

@travisfw what do you mean by "branches are not branches"? (i have no idea how mercurial works, I only know what a branch is in git and I don't know what else a branch could be)

1
0
0

@b0rk at work, we don't worry all that much about the git-mechanical definition of a branch because the bitbucket UI handling of it is where most of the training for our intuition comes from. This is a little weird because you can have a branch off of a branch, review it, and easily merge it back into that branch - which you can do "by hand" in git but a local checkout doesn't actually record the information. (So our intuition matches our *workflow* great, but not so much git itself)

1
0
0

@b0rk I think of them as bindings, sorta? So like tags and branches share that they’re pointers, except that branches move with HEAD and tags don’t? (I know that’s not probably not exactly right, but it’s how my mental model treats it.)

2
0
4

@b0rk the mindset behind the vocab collision of git branches vs mercurial branches was one of my absolute biggest hangups when I made the switch.

(Also I've learned a lot not just from your finished products but from your public process of developing them.)

1
0
0

@powersoffour i should learn what a branch is in mercurial one day, i've tried to read about it a bunch of times but the words always kind of slide off my brain

0
0
0

@b0rk This is really interesting to me so thanks for wondering about it in public. It occurs to me that the branch metaphor implies the existence of a trunk, as does the shape of your diagram. But there’s no such thing as a trunk in git (though there have been attempts to bolt one on). For me, “line” instead of “branch” makes intuitive sense, better matches reality, and avoids a false cognate.

3
0
0

@santry @b0rk
You could say your typical bush has lots of branches but no trunk. You usually have a "main" branch if you really want to think of one as the trunk.

0
0
0

@santry the "there's no trunk in git" thing is so interesting to me because (while of course that's true!) i feel like so many of our git workflows are really centered on the idea of a main branch and I don't know why there's such a focus on pretending that isn't the case

2
0
0

@santry @b0rk
I mean, shrubs have branches but no trunk

Railways have branches

0
0
0

@b0rk

> * having 2 branches that point to the same commit is of course possible but feels redundant in most cases

Isn't that pretty common when you have a staging branch (i.e. a branch that is deployed to a staging environment)? I mean it's not *always* the same commit, but it's not unusual to have the same commit on staging and main.

1
0
0

@b0rk I said pretty much the same thing back in February, came to pretty much the same conclusion as you: https://blog.plover.com/prog/git/branches.html

My position is that a branch is a sequence of commits whose last commit has a name. I've been using that model for years now without encountering any serious problems with it.

1
0
0

@mjd how did I miss this blog post at the time! this is so good

been thinking about how it's weird that "branches are named sequences of commits" is kind of the same statement as "branches are commits" (because you can think of a commit as a sequence of all its parent commits)

1
0
0

@mjd but i guess what you mean by "named sequences of commits" is probably not literally "every parent commit of that commit", but just the parent commits until it joins with the main branch

0
0
0

@b0rk I think I'd have to go with "a branch is a pointer to a commit, doing a terrible job of pretending to represent a chain of development."

1
0
0

@jonathon why do you say it does a terrible job?

1
0
0

@b0rk Because you lose traceability of which commits were associated with a given branch on merge.

They're fine as long as all you care about is the current state of your code. They're also fine if you just want to look back and see *everything* that contributed to the current state of your code. But, if I want to ask a question like "which commits were involved in feature #29," if you've used branches to do that and haven't annotated your commits by hand, Git cannot answer that for you.

1
0
0

@santry (that sounded kind of judgemental but I really am curious about it)

0
0
0

@b0rk I think this stems again from git holding religiously to the idea that a commit is the state of the world, rather than a *change* from a previous state.

Related: once you merge things, you can't necessarily tell what branch a commit was made on.

Because git wants to fervently adhere to commit == state of the world, things like diffs, chains of commits, renames, deletes, etc. all have to be partially inferred, and the places where that fail make everything feel a bit weird.

1
0
0

@zellyn thanks, I hadn't thought of it that way

0
0
0

@b0rk Maybe it becomes more obvious when you add the references as labels to the diagram? We‘d be able to see exactly how many branches exist

maybe that helps bridge part of the disconnect between definition and intuitive understanding (I agree this gap exists)

1
0
0

@exterm i think these are the only 2 branch labellings that really make sense to me

(like there are theoretically an infinite number of possible setups but I think these are the only normal/realistic options)

0
0
0

@b0rk Isn’t having a branch that points at an ancestor of another branch the usual case in over 90% of the times?

Creating a branch that doesn’t uphold this requires using the --orphan flag.

1
0
0

@whynothugo yeah totally! i just meant that usually you don't intend to leave the branch that way, you'll commit to it immediately after creating it

0
0
0

@alpha sounds pretty much exactly right to me

0
0
1

@b0rk I think you're 10000% right, and it's going to be a difficult topic. GitHub and Gitlab subvert the git model by duplicating commits across local and remote branches, but they are super popular and, for some people, the only encounter they have with git.

1
0
0

@rlb curious about what you mean by "subvert the git model" -- i've personally almost exclusively used git with github and I've never understood why some folks dislike the github/gitlab model

1
0
0

@b0rk A branch can indicate a few things:

1. A place to find commits, by tracing back from it, and stopping when you get to commits shared by another reference
2. A place to put new commits. The new commits reference the old commit as a parent, and then the branch is updated to point to the new commit.
3. An intent to merge into some target branch

You'll note that for two out of these three things, a branch is incomplete; it depends on some other reference. Only in case 2 does it stand alone.

2
0
0

@unlambda thanks this is an interesting way to think about it!

0
0
0

@b0rk Anyhow, not really sure what to make of it, but it feels that in many ways, a branch on its own is incomplete.

You'll see that when you use Git forges like GitHub or GitLab, it becomes a lot easier to deal with a branch once it's associated with a pull request or merge request, which adds in that extra information, the branch it's intended to merge with.

1
0
0

@b0rk And then if I recall correctly, other DVCS's like Monotone and Mercurial would record information about what branch a commit was associated with with each commit, rather than just as a pointer like it is in Git.

The flexibility of Git's model, where it's just a pointer, made complex things like rebasing easier; but also makes it a lot easier to shoot yourself in the foot or just get yourself very confused.

0
0
0

@b0rk I think that's exactly right. Try to explain why rebase is called "rebase" without using the intuitive meaning of branch - you can't because there isn't really any other name for that concept of "a line of commits that are forked off from another line of commits". If we weren't meant to be calling that "branch" then there would be another name for it. I think the actual definition of a git branch should be considered an implementation detail.

1
0
0

@toby1kenobi the point about the naming of rebase is really good, thanks

0
0
0

@b0rk To me, branching as soon as I start to work on something has good ergonomics. So it’s always temporary, but “this is my feature branch, it points to the same thing as main” is pretty normal.

Possibly related: depending on the issue, I don’t necessarily do “tiny method, tiny test, commit.” If I have to wander to find where to make a change, it might be hours or a day before I make that first commit on a branch.

1
0
0

@b0rk I appreciate you taking the time to engage. GitHub chooses to center people over projects. In bare git, branches tell a story, where each commit is precious. The right approach to feedback is to amend the branch and rebase the patchset. In the GitHub/Gitlab model, a branch is only as good as its last pull request, and the right way to contribute to a project is to fork it and maintain your own forest of branches.

1
0
0

@rlb ah that makes sense, i know a lot of people who also don’t like github’s approach to code review

0
0
0

@b0rk sorry, what I said was very unclear. I just meant that the metaphor of a branch (a wood stick on a tree) doesn't extend to git. I tried to recreate your illustration with some colors.

You know the green circles are Git branches. I would call the blue and red lines branches. I kinda hate Linus for not bothering to give us a way to track what I would think is obviously a branch.

1
0
0

@b0rk I think your intuition is correct here

If I were trying to explain it, I would maybe start with a diagram of a single series of commits.

O -> O -> O

“In git, you can begin a branch at any one of those circles.

These empty branches are not visible in our diagram. We need to remember that empty branches can be anywhere a commit is.

But for the rest of this tutorial, if we have an empty branch, we’ll call it out”

“there could be empty branches here but we don’t have any.”

1
0
0

@crazybutable thanks I like the idea of an “empty branch”

0
0
0

@b0rk @santry I think some of that is a reaction to earlier revision control systems, notably svn, where trunk was special and using other branches is significantly different to trunk. So it was a point of pride/differentiation that git treats all branches roughly equal.

1
0
0

@mpe ah yeah that makes sense, I never used svn so i’ve never understood any of the references to it

0
0
0

@b0rk that's pretty much where I sit.

Workflow wise I name a branch, then all my work on that branch is eventually either merged back into main at some point, used as a separate thing, or thrown out

But I definitely treat the branch as a whole thing that is defined by a set of commits, and the tools support working with branches that way quite well

1
0
0

@RandomDamage what do you mean by “used as a separate thing”?

0
0
0

a lot of people are saying that they think of branches as being "based" on another branch (often `main`)

and git's terminology supports this idea in a lot of ways -- you have "rebase" (which moves a branch to have a different base) and "merge base"

so intuitively I think it's pretty normal to think of there as being 2 kinds of branches (even if git doesn't represent it that way internally):

- standalone (like `main`)
- offshoots of a standalone branch

(5/?)

7
0
0

@b0rk Does a repository with just a single initial commit have a branch?

1
0
0

@andy_f yes, when you initialize a repo git will create a branch called `main` (or whatever you have configured as the default branch name) by default

that's what I'd think of as a "standalone" branch

1
0
0

@alpha @b0rk this matches my understanding and is said better than I've managed

when said that way, it explains why git won't let you literally check out a tag (detaching instead) — when you make a commit, you want HEAD to move with the change you just made, and tags shouldn't move, so you'd get weird behavior where you make a commit and HEAD stays still & the changes "disappear"

Which suggests detached HEAD might be better explained as being on an unnamed/anonymous/virtual branch

1
0
0

@b0rk Hmm. Do you feel like git using the term ‘branch’ for that sort of thing is a mistake? Like, should that maybe be called a ‘trunk’ or something?

I’ve been following your toots about this and it’s got me questioning quite a lot about what I thought I knew about git 🤔

In a more general sense, do you think the way git does things is ideal? Or even just good? What would a ‘better’ system look like?

1
0
0

@andy_f I don't really know. Personally when I think about that question I think about things like the VSCode git extension -- they've made some different UI choices than the command line git tool does (for example there's a big "sync" button instead of push and pull) and I think that's interesting.

0
0
0

@b0rk this matches my mental model and corresponds to how tools like GitKraken visualize a repository.

1
0
0

@b0rk pardon me if you’re already answered this, but are you working on something you plan to publish with all these git diagrams? If so, I’m really interested in the end result

1
0
0

@genebean yes! you can sign up here if you want to get an email when it's done https://wizardzines.com/zine-announcements/

0
0
0

@b0rk I was going to dispute that, because _technically_ the two sides are symmetric, but on a little reflection I realized you were right—that _is_ how we think of it and talk about it.

Previously I think I would have said that the `main` and `branch` branches in your picture share commits. But I think your description is closer to the way people usually think about it. Rebasing `branch` onto `main` would be routine. Rebasing `main` onto `branch` would be super-weird.

3
0
0

@mjd yeah I feel like git treating them as symmetric is actually the cause of a lot of confusion and mistakes (my guess it that it's relatively common for newcomers to mix up the order of a rebase and end up very confused)

1
0
0

@mjd @b0rk In my brain it's not so much two classes of branches, standalone and branched-off-standalone; rather, there's a notion of branch "seniority"; when you're sitting in branch 1 and create a new branch 2 off it, branch 2 is junior to branch 1, and e.g. it's more normal to rebase junior branches onto senior ones. If I branch 2 off 1 and then 3 off 2, then the relationship between 2 and 3 is the same as that between 1 and 2.

(But I am very far from being a git expert or a version-control expert more generally.)

1
0
0

@b0rk "a branch is a chain of development that contains commits that aren't on any other branch" — I'm not sure I like this, because it seems to me to exclude some very common use cases. What is happening when I do `git checkout -b new-topic main`? It doesn't seem to be described by your rubric there because `new-topic` doesn't contain commits that aren't on any other branch.

1
0
0

@mjd yeah I agree. I think I like the "a branch is an offshoot of another branch" definition better -- it leaves room for the offshoot to be "empty" (like when it's newly created) or to have commits on it

0
0
0

this has been a really great discussion, appreciate everyone's answers! I understand the disconnect between how people intuitively think about branches and the "reference to a commit" implementation better now

0
0
0

@b0rk I agree with all of this! There were some mercurial extensions I used while working at Mozilla that has nice support for things like "show me all of the heads in my local repository that aren't in my remote". That feels like a much more natural thing to care about! You could just checkout any revision, commit something, and it's a new head that wouldn't get lost.

1
0
0

@tedmielczarek some friends of mine have been using git-branchless that lets you work the same way with git! it seems cool

0
0
0

@gjm @mjd yeah that makes a lot of sense (i think i stated it the way i did just because i never do more than 1 level of branching)

0
0
0

@b0rk I think you’re 100% on the nose here. It’s why terms like “applicative endofunctor” are useful — they’re precise and don’t clash with common sense.

I’d say that a branch is any run of commits that has forked at an ancestor, or the single run of commits from the oldest ancestor. 🤔

1
0
0

@rjbs i'm not sure, personally I'd rather figure out how to align git with common sense instead of forcing people into weird terms

(personally I find category theory terms in programming to be extremely confusing, though it's dangerous to say that becuase it always attracts folks who are excited to try to explain monads)

0
0
0

Which suggests detached HEAD might be better explained as being on an unnamed/anonymous/virtual branch

@igrok @b0rk Apparently I’ve also ceased thinking of branches as parts of the commit graph entirely and only think of them as references.

0
0
0

@b0rk for me, main is no different than any other branch except in how we treat it. So I think of branches like bookmarks for specific commits. I often checkout a new branch as a way of backing up current state before embarking on more work. (I often split large tasks into a series of chained branches, each of which will get squashed down to a single commit.)

And I think of rebasing as "take all the work that happened between these two bookmarks and apply it on top of this third". Mostly I think the "branch" terminology is completely wrong, but we seem to be stuck with it.

1
0
0

@huxley can you say why the "main is no different from any other branch" thing is important to you?

i've always found it a bit confusing because we generally treat main quite differently (even if technically `main` has no special place in git) and I don't get where the focus on "main is not special" comes from

i'm wondering if the point is "git will not protect or help you in any way, keeping `main`'s special place is totally up to you and you can easily mess it up"

0
0
0

@hyperpape @b0rk I’ve recently stopped keeping a local trunk branch at all, finding that it’s largely unnecessary for my purposes. (For work, at least - for personal projects, I do trunk-based development and it’s a pretty high bar to use branches at all.) It started out because we used to allow pushes directly to trunk (for reasons that recently became unnecessary so now we use branch protection on GitHub), but since I never need to commit directly to trunk, there’s not really a reason to have the local reference.

I did have to tweak my new branch command to git checkout --no-track -b branch_name origin/trunk though. The trailing origin/trunk to set the branch “root” and --no-track since otherwise my prompt gets confused since otherwise upstream gets set to origin/trunk.

0
0
0

@b0rk The only thing I feel this intuitive version glosses over is that whenever I _create_ a branch it's still a separate thing in my head even before it has any unique commits of its own. So I feel like it's not _just_ the presence of divergent commits that matters.

2
0
0

@nettles yeah agreed, I think maybe this version (linked) is better (because it allows the possibility of an "empty" branch that doesn't have any commits of its own yet) https://social.jvns.ca/@b0rk/111448711844617217

0
0
0

@b0rk @nettles That, plus the hierarchy isn’t just primary/secondary, but can have an arbitrary number of nested sub-branches. For example, bugfix branches, preview (or other kinds of) build-specific branches, experimental branches, etc. which might be offshoots from a feature branch.
You know, a preview build branch on a bugfix offshoot from a feature branch which has diverged from main.

1
0
0

@b0rk @nettles There’s no technical difference between any of these branches including the ones that don’t diverge from others. It’s all just convention.
In practice, we never rebase A on B and then later go on to rebase B on A, at least not for long-lived branches. So there’s kind of a hierarchy in our minds that a branch is always rebased on its “superior” not vice versa, but unless we use tools that add semantics on top of git’s semantics, it’s all arbitrary & cultural.

0
0
0

@b0rk Branches in a repo can be used to support any desired workflow. The standard tools reflect different workflows their authors had in mind, and I've had coworkers with other ideas about what we should accomplish. So use them as you see fit!

You can start a new branch at the latest commit to rush forward with a new feature, or base it on a past commit to remember that spot or do a maintenance fix from there.

And there's nothing to really say which is primary or original, other than the significance folk attach to them. For example, Linus publishing his version of the Linux kernel is of great interest to the rest of the community since he's the lead developer.

1
0
0

@adb i hear this "there's no way to know what's primary or original" perspective a lot and I'm always confused by it -- personally I've never worked on any project that didn't have a centralized main branch, and I think it's an extremely common workflow

do you work on a lot of projects that work that way? why is it important to you to not have a central repo?

(i understand the linux kernel is an example and maybe git itself but I don't know of a lot of others)

0
0
0

@mjd @b0rk I rebase main onto branch all the time; it applies other people's work so I can fix any conflicts in the branch before I create the PR that will move my commits to the shared remote.

The thing with git is that not only is there more than one way to do it, someone is doing it that way for entirely sensible reasons.

2
0
0

@graydon @mjd huh I don't understand --- i'd expect running `git rebase branch` on main to result in a weird state on `main` where now you have a weird mix of old commits (which have already been pushed) on top of new commits

1
0
0

@graydon @b0rk

That's a great idea, I'm going to do it. Thanks!

0
0
0

@b0rk @mjd `git rebase branch` on main does result in a weird state. Would not recommend.

`git rebase main` **on branch** updates your branch commits to be as they would have been had you started from where main is now, and that will let you find the places where your work is in conflict with the rest of the team's work.

It's also stuff that has to be dealt with eventually so might as well do it locally.

1
0
0

@graydon @mjd ah yeah I agree. I would call `git rebase main` "rebasing branch onto main" and not “rebasing main onto branch", it's interesting that different people use the same terms to refer to completely different things

0
0
0