Yet another Git rant

Not logged inOpenClonk Forum

Forum Home Help Search Watchlist Register Login

Topic General / General / Yet another Git rant

Post

By PeterW

Date 2013-01-16 14:28 Edited 2013-01-16 14:35

Something that still baffles me about Git is how hard it is to merge in a meaningful way, given that it is supposed to be its forté. Let's say I have four branches
1. "master", which is the dev state of the project I have branched
2. "stable", a stable branch of the project
3. "master-dev", which is my branch of master
4. "stable-dev", my branch of stable

At some point I made the decision to start doing most development in "stable-dev", as "master" has diverged to the point that my patches don't make any sense anymore, as some modules got rewritten. Therefore master-dev is actually behind a few months at this point. Additionally, the same applies in reverse: My "master-dev" and "stable-dev" branches have refactored some modules to the point where some patches coming from "master" or "stable" need major rework in order to apply.

Now I want to clean this up, so I get a "master-dev" that I can actually work with. This means merging the changes from "stable-dev" and "master". This is where it starts to get complicated.

Let's first look at the "stable-dev" branch: It got merged with "stable" at a number of points, which means it imports a number of changes that were rebased from "master", with compatibility adjustments, mostly hidden in merge commits. Additionally, there are some commits that are actually exclusive to the "stable" branch (version numbers etc.) that I don't want in "master-dev", or it would get messy.

My decision here was (is?) to manually filter and rebase "my" changes, hoping that I don't miss anything important in the merge commits. I'm entirely unsure though what state of "master-dev" I should actually rebase to, given that I'm skipping the interleaved commits coming from "master". If I was going by the ideal of having buildable intermediate results as often as possible and minor headaches about hiding magic in merges, I would have to "replay" the equivalent commits from "master" in the right order, which is positively too much work.

Second, there's the incompatible changes from the "master" branch itself. Now it's reasonably easy to say that I want to merge with "master", as that's the history that I'm caring about here. Yet doing this as a monolithic merge means that I would be hiding possibly thousands of lines of new code in a merge.

On the other hand, just "splitting" the merge into trivial and non-trivial changes (the former in a monolithic commit, the latter as new commits referencing the original changes) is also quite a nightmare. A bit down the line I find myself having to "amend" the merge constantly as changes move into and out of my "trivial" classification. And the intermediate states are completely nonsensical, which means that I have to fly blind for a week.

Right now I'm about to scrap all my work up to this point and move to an approach that has worked in the past - namely doing the merge in "stages" up to the "non-trivial" commits, then having one "merge" commit for each of the non-trivial commits that has the actual changes. Maybe then with the rebased commits interleaved.

Argh. Why is there no system where I can say "everything from master, master-dev and stable-dev, minus anything exclusive to stable, go figure out the dependencies and call me if something happens that's actually interesting". Darcs already does it, how frickin' hard can it be. Any better ideas about how to get this working better? :/

By Newton

Date 2013-01-16 22:58

What you did sounds like you made a fork of the main git repository. You have your own stable and master and want to cherrypick changesets from the others.

> Now I want to clean this up, so I get a "master-dev" that I can actually work with. This means merging the changes from "stable-dev" and "master". This is where it starts to get complicated.

In my understanding, you should merge/cherypick
* stable => stable-dev
* master => master-dev
You shouldn't have to merge from stable-dev to master-dev since stuff that has been committed to stable should also be in master (from which you merge) in the main project anyway. Except if the changeset in stable is exclusive for stable, for example if it fixes a bug with some piece of code that has been changed all over already in master already. But in this case, you don't want to have it in master-dev either.

By PeterW

Date 2013-01-17 12:48

Yes, everything from stable is in master, but not all my commits from stable-dev are in master-dev, given that stable-dev was the branch that I did my development on, as master-dev was not suitable due to changes in master.

Do I have to draw a picture? ;)

By PeterW

Date 2013-01-17 18:28

Word of warning about staged merges - if you later figure out you have made a mistake somewhere in the middle, it seems you have to start over. Relying on git rebase -i -p for squashing commits into merges does seem to cause some major explosions - so far it doesn't seem to actually create merges, and produces quite nonsensical new conflicts. It also eats the "rebase-todo" file on every occasion without backup, which means that I had to rebuild my whole rebase plan a few times over before arriving at this insight.

Next up: Let's see whether rerere actually manages to make my life easier for a change.

By PeterW

Date 2013-01-17 19:33 Edited 2013-01-17 19:41

... it does not. Apparently it can't learn new solutions from existing merges.

I think I'm just going to write my own shell scripts now instead of hoping for Git to surprise me. Here's how I rebase merges - I'm basically creating a new merge commit, and resolve any conflicts using the diff of an existing merge, which you have to tell it manually. That together with a lot of "git show x | patch -p1" will probably give me something I can work with. Anybody have a better idea for how to achieve this?

#!/bin/sh
BASE=`git rev-parse HEAD`
MERGE=$1                  # The commit we want to merge with
REBASE=$2                 # The merge we want to rebase
REBASE_BASE=$3            # Our equivalent relative to the rebase commit
git merge $MERGE --no-commit
CONFLICTS=`git diff --name-only --diff-filter=U`
if [[ ! -z $CONFLICTS ]]; then
    # Reset file to state before merge
    git checkout $BASE $CONFLICTS
    # Import diff from other merge
    git diff -r $REBASE_BASE -r $REBASE $CONFLICTS | patch -p1
    # Add to index
    git add $CONFLICTS
fi

By PeterW

Date 2013-01-17 20:12

Well, good news is that using this conflict resolution script minus the "merge" part I can actually assist git rebase -i -p to get over its strange merge problems.

Bad news is that it seemingly at random made non-merge commits out of merges. I'd swear I did the exact same thing every time. Maybe squashing kills merges even if rebasing managed to recreate them or something? Aaaargh...

By PeterW

Date 2013-01-19 19:29

Well, let's see whether Stack Overflow at large has any advice on this. So far I only got "Git merges suck, use rebase instead" :(

By PeterW

Date 2013-01-21 15:08

... that's a resounding "no". So the options are really
1. never make a mistake while merging
2. rebase instead, which means throwing away history
3. leave significant amounts of history which are known to be nonsensical (aka unbuildable)

Which all seem impracticable at large scale. What a disappointment. I will probably grit my teeth and go with option 3, even if it means hundreds of "oops" commits.

By Zapper

Date 2013-01-21 17:25

Just out of curiosity: would that have worked better with Mercurial?

By PeterW

Date 2013-01-21 18:04 Edited 2013-01-21 18:50

Given that Mercurial provides even less help with history modification than Git - I highly doubt it. As I said, Git and Mercurial are both equally bad as far as I'm concerned.

Basically, neither of them really address the issue of patch dependencies. Instead they reason about revision dependencies, which is pretty restrictive. And when people want to break out of that (as they always will want to do within about the first five minutes) they fall back to "rebase" mechanics which just dumps a whole lot of patches on you so you can construct another completely pointless revision history out of it.

The way I have come to look at it is that I am building a "story" - not of something that actually happened, but of something that could have happened in an ideal world. A world where I never had to go back on a change, or never had to completely rewrite a feature because the code it was based on was changed in parallel. That's unqestionably a good idea, as it documents your changes better than a jumbled mess of back-and forth. Yet all this should have references to where development actually came from, in case a bug needs to be traced along that path (which is why I don't want to do a mega-rebase, as the people on Stack Overflow suggest).

So if that's what Git workflow demands, I try to construct it. Staged merges sort-of manage to get me there, but now the restrictions on squashing are really derailing my efforts...

(Addendum: Just to clarify, I am talking about work here, specifically merging this and this, which means getting through about 150k LOC in changes with conflicts all over the place. All this is way too subtle for a project the size of OpenClonk, as we don't even have that much code or activity in the first place.)

By PeterW

Date 2013-01-22 14:31

So just for general amusement, here's the picture:

This is the third time I have recreated this merge tree - going for option 4 that I didn't mention above, which is stop using Git's machinery and do stuff by hand using git diff and patch.

But now, again, I notice that there is a mistake in the highlighted merge. Yay me.

By Zapper

Date 2013-01-22 15:06

Just out of curiosity again: what did you work on that differs so much from the original tree? :o
Or was it just something that you took over from CR and where you did not do the path adjustments etc?

By PeterW

Date 2013-01-22 15:45 Edited 2013-01-22 15:47

I'm building a form of debugging support into the compiler, which means adding source mappings everywhere. As a result, there are very few important places in the compiler that I don't touch. Some (specifically the backends) I have to refactor heavily to even be able to support the sort of information I pass through.

I am trying to be as smart about it as humanly possible, but the merge nightmares are just unavoidable.

By Sven2

Date 2013-01-22 15:59

Huh, CR? I thought his ramblings were about his work on HGC?

By Zapper

Date 2013-01-22 16:13

HGC? GHC?

By Sven2

Date 2013-01-22 16:54

GHC! GHC!

By Zapper

Date 2013-01-22 17:22

Ah, okay, I thought it was Clonk-related

By Sven2

Date 2013-01-22 17:26

I don't know. Maybe it is. You talked about CR and Peter didn't deny it.

By PeterW

Date 2013-01-22 18:19

I was slightly confused as well. Finally settled for interpreting it as "something like the path adjustments we did when coming from CR".

By PeterW

Date 2013-01-22 18:25 Edited 2013-01-22 18:29

Hm, okay, now I'm going with option 5: Patching Git to do something sensible. The code is horribly naive apparantly - did they really think a simple git merge with the same parents was an appropriate way to rebase a merge...? They're not even looking at the contents of the merge in question...

I'm now doing "git merge --no-commit -s ours ...; git cherry-pick -m x -n ...; git commit ...", which seems a lot saner to me.

By Günther [de]

Date 2013-02-28 18:22

>2. rebase instead, which means throwing away history

The msysgit developers have decided that the answer to not wanting to throw history away is "duplicate it". They rebase, and then keep the old history by manufacturing a merge commit that points to the old history, but takes the state of the repository entirely from the rebased branch.

https://github.com/msysgit/msysgit/blob/master/share/msysGit/merging-rebase.sh

By PeterW

Date 2013-02-28 20:01

Hm, which is roughly what I'm doing as well - with the difference that they split up everything into rebased commits and the merges (which become essentially NOPs providing Git some context). Not sure I like it better though: duplicating thousands of commits where 1% of them actually have non-trivial changes also seems like actively hiding the needle. Collecting trivial commits together would make this more manageable.

And it still doesn't provide any insight into the "and what if I screw up on the way there?" problem :(

By Günther [de]

Date 2013-03-01 03:21

> And it still doesn't provide any insight into the "and what if I screw up on the way there?" problem :(

Oh, that's easy: You rebase again. And now you have three sets of commits! * To make it look as if you did not screw up, simply wait until the master branch has a few more commits. Clearly, you rebased in order to bring your branch up to date with regards to the master branch! Nothing to see here, move along!

* Alternatively, only keep the original set of commits and throw the intermediate stage away. Since there's only one merge commit at the end, that should be possible.

I think the official answer is to have smaller topic branches that contain no commits that are not intended to land in the master branch, and rebase those regularly, or preferably merge them into master. I guess the theory is that the work to adapt the patches is bigger than the work to arrange them into a nice history, and once you avoid the bigger part the smaller goes away on its own.

By PeterW

Date 2013-03-01 12:19 Edited 2013-03-01 12:25

Yes, I am rebasing to bring my branch up to date - which consists of careful step-by-step creation of intermediate states, which I all have to test and make sure they are consistent. If there's a problem, I have to bisect/annotate back until I have found where the problem came from. The history is here completely synthetic, an abstraction tool on top of the patches I am processing, if you will. Squashing away all unnecessary detail is the very point of this whole operation - "just don't do it" doesn't really help me.

I might add that even though I put a week of work into this, I am still months away from catching up with HEAD. I suppose you could say the mistake was doing changes spread this badly over the source code - but I sort of have no choice at this point.

By PeterW

Date 2013-01-22 19:15

Fun fact...

    A
   / \
  B   C
   \ /
    D
    |
    E

If you try to rebase this interactively as follows:

pick A
pick E
pick B
pick C
pick D

Probably with the intention of getting

    A
   / \
  E   |
  |   C
  B   |
   \ /
    D
    |
    E

You instead get

    A - E
   / \
  B   C
   \ /
    D

With E dangling there, not reached by anything and therefore effectively removed. Reason is that the merge-enabled rebase process insists on re-using the same parents, being completely inconsistent with using rebase for reordering commits.

By PeterW

Date 2013-01-23 18:59 Edited 2013-01-23 19:02

Actually a pretty interesting topic, once you think a bit about it... The reason it's using existing parents is that it's the only way to reliably recreate branches that start within the rebase area. Say:

    X
    |
    A
   / \
  B   C
   \ /
    D

If we now want to edit A, like squash in a later commit, doing a "linear" rebase like I was suggesting would leave you with a situation like follows (here for the B branch):

    X
   / \
  A'  A
  |   |
  B'  C
   \ /
    D'

pulling the unmodified A commit in again as the parent of C. Hence the current rebase process will go down both paths - requiring it to be "magical" about the commit's parents and removing considerable flexibility in reordering patches.

Currently I'm thinking that something like follows would be a sensible way to put generalized rebases:

edit A
pick B
branch A
pick C
pick D

With the branch thingy being automatically generated in the initial version of the git-todo and marking points in the rebase process where Git should do a git checkout of the new version of A. This makes the branching explicit, but the merging implicit - which still leaves some space for surprises, but would by closer to the spirit of Git's rebase for my taste.

By PeterW

Date 2013-01-23 20:01 Edited 2013-01-23 20:05

Hm, but what if we have

    X
   / \
  A   C
  |   |
  B   D
   \ /
    E
    |
    F

and now do

pick A
pick B
pick F
branch X
pick C
pick E

Therefore moving F towards A and removing D entirely. Apart from the discussion whether that sort of operation would be a terrible idea or not, the (I feel) best result should be

    X
   / \
  A   |
  |   |
  B   C
  |   |
  F   |
   \ /
    E

Meaning that the "magic" will have to recreate merge E detecting both that B has gotten a new child that it didn't have before and is a more likely target for the merge, and that D was removed and should be replaced by its parent. Hm.

(I hope nobody minds me randomly musing here. It's off-topic, I'm not breaking much, right? ;) )

By Zapper

Date 2013-01-23 21:22

>(I hope nobody minds me randomly musing here. It's off-topic, I'm not breaking much, right? ;) )

No, it's okay. I am enjoying the fishes!

By PeterW

Date 2013-04-03 18:53 Edited 2013-04-03 18:55

And the merge from hell continues... The reason that I so often have to correct merges is as follows - suppose I am in the process of step-wise merging in the changes from the ABC branch:

  A --- B --- C
         \
  - X --- Y

Now while reviewing & testing Y, I find that a bug is blocking me from proper testing - which is fixed in C. So I merge C:

  A --- B --- C
         \     \
  - X --- Y --- Z

And now I can figure out all the problems that Y had, and produce a fix F, which I really want to squash with Y, as it has nothing to do with Z:

  A --- B --- C
         \     \
  - X --- Y --- Z --- F

But this is precisely something that unmodified Git can't do.

By AlteredARMOR [ua]

Date 2013-04-04 07:54

Well, yeah... If you figured out that fixes you are going to apply have nothing to do with Z, you should have applied them to Y (by resetting to it and creating separate branch) and not Z.
But now, when F already icorporates changes from Z (C) it is something hard to achieve (unless you go through all of your modified files and reset some of them to "pre-Z" state).

By PeterW

Date 2013-04-04 09:00

So you're essentially suggesting the following?

  A --- B --- C
         \     \
  - X --- Y --- Z
           \     \
            F --- Z'

I mean, I do want a branch with all changes together, so the merge Z' is what I would do next. I still don't like this extra complexity. The thing is - in most cases Z and F are completely independent as far as the code goes, therefore they should be interchangeable. I just have to test with Z because it otherwise breaks the build at some point.

By AlteredARMOR [ua]

Date 2013-04-04 11:01

Yes. that is exaclty what I mean.

> Z and F are completely independent as far as the code goes therefore they should be interchangeable

That is why they should be implemented in separate branches. If you want interchangeability you should keep them in separate branches an continue developing your code. When "time comes" you can merge either Z or F (depending on which one is more appropriate) or perform a rebase in case you have already committed something on top of Z or F

> I still don't like this extra complexity.

Well, this is the (in)famous "Git-way": Branches are cheap. Git encourages you to create new branch for every changes which you suspect may be discarded in the end.

By PeterW

Date 2013-04-04 12:53 Edited 2013-04-04 12:55

> If you want interchangeability you should keep them in separate branches an continue developing your code.

But I can't develop my code when I can't test. And testing requires Z.

> Branches are cheap.

I'm not concerned with breaking Git, just my own brain. This approach just doesn't scale for complicated merges where I have to juggle a good dozen of Fs and Zs.

By Günther [de]

Date 2013-04-05 12:50

> Now while reviewing & testing Y, I find that a bug is blocking me from proper testing - which is fixed in C. So I merge C:

I think that's the mistake. Instead, discard the broken merge and merge X and C directly. Git rerere (I don't know whether that's enabled by default yet) should automatically reuse the merge resolutions you committed in Y, so you're not discarding your work. Generally, only merging "known stable" commits like releases is recommended, though I don't know how feasible that is in your case.

By PeterW

Date 2013-04-05 13:22

Yes, rerere solves this problem partially. But it is apparently not enabled by default and can't learn from existing merges, which is why I got stuck the first time around.

Plus stable commits are not only exceptionally tricky to identify - for documentation reason I would like to merge in the problematic commits in as much isolation as possible. If I include all fixes in the merge, there's a decent chance that a different non-trivial merge issue will pop up.

By Günther [de]

Date 2013-04-05 19:09

One could probably make rerere learn from existing merges by redoing the merge (git checkout mergecommit^; git merge --no-commit <whatever the syntax for the second parent of a commit is>; git checkout mergecommit -- .; git commit -m "teach rerere").

And sure, if the fix is only available together with lots of other changes, discarding the merge commit with the broken commit might not be the best idea - but its probably worth a try. Minimizing broken commits in the history is just as useful as other aspects of a helpful history.

By Luchs

Date 2013-04-05 20:23

><whatever the syntax for the second parent of a commit is>

By PeterW

Date 2013-04-06 12:43

Yes, but note that that's not that far from doing the rebase manually in the first place - there you will also re-run all the merges manually just like you described, just using "cherry-pick -m x" instead of "checkout". In either case it's a mess when not automated, and I went for integrating the automation into rebase instead.

By PeterW

Date 2013-04-04 16:20 Edited 2013-04-04 16:22

Okay, now with GraphViz wizardry... Here's a pretty graph of the > 3000 nodes Git tree I'm dealing with.

The colors stand for commit authors, with me being light blue. At least Firefox is able to search the graph, so a few pointers...
- 7bf44c3 is the merge head I'm trying to push forward along the main branch that goes to the right - which would actually go a lot further to the right at this point.
- Some might note 251be09, which is actually a side-branch of my changes that would make the merges even harder, which is why I'm holding out on merging them.
- Around 4783ce7 you can see the situation I was referring to when I started ranting - these are mostly easy "1 line" commits that fix things that came up in the history following f1da701. All merges on the way are not easy by any stretch of imagination (each has a few hundred LOC code complexity because of conflicts). Squashing those together into the second development line starting with 312ae09 took me multiple days. Now I have finally a Git that can do this with --recreate-merges, so this is less scary now.
- My current situation is that I am wrestling with d92bd17, which is a massive 4000 LOC change - which only over the next dozen commits actually becomes stable. I should probably have selected a better merge point, but it's damn hard to know something like that in advance.

By PeterW

Date 2013-05-20 16:25

Okay, I have finally finished the merge of doom (only took me half a year!). Also cleaned up my Git patch a bit, if somebody's interested.

Topic General / General / Yet another Git rant

Post