hackernews client

Git-absorb: Git commit –fixup, but automatic

451 pointsposted 9 months ago

277 Comments

burntsushi

9 months ago

The negativity in the comments here is unwarranted in my opinion. I've been using `git absorb` for years and it works amazingly well. I use it in addition to manual fixups. My most common uses of git-absorb, but definitely not the only, are when I submit a PR with multiple commits and it fails CI for whatever reason. If fixing CI requires changes across multiple commits (say, lint violations), then git-absorb will almost always find the right commit for each change automatically. It saves the tedium of finding the right commit for each change. False positives are virtually non-existent in my experience. False negatives do happen, but then you just fall back to the manual approach.

It seems like some would reply and say PRs should just be one commit. Or that they will be squashed anyway. And sure, that is sometimes the case. But not always. I tend to prefer logically small commits when possible, and it's not always practical to break them up across multiple PRs. Perhaps partially due to how GitHub works.

I use this workflow on all of my projects on GitHub.

menaerus

9 months ago

What is wrong with simply pushing a "Fix linting issues" in a new commit? It's self-contained and very well describes the (single) purpose of the commit.

I share the sentiment about the "logical small commits", and hence I don't see that adding a new fix commit is problematic as long as it is self-contained and purposeful, but perhaps I don't understand what is the problem that this tool is trying to solve.

It says

> You have fixes for the bugs, but you don't want to shove them all into an opaque commit that says fixes

So my understanding is that bugs were found in the PR and now you want to fix those bugs by not introducing separate "Fix A", "Fix B" and "Fix C" commits but you want to rewrite the history of existing N commits into N' commits so that it blends those A, B and C fixes in.

Maybe I can see this being somewhat useful only ever if those N commits are pushed either directly as series as patches to the main branch or as a single merge commit, and you want to keep the S/N ratio reasonable.

But otherwise I think it's a little bit problematic since it makes it harder for the reviewer to inspect and understand what changes have been done. But it also makes it harder for a developer since not all fixes are context-free, e.g. a fix for an issue found in the PR cannot always be attributed to the self-contained and single commit in your branch but it's the composition of multiple commits that actually makes this bug appear.

cryptonector

9 months ago

The issue isn't `Fix linting issues`, the issue is a `Fix linting issues` commit for issues you introduced in the code you'll be pushing with that `Fix linting issues` commit. Nobody wants to see you pre-push dev history -- it's not at all interesting. So rebase and squash/fixup such commits, split/merge commits -- do what you have to to make the history you do push to the upstream useful.

Now, if you find linting issues in the upstream and fix those, then a `Fix linting issues` commit is perfectly fine.

homebrewer

9 months ago

It makes git bisect more difficult than it needs to be.

menaerus

9 months ago

merge-commits or patch-series are already making it vastly more difficult for git-bisect than linear history with single self-contained commits. I used both and git-bisect on merge-commits is a nightmare.

WorldMaker

9 months ago

An option is `git bisect --first-parent` to start from your integration points. (Then drill down into that branch if you need to.)

codetrotter

9 months ago

Yeah. But for that we can squash everything into a single commit on merge. Instead of spending a bunch on time making every single commit in an MR perfect both in isolation and all-together. And if squashing everything in the MR causes a problem with the git history being too coarse, then that is almost certainly because the MR itself should have been split up into multiple MRs. Not because of how you organized the individual commits that belonged to one MR.

burntsushi

9 months ago

The goal isn't perfection. Splitting PRs has overhead, especially depending on what code hosting platform you're using.

Do you advocate for making code easier to read and understand by humans? What would you do if someone told you, "no I don't want to waste my time trying to make it perfect." It's the same misunderstanding. I try to treat code history like I treat code: I do my best to make it comprehensible to humans. And yes, sometimes multiple PRs is better. But sometimes multiple commits in one PR are better. I use both strategies.

codetrotter

9 months ago

I make multiple commits in the PR and squash it all on merge.

If the resulting squash is too big / touches too many things, it’s because I didn’t split the MR where I probably should have.

Because to me, the bigger waste of time is fiddling with all of the commits in the MR just to make it so that the MR can be merged without squashing.

If someone needs fine-grained history about how exactly a specific feature or bug fix came to be, they can always go to the MR itself and look at what happened there.

burntsushi

9 months ago

> I make multiple commits in the PR and squash it all on merge.

Sometimes I do that. But sometimes I want the history I've curated in my PR to be preserved without the overhead of creating separate PRs (which then often need to be stacked, introducing problems of its own). In which case, I avoid things like "fixup lint" commits. And that's where git-absorb shines. And saving time is exactly the point! git-absorb helps you curate history faster.

> If someone needs fine-grained history about how exactly a specific feature or bug fix came to be, they can always go to the MR itself and look at what happened there.

That's a lot more work than just reading the commit log. But I agree, one can do this ifneedbe. But it's nice to avoid when possible.

codetrotter

9 months ago

> That's a lot more work than just reading the commit log.

Yeah, that’s true. I suppose that we may be working in somewhat different environments.

For example, when someone has big public projects that get a lot of eyeballs on them, like your ripgrep and other projects, it makes a lot of sense to spend extra time making the git log a thing that can be read on its own completely offline.

For the things I work on at my job, there are just a few of us writing the code itself and those coworkers that work on the repos I do will usually check every MR from everyone else. And anyone else in the company working on other repos is usually likely to browse code via the GitLab anyway, if they are interested in actually looking at any of the code that are in the repos of my team. Onboarding new coworkers on the team is also mostly centered around the docs we have, and talking with other people in the team and the company, rather than digging through the git history.

And for me when I am looking to understand some code from our internal repos it’s usually that way too, that I use the GitLab UI a lot, and I look at the MRs that were merged, and I check the associated Jira ticket, and I might ask some coworkers on Slack to explain something.

Most of the time, no one outside of our team is very interested in the code of our repos at all. Discussions revolve around other aspects like the APIs we expose, and the schema of the data we store in tables in the databases, and the Kafka events we produce. That sort of thing.

And everyone at my job will necessarily need to be online when they are working anyways, so that they can use Slack etc. So we are one click away from our GitLab instance when we work. On the off-chance that someone is trying to do some work offline while being on the go and without Internet access available, they probably have some specific kind of work in mind that is sufficiently independent of specific details that they can do without that history.

burntsushi

9 months ago

Yeah folks work differently. I'm just mostly responding to this idea that (roughly paraphrasing) "hey we don't need this tool, you should just be squash merging instead."

FWIW, at my previous role, where I worked on an internal proprietary codebase, we all followed this same philosophy of curating commits. It was primarily to facilitate code review though, and not source history. Stacking PRs on GitHub is really truly annoying in my experience. (Although there is some tooling out there that purports to improve the experience.)

Gabrys1

9 months ago

Exactly this

baq

9 months ago

> What is wrong with simply pushing a "Fix linting issues" in a new commit?

if you want every individual commit be buildable then it is a no-go. it's also a no-go if you don't squash your prs.

recursive

9 months ago

What do the semantic commit purists do when a rebase causes some arbitrary commit to go from red to green? I've always wondered about that. Commit A becomes A' in a rebase. A is good, but A' is not. A' might be 20 commits ago in a feature branch.

burntsushi

9 months ago

It's hard to follow your example. You say "go from red to green" which I read as "go from failing to passing," but then you go on to say "A becomes A'" where "A is good, but A' is not." Either way, the answer is that if that commit is in your patch series that hasn't been merged yet, that it might make sense to just rewrite A'. But it might not. It could be a ton of work. But also, 20 commits in one branch is rather large, and maybe that's probably wrong too.

I suppose a purist might say you should go clean up the history so that you have no failing commits, but I don't know anyone who is a true purist. You don't have to be.

Instead of living in a word of black-or-white, join me in embracing the grey. I don't treat commit history as something that must be perfect. I just try to treat history like I treat the source code itself: I write it for other humans. Is the history always perfectly comprehensible? Nope. Just like the source code.

recursive

9 months ago

Yeah, that's what I meant.

I guess at the end it's all weighted tradeoffs for me too. I just put less weight on legibility and more on the ability to work on branches co-operatively without force-pushing.

burntsushi

9 months ago

If you're working on a branch cooperatively, then yes, don't rewrite history. But maybe you do rewrite history before merging to main to clean it up, depending.

And this is also why workflow questions are hard. Because "working on branches co-operatively" was previously unstated. It's not universal. I rarely work on branches co-operatively in a long term sort of way. Sometimes I might "take over" a branch to get it merged or similar, but rewriting history in that context is still fine.

It is always the case, even among so-called purists, that you don't rewrite history that you're sharing with others. (Unless you've both agreed to it. But even then, it better be short-lived because it's annoying.)

menaerus

9 months ago

Red main branch is not what I am talking about at all. I am referring to the code being developed in a feature/bug-fix branch that is yet to be merged with main branch. OP believes that even in the development branch you should not fix your WIP code by adding "Fix linting issues". I thought that was implied by the nature of discussion since rewriting history of the code that resides already on the main branch would be beyond my understanding.

nothrabannosir

9 months ago

“Fix lint” commits also taint git blame.

You could perhaps add some kind of filter to git blame to automatically skip commits whose message is “fix lint” but at some point the cure is worse than the disease.

I also see people argue that merge commits make git bisect harder than squashing in the first place but there is a third option: rebase and fast forward. If every commit in your branch is clean and stand-alone that’s viable. Linter fix commits break that paradigm.

paulddraper

9 months ago

> What is wrong with simply pushing a "Fix linting issues" in a new commit?

Everything.

1. git blame is obfuscated. "Fix lint" is not helpful or relevant. Tell me what actually changed.

2. git log is noisy. More stuff is harder to read than less stuff, and you're making me read more stuff.

3. git bisect is difficult. Interspersed broken commits requires more time to sift through. (Lint is a poor example, say it's "fix server config")

4. git cherry-pick is tedious. When you copy the dev change to a release branch (let's say you maintain 1.x, 2.x, etc), you must include errata.

greg0ire

9 months ago

5. git revert is tedious When one commit turns out to be bad, you can't revert it cleanly because your "Fix linting issues" builds upon it.

burntsushi

9 months ago

> What is wrong with simply pushing a "Fix linting issues" in a new commit? It's self-contained and very well describes the (single) purpose of the commit.

Because I care a lot more about the logical history of the source code versus the actual history of the source code. Commits like "fix linting issues" are more like artifacts of the development process. The commit exists likely because you forgot to run the linter earlier, and only found the issue after creating the commit. Logically speaking, those fixes belong with the relevant change.

Now, if you're just going to squash the PR you're working on, then sure. The extra commit totally doesn't matter. Make as many of those commits as you want.

But if your PR is 5 commits that you want to "rebase & merge" as-is (which is something I do pretty frequently), and you care about the history being easy to navigate, then it can totally make sense to do fixups before merging instead of creating new "fix linting issues" commits. If the follow the latter, then your commit history winds up being littered with those sorts of things. And fewer commits will likely pass tests (because you created new commits to fix tests).

I want to be VERY CLEAR, that I do not agree with your use of the word "wrong." I do not want to call your workflow wrong. Communicating the nuances and subtleties of workflows over text here is very difficult. For example, one take-away from my comment here is that you might think I always work this way. But I don't! I use "squash & merge" plenty myself, in which case, git-absorb is less useful. And sometimes breaking a change down into small atomic commits is actually a ton of work, and in that case, I might judge it to not be worth it. And still yet other times, I might stack PRs (although that's rare because I find it very annoying). It always depends.

> But otherwise I think it's a little bit problematic since it makes it harder for the reviewer to inspect and understand what changes have been done. But it also makes it harder for a developer since not all fixes are context-free, e.g. a fix for an issue found in the PR cannot always be attributed to the self-contained and single commit in your branch but it's the composition of multiple commits that actually makes this bug appear.

I'm having a hard time understanding your concern about the reviewer. But one aspect of breaking things down into commits is absolutely to make it easier for the reviewer. Most of my PRs should be read as a sequence of changes, and not as one single diff. This is extremely useful when, in order to make a change, you first need to do some refactoring. When possible (but not always, because sometimes it's hard), I like to split the refactor into one commit and then the actual interesting change into another commit. I believe this makes it easier for reviewers, because now you don't need to mentally separate "okay this change is just refactoring, so no behavior changes" and "okay this part is where the behavioral change is."

In the case where you can't easily attribute a fix to one commit, then absolutely, create a new commit! There aren't any hard rules here. It doesn't have to be perfect. I just personally don't want to live in a world where there are tons of "fix lint" commits scattered through the project's history. But of course, this is in tension with workflow. Because there are multiple ways to avoid "fix lint" commits. One of those is "squash & merge." But if you don't want to use "squash & merge" in a particular case, then the only way to avoid "fix lint" commits is to do fixups. And git-absorb will help you find the commits to create fixups for automatically. That's it.

beart

9 months ago

In some teams, you are not allowed to submit any commit that breaks the build, and a lint failure would be considered a broken build.

codetrotter

9 months ago

Do these teams run the pipeline on all the commits when you push multiple commits at the same time to your branch?

Say I have an open MR from a branch that I’ve pushed three commits to so far. I pushed these three commits individually and the pipeline ran green each time.

A coworker of mine points out some errors to me.

I have to touch files that I previously touched in the past three commits again.

I am tempted to commit these changes in one commit. But I decide to try git absorb instead.

So instead of adding one fourth, green commit to my MR, my use of git absorb rewrites all three of my previous commits.

But actually, the changes I was about to put in the fourth commit only work when taken together.

Splitting them up and rewriting the previous three commits will result in build failure if you try to build from the new first commit or the new second commit.

I don’t notice that because I’m on the new third commit.

I force push the three new commits to my branch for my MR. Gitlab runs the pipeline on the last of the commits. Everything looks fine to me.

Your team lead approves the MR and pushes the merge button.

Three months later he scolds me for a broken bisect.

barbazoo

9 months ago

You're right, most places I've worked this applies only to a subset of branches, usually main/master and sometimes other branches considered "stable" such as for the staging env.

menaerus

9 months ago

We are talking about the commit, or series of commits thereof, during the development of code in feature/bug-fix branch and not about the commit that you push as a post-fix because one of your previous commits broke something. That was not the discussion as far as my understanding goes.

masklinn

9 months ago

I’ve been using autofixup for this and it’s been ok but not great, it can be quite slow as things grown, and it doesn’t say anything when there was no match so it’s easy to miss. How does absorb surface that?

> Perhaps partially due to how GitHub works.

That’s definitely a major factor, I’d like to use stacked PRs they sound really neat, but GitHub.

Also even with stacked PRs I figure sometimes you’re at the top of the stack and you change things which affect multiple commits / prs anyway, surely in that case you need to fixup other commits than the ToS?

burntsushi

9 months ago

> I’ve been using autofixup for this and it’s been ok but not great, it can be quite slow as things grown, and it doesn’t say anything when there was no match so it’s easy to miss. How does absorb surface that?

I haven't used autofixup, but:

* git-absorb has always been pretty snappy. I don't think it scales with repository size.

* If there's no match, then the things that don't match stay in the staging area and don't make it into a commit. git-absorb will also note this in its output after running it.

Degorath

9 months ago

GitHub's quirks definitely make life much harder than it needs to be, but I've been using `git machete` for months now with great success in my team. The __one__ thing GitHub has that makes it all work is the fact that if you merge the parentmost branch, its immediate child will retarget its base branch.

I think if I had full "control" over my company's SCM workflows I would use a tool that considers a branch as a workspace and every commit in the branch becomes its own PR (personal preference, but in my experience it also motivates people to split changes more), but alas.

keybored

9 months ago

The term Stacked PRs already sounds like a term that was invented specifically in order to communicate in a GitHub-influenced context. Because Stacked PRs are just a reinvention of being able to review a commit at a time (the stack part is straightforward).

travisb

9 months ago

Stacked PRs are like being able to review a commit at a time, but add an additional layer of sequencing. It's most simply thought of as a patch series, where the evolution of each 'patch' is retained.

That additional layer allows finer grained history and to mostly avoid (unreviewed) rebasing. Many teams find those properties valuable.

keybored

9 months ago

In this day and age I don’t understand why we just can’t call tings commits instead of patches or changesets.

I guess another pet peeve for me.

https://news.ycombinator.com/item?id=41659650

masklinn

9 months ago

Stacked PRs is a way to surface the lifetime of the proposition as they get fixed or updated following reviews.

It has nothing to do with github.

keybored

9 months ago

It has nothing to do with GitHub in the sense that GitHub does not support it (I guess, I’m not up to touch). It does have something to do with GitHub in the sense that the name (PRs) and the benefits are framed from the standpoint of This is What GitHub Lacks.

Which is a GitHub-centric perspective.

https://news.ycombinator.com/item?id=41514663

Zitrax

9 months ago

I assume you refer to https://github.com/torbiak/git-autofixup. I have also used it, and its ok but not perfect.

krobelus

9 months ago

I use git autofixup; it was much better than git absorb last time I checked

> it doesn’t say anything when there was no match

that's what it should do

> it can be quite slow as things grown

How? All the slowness (on large repos) I've seen has been fixed.

masklinn

9 months ago

> that's what it should do

No it is not.

> How?

I don’t know, that’s just an observation from using it, semi regularly I autofixup changes and it takes a while to do anything.

krobelus

9 months ago

you're probably using an old version

dawg

9 months ago

Same workflow here, and it's become a breeze with autofix, rebase --update-refs, and a small command to push the whole stack. I'm using magit, so I directly see what could not be matched and remains staged.

JeremyNT

9 months ago

I am (what I assumed to be) an extensive user of fixup (usually invoked via interactive rebase). I'm intrigued by this but curious as to how it can really save so much time.

Are people fixup'ing a lot more than I do? I might do it once or twice per MR and it's never a large burden to fix the right commit.

If things get really out of hand such that the whole thing is a mess I just squash. Whatever history I had before is almost by definition gross and wrong in this scenario, and I don't mind losing it.

burntsushi

9 months ago

It's same kind of thing people tell me about ripgrep. "Why bother with ripgrep, grep has always been fast enough for me." That might well be true. Or maybe we value time differently. Or maybe none of your use cases involve searching more than 100K lines of code, in which case, the speed difference between GNU grep and ripgrep is likely imperceptible. Or maybe being faster unlocks different workflows. (The last one is my favorite. I think it's super common but little is written about it. Once something becomes fast enough, it often changes the way you interact with it in fundamental and powerful ways.)

Because I have git-absorb, I tend to be a bit more fearless when it comes to fixups. It works well enough that I can be pretty confident that it will take most of the tedium away. So in practice, maybe git-absorb means I wind up with fewer squashes and fewer "fix lint" commits because I don't want to deal with finding the right commit.

My use of git-absorb tends to be bursty. Usually when I'm trying to prepare a PR for review or something along those lines. It is especially useful when I have changes that I want to be fixed up into more than one distinct commit. git-absorb will just do it automatically for me. The manual approach is not just about finding the right commit. It also means I need to go through git-add in patch mode to select the right things to put into a fixup commit, and then select the other things for the other commits. So it actually winds up saving a fair bit of work in some cases.

Another example is renaming. Maybe PR review revealed the names of some functions were bad. Maybe I introduced two functions to be renamed in two distinct commits. I could do

first rename -> first commit -> second rename -> second commit

Or I could do:

both renames -> git add -p -> first commit -> second commit

Or I could do:

both renames -> git absorb

In practice, my workflow involves all three of these things. But that last git-absorb option eats into some of the first two options and makes it much nicer.

notlinus

9 months ago

Criticism isn't negativity. We're not Pollyannas here, we're adults who can handle critique.

burntsushi

9 months ago

Can you give me an example of criticism that is not negative? As far as I know, all forms of criticisms involve pointing out a flaw or fault. There's constructive criticism, but it's still fundamentally negative.

Either way, feel free to replace the word "negative" with "criticism" in my comment if you want. It expresses the same thing I intended to express: I disagree with the criticism.

If we're not Pollyannas and we're all "adults who can handle criticism," then you should also be able to handle criticism of criticism. It goes both ways.

beart

9 months ago

I've been thinking about your comment that all criticism is negative and I'm not sure where to go with that in my mind. I think it very much depends on your definitions of "criticism" and "negative". For example, a movie critic could praise a movie, 100% score, no faults. That's still a criticism by many definitions.

But the interesting thing about this is "the eye of the beholder". I suppose to some folks, all criticism does seem to be negative... which is why code reviews can turn into nightmares... and helps explain why I recently received a very angry phone call from another engineer because of some changes that were made to "his code".

For the purposes of this discussion, the oddity seems to be the folks jumping to defend the software they had no part in creating. Why does the criticism result in negative feelings for them? I can understand the author taking issue with it, but someone else's criticism should not impact another's ability to utilize the software for whatever purposes.

burntsushi

9 months ago

I don't want to play a definitions game and dissect word meanings here. I think it's usually a waste of time. That's why I implored folks to just accept a revision in wording in my previous comment. What I intended to convey was this:

1. There were many comments providing criticism that I disagreed with.

2. I felt that the vibe given by the comments at the time was not commensurate with how much utility the tool brought me in my own workflow.

So it seemed useful to point this out. That is, that there was a gap in the commentary that I thought I could fill.

Obviously I won't be using this phrasing again because it spawned off a huge meta sub-thread that I think is generally a huge waste of time and space.

> For the purposes of this discussion, the oddity seems to be the folks jumping to defend the software they had no part in creating. Why does the criticism result in negative feelings for them? I can understand the author taking issue with it, but someone else's criticism should not impact another's ability to utilize the software for whatever purposes.

Again, it cuts both ways. Why isn't it odd for folks to jump in and express criticism of software they had no part in creating? If folks can express negative criticism, then why can't I express positive feedback?

I didn't say people shouldn't make those comments. I said it was unwarranted. Or in other words, that it was undeserved and that I disagreed with it. Or that some balance was appropriate. This has literally nothing to do with my ability to utilize the software. I don't know why you're going down that road.

j_maffe

9 months ago

Negative comments are not always a product of negativity. Sometimes it's positive feedback to improve something that has potential.

burntsushi

9 months ago

I think that can be true and my top-level comment can be true simultaneously. I've also clarified what I meant at this point.

cryptonector

9 months ago

I think GP is saying that you didn't need to focus on the negativity (which is in itself negativity), just say the substantive thing that you wanted to say without editorializing about negativity. Your complaint (negativity) about negativity might be over-done, and anyways not useful. We can all shake our own heads at others' possibly-unnecessary negativity without having to be calling it out all the time. Meta arguments are not always useful.

Besides, consider this negative comment:

  https://news.ycombinator.com/item?id=41653797

is it one of the negative comments you didn't like? It seems like a possibly-useful negative comment -- "it didn't work for me because ..." is useful unless it was really a matter of undiagnosed PEBKAC. Would you rather than comment not have been posted? Surely not.

burntsushi

9 months ago

I think I've clarified my position in other follow-up comments sufficiently that everything you said here has already been addressed.

I even already said I would use different wording next time.

And I never said folks shouldn't make those comments. I clarified that too.

Y_Y

9 months ago

I'd be in favour of auto-stickying this. I see a lot of e-ink spilled over arguments that boil down to whether or not it's ok to comment about not liking some aspect of the subject under discussion. There are good reasons not to criticize in some situations, but I don't think they apply here. Either way the arguments are tiresome. We should agree to ban criticism, or agree not to argue about it (barring special circumstances).

tpoacher

9 months ago

I had to look up the reference, and based on the wikipedia plot summary at least, I admit I don't quite get the relevance. I expected a plot where someone handles criticism quite badly and suffers as a result, but in fact the plot was actually about someone who handled criticism very well instead, and improved the lives of others as a result?

So now I'm curious! In what way does Pollyanna relate to adults who can't handle critique? Have I got the wrong Pollyanna by any chance? xD

talideon

9 months ago

A Pollyanna is somebody who's cheerful and optimistic to a fault, i.e., even when it's unjustified.

The plot summary of the book is likely not what you should be reading as it's become an idiom. Something like Wiktionary or another dictionary would be a better place to look it up.

In this case, it's not about being able to receive criticism, but about being reticent about _giving_ it.

Y_Y

9 months ago

> The plot summary of the book is likely not what you should be reading as it's become an idiom.

This is a good and perhaps under-appreciated point. When I first read the term "Polyanna" I made the same mistake as GP. I think if you read "The Prince" to find out what "Machiavellian" meant you'd be no better than when you started. Even terms like "Kafkaesque" have taken on lives of their own and are probably better not thought of as mere literary references.

ekidd

9 months ago

Machiavelli's "The Prince" will give you a decent understanding of what people usually mean by "Machiavellian". The book explains what methods would allow an absolute ruler to stay in control of state. It does not generally make moral judgments about those methods.

Machiavelli's "Discourses" is the one that will really confuse a reader looking to understand the colloquial meaning of "Machiavellian". In this book, Machiavelli lays out a vision of a healthy "republic" (or more precisely, res publica) which benefits the people who live in it. Among other things, Machiavelli argues that republics actually benefit from multiple competing factions, and from some kind of checks and balances. Apparently these ideas influenced several of the people who helped draft the Constitution of the United States.

Now why Machiavelli had two such different books on how governments worked is another interesting question...

Y_Y

9 months ago

> Machiavellian

> adjective

> uk /ˌmæk.i.əˈvel.i.ən/ us /ˌmæk.i.əˈvel.i.ən/

> using clever but often dishonest methods that deceive people so that you can win power or control

(from https://dictionary.cambridge.org/dictionary/english/machiave... )

Ymmv, but I think that's far from the point of the book, and isn't even the main topic. It's hard for me to imagine taking a person who'd never heard the term, letting them read the book, and then asking them to propose a definition, would produce anything like the above.

unkulunkulu

9 months ago

I’ve also thought about that based on the same info, i.e. not reading the source completely.

There is a notion of a “Pollyanna mode” in schematherapy. What it means is ignoring negative facts and challenges with an outwardly positive attitude and avoiding addressing the issues themselves.

This certainly can be harmful to oneself. Another harmful thing is hating and bashing oneself for mistakes and faults and I won’t make a comparative judgement, but a healthy way is supposed to be along the lines of speaking up openly about what bothers you and thinking what can be done about it if at all.

tpoacher

9 months ago

This makes sense. Thank you for your comment! Learnt something new today :)

user

9 months ago

[deleted]

AndrewHampton

9 months ago

FWIW, I've been using this alias for the past couple years for fixup commits, and I've been happy with it:

> gfx='git commit --fixup $(git log $(git merge-base main HEAD)..HEAD --oneline| fzf| cut -d" " -f1)'

It shows you the commits on the current branch and lets you select one via fzf. It then creates the fixup commit based on the commit you selected.

nickspain

9 months ago

Nice! I use git revise[^1] a lot which does a similar thing but without the fixup commit. I I’ll try using fzf to make it interactive though. Thanks!

[^1]: https://github.com/mystor/git-revise

kccqzy

9 months ago

If it only lets you select one, that's strictly less powerful. What if I want some parts of it into one commit and another parts into another? The `hg absorb` works for this case.

AndrewHampton

9 months ago

Yeah, it's definitely less powerful that what absorb is doing. I wasn't trying to argue that it was equivalent. I just wanted to share a bash one-liner that I've had success with in case others find it helpful.

> What if I want some parts of it into one commit and another parts into another?

Looks like absorb will automatically break out every hunk into a separate fixup commit. My one-liner will create 1 fixup commit for everything that's staged. That's typically what I need, but on the occasions it's not, I use `git add -p`, as kadoban mentioned, to stage exactly what I want for each commit.

mst

9 months ago

Oh, hrm, looking at this description and the one liner, I rather like.

Once you mentioned `git add -p` I realised that this is pretty much what I do already, except with a far more efficient way of selecting the relevant commit to do it to.

Muchas gracias.

AndrewHampton

9 months ago

Yeah, I use about a dozen git aliases in my normal workflow. In case it's helpful, here are the relevant ones for this flow:

  alias git_main_branch='git rev-parse --abbrev-ref origin/HEAD | cut -d/ -f2'
  alias gapa='git add --patch'
  alias grbm='git rebase -i --autosquash $(git_main_branch)'
  alias gfx='git commit --fixup $(git log $(git_main_branch)..HEAD --oneline| fzf| cut -d" " -f1)'

Another favorite is:

  alias gmru="git for-each-ref --sort=-committerdate --count=50 refs/heads/ --format='%(HEAD) %(refname:short) | %(committerdate:relative) | %(contents:subject)'| fzf | sed -e 's/^[^[[:alnum:]]]*[[:space:]]*//' | cut -d' ' -f1| xargs -I _ git checkout _"

gmru (git most recently used) will show you the branches you've been working on recently and let you use fzf to select one to check out.

atq2119

9 months ago

Then you use `git gui`, which is part of the git distribution itself, or `tig` if TUIs are your thing. I have a key binding for `git commit --squash=%(commit)` in my tig config, so I can interactively select lines or hunks to stage and then the target commit for the squash.

kadoban

9 months ago

That sounds like what `git add -p` is for, stage part of the current changes.

kccqzy

9 months ago

That still requires you to manually select hunks. The point of `hg absorb` is to automatically select hunks even if these hunks are to be squashed into different commits.

account42

9 months ago

"Automatic" sound like it would fail about as often as it would help.

k3vinw

9 months ago

Might be able to use the multimode flag in the fzf command above and it should let you select more than one using Tab and Shift+Tab.

account42

9 months ago

Couldn't you just use --patch with the alias to achieve that?

psadauskas

9 months ago

I have this one in mine: https://github.com/paul/dotfiles/blob/master/git/.gitconfig#...

    # make a fixup commit for the last time the file was modified
    cff   = "!f() { [ -n $@ ] && git add $@ && git commit --fixup $(git last-sha $@); }; f"
    # Get latest sha for file(s)
    last-sha = log -n1 --pretty=format:%h --grep 'fixup!' --invert-grep

Given a file like `git cff path/to/file.rb`, It'll find the last commit that touched that file, and make a fixup commit for it. Its great for the try-this-change-on-CI cycle: Change the Dockerfile, git cff Dockerfile, push, repeat, without it screwing up because you changed a different file while you were working on it.

mdaniel

9 months ago

It won't matter until it does, but $@ expands arguments into separate words but those expansions are themselves only word-preserved if the $@ is itself quoted. The [ will probably get away with what you want, but its use in $(git add) and $(git last-sha) almost certainly will not

  $ cat > tmp.sh <<'FOO'
  for a in $@;   do echo "a=$a"; done
  echo
  for b in "$@"; do echo "b=$b"; done
  echo
  for c in $*;   do echo "c=$c"; done
  echo
  for d in "$*"; do echo "d=$d"; done
  echo
  FOO

  $ bash tmp.sh 'alpha beta' $'charlie\ndelta'
  a=alpha
  a=beta
  a=charlie
  a=delta

  b=alpha beta
  b=charlie
  delta

  c=alpha
  c=beta
  c=charlie
  c=delta

  d=alpha beta charlie
  delta

psadauskas

9 months ago

Yeah, you're probably right. I guess I haven't run it on any files with spaces in the 6 years since I added it to my dotfiles.

jonS90

9 months ago

Funny that I've been doing something nearly identical, but with way more boilerplate.

    fzfCommit() {
      local FZF_PROMPT="${FZF_PROMPT:=Commit: }"
      git log --oneline | fzf --border --prompt="$FZF_PROMPT" --height=10 --preview="git show {+1} --color=always" --no-sort --reverse | cut -d' ' -f1 | tr '\n' ' ' | sed 's/[[:space:]]$//';
    }
    function gfixup {
      local commit=$(FZF_PROMPT='Fixup Commit: ' fzfCommit)
      if [[ -z "$commit" ]]; then
        return 1
      fi
      set -x
      git commit --fixup "$commit" --allow-empty > /dev/null || return 1
      git rebase --interactive "$commit"~ --autosquash || return 1
    }

user

9 months ago

[deleted]

johnnypangs

9 months ago

I’ve been using this:

alias gfixup="git commit -v --fixup HEAD && GIT_SEQUENCE_EDITOR=touch git rebase -i --stat --autosquash --autostash HEAD~2"

From what I understand it does the same thing as this crate for the most part. All I do after is:

git push —force-with-lease

Not sure what you get from the crate otherwise

masklinn

9 months ago

Your alias seems like a completely unecessary complexity. If you want to meld new changes into your branch head you can just alias “git commit --amend”, you don’t need that mess.

Absorb will find the commits to fix up for each change in the working copy, it doesn’t just merge everything into the head.

johnnypangs

9 months ago

I see, the reason it’s that long complicated alias was that I didn’t want to open up the editor to change the commit every time I updated. “git commit —amend” does that.

I read the rough how it works and it now makes sense. I might give it a try. Thanks!

johnnypangs

9 months ago

Seems like you can add —no-edit and get the same behavior, now I can delete that alias. Thanks again :)

(Edit: typo)

masklinn

9 months ago

That is correct, and there is a `--edit` to revert that, so my personal alias is to `git ci --amend --no-edit` such that by default it just merges the staging into the HEAD, and then tacking on `-e` will open the commit message in an editor to expand it.

cryptonector

9 months ago

You can also set `EDITOR=true` for that `git commit --amend` if you forget about `--no-edit`.

user

9 months ago

[deleted]

johnnypangs

9 months ago

I guess the crate version is easier to soft reset?

emersion

9 months ago

With a shell such as fish, one can "git commit --fixup <tab>" and a list of commits will be displayed.

mckn1ght

9 months ago

sounds like how magit lets you create fixup commits in emacs

AndrewHampton

9 months ago

This was 100% my inspiration. I used emacs+magit for years. After switching away from emacs for dev work, I still cracked it open for git interactions for another year or so. Eventually, I moved entirely to the shell with a bunch of aliases to replicate my magit workflow.

masklinn

9 months ago

Worse in fact, since magit lets you fixup, squash, or instafix.

dangitman

9 months ago

[dead]

imiric

9 months ago

I tried using this tool after seeing recommendations for it, but IME it got the parent commit wrong enough times that the work to undo the damage was more than if I had looked up the commit myself and used `--fixup` instead. So I moved back to this manual workflow pretty quickly.

I prefer having full control over my commit history, and this tool is too much magic for my taste. I'm sure that it could be improved so that mistakes are rare, but I'm not sure I would trust it enough to not have to review its work anyway.

3eb7988a1663

9 months ago

I use lazygit for this. Keyboard driven TUI which lets you easily re-order/squash commits with minimum fuss. Being able to see them all laid out means no futzing with identifying the right ids.

imiric

9 months ago

I rarely refer to commits by their IDs in these cases, though. More often than not I use the relative `HEAD~n`, when it's easy to determine `n` visually. With a few aliases, the manual fixup workflow really doesn't take much effort.

arcanemachiner

9 months ago

I love lazygit so much. I use it all the time, and it's constantly improving my workflow.

insane_dreamer

9 months ago

lazygit also my go-to for this and git in general

montag

9 months ago

This sounds just right.

jesseduffield

9 months ago

Lazygit has its own built-in approach to this problem which is much more strict than what git-absorb does (it explicitly asks for confirmation if there's any ambiguity). There's an extensive writeup about it here: https://github.com/jesseduffield/lazygit/blob/master/docs/de...

globular-toast

9 months ago

The more worrying thing for me is how would you know whether it got it right or wrong? I suspect if you're using it the point is you're not going to go and inspect each commit manually. IMO something that looks right is far more dangerous than something that's merely wrong.

adastra22

9 months ago

If it just automatically created the fix up commit and moved it to the proper place in the commit tree, I’d be happy with that. You can then run a one-liner to merge the fix up commits after reviewing.

BeeOnRope

9 months ago

That's exactly what it does by default.

Only if you pass --and-rebase does it actually do the autosquash rebase for you.

dawg

9 months ago

Give git-autofixup a try, never had any incorrect fixup targets with it's hunk-based algorithm.

bradley13

9 months ago

Maybe I am being to much of a purist, but retroactively modifying commits and history? Why? Stuff happens, so do mistakes. Fix the mistakes, make another commit, and go on with your life.

strunz

9 months ago

Commits should be a contained change that can be understood as a logical piece of history, and reverted if necessary. If you have to look at multiple commits for 1 logical change to the code, it's much more difficult to figured out what the intention was, if it was correct, and how it can be reverted.

Terr_

9 months ago

I'd also point out that it's bad to have commits in the main branch where compilation or unit tests would fail, even when each bad-commit always arrives with an immediate "oops, fixed" commit trailing it.

It messes up the team's ability to use tools like git-bisect to pinpoint behavior changes, and is generally more noise than signal.

At a bare minimum, those spans should get squashed before they leave the PR or feature branch.

notlinus

9 months ago

Do you always make all of your logical code changes in single, atomic commits? What if you have to modify a feature later on? I'm sorry, but this is ridiculous. There's nothing wrong with having multiple commits in a row modifying something. That's how git works. GitHub's PR merge workflow really messed up the meta game, I tell you what...

trashburger

9 months ago

Yes, always. Of course I make mistakes, but the idea is that any patch I put up for review, if it cleanly applies, should not break any user and any tests. It's much more liberating knowing that bisects actually work and each commit does exactly what it says on the tin (no followup "fix"/"minor" commits that actually do something completely different). Incremental improvements are compatible with this approach. Also see https://gist.github.com/thoughtpolice/9c45287550a56b2047c631... .

brigandish

9 months ago

I don't think you're arguing against the point being made.

Logical change 1, implement Hello World:

    puts "Hello, World!"

Logical change 2, make it a method:

    def hello_world
      puts "Hello, World"
    end

Logical change 3, take an argument for the greeting:

    def hello_world(greeting)
      puts "#{greeting}, World"
    end

Logical change 4, take an argument for the recipient:

    def hello_world(greeting, recipient)
      puts "#{greeting}, #{recipient}"
    end

Logical change 5, improve the method name by showing the intent:

    def greet_recipient(greeting, recipient)
      puts "#{greeting}, #{recipient}"
    end

Logical change 6, defaults…:

    def greet_recipient(greeting="Hello", recipient)
      puts "#{greeting}, #{recipient}"
    end

And so on. These could each be their own feature or part of a feature etc. What they are, though, is logically atomic. If you want to add a default recipient later, or next, that is its own logical commit. If you roll it back, each rollback will lead you to a logical difference, not some typo fix, e.g.

    def greet_recipient(greeting="Hell", recipient)
      puts "#{greeting}, #{recipient}"
    end

plus the missing "o"

    def greet_recipient(greeting="Hello", recipient)
      puts "#{greeting}, #{recipient}"
    end

Are one logical commit, for which I'd use a fixup or `git commit --amend` or whatever. What one considers logically atomic may differ from person to person, language to language, feature to feature… but they can be differentiated from things like typos quite easily.

Personally, I make numerous commits in a feature branch, (transparently, too, my typos included - I'm not proud:) and before requesting review I'll clean up using rebase into as few logical commits as possible, and upon acceptance either squash it all to one or leave it as is, depending on the team's culture/needs.

Terr_

9 months ago

If those are all different commits, now imagine the fun of interactively rebasing them on top of a main branch where some other thing changed "Hello, World" to "Greetings dear Globe."

brigandish

9 months ago

Why is someone else writing my feature? Do we not have a ticket system? ;-)

More importantly, when using a feature branch one should really rebase master into the feature branch regularly, and any other branches that might touch the same places, it takes care of little surprises like this.

Terr_

9 months ago

To clarify, I meant a scenario where that very simplest version is already upstream, and your feature-branch contains 5-6 commits which are all different refactoring-steps or making the string literals override-able or abstracted away.

When you need to rebase that onto main/master, there would be 5-6 stops to fix conflicts in a 3-way diff, and I don't think the kind of work in that example is compelling enough to justify that as opposed to having one (squashed) commit that simply states that you improved the method in several ways.

P.S.: Sometimes more commits can make refactoring easier instead, often when it comes to splitting file rename/move operations from changing their content.

brigandish

9 months ago

I'm not against squashing, especially for merging into master (although, again, that should also constitute a logical whole), but doesn't rebasing a feature branch onto master first take care of this?

recursive

9 months ago

> Do we not have a ticket system?

Maybe we do, maybe we don't. Maybe there are two tickets with a mutual dependence.

Even without that, if you add a parameter to a method, and later someone uses it in a totally different PR, then you effectively can't revert the creation of the parameter. How are you going to track that dependency. I've heard the arguments about logical commits and the ability to cleanly revert them. I just haven't seen any code where that would actually work, even if the trouble was taking artisanally curate the commit history.

brigandish

9 months ago

> Maybe we do, maybe we don't.

If I'm in a workplace like that then I wouldn't be surprised to find a poor git flow.

> How are you going to track that dependency.

The commit graph (which is linear due to rebasing). The extra parameter commit comes before the commits using it. If you want to remove that parameter then you have to revert the features that rely on it or refactor them, there's no magical way to keep dependent code if you want to revert the code it depends on. Having logically atomic commits makes this easier, not more difficult.

> Maybe there are two tickets with a mutual dependence.

Then rebase on each other's work until its ready for review (or, even better, don't work on dependent features in the same sprint). Again, logical, atomic commits would make that easier.

The real point is, what do you do when someone introduces a parameter in two functions, one is related to your work but the other isn't, and they bundled both changes in one commit, and the changes aren't even related. Now that's a mess.

recursive

9 months ago

If the second parameter isn't supposed to be there I'd just make another PR removing it.

If we use commit ancestry to determine logical dependence then this "revert the bug commit" thing won't work unless the bug just so happens to be in the very last commit.

brigandish

9 months ago

You've created a straw man. No one can make code magically not depend on other code if it does - it's a tautology. No one can argue for better organisation if it's not wanted - feel free to not use a ticket system, don't communicate work, don't rebase work that is interconnected, and don't use logical, atomic commits. I'm not sure that Git is going to be a help given those requirements.

As I've already pointed out, I'm not sure what you're arguing against but it's none of the points made in this thread.

account42

9 months ago

`git rerere` is your friend.

account42

9 months ago

GitHub's PR flow is what brought this mess of fixup commits. In the previously common mailing list flow the normal practice is to amend your patches based on feedback. Translated to a PR flow this means rewriting the branch.

vasergen

9 months ago

I'd say pull request should be reverted if necessary, but an idea that each commit should be revertable and expectation that project should work after it is unneeded complexity for me.

ozim

9 months ago

I am not history purist in that way. I don’t care about fixes for mistakes, if they are in your local branch do the cleanup before publish and if it is your own published branch force push is ok.

Unless of course your changes were merged to common code already - then they become not touchable - just make a new commit with fix don’t try to be smart, common code rolls only forward and fixes are new commits.

gorset

9 months ago

The use case is when you look at a branch as a series of patches.

Reviewing a clean set of commits is much easier than a branch full of mistakes and wrong paths taken.

Useful when we optimize for reviewing and good history for future maintenance. This has been important and useful when I’ve worked on big mission critical backend system, but I also understand it might not be the most important factor for a new project’s success.

keybored

9 months ago

A branch with some base is already a series of commits. I don’t get where the conceptual re-imagining is here.

gorset

9 months ago

Patch series comes from the linux kernel workflow, which git was developed to support. https://kernelnewbies.org/PatchSeries

In this workflow you review every commit and not just the branch diff. Each commit is crafted carefully, and a well crafter series of commits can make even very large changes a brief to review.

It takes a certain skill to do this well. As the page above says > Crafting patches is one of the core activities in contributing code to the kernel, it takes practice and thought.

This is in contrast to using git more as a distributed filesystem where you don't care particularly much about the history, and you typically squash commits before merging etc. It's simpler and easier to work this way, but you lose some of the nice attributes of the linux kernel workflow.

keybored

9 months ago

That’s a nice summary.

What I don’t like about the Git documentation as I’ve read it is that they go between “patch” and “commit” in some places without stopping and explaining what the difference is. It makes sense to them. It’s obvious. But it isn’t necessarily obvious to most people.

A patch is a patch proper plus a commit message encoded in a format that git am understands. That’s fine. And the core developers understand that you cannot transmit a commit snapshot via email (or you shouldn’t). But I prefer to mostly stick to “commit” in the abstract sense, whether that to-be-commit is from a pull or from an email (or: it’s in the form of an email and it could be applied as a commit).

git rebase talks about “patch series” I think. Without explaining it. Why not “commit series”?

Sometimes it seems like talking about your changes by the way it happened to be transmitted. It’s like talking about “attachments” instead of commits because you happened to send them via email as an attachment (instead of inline).

Then you now have “stacked diffs” or “stacked commits”. Which are just a series of commits. Or a branch of commits (implicitly grounded by a base commit). For a while I was wondering what stacked diffs/stacked PRs/stacked patches and if I was missing out. When it just turned out to be, as you explain, essentially the Linux Kernel style of being able to review a commit in isolation. But in a sort of context that pull request inhabitants can understand.

I prefer to mostly talk about these things as “commits”.

(At several times writing those paragraphs above I wondered if I would be able to string together them in a coherent way)

travisb

9 months ago

I think part of the confusion is because 'patch' and 'commit' (really snapshot) are duals of each other, but in practice have important technical differences. When speaking abstractly about 'changes' it often doesn't much matter which term is used, but most interactions are with 'commits' so that tends to be the default term to use.

However, sometimes the details matter. For example, a 'patch' (diff + description) tends to be small enough to transfer conveniently, and human friendly. Patches do not describe their relationship to other patches, so it makes sense to talk about a series of patches which must be applied in sequence to accomplish some larger goal. 'patch' doesn't imply any particular storage format, so sometimes saying 'attachment' or 'patch file' or 'email body' is an important distinction. Git documentation assumes you know what you want. It might help to think of patches as "outside time", though of course any particular version of a patch will only apply to a subset of all snapshots.

A 'commit' (snapshot + description), on the other hand, tends to be large in that it describes the entire state of the codebase and contains (implicit or explicit) metadata describing its predecessor(s) -- possibly a large graph of them. People talk about a 'commit series' all the time, but use the terms 'history' and 'branch' instead.

Stacked PRs are built on top of commits and branches to gain the advantages of a patch series while retaining the advantages of snapshots. There's no industry consensus on if they are preferable, let alone how to best implement them (eg. rebase, rebase+squash, merge-squash, merge+first-parent, etc.), so different people have different ideas about what they look like. It isn't correct to say they are just a series of commits, because sometimes they are implemented as a series of (implicit or anonymous) branches. One of the few agreed-upon features of stacked PRs (or its other names) is that it is a sequence of 'changes' which are ordered to both satisfy change dependencies and broken into smaller pieces which "tells a story" to reviewers.

keybored

9 months ago

It certainly matters when you e.g. find a patch in the wild and have no idea what commit it was originally based on. Metadata fiddling becomes more important in email workflows.

account42

9 months ago

Conceptually, a branch is a series of tree states. The patches are derived from that.

keybored

9 months ago

Conceptually a branch is the last snapshot in a series of snapshots.

dclowd9901

9 months ago

So this wouldn’t work very well in workflows that flatten merges to a trunk?

travisjungroth

9 months ago

It would mostly be unnecessary. The separate commits won’t matter after the PR if they’re getting squashed.

Debatably, if you’re making changes during a PR review, it could be helpful to make those changes in relevant commits. That way if someone goes through them during the PR, they get one clean “story”, rather than see the pre-PR commits and the conversation after.

chippiewill

9 months ago

Tools where they allow you to squash merge tend to be fairly incompatible with stack diff workflows generally.

It's tricky to review individual commits on Github and Gitlab and they don't even allow you to leave comments on the commit messages.

In areas where people do review individual commits they tend to use tools that support that workflow. Git uses email patches where each commit is sent as a separate email. Tools like Gerrit do code review directly on a commit by commit basis and the usual strategy to get it into trunk is to either "fast forward" (advance the HEAD of the trunk branch to that commit) or to cherry-pick it.

mst

9 months ago

I like for non-trivial stuff to have a branch with a series of logical commits, cleanly rebased atop main, then use -no-ff to force a merge commit anyway. That way you can the whole branch appears as a single commit/diff atop main in primary history but you can dig in to the original components of it if/when that's useful.

The primary obstacle to doing this for me is, if I'm honest, not having automated it sufficiently that I can't forget to do that.

chippiewill

9 months ago

Amusingly it actually makes you less of a "purist" to not modify commits. Git was created by Linus Torvalds to support Linux Kernel development

The Linux Kernel is heavily built around reviewing each commit individually via patchsets sent as emails and rewriting history to make it cleaner.

Git history is incredibly valuable if its curated, at my previous employer we used Gerrit and pretty much every change would have a well crafted multi-line commit message that explained what was being changed and most importantly _why_. If I wanted to understand why a bit of the codebase had been authored in a particular way it was incredibly easy because I could just git blame the line and there'd be a detailed commit message explaining the rationale.

In the Pull Request world I do agree with what you say - but only if you squash merge. Commits that say "fix typo" or "oops" or "wip" aren't useful long term.

Izkata

9 months ago

For me it's not purism, but practicality: The top comment right now mentions using this to apply linting changes to the original commit that introduced the linting violation, but I've seen linting commits introduce bugs often enough that I think those should always be a new separate commit, so a future maintainer can easily see why that kind of bug was introduced and what the fix is.

theptip

9 months ago

It sounds like you have never heard of the Stacked Diffs workflow.

Here is an overview: https://newsletter.pragmaticengineer.com/p/stacked-diffs

It’s a perfectly reasonable preference to eschew this workflow, but definitely worth understanding the pros and cons.

dpkirchner

9 months ago

Every explanation of stacked diffs I've seen (including this one) makes me think there is some secret magic not being shared that will make the whole idea "click".

If the problem is devs having to wait too long for PRs to be reviewed, breaking tasks up in to smaller diffs should exacerbate the problem, wouldn't it?

If diff 2..n follow diff 1 then a review of 1 blocks everything else you're doing, eh? At least the traditional route, branching everything straight off of main, means you can merge 2..n and rebase 1 once it passes, leaving you less blocked.

There must be something I'm missing because there are a lot of people promoting stacking diffs.

travisb

9 months ago

In my experience with stacked diffs the smaller diffs doesn't exacerbate the problem because they are pipelined.

When getting merge approval takes at least one or two business days (say because of time zones or your colleagues are doing deep work which they don't want to drop every fifteen minutes to do a review) the pipelining is necessary to be unblocked.

johnmaguire

9 months ago

> If the problem is devs having to wait too long for PRs to be reviewed, breaking tasks up in to smaller diffs should exacerbate the problem, wouldn't it?

I find small diffs to be much easier to review and as a result I review them much practically as soon as they are posted. And I'm less likely to miss something.

dpkirchner

9 months ago

For sure, I agree. However there is also a context switching cost and IMO switching between your IC work and, say, a dozen small diffs is rougher than a couple larger diffs. That opinion said I haven't experienced fast reviews myself, so maybe that's what I'm missing.

theptip

9 months ago

In my experience five 100-line diffs get reviewed much faster than one 500-line diff.

Especially when you follow best practices like splitting refactors from functional work; the canonical stacked diff for me is “refactor to make the upcoming change easy; add some tests for the existing functionality I want to protect; small diff to actually make the change”.

1 & 2 don’t generally require caffeine to review. 3 probably requires engaging the brain. If you put everything together you need to be on high alert for the whole review.

steveklabnik

9 months ago

I recently mentally came around to stacked PRs, though I currently work mostly solo, and so don't use them. But I'm happy to try and talk with you about it. Maybe because I learned so recently it might help.

> If diff 2..n follow diff 1 then a review of 1 blocks everything else you're doing, eh? At least the traditional route, branching everything straight off of main, means you can merge 2..n and rebase 1 once it passes, leaving you less blocked.

I am a little confused about the scenario you're describing here: with the PR based flow, you can't merge the second half of the PR while waiting on some change to the first commit.

So let me try to suggest a scenario. We're working on a web app. We're going to add a new feature. This requires some backend changes, and then some front end changes that use those back end changes. For the immediate purposes here, we're going to presume these are in the same PR. I'll talk about them as two PRs in a moment. We're also going to assume for the sake of simplicity that there's two commits in this PR: one for backend, one for frontend.

With the PR workflow, you're asking for review from two people: a backend expert, and a frontend expert. Both are going to request changes, and you're gonna respond to all of them, and eventually things get to a good place and the PR is merged. Let's say that process takes a week.

There's a few possible issues here: the first is, both people are going to get notifications for every change in the PR, even if it's not in the code that they need to review. This creates additional review fatigue for both reviewers. They also may need to check what's happened since the last time they reviewed things, and github doesn't always make that easy, though they do help sometimes in some circumstances. Another issue is, say that the backend changes are good to go, but there's more frontend work to do: because it's in the same PR, the backend change will not be integrated until the frontend change is done. And the backend reviewer will still be getting those pings. Notably, if the frontend change is ready first, they'll also be getting pings from more backend reviews too.

In the stacked diff flow, and using those tools, each reviewer only needs to review their half individually. They will not get pinged for changes to the other half. They will be presented with only the diffs that affect the parts that they need to review. And, because you can land each diff in the stack whenever it's at the top (or bottom, depending on how you visualize things, I guess) of the stack, as soon as the backend commit is good to go, it can land, and the frontend commit can land at a later time. We've popped it off the stack. And no matter who finishes first, they won't get review pings for the other part's work.

So, you might say: yeah, that's exactly why I'd make them as two separate PRs. And that may be a good idea. But to do that, you have two choices: wait until the backend stuff is fully done, and then start on the frontend stuff. That can work, sure, but maybe what you want to do needs the frontend work to influence the backend work, so treating it as one logical change before any of it hits master makes sense. So to do that, you make a frontend PR that, instead of branching off of master, branches off of the backend PR. Well, now you've reinvented stacked diffs, but you don't have any of the tooling to make that nice! Once the backend lands, you'll have to rebase the frontend off of master, stuff like that.

... does any of that make sense?

dpkirchner

9 months ago

> I am a little confused about the scenario you're describing here: with the PR based flow, you can't merge the second half of the PR while waiting on some change to the first commit.

As I understand it, based on what's in the article anyway, diff 2 is started from diff 1 even when they are totally different tasks, so you'll be blocked in the sense that you'll have to rebase or merge if the review finds a problem.

I figure you might as well branch 2 off of main while you wait for the go-ahead to merge 1 into main. Stacked diffs don't seem to meaningfully change this flow.

> In the stacked diff flow, and using those tools, each reviewer only needs to review their half individually. They will not get pinged for changes to the other half. They will be presented with only the diffs that affect the parts that they need to review.

I guess part of it is I have never worked in an environment where the frontend and backend work didn't need to be closely coordinated anyway, so those pings aren't a big deal. Everyone needs to know what's going on, to minimize rework.

> ... does any of that make sense?

I dunno. Maybe my brain isn't compatible, heh.

steveklabnik

9 months ago

> I figure you might as well branch 2 off of main while you wait for the go-ahead to merge 1 into main.

Ah, so sure, if you're talking about two separate pull requests, then that makes sense. But the equivalent with stacked diffs just simply wouldn't be stacked. These discussions are inherently on work that is dependent on previous work.

> so those pings aren't a big deal.

Yeah I wasn't sure if my specific suggestion would be too specific. Examples are hard.

> Maybe my brain isn't compatible, heh.

It's all good, I felt the same way for a long time. Maybe if you are bored and play around with Gerrit, (which is admittedly a bit janky) it'll make more sense. Or maybe not :)

IshKebab

9 months ago

The point of history is so that other people (or your future self) can look back at the history of the code and understand how it changed and why changes were made.

There's zero value in retaining mistakes that were never merged into `master`. In fact there is negative value because it makes the history harder to follow.

For example would you rather see "review fixes" in Git blame, or the actual useful commit message?

You can argue that you can't be bothered to make your history nice; fine. But you can't say it's wrong to do that.

(Personally I think you should squash commits for a MR; if it's too big to review as one commit it's too big for one MR - except for branches that multiple people have worked on over an extended period.)

Terr_

9 months ago

> There's zero value in retaining mistakes that were never merged into `master`.

Not to be too pedantic about this, it sounds like you mean flawed commits that were merged into master, but never had a chance to be the HEAD of master? (In other words, they always arrived along with a fix-commit too, so that nobody checking out code could've hit the bug.)

> In fact there is negative value because it makes the history harder to follow.

It also hampers your ability to use tools like git-bisect to detect when behavior changed, especially if some of the bad commits won't compile or pass unit tests.

IshKebab

9 months ago

> flawed commits that were merged into master, but never had a chance to be the HEAD of master

Yeah exactly.

globular-toast

9 months ago

Why store git history at all? It's useless if you don't take care of it. Have you ever used git history for anything? People use it to find the source of regressions (you can do it quite quickly using git bisect).

keybored

9 months ago

Happens on all these threads.

- Thread: This helps you make a useful Git history

- Counter-point: Why are you using a version control system to make a useful history? Why does the history matter? I have been developing software for eighty years and neither I nor anyone I’ve met have cared.

froh

9 months ago

the git history of the linux kernel documents the Linux kernel detailed Design, the reasoning how things fit together.

I've used it several times to understand why things are done the way they are.

The discussion leading to this design is encoded in the revisions of the patch series, browsable on patchworks.

I agree to your point that the git history has to be taken care of to be useful.

globular-toast

9 months ago

My question was, of course, rhetorical. This and hunting regressions are two great reasons to keep history.

"What did John do on Tuesday morning" is not a good reason to keep history. Neither is "This change was made as part of a sprawling "fix" commit that happened on Thursday afternoon".

The important point is if you are going to keep history at all it needs to be kept in a state that is actually useful, meaning useful for the first two things, not the last two. Otherwise it's just a waste of disk space. The git history is meticulously maintained which is what makes it actually useful.

jeltz

9 months ago

I use the git history all the time. For example the git history of PostgreSQL is amazing but I have also worked with proprietary software with good git history.

And one key to good git history is people splitting up big PRs in multiple commits.

rom1v

9 months ago

https://www.mail-archive.com/dri-devel@lists.sourceforge.net...

nextaccountic

9 months ago

If it is unreleased it may make the life of reviewers easier. If they look at the commits at all, that is.

But if not even reviewers are looking at commits I question whether a PR should be chunked in commits at all - why not squash all commits into a single one, so that every PR is composed of exactly one commit? Could maybe separating the changes in commits convey some information?

toastal

9 months ago

Normally creating a new feature involves independent actions like updating dependency, refactoring some module beforehand, & so on. If these things can be broken up in a way that make review & reading logs easier, why would you squash the whole history—especially if that history is already good. In as much as it would be ridiculous to go to the other extreme of committing ever character/line as a separate commit, just collapsing the whole history isn’t a good approach. There isn’t a hardened rule about where the line is but that is okay & up to developer discretion on how they want to ‘tell the story’ to a reviewer (or their future self) about how certain chunks should be read together.

cryptonector

9 months ago

If you haven't pushed then you want to eventually push clean history. That's what rebasing is about.

eviks

9 months ago

Because it's easier to understand later on when it's not spread out

baq

9 months ago

the history you're merging to the main branch is much more useful clean than accurate.

metadat

9 months ago

I haven't used `git --fixup' or `git rebase --autosquash' before, but they sound pretty handy.

https://jordanelver.co.uk/blog/2020/06/04/fixing-commits-wit...

git-absorb appears to take it one level further. However from the README I'm not clear on exactly what it will do in specific situations:

Does git-absorb automatically associate the most recent commit unique to a branch for a given file and apply the diff to said commit? Or what is the precise workflow and outcome in terms of which edits go into which commits?

tcoff91

9 months ago

Just use magit and easily make fixup! commits with like 3 key presses. Even if you don't use emacs keeping it around just to use magit is worth it.

Edamagit for vscode users is not as good but it does this particular workflow great.

psanford

9 months ago

I use magit. I still use git absorb. The issue for me is not the speed at which I can run the interactive rebase, it is the amount of looking I have to do to find the correct commit to fixup onto in a big stack of commits. git absorb figures that out for me.

accelbred

9 months ago

magit supports git-absorb out of the box if its installed; see the magit-commit-absorb command. I find it quite useful.

bananapub

9 months ago

magit is fantastic but this isn't at all the same thing.

fixup is "stage changes, users selects past commit to fixup, commit staged changes"

absorb is "stage changes, git-absorb figures out which past commit to fixup, commit staged changes"

parasti

9 months ago

TIL about `git commit --fixup` and `git rebase --autosquash`.

Interactive git rebase is by far my favorite Git tool to use, it scratches a particular itch to create perfect logically atomic commits.

That said, sometimes this kind of history editing tends to backfire spectacularly because these crafted perfect commits have actually never been compiled and tested.

mst

9 months ago

I tend to go back through and build+test each of my newly crafted commits locally (and for preference have CI set up such that it'll go "hey, haven't seen these commits" and do it again itself).

Some people find doing so this boring and annoying. For whatever reason my brain finds it zen and satisfying.

ssijak

9 months ago

Do people actually check commit history in detail so often that they absolutely find so much value in ultra clean commit history? I never understood that obsession with 100% clean history.

mst

9 months ago

When I'm hunting a mistake (even/especially my own) finding the commit that introduced the bad line via blame and then looking at that specific diff is extremely handy. Also being able to bounce back and forth between the first broken one and the one before running tests while I work out exactly what I messed up and how.

I wouldn't say I put the effort in to get to 100% clean but something like 95% clean makes future me significantly less enraged at past me's mistakes.

asqueella

9 months ago

You don’t even have to do archaeology to benefit from clean commits, it helps tremendously during review too. How do you review nontrivial changes without splitting them in digestible chunks?

freedomben

9 months ago

Yes, absolutely. Writing out a change log when you have clean commit history is a trivial task. Doing so when it's not clean, can be arduous and take hours, and then still end up being inaccurate.

I also frequently try to find when some bug or change was introduced in the get history, so that I can understand why it was done through the context. Context. When it is just thrown in some random commit that doesn't even have much of a commit message, it is utterly useless. When it is part of a commit that is atomic and targeted at one thing, it is trivial to see why it was done. Even when the comment isn't super helpful (which is common despite intentions, because predicting the future and what will be useful in the future is very difficult) it's still valuable because you can see the code in its full context and it's often clear what it's doing.

So yes, I am a big believer in clean commit history with atomic, single commits that represent the task someone was doing. For example, If it was a bug fix, then the bug fix will be in its own commit with a commit message describing the bug that is being fixed.

borrow

9 months ago

It's self-reinforcing. If commit history is reasonably clean, the barrier to doing code archeology is lower, so you reach for it more often. And if you do code archeology often, you develop a better sense of what's a clean commit history and what makes commit messages useful.

The need for code archeology depends on a project. When you writing a lot of new code it's probably less important than in a legacy thing where most changes are convoluted tweaks made for non-obvious reasons.

playingalong

9 months ago

Do people wash their hands 15 times a day so that they absolutely fine so much value in getting rid of the bugs?

The answer is: some do, some don't.

syncsynchalt

9 months ago

Yes, absolutely.

If I'm doing a `git blame` or digging through the annotate gutter in GitHub I prefer to find the appropriate commit context, not a commit with title "typo" or "oops" or "trying something" or "WIP".

In the latter case you can still do the `git log` archaeology to find the actual context but I'd rather not.

Some teams will squash all merges to avoid this problem, but why not have the best of both worlds by merging relatively clean branches?

baruch

9 months ago

I do. To learn a code base it before tremendously, also to find what happened and why things were done it is off great help. I don't need it every day but I routinely do git archeology.

PhilipRoman

9 months ago

Doesn't have to be 100% clean, but history has tremendous value to me. Often the most important question for me when fixing a problem is "was this intentional" (aka did that guy (who left 10 years before I arrived) know what he was doing?).

As far as I remember, I've had to check history for about 2/3 of bugs I've fixed.

In a perfect world each system would be documented and tested well enough that correctness could be derived from first principles, but we do not live in a perfect world :(

cryptonector

9 months ago

Yes. When doing `git bisect`, for example.

jcon321

9 months ago

Yea teams that have poor tracking and no PR practices.

chippiewill

9 months ago

Only in codebases that have good commit history

gorjusborg

9 months ago

'git rebase -i' meets this need and more. Everyone using git should learn to use it eventually. With it you can squash, fixup, reword, or delete commits interactively.

pragma_x

9 months ago

Glad I'm not the only one. Squashing branches pre-merge, or post-merge solves the problems of commit purity (main/develop/prod/etc always builds at every commit) while preserving the intent of any MR/PR. Squashing also makes rebases _A LOT_ easier; every bad day you've had executing a rebase is likely solved by squashing your feature/fix branch, first.

Going back in time to fix the past just to sort your changes into the "correct" bin or vicinity of changes just seems unnecessarily complex. I am also unaware of what use-cases this supports. Are teams out there forcing commits to have a stricter scope than just "anything in the codebase"? Perhaps it's a monorepo pattern? I have no idea.

Either way, you have to `git push --force` to modify your PR/MR. May as well use the stock workflow and learn how to clean up your branch.

I also endorse `git commit --amend` for anyone that identifies as a "micro-committer" but doesn't need to preserve intermediate changes.

Edit: re-reading comments - it sounds sort of like some teams prefer to bin changes like this in order to make a PR/MR browsable by commit, rather than inspect one big broad diff. Is that the case?

syncsynchalt

9 months ago

It sounds like this will automatically find and associate with each commit that last touched the modified lines, which is nice.

There've been plenty of times I haven't done a fixup because I couldn't be arsed to check exactly which of the three ancestor commits in my branch the tweak actually belongs with. I wouldn't even consider splitting into three fixups if there were three tweaks that belonged to three different ancestors. It sounds like this tool/workflow will do all that work with a single invocation.

burntsushi

9 months ago

git-absorb is a complementary tool to `git rebase -i`. git-absorb will create the fixup commits (from your staging area) for you and set them up for use with `git rebase -i --autosquash`.

gorjusborg

9 months ago

It solves a problem that can be solved better by good habits and a solid understanding of git.

burntsushi

9 months ago

That suggests you don't understand what git-absorb does. git-absorb takes a pattern of good habits and makes one part of it faster and easier.

gorjusborg

9 months ago

Sure, I guess that's possible.

All I see is a tool that guesses where to aggregate a fixup commits when I can't be bothered to view and think about that myself.

burntsushi

9 months ago

About grep, all I see is a tool that looks for lines matching a query when I can't be bothered to just read the file myself.

I can trivialize useful software too. See how ridiculous you sound? That's what software does! It makes our job easier.

It's true that git-absorb guesses, but since you clearly haven't used the tool, you don't know how good it is at guessing. Moreover, false positives and false negatives are not the same in this scenario. A false positive would be very annoying, and I don't think that's ever happened for me in the years I've been using git-absorb. False negatives happen more frequently, but it's fine, because it tells you and then you just fall back to what you would have done manually for whatever it couldn't find a commit for.

keybored

9 months ago

The default behavior is to output a `fixup!` commit. Which is a command for the todo editor of git rebase.

MBlume

9 months ago

I've been using this workflow with hg and it's great, happy to see a git port

toastal

9 months ago

Darcs documentation suggests amending & fixing up WIP commits too

optymizer

9 months ago

One of my favorite Sapling commands at Meta

pdpi

9 months ago

Yeah, it singlehandedly turned me on to Mercurial when I was there, and has massively shaped the way I use git ever since.

OJFord

9 months ago

This makes no sense to me, why would a conflict-free modifiable commit within the last 10 be the one I want to fixup any more often than roughly 1 in 10?

I fixup^ frequently, often often with conflict to resolve in the process, and I have never ever thought 'if only something would automatically choose the target commit for me', even if it was advanced AI why would I trust it to be what I wanted?

^my alias is:

    !f(){ target="$(test -n "$1" && git rev-parse "$1" || git fzsha rev-parse)"; git commit --fixup="$target" ${@:2} && EDITOR=true git rebase -i --autostash --autosquash "$target^"; }; f

`git fzsha` being another alias of mine to choose the target commit with fzf if not given. I rarely use that though, because usually I know it's HEAD~5 or whatever from doing it a second ago, or I've already been looking at the log to work out what I want.

enw

9 months ago

In what situations does this solve a problem?

In our projects we only enable squash merge in GitHub and the PRs can have any commits you want. The squashed commit includes link to PR, and PR has detailed summary (which wouldn’t be practical in the commit message).

titusjohnson

9 months ago

Agreed. It's been a decade or more since I and the teams I worked with did anything other than squash-merging a feature branch into main. The PR body becomes the commit body for the squash, and we're done.

Developers can be as f'n messy as they want on their branches, no one cares because it never hits main.

keybored

9 months ago

Admittedly git absorb solves a smaller problem. Because if you do squash merges and leave the history in GitHub you have much bigger problems.

Aissen

9 months ago

This looks a lot like git-fixup[1], which I have been using for a few years. I might try git-absorb since it looks quite interesting.

[1] https://github.com/keis/git-fixup

cocoto

9 months ago

Small question to anyone with this workflow. Do you (re)run CI on every affected commit? If no, what is the point of small commits if you lose any guarantees? I much prefer the honest linear non-modified history.

keybored

9 months ago

Why do you ask that question as if that is the obviously most important factor?

My history is read much more than it is checked out. And definitely much more than it is stress-tested by some CI. Readability matters more.

I’m having a hard time imagining all the supposed “commit for coffee break” committers who also are die-hard bisect users that go nuts when someone doesn’t run the whole CI on every commit.[1]

And if you don’t have that many commits to check because you “squash merge”: `git bisect start --first-parent` gives you exactly the same result.

globular-toast

9 months ago

There seems to be an alternative implementation called git-autofixup: https://github.com/torbiak/git-autofixup

Has anyone compared the two?

dawg

9 months ago

Yes, I empirically found the hunk-based approach of git autofixup to work more reliably. I use it via magit. Paired with rebase --update-refs, it's particularly helpful to automatically fixup stacked branches.

krobelus

9 months ago

https://news.ycombinator.com/item?id=26507205

taberiand

9 months ago

This sounds very useful, I frequently reset soft and recommit in batches of related changes before PR and this sounds like it streamlines integrating updates quite nicely

adastra22

9 months ago

The thing that I never knew I needed, but I predict within a few days I won’t be able to live without. Thank you!

saagarjha

9 months ago

This seems neat in theory but I would be perpetually concerned that it is going to pull some change I don't mean to commit and put it into something for review. How well does it do at that, anecdotally?

mook

9 months ago

It deals with things that have been staged, so if you don't want to commit then don't stage it. I believe a dirty working tree is fine (and gets ignored).

juped

9 months ago

If I understand the logic it uses correctly this will nearly always attach the fixup to the right commit. It's not for me because I do this manually too fluidly but it seems like a good tool.

thecopy

9 months ago

If i understand this will break "changes since my last review" and disconnect PR review comments in GitHub?

jeltz

9 months ago

No, why would it?

thecopy

9 months ago

Perhaps i misunderstood it... looks to me like squashing commits and would require a force push, which would have this effect.

psanford

9 months ago

I'm a huge fan of git absorb. I love tools that do-the-right-thing so I can think less. That is what git absorb is.

dmead

9 months ago

This sounds great,but kind of an anti pattern in git.

I definitely want to have a "fixes" commit on my feature branch. You should do whatever you want on a feature branch so long as your trunk has a clean history.

This sounds like someone wanted to lift a feature of changesets in mercurial into git. I don't think this is safe and probably breaks a lot of people's mental model of git changelogs being an immutable data structure.

keybored

9 months ago

> I definitely want to have a "fixes" commit on my feature branch. You should do whatever you want on a feature branch so long as your trunk has a clean history.

This tool creates `fixup!` commits. These commits aren’t rewrite commits. They are commits which later can be used to rewrite the history.

You can use this tool, not rewrite your branch, and then rewrite the branch right before merge.

> This sounds like someone wanted to lift a feature of changesets in mercurial into git. I don't think this is safe and probably breaks a lot of people's mental model of git changelogs being an immutable data structure.

Rewriting is a huge part of the Git culture (and well-supported). Meanwhile Mercurial seems to have rewriting as a sort of add-on.

saagarjha

9 months ago

At some point the changes are going to get merged in, no? And that that point I would really like the commits to be nice.

snatchpiesinger

9 months ago

git log --first-parent

dmead

9 months ago

That's why you squash and make the commit message readable.

saagarjha

9 months ago

Squash all the commits I worked so hard to make atomic?

cryptonector

9 months ago

No, you don't want to squash everything into one commit. You want logical, clean history.

cryptonector

9 months ago

This is no more nor less safe than `git rebase`.

renewiltord

9 months ago

Pretty clever impl of a tool. I'll be using it, thanks.

tripdout

9 months ago

How does it figure out which commit to add each change to?

andsens

9 months ago

7 lines into the README buddy: > The command essentially looks at the lines that were modified, finds a changeset modifying those lines, and amends that changeset to include your uncommitted changes.

bananapub

9 months ago

it explains it in literally the first paragraph: https://github.com/tummychow/git-absorb?tab=readme-ov-file#g...

> The command essentially looks at the lines that were modified, finds a changeset modifying those lines, and amends that changeset to include your uncommitted changes.

a1o

9 months ago

Uhm, I do a lot of git rebase -i HEAD~2 where I just squash the commit on the latest or sometimes I need to reorder and move the fix commits in specific commits in Pars that multiple commits, which I then need to push force. Is this for a similar use-case? I am not familiar with fixup or how it works.

mckn1ght

9 months ago

fixup works with rebase if you add the -a flag for autosquash. try it and you’ll see the commits already reordered for you in the interactive menu.

also you can write HEAD~2 as @^^ if you want to save a couple keystrokes!

rom1v

9 months ago

Nit: Or @~2 (@ is HEAD, so @^^ is HEAD^^).

hoten

9 months ago

fixup lets you mark a change to be rolled back into a previous commit, without changing the history (yet). When you do `git rebase --autosquash` it then automatically moves those commits as `fixup` in the right places. It helps to queue up many fixups at once and only have a single rebase to do when you're done (or for reviewers to see just the changes in a review tool, not have to reassess the entire commit because you already rebased)

btw you could be doing `git commit --amend` if you just want to add to the last commit.

cryptonector

9 months ago

Yes. This saves you having to `git rebase -i` or `git rebase --autosquash` to get the fixup to be "absorbed" into the commit it is meant to fix up.

AlexCoventry

9 months ago

You can also just press `c`, then `F`, in magit.

user

9 months ago

[deleted]

lr4444lr

9 months ago

you don't want to shove them all into an opaque commit that says fixes, because you believe in atomic commits.

Sure I do. The whole branch will be squashed anyway before it's merged in, and a single "fixes" commit while still on its own branch will be easier to track in a PR for addressing everything pointed out earlier.

I mean, don't let me stop anyone from using this or --fixup if this is your flow, but this solves a problem neither I nor anyone in my last 10 years of working with code has.

imiric

9 months ago

Every team is free to choose what works best for them, but IMO always squashing PRs is not a good strategy. Sometimes you do want to preserve the change history, particularly if the PR does more than a single atomic change, which in practice is very common. There shouldn't be a static merge type preference at all, and this should be chosen on a case-by-case basis.

At the risk of sounding judgemental, I think this preference for always squashing PRs comes from a place of either not understanding atomic commits, not caring about the benefits of them, or just choosing to be lazy. In any case, the loss of history inevitably comes at a cost of making reverting and cherry-picking changes more difficult later, making `git bisect` pretty much worthless, and losing the context of why a change was made.

boolemancer

9 months ago

> At the risk of sounding judgemental, I think this preference for always squashing PRs comes from a place of either not understanding atomic commits, not caring about the benefits of them, or just choosing to be lazy. In any case, the loss of history inevitably comes at a cost of making reverting and cherry-picking changes more difficult later, as well as losing the context of why a change was made.

1) Why are you ever reverting/cherry-picking at a more granular level than an entire PR anyway? The PR is the thing that gets signed-off on, and the thing that goes through the CI build/tests, so why wouldn't that be the thing kept as an atomic unit?

2) I don't think I've ever cared about the context for a specific commit within a PR once the PR has been merged. What kind of information do you expect to get out of it?

Edit: How does it remove the context for a change or make `git bisect` useless? How big are your PRs that you can't get enough context from finding the PR commit to know why a particular change was made?

imiric

9 months ago

> The PR is the thing that gets signed-off on, and the thing that goes through the CI build/tests, so why wouldn't that be the thing kept as an atomic unit?

Because it often isn't. I don't know about your experience, but in all the teams I've worked in throughout my career the discipline to keep PRs atomic is almost never maintained, and sometimes just doesn't make sense. Sometimes you start working on a change, but spot an issue that is either too trivial to go through the PR/review process, or closely related to the work you started but worthy of a separate commit. Other times large PRs are unavoidable, especially for refactorings, where you want to propose a larger change but the history of the progress is valuable.

I find conventional commits helpful when deciding what makes an atomic change. By forcing a commit to be of a single type (feature, fix, refactor, etc.) it's easier to determine what belongs together and what not. But a PR can contain different commit types with related changes, and squashing them all when merging doesn't make the PR itself atomic.

> I don't think I've ever cared about the context for a specific commit within a PR once the PR has been merged. What kind of information do you expect to get out of it?

Oh, plenty. For one, when looking at `git blame` to determine why a change was made, I hope to find this information in the commit message. This is what commit messages are for anyway. If all commits have this information, following the history of a set of changes becomes much easier. This is helpful not just during code reviews, but after the merge as well, for any new members of the team trying to understand the codebase, or even the author themself in the future.

keybored

9 months ago

> I find conventional commits helpful when deciding what makes an atomic change.

I already know if I’m doing a fix, a refactor, a “chore” etc. Conventional commits just happen to be the ugliest way you can express those “types” in what looks like English.

imiric

9 months ago

Yeah, well, that's just, like... your opinion, man.

But I've worked with many, many developers who don't strictly separate commits by type this way. I myself am tempted to do a fix in the same commit as a refactor many times. Conventional commits simply suggest, well, a convention for how to make this separation cleaner and more explicit, so that the intent can be communicated better within a team. I've found this helpful as a guide for making atomic changes. Whether or not you write your commit messages in a certain way is beside the point. But let me know if you come up with a prettier way to communicate this in a team.

keybored

9 months ago

The prettier way is English.

boolemancer

9 months ago

> Because it often isn't. I don't know about your experience, but in all the teams I've worked in throughout my career the discipline to keep PRs atomic is almost never maintained, and sometimes just doesn't make sense. Sometimes you start working on a change, but spot an issue that is either too trivial to go through the PR/review process, or closely related to the work you started but worthy of a separate commit. Other times large PRs are unavoidable, especially for refactorings, where you want to propose a larger change but the history of the progress is valuable.

In my experience at least, PRs are atomic in that they always leave main in a "good state" (where good is pretty loosely defined as 'the tests had to pass once').

Sometimes you might make a few small changes in a PR, but they still go through a review. If they're too big, you might ask someone to split it out into two PRs.

Obviously special cases exist for things like large refactoring, but those should be rare and can be handled on a case by case basis.

But regardless, even if a PR has multiple small changes, I wouldn't revert or cherry-pick just part of it. Just do the whole thing or not at all.

> Oh, plenty. For one, when looking at `git blame` to determine why a change was made, I hope to find this information in the commit message. This is what commit messages are for anyway. If all commits have this information, following the history of a set of changes becomes much easier. This is helpful not just during code reviews, but after the merge as well, for any new members of the team trying to understand the codebase, or even the author themself in the future.

Yeah but the context for `git blame` is still there when doing a squash merge, and the commit message should still be relevant and useful.

My point isn't that history isn't useful, it's that the specific individual commits that make up a PR don't provide more useful context than the combined PR commit itself does.

I don't need to know that a typo was fixed in iteration 5 of feedback in the PR that was introduced in iteration 3. It's not relevant once the PR is merged.

imiric

9 months ago

I don't have time to reply to all your points, but regarding this:

> I don't need to know that a typo was fixed in iteration 5 of feedback in the PR that was introduced in iteration 3. It's not relevant once the PR is merged.

I agree, these commits should never exist beyond the PR. But I go back to my point about atomic commits being misunderstood. It's not about keeping the history of _all_ changes made, but about keeping history of the most relevant ones in order to make future workflows easier. A few months from now you likely won't care about a typo fix, but you will care about _why_ some change was made, which is what good commits should answer in their commit message. Deciding what "atomic" truly means is often arbitrary, but I've found that making more granular commits is much more helpful than having fewer and larger commits. The same argument applies to the scope and size of PRs as well.

keybored

9 months ago

> Why are you ever reverting/cherry-picking at a more granular level than an entire PR anyway? The PR is the thing that gets signed-off on, and the thing that goes through the CI build/tests, so why wouldn't that be the thing kept as an atomic unit?

The PR could contain three refactoring commits and one whitespace cleanup commit. One main change. The only thing that breaks anything is the main change. Because those others were no-ops as far as the code is concerned.

imp0cat

9 months ago

ad 1) I'd guess it depends on the size of the PR. If they're massive it kinda makes sense.

lr4444lr

9 months ago

If only we had a software methodology that championed incremental change...

lr4444lr

9 months ago

Since we're taking the risk of being judgmental, I'd suggest the preference for not squashing branches to preserve the developer's local commit history when merged into the main line suggests a misunderstanding between branches and commits, but if it works for you more power to you.

phist_mcgee

9 months ago

If you go all in on atomic commits, you're probably going to be rebasing a lot. IME, that's a lot of time spent trying to make a "clean" history all for the benefit of looking back infrequently at what broke.

Can't you just inspect your merged branches and bisect those instead if using the squash strat?

seadan83

9 months ago

What are your thoughts on the "ship, show, ask" workflow? [1]

In that workflow, small stuff is simply pushed, which allows PRs to be more single focused and more atomic. Perhaps your only objection is direct pushes to master? I am really curious if that workflow otherwise addresses all of the downsides you stated while still allowing for all PRs to be uniformly rebase-squash merged.

[1] https://martinfowler.com/articles/ship-show-ask.html

imiric

9 months ago

I wasn't aware of this particular workflow, but I've heard of similar ones before. The late Pieter Hintjens, of ZeroMQ fame, advocated for a strategy called "optimistic merging"[1], which essentially abandons the standard code review process in favor of merging changes ASAP, and fixing any issues as they arise later.

I'm not a fan of this. It allows contributors to abandon any established coding standards, while placing more burden on maintainers to fix issues later. This in practice never happens, so the quality of the codebase degrades over time. Not to mention that it allows malicious actors to merge changes much more easily.

As for "ship, show, ask" specifically, I have similar reservations. I think all changes should go through a review process, even if someone just glances at the changes. It not only gives the opportunity to leave feedback, but also serves as communication so that everyone is aware of the proposed changes. Also, making the decision of whether to choose "ship", "show" or "ask" might only work for senior and well disciplined developers. I think that in most teams you have a mixture of experience, so you'd inevitably end up in situations where a change should've been "show" or "ask", but was "ship", and viceversa. I don't think teams will ever align on a single strategy to make a correct decision, since it's based on subjective opinion to begin with. Always following the same workflow gets rid of these uncertainties.

[1]: http://hintjens.com/blog:106

seadan83

9 months ago

After having done a few thousand CRs, I've come to believe that CR is of mixed value. Not always good, nor all good. Removing the bad CRs and doing CR when it is useful IMO helps maximizes productivity.

First, I assume a few constraints on "ship, show, ask." First constraint is that all changes go for at least one, if not multiple self reviews. Second, the ability to ship is not outright granted. Until someone has demonstrated they can ship reasonable, working code- they are not given write permissions. They must "ask."

Next, ship is not a given. Things that are not interesting are shipped. This enhances the communication part of PRs. If a PR is sent, it is important or wants extra review. That removes the noise of everything being a PR, and removes the noise from PRs where minor changes are smuggled in for the sake of efficiency. This also removes a lot of nitpicking as well. When asked to CR, we often feel we need to CR, and if our only feedback are minor things- then that is the feedback. 'Perfect' becomes the enemy of good enough. Saving time, efficiency are huge. Do CR when it is useful, and only then.

Last, post merge reviews can still be done by the maintainers to ensure things are going smoothly. Hence, post merge review is an option for both ship and show. This means the coding lifecycle is blocked in CR only when the CR is valuable.

That CR is blocking I feel is a very important drawback. Typos linger because time is finite and only those that do not appreciate that will try to fix everything. The team then gets bogged down spending too much time fixing small things. Leaving the small things is not ideal either. If the overhead to fix small things is tiny, suddenly it takes 5 minutes to do what would otherwise require another person and possibly the next day.

For example, A PR preceded by renaming a few files is powerful. That is a game changer compared to a hard to review CR, staged PRs, or simply never doing the renaming because it runs the diff.

Blanket CR can also ossify a project. Most projects I've seen are written quickly and in large chunks. If they are reviewed at all, it is a lot of 2000 line "LGTM", zero comment reviews. Then when careful CR is done (often by the "scaling" team), suddenly things that are original become gospel. The CR debates the absolute best naming, original intent, spending magnitudes more time than was originally put into the code in the first place. Meanwhile the code was done quickly, slightly worse than good enough- and now good enough is not even good enough. The CR now demands a far higher quality bar compared to the original code when it was written. Fixing a 50k line code base at 500 lines per day is a way to _not_ fix that codebase. The original bad code is therefor fossilized into the project. Time and effort are not infinite.

That turned into a bit of a rant. I wanted to demonstrate that:

(A) there are guard rails on SSA so it is not just a free for all.

(B) selectively requesting CR when it is valuable amplifies the value of CR.

(D) post merge review of all code can still meet the goal of reviewing everything.

I'll tie this back to git absorb now. Git absorb rebases, which arguably invalidates any CR that was done. Ship-Show-Ask allows for there to be more explicit trust where someone does their git absorb, redoes the self review and then merges. Further, the workflow helps clean up PRs to not be as many commits.

Though, I do think the discussion of workflows and CR is really fascinating. Having been essentially just the one senior, or one of two on most all projects I've worked on in the last decade, - I'm certainly (overly) eager to discuss in detail the workflows.

imiric

9 months ago

Thanks for your perspective. It's interesting to hear a different take on this.

I do think there's no right or wrong answer with these things. Everyone will have different opinions about their ideal workflows, especially senior engineers. The important thing is to find a good balance that everyone on the team is satisfied with, but there will always be concessions, as in any relationship.

What I have found, though, is that the more uncertainties a workflow has, the higher the chances of misunderstanding, or someone not following it in the way someone else thinks they should. So with the way "ship, show, ask" is described, and what you mention here, is that there are many constraints, and decisions to be made about what constitutes an "interesting" change, or what is worthy to be shipped without a CR and what isn't. If the team is mostly senior, and there's a lot of trust involved, this workflow might work, and it might reduce the friction of blocking CRs. But there are a lot of variables involved there.

OTOH, by always requiring CRs and approvals for shipping, there are far less decisions to be made. The workflow is simple, fixed and known to everyone. Sure, sometimes CRs step into nitpicky territory, but this is usually solved by having some conventions, such as making clear what is a nitpick, what is a minor/major suggestion, and what is a blocker. This way nitpicks and minor suggestions can be safely ignored (as long as they're ackgnowledged in some way), and they're not blockers for merging.

In practice I've found that while this workflow does take considerable time and effort, in the vast majority of cases it leads to higher quality code. Even addressing nitpicks and typo fixes is an improvement in most cases. These can be easily automated by suggestions in GitHub, which can be batched and committed directly from the web UI, so it doesn't take much effort at all.

> Last, post merge reviews can still be done by the maintainers to ensure things are going smoothly.

Sure, but is this diligently done? And how well does it scale if the project has many contributors, but few maintainers? Placing the burden on maintainers to ensure certain quality standards is unrealistic. Moreover, it assumes that maintainers have the final say in these matters, instead of their changes also being worthy of discussion, which happens during a CR. I think that the responsibility for ensuring things are running smoothly should be shared by every member of the team, including external contributors.

> Most projects I've seen are written quickly and in large chunks. If they are reviewed at all, it is a lot of 2000 line "LGTM", zero comment reviews.

Those are examples of not doing CRs correctly, not a testament that CRs are not worth the time and effort. You can just as well run into this issue with the SSA workflow. As with any issue with team dynamics, this should be resolved by communicating and aligning on core values and practices the team finds important.

> The CR debates the absolute best naming, original intent, spending magnitudes more time than was originally put into the code in the first place.

Yes, this is to be expected. It takes much more effort to read and understand code than it does to write it. It takes additional time and effort to leave a written comprehensible review, and communicate all of this asynchronously. This scales linearly for every member in the team, and there's no way around it. But the question is whether deciding to invest the time and effort in a well established CR process ultimately results in higher quality software being shipped. In my experience, it does, and it's worth it.

That said, CRs are not the only or, even failproof, method of quality assurance. As mentioned in the SSA article, pair programming is another method, but it has its own drawbacks and limitations. All of these things help, and are best used in combination. They can be uncomfortable, tedious and seem pointless at times, but deciding to not do them because of this is doing a disservice to the final product.

seadan83

9 months ago

> what constitutes an "interesting" change, or what is worthy to be shipped without a CR

FWIW, I have not seen much trouble in the ambiguity. Things like "move method" refactor, typo fix, add additional test cases, comment clarification, are just shipped. If the update makes an important clarification on a comment that the team should know about, show is interesting. If the code fixes a bug, would be mentioned in a stand up- show is good. Ask is good for a large refactor where there is room for error, when something does not seem clean, merits any type of double check or second opinion, or and especially when it is a first foray into a different part of the codebase.

> Sure, but is (post merge reviews) diligently done?

The beauty I would say is that it does not have to be diligently done. Feature, not a bug.

If the lead maintainers are super busy, it is okay. Perhaps they will have time later and can do things in bulk, or even not at all. The process is no worse and does not stop when the majority of lead maintainers are on vacation.

Interesting changes are still notified via email (thru PRs). This highlights what a person should be looking at.

I think this attitude gives more trust to the team and contributors. It is not the responsibility of just the maintainers and seniors to (pre or post) review everything. No blame game of: "how did this bug get through, and more importantly, how did it get through review??"

> And how well does it scale if the project has many contributors, but few maintainers

Good question. Offhand I would say no worse. Contributors would all be required to ask, which is no different than 'review everything.' The difference would be for maintainers, they would not be required to always have other maintainers do CR in every case. A second potential difference, contributors might be promoted more quickly to have write permission. That in turn helps scale the efforts of the maintainers.

I think this shines on the other extreme when there are very few maintainers. OSS requiring week long turn around times will have trouble onboarding contributors to be regular maintainers. In industry, trusting developers relatively quickly I think is healthy.

The mentality to be looking to give people write permission sooner rather than later IMO is good.

> Those are examples of not doing CRs correctly, not a testament that CRs are not worth the time and effort.

My apologies for not conveying the example clearly. The examples I'm thinking of usually had no review. These are older code bases written long ago before CR. Or, these are startup quality code bases where a few founders jammed out tons of code. Or, code that was put out during a crunch, almost always successive crunches, where the constant pressure to ship was everything. Or, some Greenfield project that was jammed out and then handed over for maintenance and enhancement. Just large codebase that were never great. I'm also implying there is often a different standard for originally authored code compared to changing existing code.

My point is there is a quality bar that is suddenly higher, a lot higher, than the existing code.

> But the question is whether deciding to invest the time and effort in a well established CR process ultimately results in higher quality software being shipped. In my experience, it does, and it's worth it.

I agree, though not sure if actually always worth it. Projects are rarely labelled a failure, but many are. That can manifest as simply too much time going by without enough being shipped.

Though, good CR culture is indeed very valuable. I think it is more an exception than the rule.

I think that complements SSA at the same time. A strong CR culture with review everything might be hard to distinguish from SSA. In that sense, "ask" could be thought of as "second opinion." When a team gets crunched, there is more flexibility. Or, if the team is immature w.r.t to CR, or if there is a significant lack of reviewers- SSA removes barriers.

The idea is to capture 95% of the benefit of CR with 50% the cost.

> Sure, sometimes CRs step into nitpicky territory

Pointing out typos IMO is useful. Though, I've learned to personally delete any CR feedback I write that starts with "nitpick." I no longer believe it to actually be all that helpful or productive. It is a signal that a style discussion is due. It is important I think to respect that a CR is a blocking thing. Keeping someone an extra whatever amount of time for nitpicks is missing the bigger picture of what is important IMO

----

I very much appreciate the dialog and perspective! Thank you for your considered feedback here. I largely agree with your points, particularly the ones I did not respond to.

I suffer a bit too from the lesson of things taking too long is it's own failure mode. Success is not at all guaranteed, even if quality and solid code is shipped. At the same time, solid and quality code is required to scale, and is required for most success. That is to say that projects do fail very much in spite of quality engineering.

keybored

9 months ago

Why is the preferred goal to “rebase-squash” (redundant) merge? The implied onus here seems to be for the other party to move towards that strategy. But why this practice should be the goal does not seem to be mentioned.

Step one isn’t to find some way to accept squash-merge. Step one is to find some reason for squash-merge.

seadan83

9 months ago

I think you are coming at this from quite the different angle. To semi quote what I was responding to:

> always squashing PRs is not a good strategy [because of the following reason]

Hence my question,if workflow X mostly solves the following reasons, then always squashing is potentially a reasonable strategy.

I think you are asking, why do that at all?

The answer is IMO quite involved, nuanced, and not at all one sized fits all.

FWIW: (1) all commits to master become atomic if a passing CI is required before rebase squash-merge.

(2) rebase creates a linear history. Non linear history is very difficult to bisect, and can really be complicated. I don't see much positives from that complexity.

(3) generally PRs are functional sized pieces of work. (That is ambiguous.) If it helps, PRs are more boulders than pebbles. Having boulders in the history is more useful than every small pebble. Net benefit is the history then becomes a smattering of well marked minor changes and a set of functional changes.

Side note, I like anything that helps the workflow of "touch clean code only, touch only the code you need to, clean the code before you touch it". (Clean defined there as "not a pile of illogical, obtusely written crap - riddled with latent bugs). In other words, make the code simple and obvious first, then modify the simple and obvious code. I give little credit to those that spend days studying code to make a one-line "masterful" change. That does not scale, everyone has to do that, and there are thousands of lines. All things in their own context though...

(4) utility and communication for commits. IMO ideally a PR is initiated with a single commit that is written "for the history". This commit description is both descriptive for the reviewers, the git history, and is the complete text for the PR. This creates the most value IMO. I view PRs as ephemeral, any extra text written for the sake of PR is overhead. Then, any following commits are pushed to communicate to the reviewers. The audience is no longer a future maintainer, nor someone looking at a PR to get a handle of the in flight changes, but is someone that has looked at the diff. Thus, commit comments like, "address TOCTOU concern", "fix typo", "add null check", "add test case", "address feedback" all then make sense. On squash-merge, those comments are discarded and the merge is just the "commit written for the history". Otherwise, without squash - those commits have a much broader audience and become long lived. I believe this typically creates a much more useful history, eliminates a lot of noise, clarifies the audience of each commit message, and finally removes any overhead where a detailed description is written in a PR but then never captured in the history (wasteful &puts context in a location that is ephemeral and a step removed)

keybored

9 months ago

> I think you are coming at this from quite the different angle. To semi quote what I was responding to:

> > always squashing PRs is not a good strategy [because of the following reason]

I am at least coming from the same position that the OP is. Because I agree with this:

> > IMO always squashing PRs is not a good strategy. Sometimes you do want to preserve the change history, particularly if the PR does more than a single atomic change, which in practice is very common. There shouldn't be a static merge type preference at all, and this should be chosen on a case-by-case basis.

> Hence my question,if workflow X mostly solves the following reasons, then always squashing is potentially a reasonable strategy.

If OP has no problems with their current strategy (where they may squash but they are free not to) then you aren’t solving any problems. For them.

They don’t have a problem that they need to solve. You are the one who wants to fit this squash-square into their circle.

What onus do they have to read some Everything Is A Pattern dot com link in order to come to terms with a policy that they don’t need?

> I think you are asking, why do that at all?

> The answer is IMO quite involved, nuanced, and not at all one sized fits all.

Unlike squashing.

> FWIW: (1) all commits to master become atomic if a passing CI is required before rebase squash-merge.

I don’t know what atomic means. But `git log --first-parent` is equivalent to a fully squashed history. Which means that `git log --first-parent` is atomic if the squashed alternative reality is.

> (2) rebase creates a linear history.

`git bisect start --first-parent`.

There seems to be a pattern here.

> Non linear history is very difficult to bisect, and can really be complicated. I don't see much positives from that complexity.

Non-linear history can involve everything from twenty-head octopus merges to eight different root commits to backmerging twelve times before a feature branch is merged. But does saying no to the straightjacket of always-squash mean that you have to support every DAG under the sun? No.

Most sane everyday histories are limited to branching off the main branch, rebasing until it is done and then merging it back. One merge. Granted some people don’t like rebase or don’t know how to use it so they make back merges.

... And if those people make completely useless histories on that branch? Then they can squash of course. Because we’re arguing against the corporate always-squash policy. Not saying that you should never do it.

> (3) generally PRs are functional sized pieces of work. (That is ambiguous.) If it helps, PRs are more boulders than pebbles. Having boulders in the history is more useful than every small pebble. Net benefit is the history then becomes a smattering of well marked minor changes and a set of functional changes.

People can make however many commits are appropriate for the scope of the PR. It can be twenty or one.

See how the squash-always have to make the world around their “merge” strategy pristine (according to them) in order for it to work? “Just have perfect-sized PRs.” “Just have perfect-sized issues.”

> Side note, I like anything that helps the workflow of "touch clean code only, touch only the code you need to, clean the code before you touch it". (Clean defined there as "not a pile of illogical, obtusely written crap - riddled with latent bugs). In other words, make the code simple and obvious first, then modify the simple and obvious code. I give little credit to those that spend days studying code to make a one-line "masterful" change. That does not scale, everyone has to do that, and there are thousands of lines. All things in their own context though...

Okay.

> (4) utility and communication for commits. IMO ideally a PR is initiated with a single commit that is written "for the history". This commit description is both descriptive for the reviewers, the git history, and is the complete text for the PR. This creates the most value IMO.

Okay. You have still gotten to the explanation of why always-squash-merge is necessary for PRs. (We have covered the previous stuff. `--first-parent`)

We could talk about the ideal world. In an ideal world we could choose better tools that are good for this very granular PR structure. But in the real world we often end up with GitHub or some lookalike. Which is not good for granular PRs, dependent PRs, “stacked PRs” or whatever the trendy name is for the people who fetishize PRs over everything else.

In the real world we still have git(1) to be flexible around the limitations of these team-imposed tools.

And even with better tools I like being able to use several commits for one PR. Because sometimes you end up needing five preparation commits in order to solve the issue you were setting out to do. And that is often a hell of a lot more “agile” then going back to the issue tracker and making five issues in preparation for that final commit. Then making six pull requests that depend on each other.

> I view PRs as ephemeral, any extra text written for the sake of PR is overhead. Then, any following commits are pushed to communicate to the reviewers. The audience is no longer a future maintainer, nor someone looking at a PR to get a handle of the in flight changes, but is someone that has looked at the diff.

Okay.

I think that a PR can evolve to encompass more “atomic” changes then what was initially planned. That has happened to me. Me and the reviewer find out that it makes sense to make some more changes. Not “fix typos” on that same PR but changes that deserve to be commits in their own right.

There is also room for “fix typos” commits that you can then rebase away once you are done. With Git you have that flexibility.

> Thus, commit comments like, "address TOCTOU concern", "fix typo", "add null check", "add test case", "address feedback" all then make sense. On squash-merge, those comments are discarded and the merge is just the "commit written for the history". Otherwise, without squash - those commits have a much broader audience and become long lived.

Do we really have to tediously go through the same thing every time the squash-always camp tries to argue for their one-true-policy? Look at what OP said:

> > Sometimes you do want to preserve the change history, particularly if the PR does more than a single atomic change, which in practice is very common.

Which is what I just went through.

Yes. Rebase that typo fix which you introduced. That’s noise.

Probably don’t rebase that commit which made a refactor which should have been done three years ago.

> I believe this typically creates a much more useful history, eliminates a lot of noise, clarifies the audience of each commit message, and finally removes any overhead where a detailed description is written in a PR but then never captured in the history (wasteful &puts context in a location that is ephemeral and a step removed)

What me and OP has argued for allows you to do all that. And in addition to that it is also much more flexible.

hamandcheese

9 months ago

git bisect is still perfectly useful. It will locate the squashed commit/pr which is more than enough for me to zero in on whatever the bug is.

imiric

9 months ago

How helpful is it if it points you to a commit with 10 different changes? Will you go back to the PR to see the context of each change, and try to guess where the issue is based on the behavior? You're right back at manual debugging at that point.

This workflow really shines when it points you to the actual atomic commit that introduced the issue. Then you're simply a `git revert` away from undoing it, without risking breaking anything else, barring functional conflicts, of course. Try doing that with a squashed commit.

burntsushi

9 months ago

I'm with you (see my other top level comment), but

> Then you're simply a `git revert` away from undoing it, without risking breaking anything else

This needs careful qualification. On GitHub at least, it is difficult to ensure every commit passes CI. This can result in skipping during bisect for a busted commit. It doesn't happen often enough in my experience too convince me to give up a cleaner history, but it's a downside we should acknowledge.

Ideally GitHub would make testing each individual commit easier.

keybored

9 months ago

A revert two weeks after the fact will create a new and unique tree (untested) in any case. I don’t if you’re saying that the original commit or the revert might be untested.

In either case the brand new revert could break something. Who knows, it’s a new state.

> It doesn't happen often enough in my experience too convince me to give up a cleaner history, but it's a downside we should acknowledge.

There are tools for that.

https://github.com/mhagger/git-test

burntsushi

9 months ago

All I'm trying to do is qualify things so that the trade offs can be more honestly assessed. The bisect for finding that commit might not work as well as you hope if you need to skip over commits that don't build or whose tests fail for other reasons. Those things can happen when you aren't testing each individual commit.

I understand there are tools for testing each individual commit. You'll notice that I didn't say it's impossible to ensure every commit is tested. Needing to use random tools to do it is exactly what makes it difficult. And the tool you linked says literally nothing about using it in CI. How good is its CI integration? There's more to it than just running it. On top of all of that, there are more fundamental problems, like increasing CI times/costs dramatically for PRs split into multiple fine grained commits.

Again, anyone here can go look at my projects on GitHub. I try hard to follow atomic commits. I think it's worth doing, even with the downsides. But we shouldn't try to make things look rosier than they actually are.

recursive

9 months ago

I've not seen code where this revert thing would work. If it's in the PR something else now depends on it.

recursive

9 months ago

There can be another reason. I do care about the benefits, but I don't think there really are any.

notlinus

9 months ago

I wish people would stop saying "atomic commits" and start saying "main/master is stable", because that's what they actually mean. Every git commit is atomic, by definition... But people want every single possible revision to be green and buildable, which is different, and has nothing to do with git. I don't think it makes sense (tags are a lot more helpful for marking what is and isn't stable), but hey.

usr1106

9 months ago

Yes, atomic commits is not a very descriptive term. But main/master is stable is only a necessary condition, not a sufficient one.

If you squash (or mix from the beginning) several unrelated changes into a single commit, main/master would be stable.

AFAIK atomic commits means nothing can be taken away without breaking the change and nothing needs to be added to make the change work. How to express that without 2 clauses is indeed a good question.

assiniboine

9 months ago

Atomic commits ensure each change works independently.

usr1106

9 months ago

That cannot be achieved in general. Later commits often depend on some earlier one.

keybored

9 months ago

People say what they mean.

“Atomic” sometimes means conceptually: does one thing. Sometimes it means “builds”. Or “works”. And yes: they do mean every single commit builds. Not just on main.

You don’t have to put words in people’s mouth.

rom1v

9 months ago

> The whole branch will be squashed anyway before it's merged in

That looks like a very wrong process to me. Why would you even want to do that?

travisjungroth

9 months ago

That whole paragraph is the conditions under which this tool would be useful. If they’re not true for you, this isn’t for you. This is like quoting “You have a feature branch” and saying “No I don’t.”

krick

9 months ago

Good for you, but you (and, apparently, everyone in your last 10 years of working with) would have a problem if I was the one reviewing your commits. I mean, to be fair, I often did let it slide (for political/social/practical reasons) and use autosquash, but I always actively discouraged it, so if you are a junior or a new hire with uncertain usefulness status, I'd at least talk to you about that and ask you to fix it. This is not an ok practice, it's only acceptable because too many people don't have a habit of making their own bed and much bigger kinds of technical debt get ignored because in the end it's about shipping stuff, so you just sigh and say "oh, whatever".

hamandcheese

9 months ago

It has always been enough for me to trace a change back to its original pull request. Extra granularity might be nice sometimes but never essential. And I do a fair bit of digging through git history. Not sure what makes it "not an ok practice". Squashing works fine and lets the 80% of people who don't care spend their efforts thinking about something other than a perfect git history.

keybored

9 months ago

I personally agree that what you advocate is better. But apparently people get by without it. It’s better. But not essential. No bridges will collapse because of squash merges.

I also disagree with the framing: some policy can always be enforced if you have some kind of authority over the other person. Now we’re veering away from arguing on the technical merits.

notlinus

9 months ago

For what it's worth, you're right.

dangsux

9 months ago

[dead]

rglynn

9 months ago

As a frequent user of fixups, this feels like a solution for already broken workflows.

> Instead of manually finding commit SHAs for git commit --fixup

Assuming you are using fixups, is this actually a problem?

I could see this being a possibility if you are: A. not practicing atomic commits or B. have so many commits in your branch that this is a chore.

A. seems unlikely if you are already using fixups and B. seems like a problem worth solving properly rather than going around.

To sum up, I'm not convinced by the elevator pitch. However, I am keenly aware that the workflows of developers differ vastly across industry, company size, technology etc. I'd be interested to understand what problems this or similar tools solve?

seadan83

9 months ago

What solutions have you seen for problem (B)?

The open source example is hard to fix AFAIK. Everything needs to be a PR, some changes to older code bases are simply going to be either large or multi-stepped (many commits, and sending them all as stacked PRs is often not efficient enough to be effective). In industry, I think there are more solutions available. Though, overall, I am very curious how you would go about solving B.

keybored

9 months ago

> Assuming you are using fixups, is this actually a problem?

No. They made a whole tool around a not-actually problem.

Sure you can have a 30-commit branch. Find some OSS project where someone has been doing a large project on a branch for two months because the maintainer hasn’t accepted it yet.

adastra22

9 months ago

How would you manage long-lived branches that are periodically rebased against the master branch (when there is a release, for example)?

cryptonector

9 months ago

Tools like this are useful for users who are not power users.

DanielHB

9 months ago

Am I the only one who doesn't like atomic commits (or stacked PRs like graphite)? When I work on large PRs I often rewrite and move things around so much that trying to keep all commits in sync is a nightmare.

I do try to split the work if it is very clearly isolated, but that usually means less than 3 PRs. I have tried graphite `gt absorb` (which might use this project?) and it still creates a mess.

What I do that I wish more people did is that I heavily comment my own PRs with information that doesn't make sense in comments (for example on line X I add a comment: moved here from Y file).

> You have fixes for the bugs, but you don't want to shove them all into an opaque commit that says fixes

I actually like this, but split each fix in its own commit and during review I answer to comments with: "fixed in commit {commit-sha}". So _often_ bugs are introduced during PR review, if the fixes are isolated it is easier to see what changes between review rounds.

alex23478

9 months ago

I think this really boils down how your team is using Git and which code review tool you're using. (I've never used Gerrit personally, but as far as I understand it, we wouldn't have this conversation, since it aims to refine a single change by re-submitting a commit over and over again?)

For GitHub/GitLab reviews, I'm totally with you - this makes it more convenient for the reviewer to check that/how you've responded to feedback.

But now if you merge this without squashing, your Git history on the main branch contains all your revisions, which makes operations like bisecting or blame more complicated.

For me personally, the sweet spot is currently a mix of stacked-commits and the PR workflow: Use a single commit as the unit of review, and polish that commit using a PR, and use the commit descriptions within that PR to tell the story of you responding to feedback. Then, squash merge that commit.

This provides both a good review experience and a tidy history. If you find a strange bug later and can't grasp how someone could have come up with that, the PR still has all the history.

Together with tools such as git-branchless, this also makes working on stacked-PRs a breeze.

DanielHB

9 months ago

I have used standard github and graphite reviews. I tend to prefer what I mentioned in my original post than graphite stacked PR review (which are essentially atomic commits)

Yes I also advocate for squash-before-merge, so a lot of little commits is doesn't show up in the main history.

> For me personally, the sweet spot is currently a mix of stacked-commits and the PR workflow: Use a single commit as the unit of review, and polish that commit using a PR, and use the commit descriptions within that PR to tell the story of you responding to feedback. Then, squash merge that commit.

To me time spent on commit polishing (and dealing with conflicts) is time not spent on product. Author comments on PR review, squash-before-merge, and sit-together with reviewer for big PRs to me seems a better compromise. I don't think super polished git history is worth the extra effort for most types of product, as long as I can track a change down to a PR discussion that is enough to me. From there I can track PR review commit changes individually if needed.

Like it is so uncommon for me to go digging on git history that deeply, usually all I care is "code behaving weird && line changed < 1 month ago then probably a bug introduced then"

Of course if you are working on aviation software and the like maybe the priorities are different. But I have spent way too much time dealing with rebase conflicts when trying to chop up my PRs into smaller commits. Dealing with these conflicts often introduces bugs too.

usr1106

9 months ago

Why are people talking about stacked PRs/MRs? Shouldn't they be called queued? A stack is LIFO and a queue is FIFO.

(Of course in some special case you might want to merge a later one earlier, but I don't think that's the normal case people are talking about.)

DanielHB

9 months ago

Why is it a "Pull Request" instead of a "Push Request"?

Someone named it that way and it stuck.

usr1106

9 months ago

You request others/the maintainer to pull. That was the only way before the forges. I guess gitlab's merge request is more descriptive.