Say someone finds a bug in your software. They're running release 4.1.2. They'd just upgraded from 2.6.4, which is three years old, where what they're doing used to work.
You put together a test case and sure enough, it fails on 4.1.2 but passes on 2.6.4. But when did it get broken, and how? The software is complex and you can't see an obvious problem. And looking at the history, there's 4000 commits between those two tagged releases. It would take forever to test them all.
You can do something smarter, though. You pick a point somewhere in the middle. Maybe it's around version 3.2.0. You test that version, and it passes. So - assuming the bug only got introduced once - the problematic commit lies somewhere between 3.2.0 and 4.1.2. You pick a version between those, and repeat the process. Rather than testing 4000 different versions, you only have to test about log2(4000) = ~12 versions of the code.
The git bisect command is designed to help you with this. If you run the following commands:
$ git bisect start
$ git checkout v3.2.0
$ <run test, which passes>
$ git bisect good
$ git checkout v4.1.2
$ <run test, which fails>
$ git bisect bad
...then git bisect will take over, choose the commit half way between those, check it out for you and prompt you to run your tests and report the result with 'git bisect good' or 'git bisect bad'. Then depending on the result, it will choose the next commit to test, and repeat the process until it can tell you exactly which commit introduced the bug.
But maybe your test takes a while to run. Even with fewer intermediate versions to test, you're going to spend all day running tests, waiting for the results and telling git bisect what to do next.
So you automate your test into a little script. If the test passes, it exits with a result of zero (success). If it fails, it returns non-zero.
And then you run:
$ git bisect run ./my-test-script
and go do something else with your day. When you come back, it will have automatically found the commit that introduced the error. Magic!
But there's a cost. For this to work, you need discipline from day one. You can't have commits in the history that say "WIP, changed some stuff, not finished yet" and others that fix things up later in the branch. You need to make sure each commit is a fully self-contained change that leaves the code in a working, testable state.
If you have a small number of individual commits at which the code won't build cleanly, or which break testing for whatever reason, you can work around them with 'git bisect skip' - and you can implement this in your test script with a special return code. But if this happens too much, the whole approach becomes unmanageable.
Sort of. The only issue with merges in GitHub tends to be that merge or squashed commit itself isn’t tested before being committed. It’s always possible, though in practice this should be quite rare, that the merge or squash commit produces a different result than any of the individual commits did. I suppose you could do something like a pre-push hook that would run tests before sharing your code with everyone else, though I haven’t looked recently at what options are available with GitHub.
Edit: I stand somewhat corrected, seems most CI systems actually test the merged code - https://github.com/actions/checkout/issues/15#issuecomment-5... which I presume includes manually merged scenarios also. That said, they don’t appear to test squashed commits, under the assumption I suppose that any series of sequential changes will always cleanly squash with upstream as long as there are no merge conflicts.
When you use squash-and-merge approaches to GitHub PRs, you have to make sure you automatically delete old branches, as the connection between branch and commit only lives within the GitHub PR at that point (and maybe in a commit comment somewhere, not sure). But it’s not as explicit as merges which include the name of the branch in the merge commit.
> seems most CI systems actually test the merged code
The thrust of your point is right, though. CI tests the merged squash-commit after it's already on master, so you can still get into states where you have a failing build on master.
That risk is mitigated somewhat if you block the merging of out-of-date PR branches but this isn't supported on all the platforms.
You put together a test case and sure enough, it fails on 4.1.2 but passes on 2.6.4. But when did it get broken, and how? The software is complex and you can't see an obvious problem. And looking at the history, there's 4000 commits between those two tagged releases. It would take forever to test them all.
You can do something smarter, though. You pick a point somewhere in the middle. Maybe it's around version 3.2.0. You test that version, and it passes. So - assuming the bug only got introduced once - the problematic commit lies somewhere between 3.2.0 and 4.1.2. You pick a version between those, and repeat the process. Rather than testing 4000 different versions, you only have to test about log2(4000) = ~12 versions of the code.
The git bisect command is designed to help you with this. If you run the following commands:
$ git bisect start
$ git checkout v3.2.0
$ <run test, which passes>
$ git bisect good
$ git checkout v4.1.2
$ <run test, which fails>
$ git bisect bad
...then git bisect will take over, choose the commit half way between those, check it out for you and prompt you to run your tests and report the result with 'git bisect good' or 'git bisect bad'. Then depending on the result, it will choose the next commit to test, and repeat the process until it can tell you exactly which commit introduced the bug.
But maybe your test takes a while to run. Even with fewer intermediate versions to test, you're going to spend all day running tests, waiting for the results and telling git bisect what to do next.
So you automate your test into a little script. If the test passes, it exits with a result of zero (success). If it fails, it returns non-zero.
And then you run:
$ git bisect run ./my-test-script
and go do something else with your day. When you come back, it will have automatically found the commit that introduced the error. Magic!
But there's a cost. For this to work, you need discipline from day one. You can't have commits in the history that say "WIP, changed some stuff, not finished yet" and others that fix things up later in the branch. You need to make sure each commit is a fully self-contained change that leaves the code in a working, testable state.
If you have a small number of individual commits at which the code won't build cleanly, or which break testing for whatever reason, you can work around them with 'git bisect skip' - and you can implement this in your test script with a special return code. But if this happens too much, the whole approach becomes unmanageable.