Modifying Your Git History? Here Are a Few Things to Think About

Git has some popular features that make it easy to rewrite the commit history, and in some cases, this is a benefit. However, these features can be unnecessarily confusing, and if used incorrectly, they can cause data loss.

Avoiding lost code is one of the major benefits of version control, so using features that might cause data loss should only be done with careful consideration.

Why Rewrite the History?

One of the most-cited reasons for using Git commands that alter history is that they can help maintain a clean Git log. If you’re working on a feature branch, you might end up with a bunch of commits with messages like “checkpoint” or “wip – breaking.” These can clutter your Git history and make it harder to understand if you read through it later to learn how the branch came together.

Another thing that can clutter your history is the merge commit. Each merge generates a commit, and depending on how fast your project is moving, you might be merging often and generating a lot of these commits.

git commit –amend

Coming up with a descriptive commit message for every single checkpoint commit is overkill. However, git commit --amend is a relatively safe command that helps you keep control of your checkpoint commits.

This command takes your current changes, adds them to the previous commit, and lets you edit your commit message. If you’re inclined to make checkpoint commits, you could instead just amend your previous checkpoint commits until you’ve done enough to warrant a descriptive commit message.

When you amend your commits, you’re losing the history of your development on a particular piece of work. If you tried a few things that didn’t end up working out, all of that information will be lost. You might remember one of these dead ends a few weeks later and find yourself wishing you could dig through the Git history to try and find it, so make sure you’re okay with losing that information before you start amending commits.

Note: If you’ve already pushed a commit that you’ve amended, you’re going to have to push the new commit with the -f flag (force). This is true for all commands that change the Git history. Force pushing is dangerous because once you’ve done it, the single source of truth (the remote repository) loses all knowledge of the history you’ve rewritten or overwritten.

Git Rebase

Rebasing is a technique used to keep a repository clear of merge commits. When you rebase a branch against another branch, Git takes all of the commits on your branch and “replays” them on top of the most recent commit on the base branch.

It applies one commit at a time and prompts you to fix any conflicts that arise before moving to the next commit. This can be problematic for a number of reasons.

Too many conflicts

Say you’ve been working on a branch with 30 commits. Ten commits in, you discovered a much simpler way to accomplish what you wanted to do, so you change your approach and come up with something nicer.

Once you finish, you open your pull request. You start rebasing, and to your horror, Git is asking you to fix conflicts for each of those first 10 commits that didn’t even make the final cut.

This situation often drives people to another history-altering technique, the squash. You could squash these 30 commits down to one, and then you only have to resolve conflicts with the final version of your code, but then you lose the history of those 30 commits. Any knowledge contained in those dead-end attempts is lost.

Manually fixing merge conflicts is another way people lose code. Say you made a change that resulted in a big block of code getting indented, and this flags as a conflict. You might think it’s solely because of the indentation, but maybe while you were off on your own branch doing your own thing, someone changed something really subtle in that block of code.

A visual inspection won’t reveal it unless you read very carefully. Unless you’re using a diff tool that will give you a character-level diff, you might not see it there, either. You blindly accept the indented version of the code, and that subtle change is lost.

Guidelines

I suggest that you use these commands only on branches where you’re working alone. If you’re working with other people, someone pulls down your changes after you’ve pushed a checkpoint commit, and then you continue to amend that commit, the person you’re working with is going to run into issues pulling and pushing.

Getting an error when your push is rejected and then having to merge and potentially fix conflicts is a jarring experience that interrupts the flow of development. Be thoughtful to the other people on your team.

It’s best to avoid changing published history, and don’t push too often. Once you’ve pushed a bunch of commits to the remote repository, it’s better to avoid changing them.

If you accidentally push up a checkpoint commit that you meant to amend, leave it alone. In other words, don’t push after every commit. Changing published history requires you to push with the -f flag, and that’s not something that you want to be part of your muscle memory. Using the -f flag should scare you, and it should never become frequent or automatic. There’s no way to avoid it when you’re rebasing a branch that’s already been pushed, but you can avoid amending commits or squashing commits.

I hope these tips will come in handy when you’re thinking about modifying your Git history. If you have other guidelines, I’d love to hear about them.