10. How to Remove Commits by Their Commit Message

Published Apr 23, 2023


Recently we had to remove a large number of commits from our repository. We have implemented a job queue as a GitHub Action. The queue is implemented using empty Git commits to store the jobs information. So you do not need external services. We call it Git-Queue.

Since we were still using a beta version we decided to make some changes that broke compatibility with the old version. But we already had two demo repositories using the old version. In order to upgrade to the new version we have to remove thousands of commits related to the queue.

Here you can read the original issue.

All the queue commits have a prefix, so the Git history looks like:

* 0830b3c2 - 📝✅: library-update: job.id.1 job.ref.264657b51fbfcf63a0267fa425fd121f5f6781a0 (2022-06-13 10:43:37 +0000) <NautilusCyberneering[bot]>
* 155125d8 - library aaa synced to commit 13db6c9f7c2f9ca7d8ae57b80cbbcc97a51a28a8 (2022-06-13 10:43:36 +0000) <A committer>
* af02b696 - update library aaa to commit 13db6c9f7c2f9ca7d8ae57b80cbbcc97a51a28a8 (2022-06-13 10:43:33 +0000) <A committer>
* 2b110030 - 📝👔: library-update: job.id.1 job.ref.264657b51fbfcf63a0267fa425fd121f5f6781a0 (2022-06-13 10:43:33 +0000) <NautilusCyberneering[bot]>
* 264657b5 - 📝🈺: library-update: job.id.1 (2022-06-13 11:43:00 +0100) <NautilusCyberneering[bot]>

We wanted to remove all commits with the prefix 📝 and also a different prefix we were using before that.

After the first research we found a Stackoverflow question:

Using git filter-branch to remove commits by their commit message

There are a couple of solutions but both of them required to write a little bit of shell script.

One of them uses git filter-branch but it seems that solution squashes the commits. That should not be a problem is our case because all the commits we wanted to delete are supposed to be empty. And we do not want to delete the commit changes.

The other solution uses git rebase an a custom shell script to filter commits and mark them to “drop”.

Although those solutions were perfectly fine we decided to use reposurgeon which is a more powerful solution.

Install reposurgeon

If you are using Ubuntu I should be only:

sudo apt-get install reposurgeon

Reposurgeon has two modes: interactive and non-interactive. If you execute reposurgeon without any argument you could see something like:

$ reposurgeon
reposurgeon% help
6. The Command Interpreter
  1. Command syntax                        syntax*
  2. Finding your way around               help, history, shell, quit
  3. Regular Expressions                   regexp*
  4. Selection syntax                      selection*, functions*
  5. Redirection and shell-like features   redirection*
7. Import and Export
  1. Reading and writing repositories      read, write
  2. Repository type preference            prefer, sourcetype
  3. Rebuilds in place                     rebuild
  5. File preservation                     preserve, unpreserve
  6. Incorporating release tarballs        incorporate
  7. The repository list                   choose, drop, rename
8. Information and reports
  1. Reports on the DAG                    list, index, names, stamp, tags, inspect, graph, lint, when
  2. Statistics                            stats, count, sizes
  3. Examining tree states                 manifest, checkout, diff
9. Surgical Operations
  1. Commit deletion                       squash, delete
  2. Commit mutation                       merge, unmerge, reparent, split, add, remove, tagify, reorder
  3. Branches                              branch, branchlift, debranch
  4. Tags, resets, and blobs               tag, reset, blob, dedup
  5. Repository splitting and merging      divide, expunge, unite, graft
  6. Metadata editing                      msgout, msgin, setfield, attribution, append, gitify, filter
  7. Path reports and modifications        path, setperm
  8. Timequakes and time offsets           timequake, timeoffset
  9. Miscellanea                           renumber, transcode
10. Artifact handling
  1. Attributions                          authors
  2. Ignore patterns                       ignores
  3. Reference lifting                     references, legacy
  4. Changelogs                            changelogs
  5. Clique coalescence                    coalesce
11. Control Options                        options*, set, clear
12. Scripting and debugging support
  1. Variables, macros, and scripts        assign, unassign, define, do, undefine, script, print
  2. Housekeeping                          gc
  3. Diagnostics                           log, logfile
  4. Debugging                             resolve, version, hash, sizeof, strip
  5. Profiling                             elapsed, timing, readlimit, memory, profile, exit
Starred topics are not commands.

You can also create your own script and execute is later. The simplest think you can do is:

reposurgeon "read ." lint

That command will execute to reposurgeon commands:

1. `read .`: it reads the current folder into memory. The current folder must be a repository.
2. `lint`: it checks the repository for errors.

You can also enter is the interactive mode and execute both commands manually.

Reposurgeon is a very big tool with a lot of options. You should read the basic documentation to understand how it works. Basically it imports any kind or repo and creates an internal representation of the repo.

The you can execute some commands to change that representation and finally you can export again the internal representation into a different repo. One of the common tasks is used for is converting from different repositories formats. For example from [SNV](https://subversion.apache.org/) to [Git](https://git-scm.com/).

We are going to explain only a use case here: how to remove commits that start with a given prefix.

## Removing commits that start with a given prefix

Let's first create an empty repo:

mkdir /tmp/remove-commits-example
cd /tmp/remove-commits-example
git init

Now we can add some commits. In order to simplify the example the commits we want to delete start with the prefix drop.

echo "hello world!" > README.md
git add .
git commit -m "add README"
git commit --allow-empty -m "drop: empty commit"

After executing those command we will have two commits:

* 9f606d6 - (HEAD -> main) drop: empty commit (2022-06-13 15:08:01 +0100) <Jose Celano>
* 7533b82 - add README (2022-06-13 15:06:27 +0100) <Jose Celano>

Now we want to remove the commit starting with drop.

We can create a new file called: remove-commits.rs with this content:

# Load the project into main memory
read /tmp/remove-commits-example

# Commit deletion
/drop/c delete

# We want to write a Git repository
prefer git

# Do it
rebuild /tmp/new-remove-commits-example

Then you can run the script with:

reposurgeon "script remove-commits.rs"

All the lines are self-explaining except maybe for the commit deletion one: /drop/c delete.

The deletion command format is: {SELECTION} delete where SELECTION defines what you want to delete. That is very common for reposurgeon commands. The selection argument allows you to define which internal objects you want to act on. There are different types of selections. One of them it a “text search” which is a regular expression.

A text search normally matches against the comment fields of commits and annotated tags, or against their author/committer names, or against the names of tags; also the text of passthrough objects.

In our case the selection /drop/ means that we want to search for all objects containing the word drop.

Since we allow want to delete commits, we can add what reposurgeon calls a “qualifier letter”. The final command contains a c character after the regular expression: /drop/c. That changes the the scope of the search to only the comment text of commit or tag.

After executing our script you can go to the newly generated repo and execute a git log command. You will see the commit starting with drop that was removed:

cd new-remove-commits-example/
* 3d7da50 - (HEAD -> main) add README (2022-06-13 15:06:27 +0100) <Jose Celano>

You can also notice that hte commit hash has for the remaining commit changed from 7533b82 to 3d7da50. That means you have to force push the new repo version to the remote repo.

Another side effect you might have is losing the commit signature.

Back to home

Secure Git Guide © 2023