After importing all my emails from Gmail to my self-hosted IMAP Server (a post about it will follow) with Thunderbird, I’ve noticed that many emails were duplicate.
I’ve searched a bit and I found out that the IMAP implementation of Gmail has the emails catalogued for every label in the respective directory.
As you can imagine every email that had more than one label was copied twice in my IMAP server.
I’ve tried to delete all duplicate messages using the Remove Duplicate Messages Thunderbird add-on, but it didn’t remove all duplicate emails.
I found two identical emails and compared them using a diff tool and the only thing that has different were some headers.
That’s why I decided to build a CLI tool to remove all duplicate emails from the Maildir directory and I built it in PHP, because I’ve more experience in this language.
The project is hosted in GitLab, as I wanted automated testing and GitLab’s CI/CD seems quite easy.
There are some features I want to implement, but it worked fine and I managed to delete more than 1000 emails!