PaulHowarth/Blog/2007-03-30

Friday 30th March 2007

Mail Tidying

Just deleted around a gigabyte of mail (about 25% of the mail on my IMAP server). This was messages more than 6 months old on a few high-traffic mailing lists I'm on. I was comfortable deleting these messages because they're available on a variety of public archive sites. It was easy to write a script to do this since I have the messages stored in maildir format on the server. It's careful to only delete messages containing some list-recognition regex so as to avoid removing personal emails that are stored in the same folder for whatever reason.

The script itself (~/bin/list-cleanse):

# list-cleanse; remove old mailing list messages from folders

# Source configuration
declare -a LIST_NAME LIST_REGEX LIST_FOLDER LIST_RETENTION
source ~/lib/list-cleanse.conf || exit 1

# Generate tempfile
TMPFILE1=/tmp/list-cleanse.$(id -u -n).$$.1
TMPFILE2=/tmp/list-cleanse.$(id -u -n).$$.2
trap 'rm -f $TMPFILE1 $TMPFILE2; exit 1' 1 2 15

# Iterate through lists
LISTNUM=0
while [ -n "${LIST_NAME[$LISTNUM]}" ]; do
        echo "list-cleanse: processing ${LIST_NAME[$LISTNUM]}"
        if [ -z "${LIST_REGEX[$LISTNUM]}" -o -z "${LIST_FOLDER[$LISTNUM]}" -o -z "${LIST_RETENTION[$LISTNUM]}" ]; then
                echo "list-cleanse: list info incomplete for ${LIST_NAME[$LISTNUM]}" 1>&2
                exit 1
        fi
        find ${MAILDIR}/${LIST_FOLDER[$LISTNUM]}/cur -type f > $TMPFILE1
        echo "list-cleanse: files in folder: $(wc -l < $TMPFILE1)"
        rm -f $TMPFILE1
        find ${MAILDIR}/${LIST_FOLDER[$LISTNUM]}/cur -type f -mtime +${LIST_RETENTION[$LISTNUM]} > $TMPFILE1
        echo "list-cleanse: candidates for deletion: $(wc -l < $TMPFILE1)"
        xargs --max-args=1 grep --files-with-matches "${LIST_REGEX[$LISTNUM]}" < $TMPFILE1 > $TMPFILE2
        echo "list-cleanse: matched candidates: $(wc -l < $TMPFILE2)"
        xargs rm -f < $TMPFILE2
        echo "list-cleanse: folder cleansed"
        rm -f $TMPFILE1 $TMPFILE2
        let LISTNUM+=1
done

# Clean up
rm -f $TMPFILE1 $TMPFILE2

The configuration file (~/lib/list-cleanse.conf):

# Folders are relative to here
MAILDIR=$HOME/mail/inbox

LIST_NAME[0]=fedora-list
LIST_REGEX[0]="^List-Id:.*fedora-list\.redhat\.com"
LIST_FOLDER[0]=.Linux.fedora-list
LIST_RETENTION[0]=180

LIST_NAME[1]=fedora-devel-list
LIST_REGEX[1]="^List-Post:.*fedora-devel-list@redhat\.com"
LIST_FOLDER[1]=.Linux.fedora-devel-list
LIST_RETENTION[1]=180

LIST_NAME[2]=fedora-package-review
LIST_REGEX[2]="^List-Id:.*fedora-package-review\.redhat\.com"
LIST_FOLDER[2]=.Linux.fedora-package-review
LIST_RETENTION[2]=180

LIST_NAME[3]=fedora-extras-commits
LIST_REGEX[3]="^List-Id:.*fedora-extras-commits\.redhat\.com"
LIST_FOLDER[3]=.Linux.fedora-extras-commits
LIST_RETENTION[3]=180

LIST_NAME[4]=fedora-extras-list
LIST_REGEX[4]="^List-Id:.*fedora-extras-list\.redhat\.com"
LIST_FOLDER[4]=.Linux.fedora-extras-list
LIST_RETENTION[4]=180

Result:

[paul@goalkeeper lib]$ du -ks ~/mail/inbox
3687784 /home/paul/mail/inbox
[paul@goalkeeper lib]$ list-cleanse 
list-cleanse: processing fedora-list
list-cleanse: files in folder: 10453
list-cleanse: candidates for deletion: 19
list-cleanse: matched candidates: 10
list-cleanse: folder cleansed
list-cleanse: processing fedora-devel-list
list-cleanse: files in folder: 21690
list-cleanse: candidates for deletion: 13893
list-cleanse: matched candidates: 13883
list-cleanse: folder cleansed
list-cleanse: processing fedora-package-review
list-cleanse: files in folder: 32573
list-cleanse: candidates for deletion: 14518
list-cleanse: matched candidates: 14517
list-cleanse: folder cleansed
list-cleanse: processing fedora-extras-commits
list-cleanse: files in folder: 59525
list-cleanse: candidates for deletion: 41129
list-cleanse: matched candidates: 41124
list-cleanse: folder cleansed
list-cleanse: processing fedora-extras-list
list-cleanse: files in folder: 29354
list-cleanse: candidates for deletion: 25577
list-cleanse: matched candidates: 25493
list-cleanse: folder cleansed
$ du -ks ~/mail/inbox
2544208 /home/paul/mail/inbox

I already see the benefit of this is faster mail reading times, and of course there will be a benefit next time I do a full backup of the server.


Recent