Howto fix emails for Cyrus LMTP and IMAP
I, as many others, have been bitten by Cyrus’ strictness when it comes to RFC-compliant email headers. Although it cost me about a full day, I still appreciate that Cyrus interpretes the RFC strictly and thus forces email to be syntactically correct. It may not strictly adhere to the “be liberal in what you accept” approach, but this way is less likely to cause problems later (with IMAP clients, indexing, searching, etc.).
My pain started when I tried to “quickly” transfer my old, legacy mail
tree (there must be some emails from 1998 or so in there, converted now
at least 5 times from different mail spool formats) from a standard
Maildir/ structure (previously served by courier imapd) with
imapsync. This nice tool
can synchronize from one IMAP box to another, and thus avoids the
complexity of converting file formats and structures. I used something
simple
like
imapsync --host1=10.50.50.2 --ssl1 --user1=rene --host2=10.50.50.6 --ssl2 --user2=rene --syncinternaldates
and thought that, after about an hour or so (for the few GB worth of emails in a deep tree structure), I should be ready to switch the DNS entry to the new server. Wrong.
There are a few things that can happen when trying to import old emails into a Cyrus mail store:
- “Message contains invalid header”: This issue happens a lot, and the problem is that there are some lines in the header part of the email that, in my experience, either don’t have a colon after the first word (the infamous “From …” and “>From …” first lines that stem from converting from mbox to Maildir format), or that have headers with an empty value (“X-something-unimportant: “).
- “Message contains invalid header”: The same error message can be caused by a “Message-ID: " entry without a value, i.e. it is related to the second cause above. However, this problem not only affects IMAP but also LMTP delivery into the store.
- “Message contains NUL characters”: This message is pretty descriptive: the mail contains somewhere a NULL character (\0).
As the number of emails in my tree was far too much to check manually why each email failed to import with imapsync, I wrote a small shell script to take care of those issues that I found. It’s a quick and dirty hack that will most probably not catch all possible errors, but it worked for finally importing all of my emails after fixing them with it. It copies files before touching them, so that, if anything goes wrong, you can recover by simply copying back the backup files. The script assumes a directory called Maildir/:
#!/bin/sh
cd
# this removes invalid “From …” headers left over from mbox file
imports
mkdir -p Maildir-backup-files
cd Maildir
find . -type f | while read f; do if head -1 “$f” | egrep -q “^From “;
then mkdir -p “../Maildir-backup-files/`dirname \"$f\”`”; cp “$f”
“../Maildir-backup-files/$f”; awk ‘NR>1’ “../Maildir-backup-files/$f”
> “$f”; fi; done
cd
# the same for “>From …”
mkdir -p Maildir-backup-files4
cd Maildir
find . -type f | while read f; do if head -1 “$f” | egrep -q “^>From “;
then mkdir -p “../Maildir-backup-files4/`dirname \"$f\”`”; cp “$f”
“../Maildir-backup-files4/$f”; awk ‘NR>1’ “../Maildir-backup-files4/$f”
> “$f”; fi; done
cd
# this removes empty headers (with nothing set)
mkdir -p Maildir-backup-files2
cd Maildir
find . -type f | while read f; do if egrep -q “^X-Keywords:\W+$” “$f”;
then mkdir -p “../Maildir-backup-files2/`dirname \"$f\”`”; cp “$f”
“../Maildir-backup-files2/$f”; egrep -v “X-Keywords:\W+$”
“../Maildir-backup-files2/$f” > “$f”; fi; done
find . -type f | while read f; do if egrep -q “^X-MS-Has-Attach:\W+$”
“$f”; then mkdir -p “../Maildir-backup-files2/`dirname \"$f\”`”; cp
“$f” “../Maildir-backup-files2/$f”; egrep -v “X-MS-Has-Attach:\W+$”
“../Maildir-backup-files2/$f” > “$f”; fi; done
find . -type f | while read f; do if egrep -q
“^X-MS-TNEF-Correlator:\W+$” “$f”; then mkdir -p
“../Maildir-backup-files2/`dirname \"$f\”`”; cp “$f”
“../Maildir-backup-files2/$f”; egrep -v “X-MS-TNEF-Correlator:\W+$”
“../Maildir-backup-files2/$f” > “$f”; fi; done
cd
# and this removes NUL characters
mkdir -p Maildir-backup-files3
cd Maildir
find . -type f | while read f; do if [ x”`cat \"$f\” | md5sum`" !=
x"`cat \"$f\" | tr -d ‘\0’ | md5sum`" ]; then mkdir -p
“../Maildir-backup-files3/`dirname \"$f\”`"; cp “$f”
“../Maildir-backup-files3/$f”; tr -d ‘\0’ <
“../Maildir-backup-files3/$f” > “$f”; fi; done
cd
This fixes the import problem, but the issues can also happen during postfix (my choice of MTA) trying to deliver to the cyrus mail spool via LMTP. Fortunately, postfix >= 2.3 comes with options to fix that. First, the empty “Message-ID: " headers can just be discarded with header checks:
/^Message-ID:[[:space:]]*$/ IGNORE
e.g. in a file /etc/postfix/header_checks, which has to be listed in main.cf as
header_checks = regexp:/etc/postfix/header_checks
Getting rid of the NULL characters is even easier (starting with postfix 2.3) with an option in main.cf:
message_strip_characters = \0
This is the “be liberal what you accept but strict what you send approach”. Postfix will accept emails with NULL characters, but remove them before sending them out again or delivering to the local mail spool. Another option (which is said to catch a lot of SPAM and produce no false positives - but I haven’t tried this yet) is
message_reject_characters = \0