- Things to check or fix before the migration
- Launching the import process
- Solutions for migration problems
Things to check or fix before the migration
Mailman 2 was more lax about headers and we found problems which can hinder the migration.
Wrong date format
We found posts with dates using
GMT+00:00, which is not a proper timezone specification, but you can easily fix this error with the following one-liner:
sed -ri 's/\(GMT\+00:00\)/(GMT)/' /var/lib/mailman/archives/private/*.mbox/*.mbox
Some messages may lack a Message-Id field entirely and this information is lost. Without this field it is impossible to import.
Hyperkitty ≥ 1.2 automatically fixes it but earlier versions need the following workaround.
With this Ruby script (and associated Gemfile) you can generate fake unique Message-Id field for posts lacking it. The association is kept in the
new_message_ids.yml file so it is safe to run it multiple times as generated value will be stable (useful if you sync the list mbox regularly before the final switch to production). The mbox files are found in the usual
Procedure to run the script:
bundle install bundle exec ./mailman2_archive_fix.rb
Cleaning the previous search index
If you attempted an import previously then it is recommended to purge the previous indexes, as the index regeneration would just add data and it can take quite some space.
rm -rf /var/www/mailman/fulltext_index mkdir /var/www/mailman/fulltext_index chown mailman_webui: /var/www/mailman/fulltext_index chmod 0755 /var/www/mailman/fulltext_index
Launching the import process
To loop on each mailing-list and simplify the process it is recommended to use a script made by Fedora folks and installed by the
/var/www/mailman/bin/import-mm2.py -d <mail-domain> /var/lib/mailman/
If you need to skip some lists from being imported, you can provide a comma separated list using the
Afterwards, the search index needs to be regenerated:
ionice -c3 django-admin update_index --pythonpath /var/www/mailman/config --settings settings_admin
This can take many hours depending on the size of the imported data, but the installation can go to production without waiting for it to complete.
Solutions for migration problems
UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0x?? in position ??: ordinal not in range(??)
This is caused by badly encoded mail headers. Currently experience showed only SPAM produced such broken emails.
On Hyperkitty 1.1.5, it is possible to skip these emails and continue importing the rest of the mailbox using this patch.
DataError: invalid byte sequence for encoding “UTF8”:…
It is a variant of the previous problem but in this case the importer script skips the bad email despite the trace.
Nevertheless the previous patch is probably necessary as the import script is probably going to stop processing further lists.
RuntimeError: maximum recursion depth exceeded while calling a Python object
Hyperkitty links the posts of every threads to be able to navigate between them. If a thread is very long (>1000 posts), then the program will crash; we found this situation in archives of CI build notifications. It is possible to increase the maximum using this patch.