_   _          _  ____      
    | | | |        (_)/ ___|     
    | |_| |__   ___ _/ /___  ___ 
    | __| '_ \ / _ \ | ___ \/ __|
    | |_| | | |  __/ | \_/ |\__ \
     \__|_| |_|\___| |_____/|___/
                  _/ |           
                 |__/            

             ┌─────────┐
             │ hire me │
             └─────────┘


Automatic E-Mail attachement extraction

I got a reusable notebook for Christmas which is accompanied by a simple app that makes scanning your notes really easy. Scans of your notes are converted into PDF files which you can send yourself via E-Mail.

All of that is near — but I would prefer having them in a special folder that is synced across my devices as that folder is part of my weekly review.

So an idea popped into my head: Could I configure a mail client that simply saves attachments sent to a special mail address into a folder automatically — similar to how there are special Kindle & Evernote E-Mail addresses that save the contents to the respective services? Turns out, there is.

What you need

  • An always-on Linux computer such as a raspberry pi, a server or a NAS
  • getmail
    • Package named getmail4 on Debian.
  • procmail
  • munpack
    • Package named mpack on Debian.

E-Mail is a pretty clearly defined system that works very unix-y on servers: Multiple tools are involved that all do a single thing — but they do that single thing very well. I this case our stack uses getmail for fetching mail from a server via IMAP or SMTP, procmail for filtering those mails and munpack in order to extract attachments to those mails.

Setting up getmail

In order to use getmail, we will setup a configuration file in ~/.getmail/getmailrc with the following contents:

[retriever]
type=SimpleIMAPSSLRetriever
server=imap.myserver.com
username=my_username
password=my_password

[destination]
type=MDA_external
path=/usr/bin/procmail

[options]
verbose=0
read_all=false
delete=false
delete_after=0
delete_bigger_than=0
max_bytes_per_session=0
max_message_size=0
max_messages_per_session=0
delivered=false
received=false
message_log=~/getmail.log
message_log_syslog=false
message_log_verbose=true

The [retriever] section defines where the mails are being fetched from: In this case using IMAP over SSL from imap.myserver.com using mys username and my_password. getmail ships with retrievers for all major E-Mail protocols which can be seen in the documentation.

The [destination] section then defines what is called an MDA — a Mail Delivery Agent: A different application that will deliver / process the mails. getmail supports a couple of other different destinations but MDA_external is what we need in order to pass on the fetched mails to procmail.

At this point we have successfully configured getmail in order to connect to the IMAP server and fetch E-Mails from it.

Sorting mails using procmail

procmail is a simple application that can be used as an MDA in order to filter and sort Mails into different mailboxes or pass them on to other processes if they match certain criteria. It uses as configuration file in .procmailrc which looks like this:

PATH=/usr/bin:/bin:/usr/local/bin:$HOME/bin:$PATH

# Process all mails that arrive for save-notes@mydomain.com
:0
* ^TOsave-notes@mydomain\.com
| munpack -q -t -C $HOME/dropping_area

The procmailrc format takes some getting used to.

  • :0 denotes the beginning of a new rule
  • * ^TOsave-notes@mydomain\.com defines conditions that must be matched. In this case all mails that are sent to save-notes@mydomain.com are being processed.
  • | munpack -q -t -C $HOME/dropping_area defines the action to take with that mail. I this case the mail is being piped to munpack which extracts all attachments into ~/dropping_area

It’s done

Now, every time getmail is being executed new mails will be fetched from the server, filtered and attachments will be extracted. To periodically execute getmail a simple cronjob can be added for the current user:

*/5 * * * * getmail

Why not use fetchmail?

When researching this topic you will find a lot of solutions using fetchmail instead of getmail.

However, using fetchmail has a major disadvantage: fetchmail fetches all unread messages from the server and marks them as read. This behaviour is not wanted for situations with ‘catchall’ mail addresses, where only a small portion of the E-Mails are actually sent to this special mail address.

getmail tracks which mails have already been processed by using the message id instead of relying on the ‘read’ state on the server thereby not modifying any state on the server itself.