Backup up your GoogleMail locally with getmail

Backup up your GoogleMail locally with getmail


To the chagrin of their competitors, GoogleMail seems to have become almost as synonymous with webmail as Google has with search engine (recently my six year old was explaining to me how he Googled for something at school). GoogleMail is a useful tool and has a lot of advantages over traditional client-server mail accounts, particularly if you are on the move. To be honest those sorts of advantages are present in pretty much any webmail setup: I'm just concentrating on GoogleMail because it's by my experience the most popular. But GoogleMail has one disadvantage, all your messages are stored on Google's servers. If you lose access to Google service or to your account then you lose your e-mails. Fear not oh free software lover, help is at hand in the form of the very useful getmail.

Getmail is a command-line tool for retrieving e-mail from remote servers and delivering into a local mailstore. That mailstore might be an mbox file or a Maildir directory. Personally I prefer the latter because it means I'm no longer relying on one ever-increasing file for my mail.

Installing

To be honest I'm not sure why I still include this section in these types of articles, installing software on modern GNU/Linux systems is such a breeze and I'm guessing most of you are using one. Search your usual package manager for getmail and install from there. You can download the source and build it from the getmail homepage if you want to. Me, I just ran apt-get install getmail from a shell. There are few dependencies for getmail. It's a python script, so Python (2.4) is a requirement but that's about it.

Configure GoogleMail

For getmail to retrieve your GoogleMail you'll need to tell Google to permit access from outside of their systems. Getmail supports POP3, POP3-over-SSL, IMAP4, IMAP4-over-SSL, and SDPS mail accounts either as single or multi-drop accounts (where several e-mail accounts end up in one inbox). For this tutorial I'm going to use POP3 but GoogleMail supports IMAP as well.

Getmail supports POP3, POP3-over-SSL, IMAP4, IMAP4-over-SSL, and SDPS mail accounts either as single or multi-drop accounts

  • First log into GoogleMail and then go to the Settings section (there should be a link on the top right for this)
  • Now goto the Forwarding and POP/IMAP tab
  • Select Enable POP for all mail. Later you can go back and change this to Enable POP for mail that arrives from now on but for now we want to back up all your GoogleMail. (see note below)
  • Choose an appropriate option under step 2. For mine I leave it as keep GMail's copy in the Inbox
  • Click Save changes

Configure getmail

You can have getmail retrieve your GoogleMail by passing it command-line parameters, but I'm guessing you'll want to do this again sometime, so it's best to use a getmail config file. Unless you provide a full path to the config file, getmail will presume it is within its default directory. On most system this will be .getmail under the user's home directory. You can of course put the files anywhere as long as they're readable by the user running getmail. If you do not pass it a config file path, getmail will look for the default one of ~/.getmail/getmailrc ("~" is a shortcut for the current user's home directory). For simplicity we'll use that, but if you have multiple accounts then you can create multiple config files and tell getmail to use them by starting it with getmail -r <your config file>.

You can have getmail retrieve your GoogleMail by passing it command-line parameters, but I'm guessing you'll want to do this again sometime, so it's best to use a getmail config file

For now create a file called ~/.getmail/getmailrc and enter the following into it, substituting your GoogleMail address as the username where required:

[retriever]
type = SimplePOP3SSLRetriever
server = pop.gmail.com
username = YourGoogleMailAddress

[destination]
type = Maildir
path = ~/gmailBackup/

[options]
verbose = 1
message_log = ~/.getmail/gmail.log

The message_log line is optional but can be useful if you want to keep track of what getmail has done. There are some other options you might like to consider here. The type setting can be Maildir or mbox if you prefer and the path setting can be any directory/file the user running getmail has write permissions for. the verbose setting dictates how much information is sent to the terminal while getmail is running. Setting this to 1 will echo the messages it is retrieving as it goes. Other options are 0 for only warnings and errors and 2 for pretty much everything. 1 seems a happy medium to me.

Running it

That's all there is to the config file so once it has been created you can run getmail. Open a terminal and enter getmail followed by Enter. If you have given the config file another name you'll need to tell getmail what it is with getmail -r filename. if the file is not in the default directory you should also give the full path getmail -r /path/to/your/config/file. When you run it, getmail will ask for your GoogleMail password. If you don't want to type this every time you can put it in the retriever section of the config file. Below the username line add password = YourGoogleMailPassword. Personally I don't like to do such things as somebody getting access to my laptop could then access my GoogleMail account, but if you run getmail as a cron job (scheduled event) then you might need to add the password line.

Each time you run getmail, it retrieves the mail it has not seen before. GoogleMail is setup so that mail you have sent is in the same mailbox as mail you have received. This means getmail will retrieve the All mail folder from GoogleMail. It will not retrieve anything from the Spam older. Once you have your GoogleMail on your local machine, you might want to read it. You can point pretty much any mail client at the backup file/directory. Personally to keep things simple I use the excellent mutt as it provides fast and simple access without needing to set up accounts and the like that some of the desktop mail clients require you to.

Automating it

If you use your GoogleMail a lot you'll probably want to perform this backup quite regularly. You can of course remember to run it every so often but you can also use the cron tool in your Unix-like system to automate the backup. This isn't a tutorial on cron, but for now here's a few simple steps. Type crontab -e into a shell and your cron file will open in a text editor. This is where you add commands you want to run at scheduled times. Move to the end of the file and add the following line:

10 10 * * 1-5 /usr/bin/getmail -r /full/path/to/your/config_file >/dev/null

The format from cron commands is minute hour day-of-month month day-of-week command so this will run at 10:10 AM every weekday of every month (* is a wildcard). Note that you must include the full path to any config file you give, shortcuts like "~" won't always work in cron environments. Of course you don't need to specify one at all if you are using the default config filename but I always think it's better safe than sorry. The 1>/dev/null bit on the end means that any output from the command will be discarded. This is fine because getmail is logging everything anyway, and if we didn't do this the system would likely send you an e-mail with the list of mail that was retrieved. Errors from running the command (for example if you've not entered the command properly) will be sent to you but you can discard them by changing > /dev/null to 2&1> /dev/null.

Notes and caveats

  • Leaving POP access on your GoogleMail account open does make it less secure than closing it. Someone would still need to either know your password or be able to crack it but the risk is there. If you want to be extra careful you can always go back and reverse these steps once getmail has done its thing.
  • Getmail can be used on any remote mail account that supports the protocols it supports. This means you should be able to use it on other webmail services, but I can't guarantee that as I've not tried.
  • Getmail as the name suggests is for mail only. It won't help you backup Google calendar or docs. To backup Google calendar you can export it similarly with Google docs.
  • I'm not responsible for your e-mail. if you follow these steps and something goes wrong I can't responsible as things change and I can only report what works when I try. That said, I hope this helps.
Category: 

Comments

Terry Hancock's picture

You know, this is something I've been increasingly concerned about -- I've been using Google mail for awhile now, and it's starting to represent a significant amount of my correspondence record, which is not so easy to download using the standard web interface.

Thanks for letting me know about a very useful tool!

Ryan Cartwright's picture

You're welcome!

One thing I forgot to mention is that Google seems to put a limit on the number of messages you can download via POP in any one attempt. This is largely undocumented but for me it seems to be around 400 messages.

You can workaround this by running getmail again afterwards at which point it retrieves the next 400 messages. Of course this will only be a problem if you leave it so long between POP downloads that you have in excess of 400 messages to download. In my case that's probably longer than a week. A regular cron job would resolve this also.

Ryan
--
Equitas IT Solutions - fairness, quality, freedom
http://www.equitasit.co.uk

pterion's picture
Submitted by pterion on

Thanks for a very informative article!

Once you've backed up your mail from GMail what format does it have. I.e. can you upload it onto a private webserver and still access it with the usual client from there? How read, search etc. the mail you've just backed up?

Regards,
Daniel

Ryan Cartwright's picture

Hi,

As I say in the article, the mail is in MailDir format. that means each message is a separate textfile and they are in turn stored in cur. new and tmp directories. To read it locally I use Mutt in a shell which is probably the simplest way to do it as you just run mutt -f <path to maildir> in the shell.

Putting the archive into another mailserver is possible but the method you use will depend upon which server (MTA) you are using. If you are looking to read your GMail via another account, Google provide a forwarding service which will save you downloading the messages as I explain here.

hope this helps

Ryan

Equitas IT Solutions - fairness, quality, freedom http://www.equitasit.co.uk

Ryan Cartwright's picture

Hmm interesting,

I've not encountered that before. Getmail should create the for you. What distro are you using?
--
Equitas IT Solutions - fairness, quality, freedom
http://www.equitasit.co.uk

Author information

Ryan Cartwright's picture

Biography

Ryan Cartwright heads up Equitas IT Solutions who offer fair, quality and free software based solutions to the voluntary and community (non-profit) and SME sectors in the UK. He is a long-term free software user, developer and advocate. You can find him on Twitter and Identi.ca.