Although I’m generally wary (and weary) of the GTD crowd, I read something interesting lately called “30 Days to a More Organized Life”; and was motivated to implement day six, entitled, “Scan Everything You Can”.
Between work, school, and general mail; there is a lot of paper in my life. Some of it is important, but most of it is trash. Nonetheless I hold on to it “just in case”. For example, bank statements. These are in that gray area between “totally dispensable” and “might need it someday.” It’s hard to say, and this creates two problems. First, I have a bunch of sensitive documents in my apartment, either laying around or filed in a filing cabinet. This in insecure and un-backed-up. Second, it’s just more clutter.
30 Days suggests a remedy:
Sit down at your computer with the scanner on your desk and all your miscelanious [sic] paperwork at your side. Go through each sheet and ask yourself: “do I need this?”
If you don’t need it, throw it in the shredder. If you do, ask yourself, “do I need the original?” If yes, then put it in a “to file” pile. If you don’t need the original, run the document through the scanner.
So when I saw a great deal on a wireless scanner with built-in email functionality, I jumped at the opportunity. Now I have a setup that allows me to do the following:
- Scan documents and email to a dedicated gmail document cache account at the push of a button;
- Download all new email from that account periodically (fetchmail);
- Extract the attachments and file them away intelligently (procmail/python).
The flow is simple; it only took me an hour or so to figure it out and iron out the kinks.
push: scanner -> gmail
pull: cron -> fetchmail -> procmail -> python
Fetchmail pulls the files down from gmail every hour, and passes the result to procmail:
Procmail passes content-type matches to my python script, which does the real work. I get an email for each new scanned file, and it would be pretty trivial to include a link to that file and serve it up on a password-protected webserver:
I did say that the files are archived intelligently, that’s done by munging the subject header of the email message into an archive directory path. For example, if I scan a document and email it to my cache gmail account with the subject “bank.citi.statements” then my python script will put the files attached to that email into $archive_root/bank/citi/statements. Pretty slick. Python, to its credit, makes this kind of stuff stupid-easy. It’s true that python is one of the better-documented languages out there, while also being highly expressive.
And that’s pretty much it. A few ancillary benefits of this email-first approach (as opposed to scanning directly to the local computer, and then batch archiving to gmail) are that the scanner doesn’t have to be made to work with Linux (no small benefit), you still get the cloud-based archive for free, and what I really have here is a completely self-contained system whereby I can email any file as an attachment to a gmail address, and it will eventually end up archived on my server.