http://downbe.at/nl/ atom 2010-07-19T00:02:39Z Noah http://downbe.at tag:downbe.at/nl,2010-07-19:/2010/07/18/riddim-a-python-streaming-audio-server/ RiDDiM: A streaming audio server in python 2010-07-19T00:02:39Z 2010-07-19T00:10:25Z <p><img width='200px' src='http://upload.wikimedia.org/wikipedia/commons/e/e3/Flag_of_Ethiopia_(1897).svg' alt='lion of judah.svg' /></p> <h3>1.0</h3> <p>For some time I was looking for a way to stream music from my home server to other computers, across the internet. SHOUTcast accomplishes this, but it&#8217;s closed-source, seems essentially unlicensed, and comes with a pretty spooky <a href='http://www.shoutcast.com/disclaimer.phtml'>disclaimer</a>. At this point using closed source is pretty much out of the question, especially for network services hung off the wan. There are other projects out there, but mostly they seem abandoned and/or to be suffering from the same opacity as SHOUTcast.</p> <p>So what&#8217;s an enterprising hacker to do? Not sure. But I rolled up my sleeves, broke out $<span class="caps">EDITOR</span> and python, and came up with a streaming audio server that I call <a href='http://github.com/noah/riddim'>riddim</a>.* Right now it only handles mp3, but it has a few nice features, including persistent playlist management through an on-disk cache, fast forward/rewind, metadata, and because it&#8217;s multi-threaded, multiple clients. I have added in remote procedure calls with python&#8217;s xmlrpclib, part of the standard library, so everything can be remotely administered.</p> <p>My design goal for this project was to keep it as simple as possible, while creating functionality that a streaming music server should probably do, all while minimizing dependencies. Right now the only non-standard python library required is <strong>mutagen</strong>, but riddim could be made to work without it. I have not done any benchmarks for multiple clients, but it should scale to support quite a few because it&#8217;s an extremely lightweight model. <span class="caps">RAM</span> and <span class="caps">CPU</span> usage is virtually zero with one client connection. That&#8217;s significant, because riddim replaces my old setup, which involved mounting directories remotely with sshfs, and that is pretty resource-intensive. And if you think about it, it really doesn&#8217;t make sense to waste cycles encrypting mp3 data. That wastefulness was probably my main motivation for spending time writing riddim.</p> <h3>Yes, but what does it <em>do</em>?</h3> <p>The basic flow is this: start up the riddim server, enqueue some tracks. This is done with riddim.py. The server waits for an <span class="caps">HTTP</span> connection on port 18944, and when it gets a client connection it starts streaming mp3 data into which metadata has been chunked to the client. Simultaneously, the server listens for <span class="caps">RPC</span> connections, which it delegates to the proper methods.</p> <h3>Patent (not) pending</h3> <p>As an aside: this is a project that&#8217;s been knocking around in my head for quite some time, and I&#8217;m pretty satisfied with how it turned out. I think it&#8217;s illuminating in this case to note how the unformed idea, months in incubation, swelled up into my consciousness, until the imperative to make it real became overwhelming, at that point I expressed the idea in code. Patent lawyers make a lot of the idea/expression dichotomy, and a basic tenet of patent law is that ideas are not patentable, while expressions are. Maybe now that I have expressed this idea I should try to get a patent on it. Assuming of course that there&#8217;s no prior art on the subject. ;) In sum, I&#8217;m not sure if riddim will be useful to other people, but it&#8217;s something I&#8217;m going to be using for years to come.</p> <p>*(According to wikipedia, riddims are instrumental song versions, usually consisting of a bassline and percussion.)</p> tag:downbe.at/nl,2010-06-14:/2010/06/14/latex-tips-xetex-and-makefiles/ LaTeX Tips: XeTeX and Makefiles 2010-06-14T17:40:59Z 2010-06-14T18:23:56Z <p>Any geek worth his salt knows that there is only one word processor; and that is LaTeX, of course. After all, when <a href='http://en.wikipedia.org/wiki/LaTeX#Pronouncing_and_writing_.22LaTeX.22'>a pronunciation guide</a> is required to say the very <em>name</em> of a piece of software, how can potential challengers hope to compete?</p> <h4>Makefiles For Speed</h4> <p>Simple Makefiles and vim-wizardry add even more awesome to LaTeX:</p> <pre> main: xelatex file.tex </pre> <p>The above compiles file.tex into file.pdf with <a href='http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=xetex'>XeTeX</a>. (XeTeX is basically Knuth TeX + unicode.)</p> <p>If you use <a href=''>evince</a> as a pdf viewer, the change will become immediately visible. Thus, it&#8217;s nice to have an editor window and an evince pdf viewer window open side-by-side. This speeds up the the edit-compile-run workflow about as much as is humanly possible.</p> <p>If you&#8217;re using vim, you can do this after editing:</p> <pre> :w CTRL-z % make &lt;look at pdf&gt; % fg &lt;make more edits&gt; . . . </pre> <p>However, this is faster:</p> <pre> :w|!make &lt;look at pdf&gt; &lt;make more edits&gt; . . . </pre> <p>If you&#8217;re not using vim, kill yourself (I kid!)</p> <h4>XeTeX Makes It Hot</h4> <p>I&#8217;m using XeTeX exclusively these days because of its support for <a href='http://en.wikipedia.org/wiki/OpenType'>OpenType</a> fonts. These fonts look great and are very easy to use.</p> <p>This is about as simple as it gets:</p> <script src="http://gist.github.com/438045.js"></script><p>The argument to <code>\setromanfont</code> can be anything, provided it&#8217;s an installed font. The list can be gotten by <code>otfinfo --family somefont.otf</code>.</p> tag:downbe.at/nl,2010-06-02:/2010/06/02/rtorrent-organize-torrents-by-tracker/ <code>rtorrent:</code> organize .torrents by tracker 2010-06-03T03:42:21Z 2010-06-05T15:53:41Z <p>In the words of its creator, <a href='http://libtorrent.rakshasa.no'>libtorrent</a> is:</p> <blockquote> <p>[A] BitTorrent library written in C++ for *nix, with a focus on high performance and good code. The library differentiates itself from other implementations by transfering directly from file pages to the network stack.</p> </blockquote> <p>I have been using rtorrent&#8212;a text-based ncurses client written on top of libtorrent&#8212;for a few years now, and it is by far one of the most solid pieces of software I have yet to encounter. As of this writing, the latest version of rtorrent is:</p> <pre> % pacman -Q rtorrent rtorrent 0.8.6-2 </pre> <p>One of the benefits of rtorrent is that it supports huge numbers of seeding files, with a low memory footprint. I have had hundreds of torrents seeding simultaneously, and rtorrent only consumes a few megabytes of <span class="caps">RAM</span>. This encourages me to seed my torrents for a long time, weeks and in some cases, months. It&#8217;s also nice that rtorrent doesn&#8217;t require Xwindows. However, a problem created by this resource efficiency is that the download directories (of which there is only one by default) become quite crowded without constant pruning. Reading 20 pages of output from <code>ls|less</code> to find out what&#8217;s been downloaded sucks.</p> <p>Luckily, the rtorrent devs have a solution for this problem, which is a series of <a href='http://libtorrent.rakshasa.no/wiki/RTorrentCommandsRaw'>hooks</a> that allow registration of event callbacks; for example, when a torrent download completes.</p> <p>Combining some of those hooks</p> <script src="http://gist.github.com/423418.js"></script><p>And this directory structure,</p> <pre> /download ├── tracker1.com ├── tracker2.com ├── torrents │   └── .sessions │   └── tracker1.com │   └── tracker2.com └── queue </pre> <p>makes rtorrent <code>mv</code> finished torrents to a directory on a per-tracker basis, and updates the torrent files with the new (target) directory path. For instance, when a torrent registered at tracker1.com finishes, it is <code>mv</code>ed into /download/tracker1.com, and the session torrent&#8217;s directory path is updated to reflect that change (this is important so that when rtorrent is closed and re-opened, it doesn&#8217;t start downloading the file again, instead it checks the hash of the original). Thus, the .sessions directory is <span class="caps">NOT</span> optional.</p> <p>The <a href='http://code.google.com/p/automatic-save-folder/'>Automatic Save Folder</a> firefox extension can be used to automatically create the torrents/ hierarchy.</p> <p>It&#8217;s hard to complain about rtorrent, but my chief criticism is the rtorrent.rc file&#8217;s hacky configuration syntax. It is positively <em>terrible</em>; seemingly made up by the authors in an attempt to create Yet Another Configuration Syntax, and with the added demerit of being very finicky about escaping of characters. (However, in my experience, this enamoredness with creating ad-hoc configuration languages is shared by many C/C++ programmers.) Sometimes syntax errors throw an error, sometimes they fail silently. Limited debugging help is available in the client by pressing lowercase &#8216;L&#8217;. With a program as otherwise clean and featureful as rtorrent, it seems like such a waste to have a configuration syntax that is constantly in flux from rtorrent version to version, and is constantly being re-written. If the devs had used something like <span class="caps">YAML</span> from the get-go this wouldn&#8217;t be a problem and wouldn&#8217;t make my eyes bleed.</p> tag:downbe.at/nl,2010-05-29:/2010/05/28/iptables-is-for-masochists/ <code>iptables</code> is for masochists 2010-05-29T11:01:57Z 2010-06-03T01:18:29Z <p><strong>Sun May 30 20:50:58 <span class="caps">CDT</span> 2010</strong>:<br /> This post seems to have stirred up some anti-shorewall, pro-iptables angst. To that end, I would recommend checking out <a href="http://conntrack-tools.netfilter.org/">conntrack-tools</a> and <a href="https://docs.google.com/viewer?url=http://people.netfilter.org/pablo/docs/login.pdf">netfilter hacks</a> (alpha-masochists only!).</p> <p>Also <code>cat</code>ting <code>/proc/net/ip_conntrack</code> isn&#8217;t a bad idea.</p> <p>. . .</p> <p>Writing iptables rules by hand is for <strong>masochists</strong>. The <code>iptables</code> syntax is needlessly complicated, and this complexity has no benefit. Additionally, writing <code>iptables</code> rules by hand violates the <a href="http://www.faqs.org/docs/artu/ch01s06.html#id2878742">Rule of Generation</a>. It is also difficult to tell whether <code>iptables</code> is actually working as it should.</p> <p>Enter <a href="http://www.shorewall.net/">shorewall</a>, a rule engine that abstracts away the nitty gritty details of <code>iptables</code>, and makes the rules themselves more stateful than the combination of command-line entries and raw syntax of <code>/etc/iptables/*.rules</code>, by relegating them to text files with macro support for common port configurations.</p> <p>Shorewall is &#8220;iptables made easy&#8221; and it is also iptables made more powerful. For example, it is trivial to set up a <span class="caps">DMZ</span> with shorewall. What is <a href="http://www.cyberciti.biz/faq/linux-demilitarized-zone-howto/">overly complicated</a> with <code>iptables</code>, is a few lines of configuration code with shorewall.</p> <p>An syntactic example will illustrate the above points nicely. Which is clearer, as an example of opening some oft-used ports? Iptables,</p> <pre> iptables -A OUTPUT -p tcp --dport ftp -j ACCEPT iptables -A OUTPUT -p tcp --dport ssh -j ACCEPT iptables -A OUTPUT -p tcp --dport www -j ACCEPT iptables -A OUTPUT -p tcp --dport https -j ACCEPT </pre> <p>Or shorewall?</p> <pre> ACCEPT net $FW tcp 21,22,80,433 </pre> <p>Aside from the verbosity of the rules, there is another reason why using <code>iptables</code> directly is a bad idea: As <a href="http://www.slideshare.net/linuxawy/packet-filtering-using-iptables">noted</a> by Ahmed Mekkawy, the main problem with userspace firewalls is that they are inherently <em>volatile</em>. In other words, because software firewalls have to be applied to change the state of a running system using an init script or daemon, a good sysadmin has cause to wonder, &#8220;is my firewall working? Did I forget to restart the daemon after the last reboot? Did an upgrade break it?&#8221;</p> <p>With <code>iptables</code>, the only surefire way to answer these questions is to use <code>nmap</code> or <code>iptables-save</code> to check that the rules are in place. However, a properly-configured system may have hundreds of rules, which makes scanning the output for correctness a headache. And port scans take time.</p> <p>Shorewall provides an answer to the &#8220;am I running&#8221; question,</p> <pre> # shorewall status Shorewall-4.4.3 Status at downbe - Fri May 28 13:27:16 CDT 2010 Shorewall is running State:Started (Fri Apr 9 17:24:07 CDT 2010) </pre> <p>as well as the &#8220;which rules are defined&#8221; question:</p> <pre> # shorewall show Shorewall 4.4.3 filter Table at downbe - Fri May 28 13:28:58 CDT 2010 Counters reset Fri Apr 9 17:24:07 CDT 2010 Chain INPUT (policy DROP 0 packets, 0 bytes) pkts bytes target prot opt in out source destination . . . </pre> <p>In sum, accessing iptables directly is completely unnecessary, not to mention error-prone. Raw <code>iptables</code> rules are less stateful than other solutions, and exist at a too-low level of abstraction. So unless you are a masochist, you should probably be using shorewall.</p> tag:downbe.at/nl,2010-02-07:/2010/02/07/going_paperless_with_procmail_fetchmail_gmail_python_and_a_scanner/ Going paperless with procmail, fetchmail, gmail, python, and a scanner 2010-02-07T10:06:57Z 2010-05-29T11:01:39Z <p>Although I&#8217;m generally wary (and weary) of the <span class="caps">GTD</span> crowd, I read something interesting lately called &#8220;30 Days to a More Organized Life&#8221;; and was motivated to implement day six, entitled, <a href="http://silverclipboard.com/time-management-tips/30-days-to-a-more-organized-life-day-6-scan-everything/">&#8220;Scan Everything You Can&#8221;</a>.</p> <p>Between work, school, and general mail; there is a lot of paper in my life. Some of it is important, but most of it is trash. Nonetheless I hold on to it &#8220;just in case&#8221;. For example, bank statements. These are in that gray area between &#8220;totally dispensable&#8221; and &#8220;might need it someday.&#8221; It&#8217;s hard to say, and this creates two problems. First, I have a bunch of sensitive documents in my apartment, either laying around or filed in a filing cabinet. This in insecure and un-backed-up. Second, it&#8217;s just more clutter.</p> <p>30 Days suggests a remedy:</p> <blockquote> <p>Sit down at your computer with the scanner on your desk and all your miscelanious [sic] paperwork at your side. Go through each sheet and ask yourself: &#8220;do I need this?&#8221;</p> <p>If you don&#8217;t need it, throw it in the shredder. If you do, ask yourself, &#8220;do I need the original?&#8221; If yes, then put it in a &#8220;to file&#8221; pile. If you don&#8217;t need the original, run the document through the scanner.</p> </blockquote> <p>So when I saw a great deal on a wireless scanner with built-in email functionality, I jumped at the opportunity. Now I have a setup that allows me to do the following:</p> <ol> <li>Scan documents and email to a dedicated gmail document cache account at the push of a button;</li> <li>Download all new email from that account periodically (fetchmail);</li> <li>Extract the attachments and file them away intelligently (procmail/python).</li> </ol> <p>The flow is simple; it only took me an hour or so to figure it out and iron out the kinks.</p> <pre> push: scanner -&gt; gmail pull: cron -&gt; fetchmail -&gt; procmail -&gt; python </pre> <p>Fetchmail pulls the files down from gmail every hour, and passes the result to procmail:<br /> <script src="http://gist.github.com/297360.js?file=gistfile1.sh"></script></p> <p>Procmail passes content-type matches to my python script, which does the real work. I get an email for each new scanned file, and it would be pretty trivial to include a link to that file and serve it up on a password-protected webserver:<br /> <script src="http://gist.github.com/297362.js?file=gistfile1.sh"></script></p> <p>I did say that the files are archived intelligently, that&#8217;s done by munging the subject header of the email message into an archive directory path. For example, if I scan a document and email it to my cache gmail account with the subject &#8220;bank.citi.statements&#8221; then my python script will put the files attached to that email into <code>$archive_root/bank/citi/statements</code>. Pretty slick. Python, to its credit, makes this kind of stuff stupid-easy. It&#8217;s true that python is one of the better-documented languages out there, while also being highly expressive.<br /> <script src="http://gist.github.com/297363.js?file=scanner_cache.py"></script></p> <p>And that&#8217;s pretty much it. A few ancillary benefits of this email-first approach (as opposed to scanning directly to the local computer, and then batch archiving to gmail) are that the scanner doesn&#8217;t have to be made to work with Linux (no small benefit), you still get the cloud-based archive for free, and what I really have here is a completely self-contained system whereby I can email any file as an attachment to a gmail address, and it will eventually end up archived on my server.</p>