Python tail-call optimization, done right

It occured to me this morning to revisit the issue of tail-call-optmization using function decorators in Python. Last time I checked, the working trick involves stack inspection (works only in CPython) and throwing an exception whenever a tail call is detected. In short: non-portable and slow. I posted an enhancement here that allows for mutual recursion (function A tail-calling function B tail-calling function A …), but it did not occur to me that the stack inspection hack, clever as it is, could be improved on.

Improved on it has: Miguel Perez is reporting that his solution runs pretty much as fast as normal looping. Supports mutual recursion and is completely portable too.

clipped from groups.google.com
Please critique this tail call optimizing decorator I’ve written. I’ve tried to fix the pitfalls of other proposed decorators, and the result is this one that supports mutual recursion, does not use exceptions, stack inspection or any implementation-dependent hack, and is pretty short and fast – the fastest out of the ones I could find and try. In fact, in tail-recursive environments I tested the impact of using the decorator is difficult to even measure, as the extra time the decorator takes to run is probably saved by the better use of cache memory. The only caveat is that if used in a function that’s not called in a tail-recursive fashion, bad things will happen.
  blog it

Wide Finder: take 2

Tim Bray’s revised Wide Finder project [ongoing.org] has been ongoing for a few weeks now, and I’ve finally took the time to design and prototype an implementation.

What
The goal is to evaluate the performance of middle-of-the-road, not embarrassingly parralelizable tasks on modern-day multi-core hardware. Such as the Sun T2000 servers. Fittingly, the task is to parse a multi-gigabyte web server log file and compile some aggregate statistics.

Design
The solution I came up with for the earlier iteration of the contest, coded in different versions (C++, OCaml and JoCaml) is fundamentally sound, though rather unoptimized (picking up two-and-a-half different languages in one weekend is a good way to find out how much there is to know about, say, C++ stream buffering). With the benefit of hindsight, and given that we are several weeks into the project and there are strong implementations already [wikis.sun.com], the idea is to find an unexplored niche.

Short recap of the main implementations:

  • OCaml: Fernandez is ahead of the pack again, the only solution in the 7 minutes
  • make+C+awk+sh: Perl is dead, but shell scripting is enjoying a renaissance with parallelizable tasks. 8 minutes
  • Java-based solutions: in the 13-17 minutes range are the various JVM solutions, from Java, Groovy and Scala to Fan, an interesting Ruby-like language for the JVM. Reminds me of .NET’s Boo.
  • Python, Ruby: in the 20+ minutes range. Python multiprocessing is not that efficient yet; I believe an improved Stackless Python solution might be forthcoming

I dabbled with a Common Lisp solution; it works and appears to be competitive, when tried on a partial log file. Exploring the available options for parallelism, however, revealed the disconcerting fact: no freely-available Common Lisp compilers have good multi-threading, or even multi-processing (without shared memory) on Solaris! Even worse, the SBCL incompatibility with GCC 4.3 means that even the Linux version on my Fedora machine is several months old, and does not have the threading library.

So it’s back to Java. Perusing the blogs of the Java and Scala programmers, it appears that the common complain is .. regular expressions. So the hunt was on for a good regular expression library. Joni, a port of the Ruby Oniguruma regex library to the JVM used by the JRuby project, appears ideal: low-level and supposedly very fast. Until one hits the total lack of documentation. So that’s off the table. Ended up using dk.brics.automaton, which appears to perform well enough, even when parsing Unicode strings.

The nice thing about using Java is that, if you hit a performance brick wall, chances are that many other people have been there before you. The problem I have, the need to have random access within a file (so different threads can start at different offsets (Java’s RandomAccessFile is good for this) combined with the need for buffered I/O (BufferedReader is good, but there is no RandomAccessReader !) is solved by the nice folks at Biojava.com. Great!

On my system (2 GHz Core 2, 2 GB RAM, 5400 rpm HDD, Fedora 9 x64), Ruby takes about 2.2 seconds, while my Java implementation running on OpenJDK 1.6 (64-bit) takes about 1.6-1.7 seconds with 1 thread and 1.4-1.5 seconds with 2 threads. Close to the 1.2+ seconds time that Alex reported for Python, but hey, we’re paying the Java start-up cost here.

Will update when I get an account on the test server. 40GB dataset, here I come! In the meantime, time to look for opportunities to use a JVM-based language better suited for the task. The Java code is a tad bit verbose.

Patents: to settle or not to settle

@Béranger:

The comparison between Red Hat’s patent settlement and the MS-Novell covenant is instructive, but the two settlements are not exactly identical.

The downstream impact of both certainly only protect the respective companies’ customers. It is the upstream protection of developers that make the settlement’s coverage “broad” (you used the term “visionary” under quotation as if it was claimed by Red Hat; they did no such thing). Contrast to the MS-Novell deal that only covers Novell developers!

It would have been better to fight the lawsuit to completion, I agree, and considering the FLOSS world consider settlements in GPL licensing disputes to be a victory in their favour, the nature of this settlement is definitely ambiguous at best. Hopefully RH legal is using their resources to fight bigger battles. Abolish software patents — and curtail most patents in general, it’s not as if they lead to significant innovation anyway (Boldrin and Levine [2008]).

C types 101

I was cleaning up the code of an application that I’m packaging for Fedora, and was Googling for information on size_t (in the code, a size_t variable was being printed as a normal integer (%d), which triggered a compiler warning, and I forgot what the relevant option is. Ended up finding it in printf’s manpage) when I discovered this rather well-written gem.
clipped from www.embedded.com

Using size_t appropriately can improve the portability, efficiency, or readability of your code. Maybe even all three.

Numerous functions in the Standard C library accept arguments or return values that represent object sizes in bytes. For example, the lone argument in malloc(n) specifies the size of the object to be allocated, and the last argument in memcpy(s1, s2, n) specifies the size of the object to be copied. The return value of strlen(s) yields the length of (the number of characters in) null-terminated character array s excluding the null character, which isn’t exactly the size of s, but it’s in the ballpark.

You might reasonably expect these parameters and return types that represent sizes to be declared with type int (possibly long and/or unsigned), but they aren’t. Rather, the C standard declares them as type size_t. According to the standard, the declaration for malloc should appear in <stdlib.h> as something equivalent to:

void *malloc(size_t n);
  blog it

The ultimate Fedora 9 setup: Part 1 – UI

Unlike the commercial OSes (and commercially-supported Linux distributions), community Linux distributions tend to have fast-paced release cycles. Notably, Fedora and Ubuntu releases every 6 months.

Every OS upgrade entails several decision: do you do a clean install, or upgrade your current installation? Do you start with a clean home directory, or re-use your previous one? Any combination works fine, though my personal preference is to do a clean install and use a clean home directory, having archived the older directory. I think of it as house-cleaning — and it’s nice to experience the desktop as it ships out of the box, before customization (naturally, I then restore my music database, my address books, browser and e-mail client profiles, etc. This is not Memento!)

So, now that Fedora 9 has been released, what needs to be added to / changed from the base setup? As it turns out, not that many:

Compositing
Some people swear by Compiz; I personally find Metacity much more usable (Compiz does not support cycling through all windows of a given application — Ctrl+F6 in Metacity; Cmd+~ in OS X). Metacity now has a compositing manager that’s turned off by default; turning it on involves either using gconftool-2 (only for advanced users) or gconf-editor, and setting the /apps/metacity/general/compositing_manager key to true.

The support in the stable version is a bit flaky still; the metacity package in Rawhide is much better behaved and appears quite stable. Upgrade by issuing yum --enablerepo=rawhide update metacity. As of the moment it does not pull in any other Rawhide package so you can rest easy.

Try pressing the volume up/down/mute keys on your keyboard (if you don’t have a multimedia keyboard, change the bindings in System->Preferences->Personal->Keyboard Shortcuts) and be amazed at the translucency coolness (no, this is not bling). The brightness pop-up windows have not been changed yet, alas.

Firefox
Ever cursed Firefox’s font rendering in silence? Type about:config in the address bar, and add the following boolean keys:

font.FreeType2.autohinted = true
font.FreeType2.enable = true

Keyboard
For the English-speakers among us specifically, and those who use the US keyboard layout in general (it’s the standard layout in Indonesia, for instance), the occasional times when one has to type an accented character is rather annoying.

There are various work-arounds — launch the character map (under Accessories), add the Character Palette applet to the panel (so that it consumes RAM even when you don’t use it!)…*or* you can just fix your keyboard layout. The die-hard command-line junkie would be able to tell you what option to pass to setxkbmap to achieve this. The rest of us can just use System->Preferences->Hardware->Keyboard. In the “Layouts” tab, select “Layout Options”. The option you want is “Compose key position”; I use Right Alt, but Caps Lock haters will rejoice to know that, yes, you can use that dreaded key as your compose key as well. To type an accented character, now the only thing you need to do is hit the Compose+accent followed by the letter you want to accent (using shift as necessary, e.g. for ^).

While you’re here, you might want to change the Alt/Win key behavior, and map either Meta, Super or Hyper to one of your Win-keys. The GNOME default is inexplicably for the Win-key to be a normal key and not a modifier (so it cannot be combined with other keys).

Coming up: Applications
Et voilà! You should have a nice-looking, and more importantly, functional desktop right now. In the next instalment, I’ll comment on the applications I use. Until then, à bientôt!