DocSix: Doctests on Python 2 & 3

I was first introduced to doctests working on Zope 3 at early PyCon sprints. At the time the combination of documentation, specification, and test in a single document seemed pretty interesting to me. These days I like to use them for testing my documentation.

Last week stvs2fork helpfully opened a pull request for Rebar, fixing some syntax that’s no longer valid in Python 3. I decided that it’d be interesting to add Python 3.3 to the automated test runs. Fixing the code to work with Python 3 was easy enough, but when I ran the doctests I discovered an issue I hadn’t thought of:

Unicode string output looks different in Python 3 vs Python 2..

>>> validator = AgeValidator()
>>> validator.errors({'age': 'ten'})
{'age': [u'An integer is required.']}

This example works exactly the same in Python 2 and 3: in both cases the error messages are returned as a list of Unicode strings. But in Python 2 the output has the leading u indicator. Not so in Python 3.

What I needed to do is strip the Unicode indicator from the output strings before executing the test; then I’d have the Python 3 doctest I needed. So I wrote a tool that lets me do that.

DocSix lets you run your doctests on Python 2 and 3.

DocSix builds on Manuel, a library for mixing custom test syntax into doctests. DocSix can work with existing uses of Manuel, or it can load your doctests into a unittest TestSuite, ready to go:

from docsix import get_doctest_suite

test_suite = get_doctest_suite(

Potentially useful links:

author:Nathan Yergler
tags:python, doctests, testing, python3

Revisiting Nested Formsets

It’s been nearly four years since I first wrote about nested formsets. When I wrote about nested formsets, I must have been using Django 1.1 (based on correlating dates in the release notes and the original blog post), which means what I wrote has had four major releases of Django to drift out of date. And yet it’s still one of the most frequently visited posts on my blog, and one of the few that I receive email questions about. Four years later, it seemed like the time to revisit the original post to see if nested formsets still make sense and if so, what they look like now.

Formsets help manage the complexity of maintaining multiple instances of a Form on a single page. For example, if you’re editing a list of items on a single page, each individual item may be a copy of the same form. Formsets help manage things like HTML ID generation, flagging forms for deletion, and validating the entire set of forms together. When used with Models, they allow you to edit the members of a QuerySet all at once.

So what are nested formsets? The example I used previously was something along the lines of Block – Building – Tenant: one Block has many Buildings, and each Building has many Tenants. If you’re editing a Block, you want to see all the Buildings and all the Tenants at once. That’s a fine hypothetical, but one of the questions I get with some frequency is “what’s a good use case for a nested formset?” Four years later — two and a half of them spent doing web development full time — I have yet to encounter a situation where I needed a nested formset. In that time I’ve built some pretty complex forms, including Eventbrite’s event creation flow. That page was complex enough that I built Form Groups to support the interaction, and I think the jury is still out on whether that was a good idea or not. It’s possible that there are use cases for nested formsets in admin-style applications that I haven’t encountered. I think it’s also possible that there are reasons to use a nested formset alongside a Javascript framework to ease the user experience.

Note that if you only have one level of relationships on the page (ie, you’re editing all the Tenants for a single Building in our example) then you don’t need nested formsets: Django’s inline formsets will work just fine.

And why not nested form sets? From the questions people have asked and my experience building Form Groups (which borrowed some ideas), I’ve concluded that they’re difficult to get completely right, have edge cases that can be hard to manage, and create quite complicated user interfaces. In my original blog post I alluded to the fact that I spent most of a three day weekend trying to get the nested formsets to work right. Two thirds of that time was spent on work I eventually threw away, because I couldn’t manage the edge cases. It was only when I started using TDD that I managed to get something working. But I didn’t publish the tests with my previous code example, so no one else was able to benefit from that work.

If you’ve read this far and still think a nested formset is the best solution for your problem, what would that look like with Django 1.5? The answer is: simpler. I decided to rewrite my initial implementation using test driven development. The full implementation of the formset logic only overrides three methods from BaseInlineFormSet.

from django.forms.models import (

class BaseNestedFormset(BaseInlineFormSet):

    def add_fields(self, form, index):

        # allow the super class to create the fields as usual
        super(BaseNestedFormset, self).add_fields(form, index)

        form.nested = self.nested_formset_class(
   if self.is_bound else None,
            prefix='%s-%s' % (

    def is_valid(self):

        result = super(BaseNestedFormset, self).is_valid()

        if self.is_bound:
            # look at any nested formsets, as well
            for form in self.forms:
                result = result and form.nested.is_valid()

        return result

    def save(self, commit=True):

        result = super(BaseNestedFormset, self).save(commit=commit)

        for form in self:

        return result

These three method cover the four areas of functionality I called out in the previous post: validation (is_valid), saving (both existing and new objects are handled here by save), and instantiation (creating the nested formset instances, handled by add_fields).

By making it a general purpose baseclass, I’m also able to write a simple factory function, to make using it more in tune with Django’s built-in model formset.

def nested_formset_factory(parent_model, child_model, grandchild_model):

    parent_child = inlineformset_factory(

    parent_child.nested_formset_class = inlineformset_factory(

    return parent_child

You can find the source to this general purpose implementation on GitHub. I wrote tests at each step as I worked on this, so it may be interesting to go back and look at individual commits, as well.

So how would you use this in with Django 1.5? With a class-based view, of course.

from django.views.generic.edit import UpdateView

class EditBuildingsView(UpdateView):
    model = models.Block

    def get_template_names(self):

        return ['blocks/building_form.html']

    def get_form_class(self):

        return nested_formset_factory(

    def get_success_url(self):

        return reverse('blocks-list')

Of course there’s more needed — templates, for one — but this shows just how easy it is to create the views and leverage a generic abstraction. The real keys here are specifying model = models.Block and the definition of get_form_class. Django’s UpdateView knows how to implement the basic form processing idiom (GET, POST, redirect), so all you need to do is tell it which form to use.

You can find a functional, albeit ugly, demo application in the demo directory of the git repository.

So that’s it: a general purpose, updated implementation of nested formsets. I advise using them sparingly :).

author:Nathan Yergler
tags:django, formsets, forms, python

Updating el-get and getelget.el

Last week one of my Emacs using colleagues asked me how I managed my Emacs packages and configuration. Naturally I pointed him to el-get and my getelget.el bootstrap script. I’ve been happily managing my Emacs installation over the past five months using el-get and a private git repository for my configuration. However when I tried to square my .emacs.d/init.el with the current el-get documentation, I got a little confused; el-get is now better at bootstrapping itself from within your Emacs configuration. When my colleague read this and asked why he might want getelget.el, my response was… well, lackluster; this is an attempt to document that a littler better.)

Last night I decided to do a little clean-up on my Emacs configuration, and see if I could get rid of getelget.el. The documentation for el-get is great, so I started there. What I quickly realized is that the included el-get bootstrap mechanism is great if you want to ensure el-get is installed and then use el-get-install, el-get-remove, etc to manage your packages. But if you define your package list in you config file, it’s not quite enough. Specifically, when you first bootstrap your configuration, you want to defer calling (elget 'sync) until you’ve bootstrapped el-get. And on future runs, you want to go ahead and install any new packages that have been added to your list.

Luckily el-get has added support for hooks, which makes life a little easier. The new getelget.el (available here) looks something like this:

;; getelget – el-get boostrap script;;;; Checks to see if el-get has been checked out, and bootstraps it if;; it has not. After bootstrapping, calls el-get to load specified;; packages.;;;; el-get-packages should be defined before including this file. Any;; definitions from el-get-sources will be appended to el-get-packages.;;;; Written in 2011 by Nathan R. Yergler ;;;; To the extent possible under law, the person who associated CC0 with;; getelget has waived all copyright and related or neighboring rights;; to getelget.;;;; You should have received a copy of the CC0 legalcode along with this;; work.  If not, see .

    ;; add a hook listener for post-install el-get(defun post-install-hook (pkg)
  ;; after installing el-get, load the local package list
  (if (string-equal pkg “el-get”)
      (el-get 'sync
              (append el-get-packages
                      (mapcar 'el-get-source-name el-get-sources)))))(add-hook 'el-get-post-install-hooks 'post-install-hook)

    ;; add the el-get directory to the load path(add-to-list 'load-path
             (concat (file-name-as-directory user-emacs-directory)
                     (file-name-as-directory “el-get”)

    ;; try to require el-get(if (eq (require 'el-get nil t) nil)

    ;; urp, need to bootstrap
     (lambda (s)

    ;; successfully required el-get, load the packages!
    (post-install-hook “el-get”)

el-get also recommends splitting your package definitions from your local source recipes (which can themselves extend an included recipe). So getelget.el now expects you’ve defined two lists: el-get-packages, a list of packages to install from recipes, and el-get-sources, your local source list.

For example, I define a local recipe for magit that binds a key to magit-status and enables spell checking and fill mode for commit message editing:

(setq el-get-sources
      '((:name magit
               :after (lambda ()
                        (global-set-key “\C-x\r\r” 'magit-status)

                        ;; Enable spell checking, fill for log editing
                        (add-hook 'magit-log-edit-mode-hook
                                    (auto-fill-mode 1)
                                    (flyspell-mode 1)))))

And my el-get-packages list is just a list of packages I’m installing from the included el-get recipes.

(setq el-get-packages

Everything listed in both lists will be installed.


date:2011-09-26 09:59:24
category:development, tools
tags:emacs el-get getelget

super(self.__class__, self) # end of the line for subclassing

I’ve learned (and remembered) a lot in the past two months as I’ve gotten back to coding as my primary job. One thing that I guess I never quite internalized before is how super works. I have been bitten by code that looks something like the following a few times in the past month:
class A(object):
    def init(self):
        super(self.__class__, self).init()

    class B(A):
    def init(self):
        super(B, self).init()
The surprise comes when I try to use my sub-class, B. Instantiating B() blows up the stack with: RuntimeError: maximum recursion depth exceeded while calling a Python object. What? According to the Python 2.7.2 standard library documentation, super “return[s] a proxy object that delegates method calls to a parent or sibling class of type.” So in the case of single inheritance, it delegates access to the super class, it does not return an instance of the super class. In the example above, this means that when you instantiate B, the follow happens:
  1. enter B.__init__()
  2. call super on B and call __init__ on the proxy object
  3. enter A.__init__()
  4. call super on self.__class__ and call __init__ on the proxy object
The problem is that when we get to step four, self still refers to our instance of B, so calling super points back to A again. In technical terms: Ka-bloom. TL;DR: super(self.__class__, self) may look like a neat trick, but it’s the end of the line for sub-classing. Further reading: Raymond Hettinger’s excellent blog post on super provides a great overview of super and shows off the improved Python 3 syntax, which removes the need to write the class name as part of the super statement. I was really pleased to find the Python standard library documentation links directly to it.
date: 2011-07-04 20:44:23
wordpress_id: 1990
layout: post
slug: super-self
category: development, python
tags: python, super

Managing my Emacs packages with el-get

Update (20 April 2011): I’ve now tried this on my old MacBook running OS X 10.5. The bootstrap script initially threw an error, which I tracked down to an outdated version of git. Once I upgraded git and installed bzr (used by the python-mode recipe), I started Emacs and was rewarded with a fully functioning installation, complete with the extensions I want.

I’m on vacation for two weeks between jobs, so of course this means it’s time to sharpen the tools (because writing programs to help you write programs is almost always more fun than actually writing programs). I’ve been an Emacs user for many years, and of course I’ve customized my installation with additional modes and extensions. Previously I would check out code that I needed into a vendor directory, and then load it manually in init.el. And this worked fine, but that doesn’t mean I [STRIKEOUT:can’t] won’t spend a chunk of my day making it better.

A friend mentioned el-get to me, and I decided to give it a try. I like the combination of recipes for installing common things, and the fact that your list of packages is very explicit in init.el (so if I need to dig into one of them, I know exactly where to begin). Additionally, since I’ll have a new computer issued for the new job, I also wanted to get things into shape so that I could easily replicate my preferred editing environment. I wound up creating a small bootstrap file to help things along, getelget.el.

getelget.el checks to see if el-get has been previously bootstrapped, and if not, performs the lazy installation procedure. After it makes sure el-get is available, it loads and executes el-get. So if you need to get a new machine up and going with Emacs and any extensions, you can drop in your init.el and getelget.el, and Emacs will take care of the rest.

To use getelget, define your el-get-sources like you normally would in init.el:

(setq el-get-sources
         ;; etc...
       )  )

Then load getelget (the following assumes you have getelget.el in your user emacs directory along with init.el):

;; getelget -- bootstrap el-get if necessary and load the specified packages
   (concat (file-name-as-directory user-emacs-directory) "getelget.el"))

getelget will handle bootstrapping, loading, and executing el-get.

getelget is pretty trivial; you can download it here, and I’ve waived any rights I may hold on the code using the CC0 Public Domain Dedication.

date:2011-04-19 22:17:05
category:development, tools
tags:el-get, emacs

CI at CC

I wrote about our roll-out of Hudson on the CC Labs blog. I wanted to note a few things about deploying that, primarily for my own reference. Hudson has some great documentation, but I found Joe Heck’s step by step instructions on using Hudson for Python projects particularly helpful. We’re using nose for most of our projects, and buildout creates a nosetest script wrapper that Hudson runs to generate pass/fail reports.

Setting up coverage is on the todo list, but it appears that our particular combination of libraries has at least one strange issue: when cc.license uses Jinja2 to load a template, coverage thinks it’s a Python source file (maybe it uses an import hook or something? haven’t looked) and tries to tokenize it when generating the xml report. Ka-boom. (This has apparently already been reported.)

Another item in the “maybe/someday” file is using Tox to run the tests using multiple versions of Python (example configuration for Tox + Hudson exists). I can see that this is a critical part of the process when releasing libraries for others to consume. We have slightly less surface area — all the servers run the same version of Python — but it’d be great to know exactly what our possible deployment parameters are.

Overall Hudson already feels like it’s adding to our sanity. I just received my copy of Continuous Delivery, so I think this is the start of something wonderful.

date:2010-08-20 10:37:43
category:cc, development
tags:cc, CI, coverage, Hudson, python, sanity

Back to the Future: Desktop Applications

One of the best prepared talks I saw at PyCon this year was on Phatch, a cross-platform photo processing application written in Python. Stani Michiels and Nadia Alramli gave a well rehearsed, compelling talk discussing the ins and outs of developing their application for Linux, Mac OS X, and Windows. The video is available from the excellent Python MiroCommunity.

The talk reminded of a blog post I saw late last year and never got around to commenting on, Ruby for Desktop Applications? Yes we can. Now I’m only a year late in commenting on it. This post caught my eye for two reasons. First, the software they discuss was commissioned by the AGI Goldratt Institute. I had heard about Goldratt from my father, whose employer, Trusted Manufacturing, was working on implementing constraints-based manufacturing as a way to reduce costs and distinguish themselves from the rest of the market. More interesting, though, was their discussion of how they built the application, and how it seemed to resonate with some of the work I did in my early days at CC.

Atomic wrote three blog posts (at least that I saw), and the one with the most text (as determined by my highly unscientific “page down” method) was all about how they “rolled” the JRuby application: how they laid out the source tree, how they compile Ruby source into Java JARs, and how they distribute a single JAR file with their application and its dependencies. I thought this was interesting because even though it uses a different language (Python instead of Ruby), GUI framework (wx instead of Swing/Batik), and runtime strategy (bundled interpreter instead of bytecode archive), the thing I spent the most time on when I was developing CC Publisher was deployment.

Like Atomic and Phatch, we had a single code base that we wanted to work across the major platforms (Windows, Linux, and Mac OS X in our case). The presentation about Phatch has some great information about making desktop-specific idioms work in Python, so I’ll let them cover that. Packaging and deployment was the biggest challenge, one we never quite got right.

On Windows, we used py2exe to bundle our Python runtime with the source code and dependencies. This worked most of the time, unless we forget to specify a sub-package in our manifest, in which case it blew up in amazing and spectacular ways (not really). Like Atomic, we used NSIS for the Windows installer portion. On Mac OS X, we used py2app to do something similar, and distributed a disk image. On Linux… well, on Linux, we punted. We experimented with a cx-freeze and flirted with autopackage. But nothing ever worked quite right [enough], so we would up shipping tarballs.

The really appealing thing about Atomic’s approach is that by using a single JAR, you get to leverage a much bigger ecosystem of tools: the Java community has either solved, or has well defined idioms for, launching Java applications from JARs. You get launch4j and izpack, which look like great additions to a the desktop developer’s toolbox.

For better or for worse, we [Creative Commons] decided CC Publisher wasn’t the best place to put our energy and time. This was probably the right decision, but it was a fun project to work on. (We do have rebooting CC Publisher listed as a suggested project for Google Summer of Code, if someone else is interested in helping out.) Given the maturity of Java’s desktop tool chain, and the vast improvements in Jython over the past year or two, I can imagine considering an approach very much like Atomic’s were I working on it today. Even though it seems like the majority of people’s attention is on web applications these days, I like seeing examples of interesting desktop applications being built with dynamic languages.

date:2010-03-30 09:04:03
tags:cc, ccpublisher, python

Using pip with buildout

I’ve been asked to add a blog to koucou, and this has turned out to be more of a learning experience than I expected. My first instinct was to use WordPress — I’m familiar with it, like the way it works, and I’m not interested in building my own. The one wrinkle was that we wanted to integrate the blog visually with the rest of the site, which is built on Django. I decided to give Mingus a try. This post isn’t about Mingus — I’ll write about that shortly — but rather about pip, which Mingus uses to manage dependencies. Mingus includes a requirements file with the stable dependencies for the application (one of its goals is application re-use, so there are a lot of them). As I mentioned previously, pip is the Python packaging/installation tool I have the least experience with, so I decided to try converting my existing project to pip as a starting point — to gain experience with pip, and to try and ease integration woes with Mingus.

When I started, the project used the following setup to manage dependencies and the build process:

  • Dependencies which have an egg or setuptools-compatible sdist available are specified in install_requires in

        name = “soursop”,
        # ... details omitted
        install_requires = ['setuptools’,
  • A buildout configuration that uses djangorecipe to install Django, and zc.recipe.egg to install the application egg and its dependencies

    develop = .
    parts = django scripts
    unzip = true
    eggs = soursop
        [django]recipe = djangorecipeversion = 1.1.1settings = settingseggs = ${buildout:eggs}project = soursop
        [scripts]recipe = zc.recipe.eggeggs =
         ${buildout:eggs}interpreter = pythondependent-scripts = trueextra-paths =
       ${django:location}initialization =
       import os
       os.environ['DJANGO_SETTINGS_MODULE’] = '${django:project}.${django:settings}’
  • Dependencies that didn’t easily install using setuptools (either they didn’t have a sane source-tree layout or weren’t available from PyPI) are either specified as git submodules or imported into the repository.

All this worked pretty well (although I’ve never really loved git submodules).

gp.recipe.pip is a buildout recipe which allows you to install a set of Python packages using pip. gp.recipe.pip builds on zc.recipe.egg, so it inherits all the functionality of that recipe (installing dependencies declared in, generating scripts, etc). So in that respect, I could simply replace the recipe line in the scripts part and start using pip requirements to install from source control, create editable checkouts, etc.

Previously, I used the ${buildout:eggs} setting to share a set of packages to install between the django part (which I used to generate a Django management script) and the scripts part (which I used to resolve the dependency list and install scripts defined as entry points). I didn’t spend much time looking into replicating this with gp.recipe.pip; it wasn’t immediately clear to me how to get a working set out of it that’s equivalent to an eggs specification (I’m not even sure it makes sense to expect such a thing).

Ignoring the issue of the management script, I simplified my buildout configuration, removing the django part and using gp.recipe.pip:

[buildout]develop = .parts = soursopunzip = trueeggs = soursopdjango-settings = settingsdjango-project = soursop

    [soursop]recipe = gp.recipe.pipinterpreter = pythoneggs = ${buildout:eggs}sources-directory = vendor

    initialization =
   import os
   os.environ['DJANGO_SETTINGS_MODULE’] = '${buildout:django-project}.${buildout:django-settings}’

This allowed me to start specifying the resources I previously included as git submodules as pip requirements:

recipe = gp.recipe.pip
interpreter = python
install =      -r requirements.txt
eggs = ${buildout:eggs}
sources-directory = vendor

The install parameter specifies a series of pip dependencies that buildout will install when it runs. These can include version control URLs, recursive requirements (in this case, a requirements file, requirements.txt), and editable dependencies. In this case I’ve also specified a directory, vendor, in which editable dependencies will be installed.

That actually works pretty well: I can define my list of dependencies in a text file on its own, and I can move away from git submodules and vendor imports to specifying [D]VCS urls that pip will pull.

Unfortunately, I’m still missing my manage script. I wound up creating a small function and entry point to cause the script to be generated. In soursop/, I created the following function:

def manage():
    “”“Entry point for Django manage command; assumes
    DJANGO_SETTINGS_MODULE has been set in the environment.

    This is a convenience for getting a ./bin/manage console script
    when using buildout.”“”

    from django.core import management
    from django.utils import importlib
    import os

    settings = importlib.import_module(os.environ.get('DJANGO_SETTINGS_MODULE’))


In, I added an entry point:

entry_points = {
       'console_scripts' : [
           'manage = soursop.scripts:manage',

Re-run buildout, and a manage script appears in the bin directory. Note that I’m still using the environment variable, DJANGO_SETTINGS_MODULE, to specify which settings module we’re using. I could specify the settings module directly in my manage script wrapper. I chose not to do this because I wanted to emulate the behavior of djangorecipe, which lets you change the settings module in buildout.cfg (i.e., from development to production settings). This is also the reason I have custom initialization code specified in my buildout configuration.

Generally I really like the way this works. I’ve been able to eliminate the tracked vendor code in my project, as well as the git submodules. I can easily move my pip requirements into a requirements file and specify it with -r in the install line, separating dependency information from build information.

There are a couple things that I’m ambivalent about. Primarily, I now have two different places where I’ve declared some of my dependencies, and a requirements file, and each has advantages (which are correspondingly disadvantages for the other). Specifying the requirements in the pip requirements file gives me more flexibility — I can install from subversion, git, or mercurial without even thinking about it. But if someone installs my package from a source distribution using easy_install or pip, the dependencies won’t necessarily be satisfied [1] [2] . And conversely, specifying the requirements in allows everyone to introspect them at installation time, but sacrifices the flexibility I’ve gained from pip.

I’m not sure that we’ll end up using Mingus for koucou, but I think we’ll stick with gp.recipe.pip. The disadvantage is a small one (at least in this situation), and it’s not really any worse than the previous situation.

[1]I suppose I could provide a bundle for pip that includes the dependencies, but the documentation doesn’t make that seem very appealing.
[2]Inability to install my Django application from an sdist isn’t really a big deal: the re-use story just isn’t good enough (in my opinion) to have it make sense. Generally, however, I like to be able to install a package and pull in the dependencies as well.
date:2010-03-28 13:05:22
tags:dependencies, django, koucou, pip, python, scm, zc.buildout

For Some Definition of “Reusable”

I read “Why I switched to Pylons after using Django for six months” yesterday, and it mirrors something I’ve been thinking about off and on for the past year or so: what is the right level of abstraction for reuse in web applications? I’ve worked on two Django-based projects over the past 12-18 months: CC Network and koucou. Neither is what I’d call “huge”, but in both cases I wanted to re-use existing apps, and in both cases it felt… awkward.

Part of this awkwardness is probably the impedance mismatch of the framework and the toolchain: Django applications are Python packages. The Python tools for packaging and installing (distutils, setuptools, distribute, and pip, I think, although I have the least experience with it) work on “module distributions1: some chunk of code with a This is as much a “social” issue as a technology one: the documentation and tools don’t encourage the “right” kind of behavior, so talk of re-usable applications is often just hand waving or, at best, reinvention2.

In both cases we consciously chose Django for what I consider its killer app: the admin interface. But there have been re-use headaches. [NB: What follows is based on our experience, which is setuptools and buildout based] The first one you encounter is that not every developer of a reusable app has made it available on PyPI. If they’re using Subversion you can still use it with setuptools, but when re-using with git, we have some additional work (a submodule or another buildout recipe). I understand pip just works with the most commons [D]VCS, but haven’t used it myself. Additionally, they aren’t all structured as projects, and those that are don’t always declare their dependencies properly3. And finally there’s the “real” issues of templates, URL integration, etc.

I’m not exactly sure what the answer is, but it’s probably 80% human (as opposed to technology). Part of it is practicing good hygiene: writing your apps with relocatable URLs, using proper URL reversal when generating intra-applications URLs, and making sure your templates are somewhat self-contained. But even that only gets you so far. Right now I have to work if I want to make my app easily consumable by others; work, frankly, sucks.

Reuse is one area where I think Zope 3 (and it’s derived frameworks, Grok and repoze.bfg) have an advantage: if you’re re-using an application that provides a particular type of model, for example, all you need to do is register a view for it to get a customized template. The liberal use of interfaces to determine context also helps smooth over some of the URL issues4. Just as, or more, importantly, they have a strong culture of writing code as small “projects” and using tools like buildout to assemble the final product.

Code reuse matters, and truth in advertising matters just as much or more. If we want to encourage people to write reusable applications, the tools need to support that, and we need to be explicit about what the benefits we expect to reap from reuse are.

1 Of course you never actually see these referred to as module distributions; always projects, packages, eggs, or something else.

2 Note that I’m not saying that Pylons gets the re-use story much better; the author admits choosing Django at least in part because of the perceived “vibrant community of people writing apps” but found himself more productive with Pylons. Perhaps he entered into that with different expectations? I think it’s worth noting that we chose Django for a project, in part, for the same reason, but with different expectations: not that the vibrant community writing apps would generate reusable code, but that they would education developers we could hire when the time came.

3 This is partially due to the current state of Python packaging: setuptools and distribute expect the dependency information to be included in; pip specifies it in a requirements file.

4 At least when dealing with graph-based traversal; it could be true in other circumstances, I just haven’t thought about it enough.

date:2010-03-09 18:38:54
tags:django, python, web, zope

i18n HTML: Bring the Pain

I have to stay up a little later this evening than I’d planned, so as a result I’m finally going through all the tabs and browser windows I’ve had open on my personal laptop. I think some of these have been “open” for months (yes, there have been browser restarts, but they’re always there when the session restores). One that I’ve meant to blog is Wil Clouser’s post on string substitution in .po files. It’s actually [at least] his second post on the subject, recanting his prior advice, coming around to what others told him previously: don’t use substitution strings in .po files.

I wasn’t aware of Chris’s previous advice, but had I read it when first published, I would have nodded my head vigorously; after all, that’s how we did it. Er, that’s how we, uh, do it. And we’re not really in a position to change that at the moment, although we’ve certainly looked pretty hard at the issue.

A bit of background: One of the core pieces of technology we’ve built at Creative Commons is the license chooser. It’s a relatively simple application, with a few wrinkles that make it interesting. It manages a lot of requests, a lot of languages, and has to spit out the right license (type, version, and jurisdiction) based on what the user provides. The really interesting thing it generates is some XHTML with RDFa that includes the license badge, name, and any additional information the user gives us; it’s this metadata that we use to generate the copy and paste attribution HTML on the deed. So what does this have to do with internationalization? The HTML is internationalized. And it contains substitutions. Yikes.

To follow in the excellent example of AMO and Gnome, we’d start using English as our msgids, leaving behind the current symbolic keys of the past. Unfortunately it’s not quite so easy. Every time we look at this issue (and for my first year as CTO we really looked; Asheesh can atest we looked at it again and again) and think we’ve got it figured out, we realize there’s another corner case that doesn’t quite work.

The real issue with the HTML is the HTML: zope.i18n, our XSLT selectors, the ZPT parse tree: none of them really play all that well with HTML msgids. The obvious solution would be to get rid of the HTML in translation, and we’ve tried doing that, although we keep coming back to our current approach. I guess we’re always seduced by keeping all the substitution in one place, and traumatized by the time we tried assembling the sentences from smaller pieces.

So if we accept that we’re stuck with the symbolic identifiers, what do we do? Build tools, of course. This wasn’t actually an issue until we started using a “real” translation tool — Pootle, to be specific. Pootle is pretty powerful, but some of the features depend on having “English” msgids. Luckily it has no qualms about HTML in those msgids, it has decent VCS support, and we know how to write post-commit hooks.

To support Pootle and provide a better experience for our translators, we maintain two sets of PO files: the “CC style” symbolic msgid files, and the “normal” English msgid files. We keep a separate “master” PO file where the msgid is the “CC style” msgid, and the “translation” is the English msgid. It’s this file that we update when we need to make changes, and luckily using that format actually makes the extraction work the way it’s supposed to. Or close. And when a user commits their work from Pootle (to the “normal” PO file), a post-commit hook keeps the other version in sync.

While we’ve gotten a lot better at this and have learned to live with this system, it’s far from perfect. The biggest imperfection is its custom nature: I’m still the “expert”, so when things go wrong, I get called first. And when people want to work on the code, it takes some extra indoctrination before they’re productive. My goal is still to get to a single set of PO files, but for now, this is what we’ve got. Bring the pain.

For a while, at least. We’re working on a new version of the chooser driven by our the license RDF. This will be better for re-use, but not really an improvement in this area.

This works great in English, but in languages where gender is more strongly expressed in the word forms, uh, not so much.

date:2010-03-01 23:21:20
category:cc, development
tags:cc, i18n, license engine, zope