Unicode output from Zope 3

The Creative Commons licene engine has gone through several iterations, the most recent being a Zope 3 / Grok application. This has actually been a great implementation for us[1]_, but since the day it was deployed there’s been a warning in `README.txt <http://code.creativecommons.org/svnroot/cc.engine/trunk/README.txt>`_:

If you get a UnicodeDecodeError from the cc.engine (you’ll see this if it’srunning in the foreground) when you try to access the http://host:9080/license/then it’s likely that the install of python you are using is set to use ASCIIas it’s default output.  You can change this to UTF-8 by creating the file/usr/lib/python<version>/sitecustomize.py and adding these lines:

  import sys

This always struck me as a bit inelegant — having to muck with something outside my application directory. After all, this belief that the application should be self-contained is the reason I use zc.buildout and share Jim’s belief in the evil of the system Python. Like a lot of inelegant things, though, it never rose quite to the level of annoyance needed to motivate me to do it right.

Today I was working on moving the license engine to a different server[2]_ and ran into this problem again. I decided to dig in and see if I could track it down. In fact I did track down the initial problem — I was making a comparison between an encoded Unicode string and without specifying an explicit codec to use for the decode. Unfortunately once I fixed that I found it was turtles all the way down.

Turns out the default Zope 3 page template machinery uses `StringIO <http://www.python.org/doc/lib/module-StringIO.html>`_ to collect the output. StringIO uses, uh, strings — strings with the default system encoding. Reading the module documentation, it would appear that mixing String and Unicode input in your StringIO will cause this sort of issue.

Andres suggested marking my templates as UTF-8 XML using something like:

< ?xml version="1.0" encoding="UTF-8" ?>

but even after doing this and fixing the resulting entity errors, there’s still obviously some 8 bit Strings leaking into the output. In conversations on IRC the question was then asked: “is there a reason you don’t want a reasonable system wide encoding if your locale can support it?”

I guess not[3]_.

UPDATE Martijn has a tangentially related post which sheds some light on why Python does/should ship with ascii as the default codec. At least people smarter than me have problems with this sort of thing, too.

[1]Yes, I may be a bit biased — I wrote the Zope3/Grok implementation. Of course, I wrote the previous implementation, too, and I can say without a doubt it was… “sub-optimal”.
[2]We’re doing a lot of shuffling lately to complete a 32 to 64 bit conversion; see the CC Labs blog post for the harrowing details.
[3]So the warning remains.
date:2008-07-19 12:57:33
category:cc, development
tags:cc, development, license engine, python, zope

Sane Merging in SVN

We use Subversion for version control at work. We try to version control everything: code, content, graphics, site configuration. Everything. This does wonders for our sanity, but we can do more. Recently (n < 6 months)) we’ve started doing something we should have done from Day 1: develop in one tree, deploy in another. In our case we’re developing in the trunk, and there’s a long-lived branch cleverly named production. This is great, with one little problem: cherry-picking revisions to merge in Subversion is a pain in the ass.

Last week I was looking at the upcoming features in Subversion 1.5. Asheesh had pointed out the merge tracking feature, which sounded lovely. And it probably will be. The thing I discovered, though, is that you can get it today in the form of svnmerge.

svnmerge allows you to track what revisions you’ve merged from a branch (or trunk), block certain revisions (features you might not want to deploy just yet), and performs the merges for you when that time arrives (including generating a nice commit message containing all the log messages you’re merging; handy). I spent an hour yesterday and an hour today getting the merges recorded for the packages I’m currently working on, and it already feels better. No more wondering if I remembered to merge something; just svnmerge avail and see if anything shows up.

Sure, it’ll be great to get this feature into the core application (and an interactive mode ala darcs would be slick, too), but to paraphrase Scarlett O’Hara, “with god as my witness, I will never svn merge again”.

date:2007-11-06 13:32:25
tags:branch, development, subversion