Pre-read: Grok 1.0 Web Development

|image0|Late last month I received an email from Packt Publishing (en.wp), asking if I’d be interested in reviewing one of their new titles, `Grok 1.0 Web Development <http://www.packtpub.com/grok-1-0-web-development/book?utm_source=yergler.net&utm_medium=bookrev&utm_content=blog&utm_campaign=mdb_002632>`_, by Carlos de la Guardia. I immediately said yes, with the caveat that I’m traveling a lot over the next 30 days, so the review will be a little delayed (hence this pre-review). I said “yes” because Grok is one of the Python web frameworks that’s most interesting to me these days. It’s interesting because one of its underlying goals is to take concepts from [STRIKEOUT:Zope 3]Zope Toolkit, and make them more accessible and less daunting. These concepts — the component model, pluggable utilities, and graph-based traversal — are some of the most powerful tools I’ve worked with during my career. And of course, they can also be daunting, even to people with lots of experience; making them more accessible is a good thing.

I’ve read the first four chapters of Grok 1.0 Web Development, and so far there’s a lot to like. It’s the sort of documentation I wish I’d had when I ported the Creative Commons license chooser to Grok1. I’m looking forward to reading the rest, and will post a proper review when I return from Nairobi. In the mean time, check out Grok, Zope 3 for cavemen.

You can download a preview from Grok 1.0 Web Development, `Chapter 5: Forms </media/2010/03/7481-grok-1-0-Web-development-sample-chapter-5-forms.pdf>`_.


1 The CC license chooser has evolved a lot over the years; shortly after Grok was launched we adopted many of its features as a way to streamline the code. Grok’s simplified support for custom traversal, in particular, was worth the effort.

date:2010-03-16 09:14:50
wordpress_id:1567
layout:post
slug:pre-read-grok-1-0-web-development
comments:
category:reading
tags:cc, grok, pre-read, python, reading, zope

For Some Definition of “Reusable”

I read “Why I switched to Pylons after using Django for six months” yesterday, and it mirrors something I’ve been thinking about off and on for the past year or so: what is the right level of abstraction for reuse in web applications? I’ve worked on two Django-based projects over the past 12-18 months: CC Network and koucou. Neither is what I’d call “huge”, but in both cases I wanted to re-use existing apps, and in both cases it felt… awkward.

Part of this awkwardness is probably the impedance mismatch of the framework and the toolchain: Django applications are Python packages. The Python tools for packaging and installing (distutils, setuptools, distribute, and pip, I think, although I have the least experience with it) work on “module distributions1: some chunk of code with a setup.py. This is as much a “social” issue as a technology one: the documentation and tools don’t encourage the “right” kind of behavior, so talk of re-usable applications is often just hand waving or, at best, reinvention2.

In both cases we consciously chose Django for what I consider its killer app: the admin interface. But there have been re-use headaches. [NB: What follows is based on our experience, which is setuptools and buildout based] The first one you encounter is that not every developer of a reusable app has made it available on PyPI. If they’re using Subversion you can still use it with setuptools, but when re-using with git, we have some additional work (a submodule or another buildout recipe). I understand pip just works with the most commons [D]VCS, but haven’t used it myself. Additionally, they aren’t all structured as projects, and those that are don’t always declare their dependencies properly3. And finally there’s the “real” issues of templates, URL integration, etc.

I’m not exactly sure what the answer is, but it’s probably 80% human (as opposed to technology). Part of it is practicing good hygiene: writing your apps with relocatable URLs, using proper URL reversal when generating intra-applications URLs, and making sure your templates are somewhat self-contained. But even that only gets you so far. Right now I have to work if I want to make my app easily consumable by others; work, frankly, sucks.

Reuse is one area where I think Zope 3 (and it’s derived frameworks, Grok and repoze.bfg) have an advantage: if you’re re-using an application that provides a particular type of model, for example, all you need to do is register a view for it to get a customized template. The liberal use of interfaces to determine context also helps smooth over some of the URL issues4. Just as, or more, importantly, they have a strong culture of writing code as small “projects” and using tools like buildout to assemble the final product.

Code reuse matters, and truth in advertising matters just as much or more. If we want to encourage people to write reusable applications, the tools need to support that, and we need to be explicit about what the benefits we expect to reap from reuse are.


1 Of course you never actually see these referred to as module distributions; always projects, packages, eggs, or something else.

2 Note that I’m not saying that Pylons gets the re-use story much better; the author admits choosing Django at least in part because of the perceived “vibrant community of people writing apps” but found himself more productive with Pylons. Perhaps he entered into that with different expectations? I think it’s worth noting that we chose Django for a project, in part, for the same reason, but with different expectations: not that the vibrant community writing apps would generate reusable code, but that they would education developers we could hire when the time came.

3 This is partially due to the current state of Python packaging: setuptools and distribute expect the dependency information to be included in setup.py; pip specifies it in a requirements file.

4 At least when dealing with graph-based traversal; it could be true in other circumstances, I just haven’t thought about it enough.

date:2010-03-09 18:38:54
wordpress_id:1539
layout:post
slug:for-some-definition-of-reusable
comments:
category:development
tags:django, python, web, zope

i18n HTML: Bring the Pain

I have to stay up a little later this evening than I’d planned, so as a result I’m finally going through all the tabs and browser windows I’ve had open on my personal laptop. I think some of these have been “open” for months (yes, there have been browser restarts, but they’re always there when the session restores). One that I’ve meant to blog is Wil Clouser’s post on string substitution in .po files. It’s actually [at least] his second post on the subject, recanting his prior advice, coming around to what others told him previously: don’t use substitution strings in .po files.

I wasn’t aware of Chris’s previous advice, but had I read it when first published, I would have nodded my head vigorously; after all, that’s how we did it. Er, that’s how we, uh, do it. And we’re not really in a position to change that at the moment, although we’ve certainly looked pretty hard at the issue.

A bit of background: One of the core pieces of technology we’ve built at Creative Commons is the license chooser. It’s a relatively simple application, with a few wrinkles that make it interesting. It manages a lot of requests, a lot of languages, and has to spit out the right license (type, version, and jurisdiction) based on what the user provides. The really interesting thing it generates is some XHTML with RDFa that includes the license badge, name, and any additional information the user gives us; it’s this metadata that we use to generate the copy and paste attribution HTML on the deed. So what does this have to do with internationalization? The HTML is internationalized. And it contains substitutions. Yikes.

To follow in the excellent example of AMO and Gnome, we’d start using English as our msgids, leaving behind the current symbolic keys of the past. Unfortunately it’s not quite so easy. Every time we look at this issue (and for my first year as CTO we really looked; Asheesh can atest we looked at it again and again) and think we’ve got it figured out, we realize there’s another corner case that doesn’t quite work.

The real issue with the HTML is the HTML: zope.i18n, our XSLT selectors, the ZPT parse tree: none of them really play all that well with HTML msgids. The obvious solution would be to get rid of the HTML in translation, and we’ve tried doing that, although we keep coming back to our current approach. I guess we’re always seduced by keeping all the substitution in one place, and traumatized by the time we tried assembling the sentences from smaller pieces.

So if we accept that we’re stuck with the symbolic identifiers, what do we do? Build tools, of course. This wasn’t actually an issue until we started using a “real” translation tool — Pootle, to be specific. Pootle is pretty powerful, but some of the features depend on having “English” msgids. Luckily it has no qualms about HTML in those msgids, it has decent VCS support, and we know how to write post-commit hooks.

To support Pootle and provide a better experience for our translators, we maintain two sets of PO files: the “CC style” symbolic msgid files, and the “normal” English msgid files. We keep a separate “master” PO file where the msgid is the “CC style” msgid, and the “translation” is the English msgid. It’s this file that we update when we need to make changes, and luckily using that format actually makes the extraction work the way it’s supposed to. Or close. And when a user commits their work from Pootle (to the “normal” PO file), a post-commit hook keeps the other version in sync.

While we’ve gotten a lot better at this and have learned to live with this system, it’s far from perfect. The biggest imperfection is its custom nature: I’m still the “expert”, so when things go wrong, I get called first. And when people want to work on the code, it takes some extra indoctrination before they’re productive. My goal is still to get to a single set of PO files, but for now, this is what we’ve got. Bring the pain.


For a while, at least. We’re working on a new version of the chooser driven by our the license RDF. This will be better for re-use, but not really an improvement in this area.

This works great in English, but in languages where gender is more strongly expressed in the word forms, uh, not so much.

date:2010-03-01 23:21:20
wordpress_id:1501
layout:post
slug:i18n-html-bring-the-pain
comments:
category:cc, development
tags:cc, i18n, license engine, zope

Unicode output from Zope 3

The Creative Commons licene engine has gone through several iterations, the most recent being a Zope 3 / Grok application. This has actually been a great implementation for us[1]_, but since the day it was deployed there’s been a warning in `README.txt <http://code.creativecommons.org/svnroot/cc.engine/trunk/README.txt>`_:

If you get a UnicodeDecodeError from the cc.engine (you’ll see this if it’srunning in the foreground) when you try to access the http://host:9080/license/then it’s likely that the install of python you are using is set to use ASCIIas it’s default output.  You can change this to UTF-8 by creating the file/usr/lib/python<version>/sitecustomize.py and adding these lines:

  import sys
  sys.setdefaultencoding(“utf-8”)

This always struck me as a bit inelegant — having to muck with something outside my application directory. After all, this belief that the application should be self-contained is the reason I use zc.buildout and share Jim’s belief in the evil of the system Python. Like a lot of inelegant things, though, it never rose quite to the level of annoyance needed to motivate me to do it right.

Today I was working on moving the license engine to a different server[2]_ and ran into this problem again. I decided to dig in and see if I could track it down. In fact I did track down the initial problem — I was making a comparison between an encoded Unicode string and without specifying an explicit codec to use for the decode. Unfortunately once I fixed that I found it was turtles all the way down.

Turns out the default Zope 3 page template machinery uses `StringIO <http://www.python.org/doc/lib/module-StringIO.html>`_ to collect the output. StringIO uses, uh, strings — strings with the default system encoding. Reading the module documentation, it would appear that mixing String and Unicode input in your StringIO will cause this sort of issue.

Andres suggested marking my templates as UTF-8 XML using something like:

< ?xml version="1.0" encoding="UTF-8" ?>

but even after doing this and fixing the resulting entity errors, there’s still obviously some 8 bit Strings leaking into the output. In conversations on IRC the question was then asked: “is there a reason you don’t want a reasonable system wide encoding if your locale can support it?”

I guess not[3]_.

UPDATE Martijn has a tangentially related post which sheds some light on why Python does/should ship with ascii as the default codec. At least people smarter than me have problems with this sort of thing, too.


[1]Yes, I may be a bit biased — I wrote the Zope3/Grok implementation. Of course, I wrote the previous implementation, too, and I can say without a doubt it was… “sub-optimal”.
[2]We’re doing a lot of shuffling lately to complete a 32 to 64 bit conversion; see the CC Labs blog post for the harrowing details.
[3]So the warning remains.
date:2008-07-19 12:57:33
wordpress_id:563
layout:post
slug:unicode-output-from-zope-3
comments:
category:cc, development
tags:cc, development, license engine, python, zope