ccPublisher, Python and XML

So two days ago I launched the first Developer Preview of ccPublisher 2 for Linux, promising Windows and Mac OS X builds “within the day.” It’s been two days, they’re not uploaded, what’s going on? Funny you should ask. It actually has a lot to do with something else that’s been generating a lot of discussion lately on Python blogs: XML.

Philip J. Eby, the mastermind behind things like PEAK and Python Eggs, wrote a blog post last month titled Chandler Begins Recovery from XML This follows his self-described rant from late last year, Python is not Java where he took developers to task for, among other things, turning to XML as the solution to all your data and configuration woes. The gist was, it might work for Java, but when mixed with Python it’s nothing but a boat anchor. So how is Chandler “recovering” from XML? By dumping it. Their system for extending Chandler, parcels, previously used an XML file to define extension points and connections (roughly — I won’t claim really deep knowledge here). The new system, championed by PJE, uses Python syntax and code — descriptors, registrations, etc — to accomplish the same thing. PJE’s argument, as I read it, hinges not on the idea that XML is inherently evil, but rather that using XML is often a sign of over-engineering. As a believer in YAGNI (Ya Ain’t Gonna Need It) in software development, I can agree with that.

So what does this have to do with ccPublisher 2, and more importantly the delayed Developer Preview packages? Let me address the two parts of that question in sequence.

First, what does it have to do with ccPublisher 2? A major design goal of ccPublisher 2 is enabling third-party contributions, in the form of extensions and derivative applications. We’re doing this in a number of ways, including basic things like improved documentation. A major tactic, though, is the use of loosely coupled pieces of code that are intentionally ignorant about one another. For example, an MP3 file contains metadata in the form of ID3 tags. The object that wraps the generic file doesn’t know this, but it knows it can say “Hey, all you components — anyone know anything about this here file-thingy?” and an adapter object will respond with everything it knows. So in theory (and in practice, actually — this mostly works already) you can swap out or add objects that respond without major surgery. A huge improvement over the ccPublisher 1 codebase. All these bits of code are tied together by XML files that describe subscriptions, adapters and interfaces. I chose the ZCML format, developed as part of the Zope3 project, because I was familiar with it, and because I was reasonably confident I could use code from Zope3 to make my life easier. And it turns out I was right — ZCML was reasonably easy to separate from Zope3. It’s also made life somewhat easier, and it will let non-coders who need customized metadata fields to add them relatively easily (note that I haven’t actually decided if non-coders will actually need to do this, it’s just the easiest rationalization right now).

So after reading Philip’s rant(s) and background on deprecating XML configuration files in Chandler, I started thinking about the suitability of ZCML for the task at hand. ZCML makes a lot of sense for Zope3 — a big advantage (in my mind) of Zope3 over previous versions is that (in theory) you can take existing classes that model data or behavior and use them in Zope without making them Zope-specific. In that case moving the configuration and registration into external files helps with that goal. ccPublisher doesn’t have that goal or that baggage — anything used in ccPublisher will probably be ccPublisher-ized in some way. I’m not convinced that ZCML is the wrong choice for ccPublisher, but the talk has had the effect of making me think about it more now than I did earlier.

Now, on to the second question — why the delay. Well, it turns out that ZCML makes life a bit more difficult when packaging your code. Linux wasn’t a problem — you just use distutils and specify a recursive-include in the Windows is a different story — we’re using py2exe, which means there are two problems: first, py2exe ignores the when finding modules to include. This makes a certain perverse sense, but it still bites you in the ass. After hacking up a script to include the ZCML along side the Python byte-code, though, you [I] realize something — the byte code is in a ZIP file, and your code doesn’t traverse into ZIP files (ala PEP 302) to retrieve the ZCML resources properly. Additionally, even though you can set up a dummy tree along side containing the ZCML, the Python pathing makes things, well, ugly. Really ugly. Sigh.

So ccPublisher 2 Developer Preview is slightly delayed on Windows while we make some retrofits to the code. The solution I’ve decided on is Python Eggs. Eggs let you package your Python code, make explicit declarations about dependencies and (most importantly for this situation) access non-code resources stored in the package.

So interestly, PJE appears to have the ability to spark concern as well as solve weird edge-case problems.

date:2005-09-08 16:56:35
category:development, python

ZConfig 2.0 Released

I just saw on the Python Daily URL that ZConfig 2.0 is out. I migrated our internal backup software to ZConfig from Python’s included ConfigParser over the summer after cursing one too many times at it’s inability to do intelligent type checking. While it was a pain in the ass to create the schema and grok the datatypes set up, it was well worth it. Finally, config files you can actually read!

date:2003-11-03 09:30:43

XML generation made easy

Stoa is a project I work on for my day job; a Zope-based Student Information System we use at the school for everthing from scheduling and attendance to posting online content for courses. Stoa uses XML for a few internal tasks, and right now it uses a module I wrote to do the handling. The module is ugly. And expensive. Really ugly, really expensive.

So I’ve been struggling with ideas on how to consume and emit XML is a “Python-ic”, Zope-friendly way. JAXML seems to be a good choice for emitting Python. It has an amazingly simple API, and doesn’t get caught up in the SAXiness that seems to plague other XML tools (I know, I know, SAX is supposed to make our lives better, but it just gets in my way). Now to find an equally simple way to consume XML.

date:2003-10-20 09:07:40

Creative Commons License Validator

Well, what good is a weekend not spent coding? In this case, I managed to hack together a very rudimentary RDF parser/validator specialized for Creative Commons licenses. You can find the web interface here and the sourcecode for the CGI here. The Python module which does most of the heavy lifting,, will be up soon. I’d love your feedback on any or all of it.

Right now it can parse raw RDF or retrieve a URL and scan it for RDF (using a simple regular expression). In either case it parses it and spits out the licensing and work information it finds.

It still needs some work when it comes to parsing work description information, especially with sub-elements (like Agent’s). I’d also love to hear suggestions for improving the output mechanism; it currently runs as a simple CGI, so the result page’s HTML is manually emitted with print statements. Any suggestions for making this work smarter?

date:2003-10-15 09:00:00
category:ccValidator, python

Ahhh… Now I Understand

Descriptors, like meta-classes, are a feature of Python that I didn’t really understand up until now. Raymond Hettinger has written an excellent How-To Guide for Descriptors, which clearly explains and demonstrates their use. My biggest suprise is that (unlike meta-classes) I already knew how to use Descriptors, I just didn’t have the vocabulary to adequately explain them or an understanding of the underlying machinary.

Reading explanations of advanced Python features always leaves me feeling a little overwhelmed, yet incredibly empowered: “you mean I can do that with so little code?”

date:2003-10-14 12:36:17