I read that 2018 was the final year of Open Source Bridge. Reading that I felt sadness, as well as gratitude the organizers were able to choose that. I spoke at the inaugural Open Source Bridge, and my memory of that is that it was such a refreshing vibe compared to the other conferences I was attending at the time (OSCON, Semantic World). There was space for self care (yoga), for weird ideas, and for community in a way I didn’t experience at other conferences. I started to write this as a status update, and realized that these feelings about Open Source Bridge are part of a larger wave of nostalgia for the late aughts I’ve been feeling lately.

The first Open Source Bridge took place in 2009. I had been living in San Francisco for two years and was working at Creative Commons. My role at CC had morphed from “figure out what we could build to engage people with the commons” to “figure out our technical strategy and how we fit with the You Tubes of the world”. I was a lot better at the former; at the very least I enjoyed it more. But there was still something there that I felt energized being part of. There was a community that I appreciated and valued. I’ve reflected in the past that formed the core of this online community for me. It also lived on blogs and in the #cc IRC channels.

So I guess it’s appropriate that some of this nostalgia is undoubtedly triggered by all the awesome Indie* work being done. Just this week I learned about Indie Book Club and Indie Web Ring. And while both are simple, that’s sort of the point: I’m happy they both exist, because [I hypothesize] they help me connect with a larger community while being my whole self online. Being my whole self means that I “show up” in a solid, singular way: you come to my blog and get printmaking, scifi quotes, Python advice, sewing; you get me.

My talk at the inaugural Open Source Bridge was entitled “A Database Called the Web”. It was the second and last time I presented that talk, which was a shame because I don’t think I ever really got the kinks worked out. Creative Commons was founded with this technical layer under girding the licenses, and “A Database Called the Web” was my attempt to articulate that decentralized, federated vision in a way that didn’t start with RDF, XHTML, etc. And that’s why in addition to feeling some nostalgia I also feel some hope: it seems like with ~ 10 years of time (and a lot of heartache) people have moved on to building decentralized things that they want to see exist in the world. And that makes me happy.

Minor update for OpenAttribute

I’ve just pushed a minor update to OpenAttribute to github. It’s minor in terms of user-facing functionality, but improves support for one important use case, licensing of “objects” in a page.

*Summary:* OpenAttribute 0.8.1 (XPI) fixes Issues 1, 2, 3, and 5. All users should install this update.

CC REL is Creative Commons’ recommended way to describe licenses and licensed works. It builds on RDFa, a W3 recommendation, and allows publishers to specify the license and other related information about a work. One of the advantages of building on RDFa was that we could scope the assertions in the page, making it unambiguous (for software) what was being licensed: the page, a particular image or video, or even a specific portion of text. While the initial release of OpenAttribute properly detected the license, it was unable to display the results.

Creative Commons licenses are self-describing using CC REL: a tool (like OpenAttribute) can dereference the license and discover information like the human readable name, version, and permissions/requirements/prohibitions. While Igor’s code from GSoC used the CC API to retrieve this information, it was clear to me that using the license itself is preferable: using self-describing resources on the web allows everyone to play, without registration or integration.

OpenAttribute 0.8 had a somewhat naive implementation of license dereferencing and parsing, which caused problems when there were multiple licensed objects in a page. In the 0.8 release, only information about the last object was displayed in the dialog. OpenAttribute 0.8.1 includes a new licenseloader component, which implements some simple serialization for these requests. If multiple requests are made, they’re queued and dereferenced/processed in order.

You’ll note that OpenAttribute isn’t available from AMO yet. That’s partially because I’d like to have some more testing before making it available there. If you try it out and find bugs or have feature requests, you can file those on the github project.

date:2010-12-31 13:02:03
tags:add-on, cc, drumbeat, firefox, mozcc, OpenAttribute

Licenses & Attribution in Firefox: OpenAttribute

Seven years ago I started working on MozCC, an add-on for Firefox that exposed Creative Commons license information embedded in web pages. Little did I know that add-on would be the start of a career with CC, eventually leading me to San Francisco, and subsequently around the globe to talk about CC’s technology. MozCC was dropped from active maintenance somewhere around Firefox 3, but of the tools I built during my first couple years at CC, it’s the one I still get the most questions about.

This summer, Igor Lukanin worked on a Google Summer of Code project for CC to develop a replacement for MozCC, an add-on for Firefox that would expose license and attribution information. While the project wasn’t totally successful, it did produce an add-on that detected CC licenses in pages, and exposed details about them.

Last month I had the opportunity to attend Learning, Freedom, and the Web, the first Mozilla Drumbeat Festival, in Barcelona, Spain. One of the issues identified by attendees was that, while there are plenty of works being licensed under CC licenses, knowing how to properly attribute re-use is still a challenge. How many times have we seen presentations made up of beautiful photos, with a simple “CC licensed, Flickr” under each (if that)? The proposed solution was an attribution generator: a tool that would generate reasonable attribution for CC licensed works, based on information available.

I’ve spent the past month or so hacking on Igor’s code in my spare time, using it as the basis of an attribution generator for Firefox. The result is OpenAttribute (working name, selected by the Drumbeat group), which is available for testing on Firefox 3.6 and above (I’ve been testing with 3.6 and 4.0b8 on Linux). As Dame Shirley Bassey sang, this is “all just a little bit of history repeating.”


OpenAttribute is an add-on that displays a small “CC” icon in the URL bar when license information is present. Clicking that icon displays the page’s license information and, importantly, copy and paste HTML you can use to attribute the work. You can click the “More Information” button to display the details on licensed objects in the page.


There’s still work to be done, but at this point I think it’s ready for broader testing. You can download the add-on and find the code on github. Feedback, questions, suggestions should all probably go to the attrib-generator Google Group.

date:2010-12-29 16:00:52
tags:add-on, cc, drumbeat, firefox, mozcc, OpenAttribute

CI at CC

I wrote about our roll-out of Hudson on the CC Labs blog. I wanted to note a few things about deploying that, primarily for my own reference. Hudson has some great documentation, but I found Joe Heck’s step by step instructions on using Hudson for Python projects particularly helpful. We’re using nose for most of our projects, and buildout creates a nosetest script wrapper that Hudson runs to generate pass/fail reports.

Setting up coverage is on the todo list, but it appears that our particular combination of libraries has at least one strange issue: when cc.license uses Jinja2 to load a template, coverage thinks it’s a Python source file (maybe it uses an import hook or something? haven’t looked) and tries to tokenize it when generating the xml report. Ka-boom. (This has apparently already been reported.)

Another item in the “maybe/someday” file is using Tox to run the tests using multiple versions of Python (example configuration for Tox + Hudson exists). I can see that this is a critical part of the process when releasing libraries for others to consume. We have slightly less surface area — all the servers run the same version of Python — but it’d be great to know exactly what our possible deployment parameters are.

Overall Hudson already feels like it’s adding to our sanity. I just received my copy of Continuous Delivery, so I think this is the start of something wonderful.

date:2010-08-20 10:37:43
category:cc, development
tags:cc, CI, coverage, Hudson, python, sanity

CiviCon Plenary: What Are We Paying For?

I’m spending part of my day today at the first ever CiviCon, a one day conference for the CiviCRM community. I was honored (and a little surprised) to be asked to give the opening plenary talk. My original travel schedule called for me to fly back from Istanbul yesterday, get up this morning and present (and then presumably crash from jet lag). Luckily (for me) the volcano derailed my trip to Istanbul, so I was able to present with a little more sleep. This is the text of the talk I prepared.

Good morning. I am incredibly happy to be here talking to you this morning. I’m excited to be here because CiviCRM has been and continues to be a very important part of our infrastructure at Creative Commons. Beyond CC, I think it’s an important piece of software for non-profit and grassroots organizations, one that should not be ignored when evaluating donor and constituent management applications. This morning I want to tell you about my experience with it, and why I think it’s so very, very important.

I’ve been at Creative Commons about six years now. I started working as an engineer, which basically meant I wrote code, deployed code, and babysat the server — there was only one then — when it got cranky. We started using Civi in 2006. Creative Commons is a 501c3, which means we have to work towards passing the IRS’s public support test: 30% of our operating budget from “the public”, in our case. In 2005 we had a rather unpleasant realization: we hadn’t done shit, and we were getting close to the end of our five year grace period. I hacked together a contribution page, wired up PayPal for donation processing, and put together a terrible looking Plone backend that allowed people in the office to enter offline checks, promised contributions, etc, and to make the all important thermometer work on the front page. And we started begging. Creative Commons is fortunate to have a generous, passionate community, and they responded to our call. After the dust had settled and we realized we were going to have to do this again, we knew we needed something better than what I could write in between other projects. It was clear that we didn’t have the resources to build something complete, and still fulfill our mission, so we started looking around. Mike Linksvayer, then our CTO, asked me if I’d used Raiser’s Edge before, and if so, what I thought of it. Raiser’s Edge had — perhaps has — the mindshare, and our development director at the time was certain Raiser’s Edge was the essential piece of technology to make the jump to the big leagues. If I recall correctly, my response is not suitable for mixed company.

Before working at CC, I worked for a very different kind of non-profit. Canterbury School is a private, K-12, college prep school. With small classes, dedicated teachers, and a commitment to education that includes computer programming for every student, they are they best game in private education in northeast Indiana. The faculty and staff there do an amazing job. Every year Canterbury holds an annual campaign, which raises funds to support the school operating budget. Part of my job there was supporting the development office and Raiser’s Edge. Our Raiser Edge installation had one primary user, Barbara. I am convinced Barbara is the direct descendant of the bearded UNIX sysadmin wizards. Unlike her ancestors, she does not keep mainframes online, but rather has her own special wizard-like skills: comprehensive knowledge of the Raiser’s Edge query language and interface. So you can imagine that days when Barbara called my office were not my favorite days at work. “The database is locked” “My labels template won’t print right” “The query isn’t returning data I know is in there.”

Now in all fairness, this was almost ten years ago, and we were running Raiser’s Edge on Sybase SQLAnywhere. But I’d also like to point out that we were paying about $3000 a year for a support contract, a contract sold with a statement along the lines of, “You want updates when they come out, don’t you?” These updates and improvements, when they did come out, were usually accompanied by new system requirements, indirect costs, and lots of time spent making sure the upgrade went correctly. And bugs fixes? What bugs?

This, my friends, is what we call a protection racket.

I also had enough experience to know that expecting to deploy a CRM package out of the box with no customization was probably unreasonable. I had worked on a CRM and ERP deployment in 2003 for an international manufacturer that took the better part of a year, and whose customization costs were almost as high as the software licensing costs. I was lucky on that project: I got to write C++ DLLs to handle custom accounting information from pricing tables. And this was a product sold on how open and customizable it was. “Look, we have an API, you can write a DLL.” Gee, thanks.

So back to Creative Commons: after some consideration, CiviCRM was selected as a platform. When we started out with Civi, we paid for CivicActions’ development, customization, and hosting services. We also paid for updates, improvements, and upgrades, just like we would have with Raiser’s Edge. It’s a couple years later, in 2007, when the difference between the two really becomes apparent. because

In 2007, we had been been using Civi for a couple of years, and while we weren’t unhappy with it, it still wasn’t really a piece of software any of us looked forward to using. I personally tried to ignore it as much as possible except during the lead up to the annual campaign. It worked, it did it’s job, but it felt quirky and a little clunky, sort of like I was working against it sometimes. But there was hope. The 3.0 release was going to be a big upgrade. Lots of new features, improved user interface, and lots of attention paid to usability. We talked to CiviActions about a 3.0 upgrade and some additional customization we wanted to do. The quotes we got back were, well, expensive. Now I don’t think they were out of line, but it was more than we had budgeted at the time. Our situation internally had changed, as well. We were up to a handful of servers, two full time engineers, and a graphic designer with great technical chops. Instead of quoting out every change and upgrade, it started to feel like we might be ready to walk on our own.

So we checked out the source code — including all the custom code CivicActions built for us — and installed it on our own server.

I think this bears repeating. We downloaded the source code — the same source code our vendor was using — and installed it with our own tools. We didn’t have to ask for permission, pay a fee, or “upgrade” our license; it was already available to us.

You see, unlike the protection racket I lived under at Canterbury, at Creative Commons we’ve been paying for something that we own. That the community owns. We’re paying for value that we retain, that we can take elsewhere when we’re ready. This is huge.

Now, for those of you who are open source and free software developers, those of us who run Linux on our laptops, who build applications using open source tools, this sounds like business as usual. But CiviCRM is very special software: it is not software for Geeks, it’s software for Humans.

At Creative Commons, we build legal tools that help people share their creative works with the permissions that they choose, the some rights reserved that work best for them. But we also recognize that most of the time, those creators are not lawyers. They’re coming to us because they’re not lawyers, and our job is to reduce the number of hoops they have jump through to share their work. So we make our tools available for two different audiences, which we jokingly refer to as Lawyers and Humans. Today I’d like to posit that we have a similar divide when it comes to free software. Apache, Linux, MySQL, CouchDB — these are all examples of software for Geeks. CiviCRM is software for Humans. And it’s really important software for Humans.

A friend of mine, Asheeesh Laroia, runs Open Hatch, a website dedicated to helping people get involved with open source software. Talking about software like CiviCRM, he made the comment, “It’s important that we give communities tools that they can use and that they can control. Otherwise how do we expect them to be independent and self sustaining?” Fundraising and constituent management is a critical part of any non-profit’s life cycle. Why do we think it’s OK for these organizations — many of which we personally contribute to — to enrich the coffers of a for profit businesses with no real long term return?

Please don’t misunderstand me: I am not saying that paying money for support is wrong, misguided, or unncessary. I am saying that we — non-profits — can do a lot better than investing in a protection racket. We can pay for support, and at the same time invest in our tools. And when we’re ready to move to the next level, there is no upgrade fee. We already own it all.

Creative Commons’ story did not end by simply installing CiviCRM onto our server and moving on. And putting it on our own server has not resulted in the cost of Civi going to zero. In fact, we still “pay” for support, but we do it differently these days. Maintenance and support for Civi is a core responsibility for the technical team today. When our development team has questions, wants to run new reports, or wants to do something new like personal campaigns, they sit down with Chris, John, and Nathan, and figure out what we’ll need to do to Civi to make it work.

Last year we decided that the contribution workflow was too long. We were demanding information from our donors that we didn’t necessarily need, and duplicating information they might give PayPal when processing their payment. We wanted to streamline the process, reducing the number of clicks between us and a donor’s money. We invested the time to implement some custom code on Civi to make this happen. Right now we want to offer tighter integration between CiviCRM and CC Network accounts, a premium we offer donors. So we’re writing code to do that.

The click-streamline code is available to anyone who wants to use it. We’re writing our integration code with the hope we can contribute it to the CiviCRM core. It’s tempting to say that we’re no longer customers, that we’ve moved up to become partners. But the truth is that we were never just customers: with CiviCRM, we were partners from day one, we just didn’t necessarily realize it.

Now I know that lots of commercial vendors talk about partnering with customers. I’d like to call bullshit on that. If you don’t have the source code, you are not a partner. If you can’t be trusted to inspect the software you rely on, you are not a respected equal in that relationship. Companies go out of business. Companies are acquired. Management changes. This is the reality of business.

Let’s go back and look at Canterbury again. I sent Vern, Canterbury’s Director of Technology and a man who can really only be called my mentor, an email to fact check my memory about what we were working with in 2001. Vern told me he’s part of a Google Group formed for schools to exchange SQL Server queries for accessing their data. Why SQL Server queries and not an API — seems a little crude to me. Turns out, if you want to write your own code to access your data through their API, there’s a toll: $10000 and another $1000 per year in support.

If a customer want to access their data in the safest, sanest manner, they have to pay. Again. If that were the reality for CiviCRM, we would not have been able to streamline our contribution process. We would not be integrating CC Network memberships as a premium for donating. Hell, I’m not even sure we’d be able to show the beloved thermometer on the front page. The cost is too high.

Let me wrap up with some thoughts about the future. If you work for a non-profit or grassroots organization, I believe it is incumbent upon you to think about things in terms of fiscal stewardship. Where is your money going today, and what value are you getting? Are you a customer, or are you a partner? As an officer at a non-profit, I see fundraising and development as critical to our ability to execute our core mission. Our mission at Creative Commons is not to make money, but if we can’t keep the lights on, we can’t help people share their work online.

If you’re using CiviCRM today, it’s time to start talking about it. I find that lots of non-profits I talk to are unaware that there’s a different way. They haven’t thought about the fact that they have a choice. We need to work to increase the mind share that CiviCRM has. If you think Civi is only good for small organizations, that you can’t really recommend it for large scale installations, I’d like to challenge that misconception. Mozilla Foundation. Wikimedia Foundation. Creative Commons. We’re all using CiviCRM for all or part of our constituent management solution. And if you’re a user or integrator who’s customizing it, it’s time to start thinking about how you give back. The AGPL already requires that you make your changes available. I think it’s worth your while to consider how you’re making those changes. Is this a customization someone else might want? If so, I encourage you to take a few minutes, talk to the developers on IRC or the forums, and think about how you might build your solution in a general way. The CiviCRM developers are some of the most available and responsive I’ve ever encountered. They can give you guidance about how to think about a problem. It may take some extra time to build a general solution. But if you can contribute that back to the core codebase, you’re going to make upgrades much easier. And the great thing is that at the same time you’re making things easy for yourself in the future, you’re also helping to sustain the community and the development process.

Finally, if you’re evaluating CiviCRM today, I have some advice for you. Do not think to yourself, “oh, I can download this for free, this isn’t going to cost me anything.” Do not look at CiviCRM as a zero cost solution. My experience is that there is no such thing in CRM, and I’d argue you wouldn’t want there to be: we all need experts we can call upon in an emergency, and paying for a tool enables the developers to improve it. Instead, do your checkbox evaluation and realize that Civi is competitive with commercial packages. Talk to people who are using CiviCRM and tell them how you plan to use it, ask them about their experience. Then ask yourself, “how much am I going to spend year over year, and what am I getting from that? How is my investment going to grow? Am I paying for protection, or am I investing in my organizations future stability?” I think the answer is self evident.

Working at Creative Commons has been an amazing opportunity for me, and I’m proud of a lot that I’ve accomplished over the past six years. But I’m especially proud of how we’ve used and contributed to CiviCRM. I’m proud to be a user, and I feel good every time I talk to my team about Civi and hear, “Oh, yeah, I just jumped on IRC and got an answer from one of the core developers.” You can not buy that sort of dedication and support.

date:2010-04-22 11:41:29
tags:cc, civicon, civicrm, talks

Back to the Future: Desktop Applications

One of the best prepared talks I saw at PyCon this year was on Phatch, a cross-platform photo processing application written in Python. Stani Michiels and Nadia Alramli gave a well rehearsed, compelling talk discussing the ins and outs of developing their application for Linux, Mac OS X, and Windows. The video is available from the excellent Python MiroCommunity.

The talk reminded of a blog post I saw late last year and never got around to commenting on, Ruby for Desktop Applications? Yes we can. Now I’m only a year late in commenting on it. This post caught my eye for two reasons. First, the software they discuss was commissioned by the AGI Goldratt Institute. I had heard about Goldratt from my father, whose employer, Trusted Manufacturing, was working on implementing constraints-based manufacturing as a way to reduce costs and distinguish themselves from the rest of the market. More interesting, though, was their discussion of how they built the application, and how it seemed to resonate with some of the work I did in my early days at CC.

Atomic wrote three blog posts (at least that I saw), and the one with the most text (as determined by my highly unscientific “page down” method) was all about how they “rolled” the JRuby application: how they laid out the source tree, how they compile Ruby source into Java JARs, and how they distribute a single JAR file with their application and its dependencies. I thought this was interesting because even though it uses a different language (Python instead of Ruby), GUI framework (wx instead of Swing/Batik), and runtime strategy (bundled interpreter instead of bytecode archive), the thing I spent the most time on when I was developing CC Publisher was deployment.

Like Atomic and Phatch, we had a single code base that we wanted to work across the major platforms (Windows, Linux, and Mac OS X in our case). The presentation about Phatch has some great information about making desktop-specific idioms work in Python, so I’ll let them cover that. Packaging and deployment was the biggest challenge, one we never quite got right.

On Windows, we used py2exe to bundle our Python runtime with the source code and dependencies. This worked most of the time, unless we forget to specify a sub-package in our manifest, in which case it blew up in amazing and spectacular ways (not really). Like Atomic, we used NSIS for the Windows installer portion. On Mac OS X, we used py2app to do something similar, and distributed a disk image. On Linux… well, on Linux, we punted. We experimented with a cx-freeze and flirted with autopackage. But nothing ever worked quite right [enough], so we would up shipping tarballs.

The really appealing thing about Atomic’s approach is that by using a single JAR, you get to leverage a much bigger ecosystem of tools: the Java community has either solved, or has well defined idioms for, launching Java applications from JARs. You get launch4j and izpack, which look like great additions to a the desktop developer’s toolbox.

For better or for worse, we [Creative Commons] decided CC Publisher wasn’t the best place to put our energy and time. This was probably the right decision, but it was a fun project to work on. (We do have rebooting CC Publisher listed as a suggested project for Google Summer of Code, if someone else is interested in helping out.) Given the maturity of Java’s desktop tool chain, and the vast improvements in Jython over the past year or two, I can imagine considering an approach very much like Atomic’s were I working on it today. Even though it seems like the majority of people’s attention is on web applications these days, I like seeing examples of interesting desktop applications being built with dynamic languages.

date:2010-03-30 09:04:03
tags:cc, ccpublisher, python

CiviCon Next Month in San Francisco

I’m honored to be asked to kick off the first ever CiviCon next month in San Francisco. CiviCon is a one day conference for users of CiviCRM, a free software constituent relationship management platform. CiviCRM is a key component of Creative Commons’ infrastructure (we use it as our donor management system), and I’m excited to see the community come together and talk about new features, integration techniques, and ideas for future development.

When I was asked to present, I thought about what I could talk about, beyond simply our deployment and customization of CiviCRM (which the other Nathan will do a great job of during his presentation). Creative Commons is not the first non-profit I’ve worked with, and CiviCRM is not the first constituent/donor management system I’ve worked with. As I thought about my past experience and my experience at Creative Commons, I realized that CiviCRM is a key piece of infrastructure that enables Creative Commons to fulfill its mission, and to do so in a responsible way. Using CiviCRM is not just a question of free vs. proprietary software: it’s a question of responsible stewardship. CiviCRM and other free software allows us to fulfill our mission in a responsible, sustainable way. I think this is important to think about, so I’ll be talking about why I think this is the case. I’ll touch on how CiviCRM fits into Creative Commons, how it supports our mission, why I think FLOSS infrastructure ([STRIKEOUT:including]especially Civi) is essential for non-profits and grassroots organizations, and what I think is on the horizon.

I hope you’ll join me next month at CiviCon; you can register now (space is limited). The list of proposed sessions is online, and it looks like a really interesting day.

date:2010-03-22 06:47:53
tags:cc, civicon, civicrm, san francisco, speaking

Pre-read: Grok 1.0 Web Development

|image0|Late last month I received an email from Packt Publishing (en.wp), asking if I’d be interested in reviewing one of their new titles, `Grok 1.0 Web Development <>`_, by Carlos de la Guardia. I immediately said yes, with the caveat that I’m traveling a lot over the next 30 days, so the review will be a little delayed (hence this pre-review). I said “yes” because Grok is one of the Python web frameworks that’s most interesting to me these days. It’s interesting because one of its underlying goals is to take concepts from [STRIKEOUT:Zope 3]Zope Toolkit, and make them more accessible and less daunting. These concepts — the component model, pluggable utilities, and graph-based traversal — are some of the most powerful tools I’ve worked with during my career. And of course, they can also be daunting, even to people with lots of experience; making them more accessible is a good thing.

I’ve read the first four chapters of Grok 1.0 Web Development, and so far there’s a lot to like. It’s the sort of documentation I wish I’d had when I ported the Creative Commons license chooser to Grok1. I’m looking forward to reading the rest, and will post a proper review when I return from Nairobi. In the mean time, check out Grok, Zope 3 for cavemen.

You can download a preview from Grok 1.0 Web Development, `Chapter 5: Forms </media/2010/03/7481-grok-1-0-Web-development-sample-chapter-5-forms.pdf>`_.

1 The CC license chooser has evolved a lot over the years; shortly after Grok was launched we adopted many of its features as a way to streamline the code. Grok’s simplified support for custom traversal, in particular, was worth the effort.

date:2010-03-16 09:14:50
tags:cc, grok, pre-read, python, reading, zope

i18n HTML: Bring the Pain

I have to stay up a little later this evening than I’d planned, so as a result I’m finally going through all the tabs and browser windows I’ve had open on my personal laptop. I think some of these have been “open” for months (yes, there have been browser restarts, but they’re always there when the session restores). One that I’ve meant to blog is Wil Clouser’s post on string substitution in .po files. It’s actually [at least] his second post on the subject, recanting his prior advice, coming around to what others told him previously: don’t use substitution strings in .po files.

I wasn’t aware of Chris’s previous advice, but had I read it when first published, I would have nodded my head vigorously; after all, that’s how we did it. Er, that’s how we, uh, do it. And we’re not really in a position to change that at the moment, although we’ve certainly looked pretty hard at the issue.

A bit of background: One of the core pieces of technology we’ve built at Creative Commons is the license chooser. It’s a relatively simple application, with a few wrinkles that make it interesting. It manages a lot of requests, a lot of languages, and has to spit out the right license (type, version, and jurisdiction) based on what the user provides. The really interesting thing it generates is some XHTML with RDFa that includes the license badge, name, and any additional information the user gives us; it’s this metadata that we use to generate the copy and paste attribution HTML on the deed. So what does this have to do with internationalization? The HTML is internationalized. And it contains substitutions. Yikes.

To follow in the excellent example of AMO and Gnome, we’d start using English as our msgids, leaving behind the current symbolic keys of the past. Unfortunately it’s not quite so easy. Every time we look at this issue (and for my first year as CTO we really looked; Asheesh can atest we looked at it again and again) and think we’ve got it figured out, we realize there’s another corner case that doesn’t quite work.

The real issue with the HTML is the HTML: zope.i18n, our XSLT selectors, the ZPT parse tree: none of them really play all that well with HTML msgids. The obvious solution would be to get rid of the HTML in translation, and we’ve tried doing that, although we keep coming back to our current approach. I guess we’re always seduced by keeping all the substitution in one place, and traumatized by the time we tried assembling the sentences from smaller pieces.

So if we accept that we’re stuck with the symbolic identifiers, what do we do? Build tools, of course. This wasn’t actually an issue until we started using a “real” translation tool — Pootle, to be specific. Pootle is pretty powerful, but some of the features depend on having “English” msgids. Luckily it has no qualms about HTML in those msgids, it has decent VCS support, and we know how to write post-commit hooks.

To support Pootle and provide a better experience for our translators, we maintain two sets of PO files: the “CC style” symbolic msgid files, and the “normal” English msgid files. We keep a separate “master” PO file where the msgid is the “CC style” msgid, and the “translation” is the English msgid. It’s this file that we update when we need to make changes, and luckily using that format actually makes the extraction work the way it’s supposed to. Or close. And when a user commits their work from Pootle (to the “normal” PO file), a post-commit hook keeps the other version in sync.

While we’ve gotten a lot better at this and have learned to live with this system, it’s far from perfect. The biggest imperfection is its custom nature: I’m still the “expert”, so when things go wrong, I get called first. And when people want to work on the code, it takes some extra indoctrination before they’re productive. My goal is still to get to a single set of PO files, but for now, this is what we’ve got. Bring the pain.

For a while, at least. We’re working on a new version of the chooser driven by our the license RDF. This will be better for re-use, but not really an improvement in this area.

This works great in English, but in languages where gender is more strongly expressed in the word forms, uh, not so much.

date:2010-03-01 23:21:20
category:cc, development
tags:cc, i18n, license engine, zope

Houston Connexions

I spent the first half of this week in Houston, Texas for the Connexions Consortium Meeting and Conference. What follows are my personal reflections.

Connexions ( is an online repository of learning materials — open educational resources (OER). Unlike many other OER repositories, Connexions has a few characteristics that work together to expand it’s reach and utility.

While it was founded by (and continues to be supported by) Rice University, the content in Connexions is larger in scope than a single university, and isn’t tied to a particular course the way, say, MIT OCW is. Attendees of the conference came from as far away as the Netherlands and Vietnam.

In addition to acting as a repository, Connexions is an authoring platform: content is organized into modules, which can then be re-arranged, re-purposed, and re-assembled into larger collections and works. This enables people to take content from many sources and assemble it into a single work that suits their particular needs; that derivative is also available for further remixing. At the authors’ panel at the conference, we heard about how some authors have used this to update or customize a work for the class they were teaching. [UPDATE 5 Feb 2010: See the Creative Commons blog for information on this, and thoughts from the authorDr. Chuck“ (Charles Severance), who was on the authors panel. ]

Finally, Connexions is an exemplar when it comes to licensing: if you want your material to be part of Connexions, the license is CC Attribution 3.0. While OER is enabled by CC licenses generally, this choice provides a lot of leverage to users. The remixing, re-organizing, and re-purposing enabled by the authoring platform is far simpler with no license compatibility to worry about. Certainly you can imagine a platform that handled some of the compatibility questions for you — and the idea of developing such a system based on linked data is intriguing to me personally — but the use of a single, extremely liberal license means that when it comes to being combined and re-purposed, all authors are equal, all content is equal.

This year was the second Connexions Conference, and from my perspective there were two themes: the consortium, and Rhaptos. The consortium is actually why I was in Houston. The Connexions Consortium is an, uh, consortium of organizations with a vested interest in Connexions: universities and colleges that are using it and companies that are using the content. And Creative Commons, who I was representing at the meeting. I’ve also been elected to the Technology Committee, a group of people representing consortium members who will provide guidance on technical issues to Connexions. During our meeting on Monday afternoon there was discussion of a variety of areas. One that we didn’t get to, but which is interesting to me, is how content in Rhaptos repositories can be made more discoverable, and how we can enable federated or aggregated search.

Rhaptos was the other prominent theme at the conference. Rhaptos is the code that runs Connexions: without the specific look and feel/branding. While the source code behind Connexions has always been available, in the past year they’ve invested time and resources to making it easy (or at least straight-forward) to deploy. Interestingly (to me) Rhaptos is a Plone (Zope 2) application, and the deployment process makes liberal use of buildout. It’s not clear to me exactly what the market is for Rhaptos. It’s definitely one of those “unsung” projects right now, with lots of potential, and one really high profile user. I think it’ll be interesting to see how the Consortium and Rhaptos interact: right now all of the members are either using the flagship site to author content, or the content from the site to augment their commercial offerings. One signifier of Rhaptos adoption would be consortium members who are primarily users of the software, and interested in supporting its development.

Overall it was a great trip; I got to hear about interesting projects and see a lot of people I don’t get to see that often. I’m looking forward to seeing how both the consortium and Rhaptos develop over the next year.

If needed, and the evidence to date is that the staff is more than competent. I expect we’ll act more as a sounding board, at least initially.

This is an area that’s aligned with work we’re doing at CC right now, so it’s something I’ll be paying attention to.

date:2010-02-04 22:15:06
tags:cc, cnx, IAH, oer, travel