Thoughts on Deploying and Maintaining SMW Applications

In September or October of last year, I received an email from someone who had come across CC Teamspace and was wondering if there was a demo site available they could use to evaluate it. I told them, “No, but I can probably throw one up for you.” A month later I had to email them and say, “Sorry, but I haven’t found the time to do this, and I don’t see that changing.” This is clearly not the message you want to send to possible adopters of your software — “Sorry, even I can’t install it quickly.” Now part of the issue was my own meta/perfectionism: I wanted to figure out a DVCS driven upgrade and maintenance mechanism at the same time. But even when I faced the fact that I didn’t really need to solve both problems at the same time, I quickly became frustrated by the installation process. The XML file I needed to import seemed to contain extraneous pages, and things seemed to have changed between MediaWiki and/or extension versions since the export was created. I kept staring at cryptic errors, struggling to figure out if I had all the dependencies installed. This is not just a documentation problem.

If we think about the application life cycle, there are a three stages a solution to this problem needs to address:[†]_

  1. Installation
  2. Customization
  3. Upgrade

If an extension is created using PHP, users can do all three (and make life considerably easier if they’re a little VCS savvy). But if we’re dealing with an “application” built using Semantic MediaWiki and other SMW Extensions, it’s possible that there’s no PHP at all. If the application lives purely in the wiki, we’re left with XML export/import[‡]_ as the deployment mechanism. With this we get a frustrating release process, Customization support, and a sub-par Installation experience.

The basic problem is that we currently have two deployment mechanisms: full-fledged PHP extensions, and XML dumps. If you’re not writing PHP, you’re stuck with XML export-import, and that’s just not good enough.

A bit of history: When Steren created the initial release of CC Teamspace, he did so by exporting the pages and hand tweaking the XML. This is not a straight-forward, deterministic process that we want to go through every time a bug fix release is needed.

For users of the application, once the import (Installation) is complete (assuming it goes better than my experience), Customization is fairly straight-forward: you edit the pages. When an Upgrade comes along, though, you’re in something of a fix: how do you re-import the pages, retaining the changes you may have made? Until MediaWiki is backed by a DVCS with great merge handling, this is a question we’ll have to answer.

We brainstormed about these issues at the same time we were thinking about Actions. Our initial thoughts were about making the release and installation process easier: how does a developer[◊]_ indicate these pages in my wiki make up my application, and here’s some metadata about it to make life easier.

We brainstormed a solution with the following features:

  1. An “Application“ namespace: just as Forms, Filters, and Templates have their own namespace, an Application namespace would be used to define groups of pages that work together.
  2. Individual Application Pages, each one defining an Application in terms of Components. In our early thinking, a Component could be a Form, a Template, a Filter, or a Category; in the latter case, only the SMW-related aspects of the Category would be included in the Application (ie, not any pages in the Category, on the assumption that they contain instance-specific data).
  3. Application Metadata, such as the version[♦]_, creator, license, etc.

A nice side effect of using a wiki page to collect this information is that we now have a URL we can refer to for Installation. The idea was that a Special page (ie, Special:Install, or Special:Applications) would allow the user to enter the URL of an Application to install. Magical hand waving would happen, the extension dependencies would be checked, and the necessary pages would be installed.

While we didn’t get too far with fleshing out the Upgrade scenario, I think that a good first step would be to simply show the edit diff if the page has changed since it was Installed, and let the user sort it out. It’s not perfect, but it’d be a start.

I’m not sure if this is exactly the right approach to take for packaging these applications. It does effectively invent a new packaging format, which I’m somewhat wary of. At the same time, I like that it seems to utilize the same technologies in use for building these applications; there’s a certain symmetry that seems reassuring. Maybe there are other, obvious solutions I haven’t thought of. If that’s the case, I hope to find them before I clear enough time from the schedule to start hacking on this idea.

date:2010-01-25 21:24:51
category:cc, development
tags:cc, mediawiki, semantic mediawiki, smw

“Actions” for SMW Applications (Hypothetically)

Talking about AcaWiki has me thinking some more about our experiences over the past couple years with Semantic MediaWiki, particularly about building “applications” with it. I suppose that something like AcaWiki could be considered an application of sorts — I certainly wrote about it as such earlier this week — but in this case I’m talking about applications as reusable, customizable pieces of software that do a little more than just CRUD data.

In 2008 we were using our internal wiki, Teamspace, for a variety of things: employee handbook, job descriptions, staff contact information, and grants. We decided we wanted to do a better job at tracking these grants, specifically the concrete tasks associated with each, things we had committed to do (and which potentially required some reporting). As we iterated on the design of a grant, task, and contact tracking system, we realized that a grant was basically another name for a project, and the Teamspace project tracking system was born.

As we began working with the system, it became obvious we needed to improve the user experience; requiring staff members to look at yet another place for information just wasn’t working. So Steren Giannini, one of our amazing interns, built Semantic Tasks.

Semantic Tasks is a MediaWiki extension, but it’s driven by semantic annotations on task pages. Semantic Tasks’ primary function is sending email reminders. One of the things I really like about Steren’s design is that it works with existing MediaWiki conventions: we annotate Tasks with the assigned (or cc’d) User page, and Semantic Tasks gets the email addresses from the User page.

There were two things we brainstormed but never developed in 2008. I think they’re both still areas of weakness that could be filled to make SMW even more useful as an application platform. The first is something we called Semantic Actions: actions you could take on a page that would change the information stored there.

Consider, for example, marking a task as completed. There are two things you’d like to do to “complete” a task: set the status to complete and record the date it was completed. The thought was that it’d be very convenient to have “close” available as a page action, one which would effect both changes at once without requiring the user to manually edit the page. Our curry-fueled brainstorm was that you could describe these changes using Semantic Mediawiki annotations[1]_. Turtles all the way down, so to speak.

The amount of explaining this idea takes, along with some distance, makes me uncertain that it’s the right approach. I do think that being able to easily write extensions that implement something more than CRUD is important to the story of SMW as a “real” application platform. One thing I that makes me uncertain about this approach is the fear that we are effectively rebuilding Zope 2’s ZClasses, only crappier. ZClasses, for those unfamiliar, were a way to create classes and views through the a web-based interface. A user with administrative rights could author an “application” through the web, getting lots of functionality for “free”. The problem was that once you exhausted ZClasses’ capabilities, you pretty much had to start from scratch when you switched to on disk development. Hence Zope 2’s notorious “Z-shaped learning curve”. I think it’s clear to me now that building actions through the web is going to by necessity expose a limited feature set. The question is whether it’s enough, or if we should encourage people to write [Semantic] Mediawiki extensions that implement the features they need.

Maybe the right approach is simply providing really excellent documentation so that developers can easily retrieve the values the SMW annotations on the pages they care about. You can imagine a skin that exists as a minor patch to Monobook or Vector, which uses a hook to retrieve the installed SMW “actions” for a page and displays them in a consistent manner.

Regardless of the approach taken, if SMW is going to be a platform, there has to be an extensibility story. That story already exists in some form; just look at the extensions already available. Whether the existing story is sufficient is something I’m interested in looking at further.

Next time: Thoughts on Installation and Deployment.

[1]The difference between Semantic Tasks and our hypothetical Semantic Actions is that the latter was concerned solely with making some change to the relevant wiki page.
date:2010-01-07 22:13:41
category:cc, development
tags:cc, mediawiki, semantic mediawiki, smw, teamspace

We called it “magic”

Just under ten years ago I started working at Canterbury School doing a variety of things. One thing I wound up doing was building the new Intro to Computer curriculum, based on Python. When Vern and I presented our approach at PyCon in 2003, we were asked what advantages we thought Python had over its predecessor in the curriculum, Java. The first answer was always, “Magic; a lack thereof.” There was less boilerplate, fewer incantations, a much shorter list of things you have to wave your hands about and say, “Don’t worry, we’ll talk about this later in the semester. For right now, it’s magic, just do it.” Magic distracts students, and makes them wonder what you’re hiding.

Seeing a comparison between Java and Clojure (albeit one you can read as more about succinctness than clarity), I was reminded that this lack of magic — boilerplate, ceremony, whatever — is still important.

date:2010-01-06 19:04:13
category:aside, development
tags:magic, python

Caching WSGI Applications to Disk

This morning I pushed the first release of wsgi_cache to the PyPI, laying the groundwork for increasing sanity in our deployment story at CC. wsgi_cache is disk caching middleware for WSGI applications. It’s written with our needs specifically in mind, but it may be useful to others, as well.

The core of Creative Commons’ technical responsibilities are the licenses: the metadata, the deeds, the legalcode, and the chooser. While the license deeds are mostly static and structured in a predictable way, there are some “dynamic” elements; we sometimes add more information to try and clarify the licenses, and volunteers are continuously updating the translations that let us present the deeds in dozens of languages. These are dynamic in a very gross sense: once generated, we can serve the same version of each deed to everyone. But there is an inherent need to generate the deeds dynamically at some point in the pipeline.

Our current toolset includes a script for [re-]generating all or some of the deeds. It does this by [ab]using the Zope test runner machinery to fire up the application and make lots of requests against it, saving the results in the proper directory structure. The result of this is then checked into Subversion for deployment on the web server. This works, but it has a few shortfalls and it’s a pretty blunt instrument. wsgi_cache, along with work Chris Webber is currently doing to make the license engine a better WSGI citizen, aims to streamline this process.

The idea behind wsgi_cache is that you create a disk cache for results, caching only the body of the response. We only cache the body for a simple reason — we want something else, something faster, like Apache or other web server, to serve the request when it’s a cache hit. We’ll use mod_rewrite to send the request to our WSGI application when the requested file doesn’t exist; otherwise it hits the on disk version. And cache “invalidation” becomes as simple as rm (and as fine grained as single resources).

There are some limitation which might make this a poor choice for other applications. Because you’re only caching the response body, it’s impossible to store other header information. This can be a problem if you’re serving up different content types which can’t be inferred from the path (note that we use filenames that look like and, so we tell Apache to override the content type for everything; this works for our particular scenario). Additionally, this approach only makes sense if you have another front end server that can serve up the cached version faster; I doubt that wsgi_cache will win any speed challenges for serving cached versions.

We’re not quite ready to roll it out yet, and I expect we’ll find some things that need to be tweaked, but a test suite with 100% coverage makes that a challenge I’m up for. If you’re interested in taking a look (and adapting it for your own use), you can find the code in Creative Commons’ git repository.

date:2010-01-05 23:37:29
category:cc, development
tags:cache, cc, middleware, python, wsgi, wsgi_cache

AcaWiki: On Building Emerging Applications

I’m woefully late in noting the launch of AcaWiki. Mike does a good job exploring the sweet spot AcaWiki may fill between research blogging and open access journals, and where AcaWiki fits into the wiki landscape. AcaWiki is interesting to me for two reasons; first, I was the technical lead on the project, and second, it’s another recent example of building a site using MediaWiki as a platform. More specifically, we used MediaWiki along with Semantic MediaWiki, Semantic Forms, and several other related extensions as the platform for the site.

The idea of using a wiki for a community oriented site is far from new. The difference here is that Neeru came to us talking about specific ways people could interact with the site — specific structured data she wanted to organize and capture about academic articles. For anyone familiar with MediaWiki and Wikipedia, the obvious answer istemplates; Wikipedia uses them extensively to provide a consistent presentation for parts of an articles (messages about the article, citations, etc). The catch is that for someone coming to a site for the first time, who perhaps has not edited a wiki previously, templates are a bit of inside baseball — you need to know which one to use, and you need to know how to format them in your article. Of course these are trainable skills, but I suspect for many users they’re non-obvious. Semantic Forms lets us provide a form for entering these fields, which is then translated to a template.

The question that comes up when discussing this approach with non-wiki-philes is, “why use a wiki at all? if all you need are CRUD forms, why not just whip it up in Rails, Django, etc?” The question is a good one — a specialized tool almost always has the potential to look fantastic compared to an off the shelf one. And who wants to learn that weird markup syntax, anyway? The thing is, at the end of the day, AcaWiki isn’t a software project, it’s a community project. There isn’t a team of engineers available to help move the toolset forward. There isn’t staff available to fix bugs and write migration scripts. So using off the shelf tools with active communities is essential to achieving any amount of scalability.

As Mike points out, there are some niches AcaWiki seems primed to fill. While working on the site, however, it was clear there are lots of unanswered questions about how that will actually happen. AcaWiki, like many sites that seek to serve a community of interest in a given area, is an emerging application. The data schema isn’t well defined, and we don’t necessarily know how users are going to interact with the site. The goal is to get something that users can use in place; something that provides just enough structure to encourage newcomers, while retaining the plasticity and flexibility needed to grow and evolve.

As I mentioned before, this is not the first “application” we’ve built using this tool chain; we use MediaWiki and Semantic MediaWiki at Creative Commons in many places. We use it to track Events our community puts together, and we use it to track things we’d like developers to work on (NB: the latter is woefully out of date and stagnated; perhaps a negative use case for this sort of tool). We even built a system for tracking grants and projects using it.

Using MediaWiki and Semantic MediaWiki as an application platform isn’t appropriate for every project and it isn’t a cure all; there are real limitations, like any off the shelf system. In some cases these issues are magnified due to the fact that it’s not explicitly designed as a platform. For applications that rely on community involvement and that are only partially defined, it usually either gets the job done, or brings us far enough along with minimal effort that we can see what the real problem we’re trying to solve is.

AcaWiki is an exciting experiment in building community around academic research and knowledge. It’s also another in a line of interesting experiments with building applications in a different, organic manner. There’s some interesting work in the pipeline for AcaWiki, including data dumps, a shiny Vector-based skin, and improvements to the forms and templates used. The most interesting work, however, will be the work done by the community.

AcaWiki’s founder, Neeru Paharia, was one of CC’s earliest employees, and she turned to the CC technology team for help with this project.

date:2010-01-04 22:44:17
tags:acawiki, cc, mediawiki, platforms, semantic mediawiki, smw, wiki

Remembering with org-mode and Ubiquity

Yesterday evening I published my second set of Ubiquity commands which provide a Ubiquity interface between Firefox and Emacs — specifically org-mode — using org-protocol. Ubiquity is an experimental extension from Mozilla Labs that lets you interact with the browser by giving it short, plain text commands. For example, “share” to post a bookmark to Delicious, or “map” to open a map of the selected address.

Org-Mode is an Emacs mode that can be used to keep track of notes, agendas and task lists. I use it to maintain my task list for various projects and take notes when I’m in a meeting. I really like that while it’s an outline editor at heart, it lets me write lots of text and go back later and figure out what’s actually actionable, as opposed to maintaining separate notes and task lists. org-protocol is included in recent releases and lets you launch an instance of emacsclient with some additional information (i.e., the URL and title of a web page, etc) and take some action on it. One of the built in “protocols” is sending that information to remember mode, which org-mode augments.

The main command is simply remember. Invoking it will send the current URL and document title to org-mode’s Remember buffer. You can optionally type a note or select text in the page to be captured along with the link.

Once you’re in the buffer you can make any changes needed and then simply C-c C-c to save the note, or C-1 C-c C-c to interactively file the note someplace else. I’m using this command to quickly store links with some notes to project files. I hope this will be particularly useful when I run across something for a project I’m not actually able to spend time on at the moment.

Note that before using the commands you need to configure Firefox to understand org-protocol:// links, and need to configure a remember template. The template I use looks like:

(?w "* %?\n\n  Source: %u, %c\n\n  %i" nil "Notes")

This store the information in the Notes section of my org-default-notes-file and positions the cursor ready to type a heading.

To install, visit the command page and click “Subscribe”in the upper right hand corner when prompted (this assumes you have Ubiquity already installed). You can find the Javascript source on gitorious; I’ll be adding my RDFa commands to that repository as well.

date:2009-10-07 12:52:04
category:development, geek
tags:emacs, firefox, mozilla, orgmode, ubiquity

Nested Formsets with Django

I’ve published an updated post about nested formsets, along with an generic implementation and demo application on GitHub.

I spent Labor Day weekend in New York City working on a side project with Alex. The project is coming together (albeit slowly, sometimes), and there have been a few interesting technical challenges. Labor Day weekend I was building an interface for editing data on the site. The particular feature I’m working on uses a multi-level data model; an example of this kind of model would be modeling City Blocks, where each Block has one or more Buildings, and each Building has one or more Tenants. Using this as an example, I was building the City Block editor.

Django Formsets manage the complexity of multiple copies of a form in a view. They help you keep track of how many copies you started with, which ones have been changed, and which ones should be deleted. But what if you’re working with this hypothetical data model and want to allow people to edit the Buildings and Tenants for a Block, all on one page? In this case you want each form in the Building formset to have a complete Tenant formset, all its own. The Django Formset documentation is silent on this issue, possibly (probably?) because it’s an edge case and one that almost certainly requires some application-specific thought. I spent the better part of two days working on it — the first pretty much a throw away, the second wildly productive thanks to TDD — and this is what I came up with.

Formsets act as wrappers around Django forms, providing the accounting machinery and convenience methods needed for managing multiple copies of the form. My experience has been that, unlike forms where you have to write your form class (no matter how simple), you write a Formset class infrequently. Instead you use the factory functions which generate a default that’s suitable for most situations. As with regular Forms and Model Forms, Django offers Model Formsets, which simplify the task of creating a formset for a form that handles instances of a model. In addition to model formsets, Django also provides inline formsets, which make it easier to deal with a set of objects that share a common foreign key. So in the example data model, an instance of the inline formset might model all the Buildings on a Block, or all the Tenants in the Building. Even if you’re not interested in nested formsets, the inline formsets can be incredibly useful.

Let’s go ahead and define the models for our example:

class Block(models.Model):
    description = models.CharField(max_length=255)

class Building(models.Model):
    block = models.ForeignKey(Block)
    address = models.CharField(max_length=255)

class Tenant(models.Model):
    building = models.ForeignKey(Building)
    name = models.CharField(max_length=255)
    unit = models.CharField(max_length=255)

After we have our models in place we need to define the forms. The nested form is straight-forward — it’s just a normal inline formset.

from django.forms.models import inlineformset_factory

TenantFormset = inlineformset_factory(models.Building, models.Tenant, extra=1)

Note that inlineformset_factory not only creates the Formset class, but it also create the ModelForm for the model (models.Tenant in this example).

The “host” formset which contains the nested one — BuildingFormset in our example — requires some additional work. There are a few cases that need to be handled:

  1. Validation — When validating an item in the formset, we also need to validate its sub-items (those on its nested formset.
  2. Saving existing data — When saving an item, changes to the items in the nested formset also need to be saved.
  3. Saving new parent objects — If the user adds “parent” data as well as sub-items (so adding a Building, along with Tenants), the nested form won’t have a reference back to the parent unless we add it ourselves.
  4. Finally, the very basic issue of creating the nested formset instance for each parent form.

Before delving into those issues, let’s look at the basic formset declaration.

from django.forms.models import BaseInlineFormSet

class BaseBuildingFormset(BaseInlineFormSet):

BuildingFormset = inlineformset_factory(models.Block, models.Building,
                                formset=BaseBuildingFormset, extra=1)

Here we declare a sub-class of the BaseInlineFormSet and then pass it to the inlineformset_factory as the class we want to base our new formset on.

Let’s start with the most basic piece of functionality: associating the nested formsets with each form. The super class defines an add_fields method which is responsible for adding the fields (and their initial values since this is a model-based Form) to a specific form in the formset. This seemed as good a place as any to add our formset creation code.

class BaseBuildingFormset(BaseInlineFormSet):

    def add_fields(self, form, index):
        # allow the super class to create the fields as usual
        super(BaseBuildingFormset, self).add_fields(form, index)

        # created the nested formset
            instance = self.get_queryset()[index]
            pk_value =
        except IndexError:
            pk_value = hash(form.prefix)

        # store the formset in the .nested property
        form.nested = [
                            instance = instance,
                            prefix = 'TENANTS_%s’ % pk_value)]

The heart of what we’re doing here is in the last statement: creating a form.nested property that contains a list of nested formsets — only one in our example and in the code I implemented; more than one would probably be a UI nightmare. In order to initialize the formset we need two pieces of information: the parent instance and a form prefix. If we’re creating fields for an existing instance we can use the get_queryset method to return the list of objects. If this is a form for a new instance (i.e., the form created by specifying extra=1), we need to specify None as the instance. We include the objects primary key in the form prefix to make sure the formsets are named uniquely; if this is an extra form we hash the parent form’s prefix (which will also be unique). The Django documentation has instructions on using multiple formsets in a single view that are relevant here.

Now that we have the nested formset created we can display it in the template.

def edit_block_buildings(request, block_id):
    """Edit buildings and their tenants on a given block."""

    block = get_object_or_404(models.Block, id=block_id)

    if request.method == 'POST’:
        formset = forms.BuildingFormset(request.POST, instance=block)

        if formset.is_valid():
            rooms = formset.save_all()

            return redirect('block_view’,

        formset = forms.BuildingFormset(instance=block)

    return render_to_response('rentals/edit_buildings.html’,

edit_buildings.html (fragment)

{{ buildings.management_form }}
{% for building in buildings.forms %}

  {{ building }}

  {% if building.nested %}
  {% for formset in building.nested %}
  {{ formset.as_table }}
  {% endfor %}
  {% endif %}

{% endfor %}

When the page is submitted, the idiom is to call formset.is_valid() to validate the forms. We override is_valid on our formset to add validation for the nested formsets as well.

class BaseBuildingFormset(BaseInlineFormSet):

    def is_valid(self):
        result = super(BaseBuildingFormset, self).is_valid()

        for form in self.forms:
            if hasattr(form, 'nested’):
                for n in form.nested:
                    # make sure each nested formset is valid as well
                    result = result and n.is_valid()

        return result

Finally, assuming the form validates, we need to handle saving. As I mentioned earlier, there are two different situations here — saving existing data (and possibly adding new nested data) and saving completely new data.

For new data we need to override save_new and update the parent reference for any nested data after we save (well, instantiate) the parent.

class BaseBuildingFormset(BaseInlineFormSet):

    def save_new(self, form, commit=True):
        """Saves and returns a new model instance for the given form."""

        instance = super(BaseBuildingFormset, self).save_new(form, commit=commit)

        # update the form’s instance reference
        form.instance = instance

        # update the instance reference on nested forms
        for nested in form.nested:
            nested.instance = instance

            # iterate over the cleaned_data of the nested formset and update the foreignkey reference
            for cd in nested.cleaned_data:
                cd[] = instance

        return instance

Finally, we add a save_all method for saving the parent formset and all nested formsets.

from django.forms.formsets import DELETION_FIELD_NAME

class BaseBuildingFormset(BaseInlineFormSet):

    def should_delete(self, form):
        """Convenience method for determining if the form’s object will
        be deleted; cribbed from BaseModelFormSet.save_existing_objects."""

        if self.can_delete:
            raw_delete_value = form._raw_value(DELETION_FIELD_NAME)
            should_delete = form.fields[DELETION_FIELD_NAME].clean(raw_delete_value)
            return should_delete

        return False

    def save_all(self, commit=True):
        """Save all formsets and along with their nested formsets."""

        # Save without committing (so self.saved_forms is populated)
        # — We need self.saved_forms so we can go back and access
        #    the nested formsets
        objects =

        # Save each instance if commit=True
        if commit:
            for o in objects:

        # save many to many fields if needed
        if not commit:

        # save the nested formsets
        for form in set(self.initial_forms + self.saved_forms):
            if self.should_delete(form): continue

            for nested in form.nested:

There are two methods defined here; the first, should_delete, is lifted almost directly from code in django.forms.models.BaseModelFormSet.save_existing_objects. It takes a form object in the formset and returns True if the object for that form is going to be deleted. We use this to short-circuit saving the nested formsets: no point in saving them if we’re going to delete their required ForeignKey.

The save_all method is responsible for saving (updating, creating, deleting) the forms in the formset, as well as all the nested formsets for each form. One thing to note is that regardless of whether we’re committing our save (commit=True), we initially save the forms with commit=False. When you save a model formset with commit=False, Django populates a saved_forms attribute with the list of all the forms saved — new and old. We need this list of saved forms to make sure we are able to save any nested formsets that are attached to newly created forms (ones that did not exist when the initial request was made). After we know saved_forms has been populated we can do another pass to commit if necessary.

There are certainly places this code could be improved, tightened up or generalized (for example, the nested formset prefix calculation and possibly save_all). It’s also entirely plausible that you could wrap much of this into a factory function. But this gets nested editing working and once you wrap your head around what needs to be done, it’s actually fairly straight forward.

date:2009-09-27 19:42:42
category:development, koucou
tags:django, formsets, howto, orm, python

gsc Bug Fixes

I announced gsc earlier this week because it worked for me. If you were brave and cloned the repository to try it out, you undoubtedly found that, well, it didn’t work for you. Thanks to Rob for reporting the problem with, as well as a few other bugs.

I’ve pushed an update to the repository on gitorious which includes fixes for the issue, support for some [likely] common Subversion configurations and a test suite. In addition to the installation issue Rob also reported that wasn’t able to clone his svn repository with gsc. Some investigation led me to realize the following cases weren’t supported:

  • svn:externals specified with nested local paths (ie, “vendor/product”)
  • empty directories in the Subversion repository with nothing but svn:externals set on them

Both now clone correctly.

One open question is what (if anything) gsc should do when you run it against an already cloned repository. I’ve envisioned it purely as a bootstrapping tool but received an email stating that it didn’t work when run a second time, so obviously it should do something, even if that’s just failing with an error message.

date:2009-07-25 18:41:27
tags:cc, git, git-svn, gsc, svn, svn:externals

git-svn and svn:externals

UPDATE I’ve pushed a few bug fixes; see this entry for details.

At Creative Commons we’re a dual-[D]VCS shop. Since we started self-hosting our repositories last year we’ve been using both Subversion and git. The rationale was pragmatic more than anything else: we have lots of code spread across many small projects and don’t have the time (or desire) to halt everything and cut over from one system to the other. This approach hasn’t been without it’s pain but I think that overall it’s been a good one. When we create projects we tend to create them in git and when we do major refactoring we move things over. It’s also given [STRIKEOUT:recalcitrant staff] me time to adjust my thinking to git. Adjustments like this usually involve lots of swearing, fuming and muttering.

As I’ve become more comfortable with git and its collection of support tools, I’ve found myself wanting to use git svn to work on projects that remain in Subversion. One issue I’ve run into is our reliance on svn:externals. We use externals extensively in our repository which has generally made it easy to share large chunks of code and data, and still be able to check out the complete dependencies for a project and get to work[1]_. More than once I’ve thought “oh, I’ll just clone that using git-svn so I can work on it on the plane[2]_,” only to realize that there are half a dozen externals I’d need to handle as well.

Last week I decided that tools like magit make git too useful not to use when I’m coding and that I needed to address the “externals issues“. I didn’t want to deal with a mass conversion, I just wanted to get the code from Subversion into the same layout in git. I found git-me-up which was close, but which baked in what I assume are Rails conventions that our projects don’t conform to. Something like this may already exist, but the result of my work is a little tool, **gsc** — “git subversion clone”.

gsc works by cloning a Subversion repository using git svn and then recursively looks for externals to fetch. If it finds an external, it does a shallow clone of the target (only fetching the most recent revision instead of the full history). The result is a copy of your project you can immediately start working on. Of course, it also inherits some of the constraints associated with svn:externals. If you want to work on code contained in an external (and push it back to the Subversion repository) you may need to check out the code manually[3]_. Of course, the beauty of DVCS is that there’s nothing stopping you from committing to the read-only clone locally and then pushing the changes via email to a reviewer.

You can grab gsc from gitorious. There are also installation instructions and basic usage information in the README.

[1]It’s also led to some sub-optimal software release practices, but that’s probably a different post.
[2]Yes, I’ve actually encountered the “airplane” scenario; this either means DVCS advocates are prescient or I’ve been traveling way too much lately.
[3]This is true because some repositories spell read-only and read-write access differently; both CC and Zope do this, so the svn:externals definitions are often written using the read-only syntax to make sure everyone can make a complete checkout.
date:2009-07-21 09:47:17
tags:cc, git, git-svn, gsc, svn, svn:externals

RDFa Bookmarklets for Ubiquity

I’ve been aware of Ubiquity since it launched and have meant to dig in and play with it for a while. I’m becoming increasingly reliant on my keyboard for fast interaction with the computer; I blame gnome-do. So using the keyboard to interact more quickly with my browser had a lot of appeal.

Today I finally installed Ubiquity 0.5 and looked at converting the RDFa bookmarklets to Ubiquity commands. The bookmarklets are invaluable for debugging and exploring RDFa, but I don’t use them often enough to feel like I want them on my bookmark bar all the time.

Turns out that Ubiquity makes it really easy to convert a bookmarklet to a command. I’ve converted the Get N3 and RDFa Highlight bookmarklets and made them available. I’d like to convert the fragment parser as well but I think that’ll be a little more involved.

To use the commands, just install Ubiquity 0.5 (or later for you visitors from the future) and visit the commands page. You’ll see a notification at the top of the browser window asking if you’d like to install the commands.

date:2009-07-11 09:58:28
category:development, geek, projects
tags:firefox, javascript, rdfa, ubiquity