| Author: | Nathan R. Yergler <nathan@creativecommons.org> |
|---|---|
| Date: | 2006-01-23 |
| Revision: | $Revision: $ |
Building open source applications which meet all the needs of all users is becoming more and more difficult. One solution is to implement a system which allows third parties to easily extend your application. While developed for use in web applications, the Zope 3 project provides the libraries and technology necessary to create an application which can be extended in arbitrary ways.
Creative Commons is currently developing a new release of ccPublisher, an application which allows users to tag digital works with metadata and upload them to the Internet Archive for persistent hosting. The new release does not add any new features. Instead, the developers have rewritten a large majority of the code base in order to support two new use cases: derivatives and extensions.
Derivatives are applications such as Ourmedia Publisher and the original Jamloader which build upon the ccPublisher code and add significant new features or functionality. Extensions are thought of as small pieces of code which add some piece of functionality to the existing application without changing its overall behavior. An example of an extension would be enabling ccPublisher to post to a user's blog when she uploaded a new work to the Internet Archive.
While derivatives were possible with the original ccPublisher release, the application was not designed to accomodate them easily. As a result developing them was difficult, and keeping them in sync with each other was a monumental task. We believe that not everyone will be served by ccPublisher, but users will benefit if mulitple applications share code for common functionality, such as tagging and license selection.
This paper covers the ways Creative Commons leveraged code from the Zope 3 in order to create an extensible, flexible application. In addition to implementation information, the paper covers challenges introduced in application distribution by such a system and ways in which we have overcome them.
The Zope 3 project is a "redesign of Zope 2 and improves the Zope development experience through the use of a "component architecture"." [1] This component architecture has been implemented independently from the web-specific portions of the project, and as such is able to be used with minimal additional dependencies.
Looking at the Zope 3 source repository (http://svn.zope.org/Zope3/trunk/src/zope), we can mentally divide the code into two segments: things in the app module, and everything else. Code specific to the Zope 3 web application server lives in the app module. Other modules may depend on one another but do not depend on the application server code.
As previously stated, the primary goals of ccPublisher 2 are to create an application that was easy to extend and easy to customize. The initial iteration of this goal was to use the zope.interface package to add interface support to Python, and create strong, Java-style interfaces for each object type ccPublisher uses. For example, the Internet Archive backend implemented a "storage" interface, and metadata field declarations all implemented a common interface. This approach did meet the goal of providing an easily customizable code base. A developer who wanted to create a customized version of ccPublisher would simply sub-class the necessary components and modify the functionality as they desired.
However, after doing the initial implementation we discovered two things. First, this approach did not lend itself to the problem of application extensions. It did not make them impossible or more difficult to implement, it simple did not do anything to further that goal. Second, it proved difficult to separate the functionality sufficiently in order to write clean, straight-forward interfaces. In short, imposing Java-like order on a dynamic language began to feel stifling.
As we looked at ways in which we might keep the usefulness of interfaces, but restore some of the dynamic nature that drew us to Python intially, we began to look at adopting more of the Zope 3 project. In the end we adopted several of the Zope 3 packages in order to gain the flexibility we wanted. We still use interfaces, but mostly as a developer hint that certain methods should be implemented in a class or sub-class.
The core of the extensible approach comes from the zope.component package. At the highest level of abstraction, this package provide a component model and an event/dispatch mechanism. The component model provides a mechanism for making declarations about what objects support and how they can be adapted, as well as a mechanism for registering adapters or subscribers for particular types of objects. The event/dispatch mechanism provides a basic way for objects to observe one another and notify their observers about changes. In practice the line between the two is very blurry.
The application architecture of ccPublisher 2 consists of a central framework package, P6, which provides the common component and event functionality that would likely be reused by derivative applications. The ccpublisher package contains ccPublisher-specific code, such as the Internet Archive backend implementation. In both the P6 and ccPublisher code, we view the application as a set of components which communicate with each other through minimal interface contracts or the event framework.
Examining how the application handles user entry of metadata provides a good overview of how the application is broken into components and how it leverages the Zope 3 packages to provide maximum flexibility. The P6 framework defines a window class which provides the wizard functionality, and a generic page which implements editting for a single metadata group (a set of fields). Focusing on the metadata group, we have the following set of actions taking place:
Looking at these tasks, there are obviously other questions lying beneath the surface. For example, what portion of the application is responsible for saving the field values, and where do we get our list of defaults. A benefit of dividing the application into segmented components is that we can be willfully ignorant about these questions. As we will see, the metadata editting page simply deals with its own core functionality, leaving the other details to other portions of the framework.
The metadata editor takes a MetadataGroup object as a parameter to its constructor, and this group provides us with the list of fields to edit. We will see later how the MetadataGroup is instantiated and initialized. When the framework constructs the user interface object, it checks each field to find out if its persistance flag is set [2]. If the flag is set, the framework publishes a LoadMetadata event. This event is a notification to any metadata provider to load the particular field.
if field.persist:
event = p6.metadata.events.LoadMetadataEvent(
self.metagroup.appliesTo,
self.metagroup, field,
)
zope.component.handle(event)
The event itself is not as interesting as the way the application handles it. Any component can register a subscriber for a particular event type and be notified when those events are published. So the application does not necessarily know where the persisted data is stored, or even if it is stored. It just provides the hook for the storage and retrieval to happen.
An interesting side-effect of this architecture is that an application extension may provide integration with other data sources simply by listening for and responding to events. For example, login credentials could potentially be integrated with an external user source by an extension which listened for the appropriate events and provided the correct credential information.
Once the user has editted the metadata fields, the framework performs the reverse process, storing the values back to a container. As with the initial load, the framework simply constructs an event and publishes it.
zope.component.handle(
p6.metadata.events.UpdateMetadataEvent(self.metagroup.appliesTo,
field,
widget.GetValue()
)
)
This demonstrates how publishing events can dramatically decouple an application, but does not expose how objects are initially instantiated and registered. Our initial implementation placed the initialization code in the application class. While this was an improvement in that the code was self-contained, we again found that it duplicated effort between the "core" application functionality and "extension" functionality. In order to use the same type of solution for both, an additional Zope 3 package was included, configuration.
The zope.configuration package provides basic support for the ZCML configuration language. ZCML is an XML-ish language defined by the Zope 3 project for the purpose of configuring applications. Another way to look at this is that the code defines components, and the ZCML assembles them into the completed, functional application. ZCML can be extended with directives which are declared as a combination of interfaces and a Python implementation which is called when an instance of that directive is encountered.
P6 uses the core ZCML directives for registering adapters and subscribers for events, and also extends ZCML for some domain-specific configuration tasks. P6 uses a file called app.zcml for the core application instantiation. The app.zcml file always starts with a <configure> directive, which is the container for the entire configuration. The only additional requirement is that the file include the line
<include package="p6" />
before any application-specific directives. The include tag tells the ZCML parser to traverse into the Python package named and look for a file named configure.zcml, which is in turned parsed. With this single include statement the application loads the domain-specific directives and initializes the necessary data structures for loading the application.
When an application such as ccPublisher 2 is configured, there are three primary areas we're concerned with instantiating: storage, metadata, and user interface. Storage is the term given to backends such as the Internet Archive. An application must have at least one storage instance. A storage class is instantiated with a directive such as
<storage name="NoOp Storage" factory="p6.storage.basic.BasicStorage" />
The name attribute is purely for logging and reporting; it is never exposed to the end user. The factory attribute specifies a callable which returns an object implementing the IStorage interface. In this case the factory is actually a class definition, so the framework just calls the constructor when it encounters this directive. There are specific events which a storage provider should register for; new storage implementations can get this behavior by simply subclassing the BasicStorage class.
ccPublisher has metadata collection as a core piece of functionality. Recognizing that derivatives such as Ourmedia Publisher often want to collect different sets of metadata, the declaration for these fields was moved into the ZCML as well. Metadata fields are separated into groups, and each group is displayed as a page in the wizard. In addition to providing logical separation of fields, groups allows application architects to declare what sort of objects a metadata field is collected for (applies to). For example, ccPublisher collects some metadata for the entire work (which may include more than one file). Items such as author, title and year of copyright apply to the work as a whole. However, other pieces of metadata, such as file format, apply only to specific files. Looking at the metadatagroup declaration shows how this container is created.
<metadatagroup id="workinfo" title="Tell Us About Your Files" description="Metadata helps others to find your works." for="p6.storage.interfaces.IWork" >
In this declaration, the id must be unique among all metadatagroups; the title is used for the user interface, and the for attribute specifies the interface which the contained fields apply to. The specific field declarations take the following format.
<field id="title"
label="Title of Work"
type="p6.metadata.types.ITextField"
validator="ccpublisher.validators.validateTitle"
canonical="http://purl.org/dc/elements/1.1/title"
/>
This field declaration actually contains some optional elements. The id, label and type are all required; validator and canonical are optional. The type attribute specifies the interface which this field should conform to; in this case a text field. Interfaces current exist for choice fields and text fields; we specify an interface instead of a class, as the actual implementation is considered toolkit specific [3].
The validator attribute specifies a callable which, when passed the value for this field, returns True of False. The canonical attribute is a little more opaque. The canonical field provides a canonical URI (in this case the Dubline Core URI) for this type of field. This is primarily there for extension authors; since the field id and group id are arbitrary, if an extension wants the title of a work, they need some key to retrieve it by. The canonical attribute serves this purpose.
Finally, the P6 framework configures the application user interface through ZCML as well. Since the framework was designed for wizard-like applications, the container we use is the `pages` directive.
<pages appid="ccpublisher">
<page factory="ccpublisher.ui.WelcomePage" />
<fileselector />
<metadatapages
for="p6.storage.interfaces.IWork
p6.storage.interfaces.IWorkItem
p6.storage.interfaces.IStorage" />
<storepage />
</pages>
This code sample is the entire user interface declaration for ccPublisher 2. It demonstrates the three types of pages: built in, metadata and custom. The `fileselector` and `storepage` directives are both built-in page types. The file selector page simply allows the user to select one or more files, and the store page provides the progress bar and user feedback during the upload process [4].
The `metadatapages` directive instructs P6 to generate the necessary pages for the metadata fields which apply to the specified interfaces (for). In another instance of willful ignorance, its interesting to note that the directive makes no assumptions regarding how many pages there are, or how many fields are defined for each page.
Finally, the simple `page` directive defines a custom page, in this case one specified by our application package (ccpublisher). A custom page factory returns an object which subclasses the XrcWizPage class provided by the ccwx package.
While separating an application into more independent components and stitching them together with ZCML has proven advantagous to productivity, it proves problematic from a packaging perspective. Previous release of ccPublisher used py2app on Mac OS X and py2exe on Windows platforms [5]. There are two primary problems, which are interrelated. The problems are bootstrapping an application which is not "assembled" until run time, and properly finding the necessary ZCML files for finalizing the configuration.
For developing ccPublisher we chose to punt on the first issue, and created a script which knows what the application-level class is and how to find the app.zcml file for further configuration. The application class constructor handles actually loading the file and the other necessary instantiation.
The second problem is more difficult to deal with. First, in regards to Windows and py2exe: py2exe bundles the Python byte code files specified by disutils into library.zip, and then loads them using the zipfile import mechanism at runtime. Unfortunately, specifying the ZCML files as plain distutils data files places them in library.zip. The ZCML file loading mechanism is unfortunately unable to properly traverse into zip files for loading ZCML.
To solve the problem on Windows platforms, we specify the required ZCML files as both package data and data files: package data is placed along side the packages in library.zip, and the data files are placed alongside the executable. The ZCML file loader is then monkey-patched so that if it can not find the file in the regular Python path, it looks in the "side by side" path before declaring the file missing.
ccPublisher 2 is already a much more stable application than the original release. This stability is due in large part to the division of the application into semi-independent components which each focus on a small piece of functionality. Not only does this make it easier to develop the pieces, it usually makes debugging easier, as functionality is isolated. This componentization allows us to develop an application which is highly extensible and customizable. This would not be possible without the code developed as part of the Zope 3 project.
| [1] | http://dev.zope.org/Zope3 |
| [2] | The persistance flag, field.persist, is a hint to the framework that it should remember the value of the field between application runs. |
| [3] | Yes, there is currently only one toolkit implementation -- wxPython -- and yes, the separation isn't complete, but its the thought that counts, right? |
| [4] | Currently the storepage is also responsible for initiating the upload process. |
| [5] | Why isn't Linux listed? Because while we do our primary development on Linux we never figured out a good way to ship a "built" application that would not require all the dependencies to be statically linked. This has been fixed for ccPublsher 2, thanks to some magic from the Straw project, which we are extending to be a little more generic. The end result will be an RPM distribution which installs into /usr/bin, and which acts like a "real" application. |