DocSix: Doctests on Python 2 & 3

I was first introduced to doctests working on Zope 3 at early PyCon sprints. At the time the combination of documentation, specification, and test in a single document seemed pretty interesting to me. These days I like to use them for testing my documentation.

Last week stvs2fork helpfully opened a pull request for Rebar, fixing some syntax that’s no longer valid in Python 3. I decided that it’d be interesting to add Python 3.3 to the automated test runs. Fixing the code to work with Python 3 was easy enough, but when I ran the doctests I discovered an issue I hadn’t thought of:

Unicode string output looks different in Python 3 vs Python 2..

>>> validator = AgeValidator()
>>> validator.errors({'age': 'ten'})
{'age': [u'An integer is required.']}

This example works exactly the same in Python 2 and 3: in both cases the error messages are returned as a list of Unicode strings. But in Python 2 the output has the leading u indicator. Not so in Python 3.

What I needed to do is strip the Unicode indicator from the output strings before executing the test; then I’d have the Python 3 doctest I needed. So I wrote a tool that lets me do that.

DocSix lets you run your doctests on Python 2 and 3.

DocSix builds on Manuel, a library for mixing custom test syntax into doctests. DocSix can work with existing uses of Manuel, or it can load your doctests into a unittest TestSuite, ready to go:

from docsix import get_doctest_suite

test_suite = get_doctest_suite(
    'index.rst',
    'advanced.rst',
)

Potentially useful links:

author:Nathan Yergler
category:development
tags:python, doctests, testing, python3
comments:

Revisiting Nested Formsets

It’s been nearly four years since I first wrote about nested formsets. When I wrote about nested formsets, I must have been using Django 1.1 (based on correlating dates in the release notes and the original blog post), which means what I wrote has had four major releases of Django to drift out of date. And yet it’s still one of the most frequently visited posts on my blog, and one of the few that I receive email questions about. Four years later, it seemed like the time to revisit the original post to see if nested formsets still make sense and if so, what they look like now.

Formsets help manage the complexity of maintaining multiple instances of a Form on a single page. For example, if you’re editing a list of items on a single page, each individual item may be a copy of the same form. Formsets help manage things like HTML ID generation, flagging forms for deletion, and validating the entire set of forms together. When used with Models, they allow you to edit the members of a QuerySet all at once.

So what are nested formsets? The example I used previously was something along the lines of Block – Building – Tenant: one Block has many Buildings, and each Building has many Tenants. If you’re editing a Block, you want to see all the Buildings and all the Tenants at once. That’s a fine hypothetical, but one of the questions I get with some frequency is “what’s a good use case for a nested formset?” Four years later — two and a half of them spent doing web development full time — I have yet to encounter a situation where I needed a nested formset. In that time I’ve built some pretty complex forms, including Eventbrite’s event creation flow. That page was complex enough that I built Form Groups to support the interaction, and I think the jury is still out on whether that was a good idea or not. It’s possible that there are use cases for nested formsets in admin-style applications that I haven’t encountered. I think it’s also possible that there are reasons to use a nested formset alongside a Javascript framework to ease the user experience.

Note that if you only have one level of relationships on the page (ie, you’re editing all the Tenants for a single Building in our example) then you don’t need nested formsets: Django’s inline formsets will work just fine.

And why not nested form sets? From the questions people have asked and my experience building Form Groups (which borrowed some ideas), I’ve concluded that they’re difficult to get completely right, have edge cases that can be hard to manage, and create quite complicated user interfaces. In my original blog post I alluded to the fact that I spent most of a three day weekend trying to get the nested formsets to work right. Two thirds of that time was spent on work I eventually threw away, because I couldn’t manage the edge cases. It was only when I started using TDD that I managed to get something working. But I didn’t publish the tests with my previous code example, so no one else was able to benefit from that work.

If you’ve read this far and still think a nested formset is the best solution for your problem, what would that look like with Django 1.5? The answer is: simpler. I decided to rewrite my initial implementation using test driven development. The full implementation of the formset logic only overrides three methods from BaseInlineFormSet.

from django.forms.models import (
    BaseInlineFormSet,
    inlineformset_factory,
)


class BaseNestedFormset(BaseInlineFormSet):

    def add_fields(self, form, index):

        # allow the super class to create the fields as usual
        super(BaseNestedFormset, self).add_fields(form, index)

        form.nested = self.nested_formset_class(
            instance=form.instance,
            data=form.data if self.is_bound else None,
            prefix='%s-%s' % (
                form.prefix,
                self.nested_formset_class.get_default_prefix(),
            ),
        )

    def is_valid(self):

        result = super(BaseNestedFormset, self).is_valid()

        if self.is_bound:
            # look at any nested formsets, as well
            for form in self.forms:
                result = result and form.nested.is_valid()

        return result

    def save(self, commit=True):

        result = super(BaseNestedFormset, self).save(commit=commit)

        for form in self:
            form.nested.save(commit=commit)

        return result

These three method cover the four areas of functionality I called out in the previous post: validation (is_valid), saving (both existing and new objects are handled here by save), and instantiation (creating the nested formset instances, handled by add_fields).

By making it a general purpose baseclass, I’m also able to write a simple factory function, to make using it more in tune with Django’s built-in model formset.

def nested_formset_factory(parent_model, child_model, grandchild_model):

    parent_child = inlineformset_factory(
        parent_model,
        child_model,
        formset=BaseNestedFormset,
    )

    parent_child.nested_formset_class = inlineformset_factory(
        child_model,
        grandchild_model,
    )

    return parent_child

You can find the source to this general purpose implementation on GitHub. I wrote tests at each step as I worked on this, so it may be interesting to go back and look at individual commits, as well.

So how would you use this in with Django 1.5? With a class-based view, of course.

from django.views.generic.edit import UpdateView

class EditBuildingsView(UpdateView):
    model = models.Block

    def get_template_names(self):

        return ['blocks/building_form.html']

    def get_form_class(self):

        return nested_formset_factory(
            models.Block,
            models.Building,
            models.Tenant,
        )

    def get_success_url(self):

        return reverse('blocks-list')

Of course there’s more needed — templates, for one — but this shows just how easy it is to create the views and leverage a generic abstraction. The real keys here are specifying model = models.Block and the definition of get_form_class. Django’s UpdateView knows how to implement the basic form processing idiom (GET, POST, redirect), so all you need to do is tell it which form to use.

You can find a functional, albeit ugly, demo application in the demo directory of the git repository.

So that’s it: a general purpose, updated implementation of nested formsets. I advise using them sparingly :).

author:Nathan Yergler
category:development
tags:django, formsets, forms, python
comments:

Emacs & Jedi

Brandon Rhodes delivered the keynote at PyOhio yesterday[1]. He talked about sine qua nons: features that without which, a language is nothing to him. One of the things he mentioned was Jedi, a framework for extending editors with Python autocompletion, documentation lookup, and source navigation. I say “editors”, vaguely, because Jedi consists of a Python server that the editor communicates with via an editor specific plugin. I’d seen Jedi before, but hadn’t managed to get it working with Emacs. After hearing Brandon speak of it so glowingly, I decided to give it another try. The actual installation was easy: using the master branch of el-get, the recipe installed the Jedi Emacs plugin and its dependencies seamlessly. And it seemed to just work for the standard library.

And once I’d enabled it for python-mode, I was indeed able to autocomplete things from the standard library, and jump to the source of members implemented in Python. But I found that I wasn’t able to navigate to the third party dependencies in my project, and eventually figured out there were three cases I needed to address.

  • Many things I work on use virtualenv. Jedi supports virtual environments if the VIRTUAL_ENV environment variable is set, but I tend to keep Emacs running and switch between several different projects, each with their own environment.
  • Some of my projects also use buildout. When I’m using buildout for a project, the dependencies end up in an eggs sub-directory, which Jedi (as far as I know) doesn’t actually know about.
  • Finally, the setup we use at Eventbrite requires some special handling, as well: we store our source checkouts on an encrypted disk image, which is then mounted into a Vagrant virtual machine, where the actual virtual environment lives. Since the virtual environment isn’t on the same “machine” that we’re editing on, I need to tell Jedi explicitly what directories to find source in.

The Jedi documentation includes an advanced example of customizing the server arguments on per buffer. It assumes static arguments, but it seemed like a solution was possible. I spent a couple hours this afternoon working on my Emacs Lisp skills to make Jedi work in all three of my cases.

kenobi.el

kenobi.el (gist) is the result, and it does a few things:

  1. Fires hooks for each mode after the file or directory local variables have been set up. I found a StackOverflow post that confirmed what I had observed: any file or directory local variables weren’t set when the normal python-mode hook was fired. The mode-specific hooks are chained off of hack-local-variables-hook, which is fired after the local variables have been resolved.
  2. Walks up the directory hierarchy from the buffer file, looking for ./bin/activate at each level. If it finds one, it assumes this is the virtual env, and adds it to the list of virtual envs Jedi will look at.
  3. Walks up the directory hierarchy looking for an ./eggs/*.egg sub-directory at each level. If it finds one, it adds each of those .egg subdirectories to the sys.path Jedi will look at. This allows Jedi to work when you’re editing files in buildout-based projects.
  4. Looks to see if the aditional_paths variable has been set as a list of other paths to add.

The first three bits are sort of implementation details: you can usually just ignore them, and Jedi will just work. The final, though, needs a little explanation.

As I mentioned above, Eventbrite stores the source code on a disk image, which is mounted into a virtual machine where the actual virtualenv lives. That means I need to add specific paths to sys.path when I open a source file in that disk image. To get that to work, I create a .dir-locals.el in the root of the source tree, something like:

(
 (nil . ((additional_paths . (
          "/Volumes/eb_home/work/src"
          "/Volumes/eb_home/work/deps"
          ))))
)

I’m sure that my Emacs Lisp could be improved upon, but it felt pretty good to figure out enough to integration Jedi into the way I use Emacs. I haven’t worked with Jedi extensively, but so far it seems to work pretty well. The autocomplete features seem to be minimally invasive, and the show docstring and jump to definition both work great.

[1]The fact that the video is up less than 36 hours after the keynote is a testament to how great Next Day Video is. They do amazing work at Python (and other) conferences and make it possible to enjoy the hallway track without worrying about missing a presentation.
author:Nathan Yergler
category:emacs
tags:python, virtualenv, buildout
comments:

“Effective Django” at OSCON

I’m going to be presenting my introductory Django tutorial, Effective Django at OSCON later this month. If you’re going to be at OSCON and haven’t selected your tutorials yet, or just think a trip to Portland, Oregon sounds nifty, there’s still time to sign up. You can find the details on the OSCON tutorial page.

In preparation for that I’ve been continuing to work on the content. I presented the tutorial to some of the Eventbrite engineering team a couple weeks ago, and thier feedback was very useful. In response, I’ve made a few changes. Specifically, I split up the Views chapter with a brief interlude on static assets and template inheritance. It’s something that I didn’t cover the first time around, but based on the questions, I think some guidance is useful.

The revisions for OSCON also include updating the sample code repository that goes with the tutorial. I developed a tool, Tut, to help manage these stacked branches, and while making changes to early parts of the tutorial code, I realized it still requires significant work to really be a good workflow tool. One of the most important requirements for Tut is the ability to manage an ordered series of “checkpoints” and move between them.

When I started working on the sample code this time around, I was on a new laptop, so I had to start from a fresh clone. This was revealing and frustrating. I discovered Tut assumed all the branches were already local, which they obviously aren’t with a fresh clone. Worse, the git magic I was trying to use to get the branch list in the “right” order was pretty fragile, and broke when I tried to lean on it at all.

This screenshot shows a common editing case. My intention is to manage each step, or checkpoint, in the tutorial as a branch. Each step builds on the previous one, so if I make a change to something early in the tutorial, I just need to merge the branches “forward” until I get to master. In this example I’ve checked out the contact_form_test branch and added a new commit. In order for Tut to help me merge that forward, I need to be able to generate the list of steps.

The correct order here (last step first) is master, custom_form_rendering, contact_form_test, edit_addresses, address_model, confirm_contact_email, contact_detail_view. But you can’t get that out with either date or topo ordering. You really need to walk back from master, looking for branches (head refs), and at each step look for any head refs reachable (as children) that you haven’t already seen. I haven’t figured out how to do that yet with git plumbing commands, so for the time being I’m just using a text file to record the correct order. [1]

I’m really excited about presenting Effective Django at OSCON, and appreciate the feedback and suggestions from everyone.

[1]I contemplated just using a text file in the repo as the solution, but realized that this has its own issues: if it’s under version control as well, then what’s the “right” version to look at when you’re in a branch? That branch’s? Master’s? It’s not clear to me.
author:Nathan Yergler
category:Effective Django
tags:talks, effective django, python, tut,
comments:

“Effective Django” at PyCon 2013

PyCon (the US variation, at least), is about a month and half away, and once again I’m looking at the schedule of presentations and events and wondering how it is the community pulls it off every year. I’m also busy preparing my contribution to PyCon. This year I’m happy to be presenting a tutorial, Effective Django.

You may wonder what I mean by “Effective Django”. It’s an introduction to Django with a focus on good engineering practices. What I’ve noticed from my own experience over the years is that with all of its features and flexibility, Django makes it easy to get up and going really quickly. It also lets you write code that’s difficult to test, scale, and maintain. I have written plenty of code like that over the years, and the problem is that the real pain may come long after the initial implementation. From talking to engineers at Eventbrite and elsewhere I have learned that I’m not alone in this, so I’ve been working on documenting how to do leverage Django effectively. My goal is that attendees of the tutorial will leave feeling like they’re able to work on a Django application and identify things to do (and avoid) that will help them write code that’s cohesive, testable, and scalable.

I’m enjoying putting together the material for PyCon, and I hope that if you’re new to Django and interested in starting off on the right foot you’ll join me in Santa Clara for the afternoon.

If you’re totally new to Django and want to get a complete introduction to web apps and Django, Effective Django pairs well with Karen Rustad’s “Going from web pages to web apps“.

author:Nathan Yergler
category:pycon
tags:talks, effective django, python,
comments:

Bytes, Characters, Codecs, and Strings

One of the persistent areas of confusion for many Python developers is Unicode strings, byte strings, and how they interact. While Python 3 should help ease some of that pain, many places (including Eventbrite) are still running on Python 2. The problem is made worse by the fact that modern web frameworks (Django, in our case) attempt to do the Right Thing by using Unicode strings, but legacy code may assume byte strings with a specific encoding. That bifurcation can lead to confusion, and make it difficult to refactor code in a way that can be shared between both “sides of the street”.

Towards the end of 2012 I developed a brief training for the web team to help establish a baseline common understanding of Unicode and string handling in Python. The notes from the presentation are available, and I hope others find them useful, as well. I’m currently working on some edits to expand areas that seemed to have lingering confusion, and to address the upcoming shift to Python 3 more directly. If you have suggestions, you can contact me me with them.

author:Nathan Yergler
category:training
tags:python, unicode, i18n
comments:

hieroglyph: Easy, Beautiful Slides with Restructured Text

I was happy to have my talk proposal accepted for PyCon this year, and happy with the feedback I received on my talk (Django Forms Deep Dive). But as I was putting my talk together the distracting question was not, “what should I say”, but “what should I say it with”. As a mentor once pointed out, “it’s more fun to write programs to help you write programs than it is to write programs.” The corollary I found over the past couple weeks: “it’s more fun to write programs to help you write slides than it is to write slides.”

I was putting together notes using reStructured Text and kept thinking that it’d be nice to generate both slides and longer written documentation from the same source. I’ve used docutils’ S5 generator in the past, but was looking for something a little more polished looking. Something like the HTML5 Slides.

So I wrote a Hieroglyph, a Sphinx builder for generating HTML5 Slides. I presented hieroglyph at the Sunday morning lightning talks at PyCon: you can see the slides, the reStructured Text source, as well as the HTML documentation generated from the same source.

I’m really happy with the output — it looks great in the browser, projects well, and because I’m using the html5slides CSS, looks great on mobile devices, too. I’m even happier that I’m able to work on my content in plain text. You can find the source on github.

date:2012-03-13 22:31:16
wordpress_id:2028
layout:post
slug:hieroglyph
comments:
category:projects, hieroglyph
tags:python, rst, slides, sphinx

super(self.__class__, self) # end of the line for subclassing

I’ve learned (and remembered) a lot in the past two months as I’ve gotten back to coding as my primary job. One thing that I guess I never quite internalized before is how super works. I have been bitten by code that looks something like the following a few times in the past month:
class A(object):
    def init(self):
        super(self.__class__, self).init()

    class B(A):
    def init(self):
        super(B, self).init()
The surprise comes when I try to use my sub-class, B. Instantiating B() blows up the stack with: RuntimeError: maximum recursion depth exceeded while calling a Python object. What? According to the Python 2.7.2 standard library documentation, super “return[s] a proxy object that delegates method calls to a parent or sibling class of type.” So in the case of single inheritance, it delegates access to the super class, it does not return an instance of the super class. In the example above, this means that when you instantiate B, the follow happens:
  1. enter B.__init__()
  2. call super on B and call __init__ on the proxy object
  3. enter A.__init__()
  4. call super on self.__class__ and call __init__ on the proxy object
The problem is that when we get to step four, self still refers to our instance of B, so calling super points back to A again. In technical terms: Ka-bloom. TL;DR: super(self.__class__, self) may look like a neat trick, but it’s the end of the line for sub-classing. Further reading: Raymond Hettinger’s excellent blog post on super provides a great overview of super and shows off the improved Python 3 syntax, which removes the need to write the class name as part of the super statement. I was really pleased to find the Python standard library documentation links directly to it.
date: 2011-07-04 20:44:23
wordpress_id: 1990
layout: post
slug: super-self
comments:
category: development, python
tags: python, super

CI at CC

I wrote about our roll-out of Hudson on the CC Labs blog. I wanted to note a few things about deploying that, primarily for my own reference. Hudson has some great documentation, but I found Joe Heck’s step by step instructions on using Hudson for Python projects particularly helpful. We’re using nose for most of our projects, and buildout creates a nosetest script wrapper that Hudson runs to generate pass/fail reports.

Setting up coverage is on the todo list, but it appears that our particular combination of libraries has at least one strange issue: when cc.license uses Jinja2 to load a template, coverage thinks it’s a Python source file (maybe it uses an import hook or something? haven’t looked) and tries to tokenize it when generating the xml report. Ka-boom. (This has apparently already been reported.)

Another item in the “maybe/someday” file is using Tox to run the tests using multiple versions of Python (example configuration for Tox + Hudson exists). I can see that this is a critical part of the process when releasing libraries for others to consume. We have slightly less surface area — all the servers run the same version of Python — but it’d be great to know exactly what our possible deployment parameters are.

Overall Hudson already feels like it’s adding to our sanity. I just received my copy of Continuous Delivery, so I think this is the start of something wonderful.

date:2010-08-20 10:37:43
wordpress_id:1734
layout:post
slug:ci-at-cc
comments:
category:cc, development
tags:cc, CI, coverage, Hudson, python, sanity

Batteries Included (or, Maildir to mbox, again)

UPDATE 7 June 2010: Added usage information to docstring.

UPDATE 30 January 2012: Frédéric Grosshans has provided an updated version that supports nested maildirs; you can find it at github. YMMV.

My script for converting maildir to mbox continues to be one of the most popular pages on yergler.net (according to Google Analytics). Of course, even after I updated it slightly in February, it still had a couple of bugs, likely introduced when I converted the page from MoinMoin to WordPress. This afternoon I finally decided to clear out the pending comments about those bugs, and update it.

While looking at the Python documentation for the `mailbox <http://docs.python.org/library/mailbox.html>`_ package included in the standard library, I realized it could probably be simplified even further by using the library’s native `mbox <http://docs.python.org/library/mailbox.html#mbox>`_ support. I’m also more comfortable using the standard library’s implementation of mbox rather than my hacked up raw file implementation (who knows, the standard library may do exactly what my script did: I’m not an mbox expert by any stretch of the imagination).

The new script is below. I should note that it’s received very little testing, and I make no guarantees. I also should note that there is nothing creative or original about this. It just uses Python’s excellent standard library. As they say, “batteries included”.

    /del> coding: utf-8


“”“maildir2mbox.pyNathan R. Yergler, 6 June 2010

This file does not contain sufficient creative expression to invokeassertion of copyright.  No warranty is expressed or implied; use atyour own risk.

—-

Uses Python’s included mailbox library to convert mail archives frommaildir [http://en.wikipedia.org/wiki/Maildir] to mbox [http://en.wikipedia.org/wiki/Mbox] format.

See http://docs.python.org/library/mailbox.html#mailbox.Mailbox for full documentation on this library.

—-

To run, save as maildir2mbox.py and run:

$ python maildir2mbox.py [maildir_path] [mbox_filename]

[maildir_path] should be the the path to the actual maildir (containing new, cur, tmp);

[mbox_filename] will be newly created.“”“

import mailboximport sysimport email


    open the existing maildir and the target mbox file
maildir = mailbox.Maildir(sys.argv [-2], email.message_from_file)mbox = mailbox.mbox(sys.argv[-1])


    lock the mbox
mbox.lock()


    iterate over messages in the maildir and add to the mbox
for msg in maildir:
mbox.add(msg)


    close and unlock
mbox.close()maildir.close()
date:2010-06-06 13:04:29
wordpress_id:1707
layout:post
slug:batteries-included-or-maildir-to-mbox-again
comments:
category:geek
tags:maildir, mbox, python