On Product Engineering

I’ve been writing software and working in teams for well over a decade now. The places I’ve worked have been varied: schools, non-profits, consumer startups, education software, and B2B SAAS companies. And despite that diversity, in every case there was this question: how do engineers maximize the amount of meaningful work they do?

I was fortunate to work at Remind at an inflection point in the organization’s development. We were moving from reporting to a single individual (the VP of Engineering) to a structure with engineering management. That shift wasn’t one we took lightly; the stakes felt high, so we approached it with some trepidation. The structure we decided to try was based in part on Spotify’s model, and kept that question squarely in mind: how do we help engineers maximize their meaningful work while minimizing the bullshit. 

The structure we landed on was optimized for engineer productivity and happiness. It helped us focus on what we wanted to spend our time on: building a great product.


During my time there, Remind was growing fast (and still is). Our goal was to connect every parent, student, and teacher, and in doing so improve the educational experience and outcomes. The core product — secure, one-way messaging — had demonstrated that those relationships were powerful and could be amplified with a great product, but we were still trying to figure out what the next step was. Was it parent-to-parent chat? Students organizing their own groups? Hierarchical message boards? 

As engineers, the most leveraged thing we could do to help figure that out was to ship something that worked, as quickly as possible. 

“Something that works” is a feature that users can see and use. It doesn’t have to be complete, it just has to be sufficiently useful to give us feedback about whether we’re on the right track.

Shipping quickly is important so we can shorten the time between hypothesis and validation; for example, we think parents chatting with each other will help engage parents, is that actually true?

Shipping quickly also means we have more opportunities to iterate. Thinking in iterations forces us to think about what the smallest useful feature is, and build from there.

As we looked around, it was obvious that there were things getting in the way of our individual contributors. So we set out to minimize those obstacles. 


Contention

We avoid contention by working in cross-functional teams.

Contention occurs when teams are dependent on a single resource, often another team. For example, if an Android team and an iOS team are working on different parts of the application and both need help from the backend team, they’re both blocked from progressing until backend addresses their issue. By organizing into cross-functional teams we can limit that contention: each team has dedicated engineers with expertise in backend, native clients, and web.

Interrupts

We avoid interrupts by working from a backlog.

Interrupts — both externally and internally triggered — are costly. They force a context change, usually for more than one individual, and it make it easy to lose sight of what we’re trying to build.
Interrupts are triggered “internally” when an engineer needs more information to complete a task. In those situations, she’s forced to go looking for someone who has the information or can make the decision. If she can’t find that individual, or if the answer isn’t immediately available, she has to find something else to work on until she can continue with the first task.

Interrupts are triggered “externally” when someone interrupts work to ask an engineer to look at something. This can be a product manager, support, or another team.

Working from a backlog helps us avoid both of these: work only goes onto the backlog if it’s actionable, and others — product managers, support, etc — know where to put work to make sure it gets looked at. At Remind this was the “ice box”, which we committed to reviewing weekly.

Uncertainty

We avoid uncertainty by reviewing the backlog together.

Uncertainty about what we’re building or why leads to a variety of other issues: it causes interrupts when questions need to be answered, and can lead to re-work if individuals aren’t in sync. We avoid uncertainty by reviewing the backlog together during a weekly planning meeting. During that meeting the team does a few things:

  • Review what’s still in the backlog from the previous week: does it still apply? Is it still important?
  • Review what’s in the “next” bucket (which we called the ice box). If items there have enough detail and specificity, they’re moved to the backlog and prioritized by the team and product manager.
  • New Feature work is assigned a point value based on complexity. The entire team participates in assigning the point value to work, even individuals who don’t have expertise in that area. Involving everyone helps uncover unspoken assumptions, or details that exist in a single person’s head.

When we develop a new feature, it’s up to the Project Lead to break it down into stories ahead of the planning meeting. The Project Lead is an engineer who works with the product manager to figure out what the smallest iteration is, and what needs to be built.

If during planning something doesn’t have enough information for an engineer to start work immediately, it doesn’t go in the backlog.

Blocking

We avoid blocking by specifying work as a user facing unit of work: features

Blocking is like contention: it slows the team down and often forces context switches. Blocking occurs when team members are waiting on one another. When we started thinking about process, it wasn’t uncommon for an Android engineer to defer landing work because an endpoint wasn’t ready. Co-locating the expertise in a cross-functional team helps, but it’s not sufficient to avoid blocking. We avoid blocking between tasks by specifying them as user-facing units of work: features. As much as possible, items in the backlog should be something that is user facing and shippable. This bears some unpacking.

A user facing story will be visible and useful to the user when it is complete. The user in this context is who we’re building for; it could be the end user, another team (product, support), or a new constituency. We require that the work be “visible” because this gives us incremental signal about whether we’re on the right track. Remind relied on product managers to verify work, so requiring that the task be “visible” also meant it could be easily verified.

A story is shippable if we can merge it to the main branch when it is complete. Long lived branches are a major source of blocking; stories should be constructed such that at the completion of each, the code can be merged.

Note that saying we work on user facing, shippable stories does not mean that when the story is complete, the entire feature is complete, or that we’ll send it to the customer. When we release is a separate question, and something that the product manager and project lead decide.

When we were adding support for group chat to Remind, we needed to show the user a list of other people they could potentially chat with. Instead of breaking that down into two tasks — “implement an endpoint to return the list of potential chat partners” and “when you tap the New Chat button, show the list of potential chat partners” — we’d specify it as a single task: “when you tap the New Chat button, show the list of potential chat partners”. If a new endpoint is necessary, that’s an implementation detail, and ideally it’s all handled by the same individual.

When we do wind up in a blocking situation, engineers should take the opportunity to pair in a cross functional manner: this helps reinforce cross-training and reduces future blocking.

What else is important to us?

Our team organized to minimize contention, interrupts, uncertainty, and blocking. In support of that, we adopted some additional supporting practices.

We work as a single unit

Teams are a single unit that works on a feature together. We’re measured as a team, and we support one another. When it comes to what we work on next, we pull the next thing at the top of the backlog, not the next thing we are expert on. This, along with pairing, helps us build up institutional knowledge about how things work, and means individuals can take the time they need without jeopardizing the rest of the team’s work.

We evaluate our progress and health

We review our progress weekly to make sure we’re on the right track and moving forward. During our evaluation we ask ourselves questions:
* What got in the way?
* What supported our work?
* Did I do as much as I thought? Why or why not?

Our goal is to get better at estimation at the team level, and understand what supports and impedes shipping.

We own our own process

Providing understanding of what goes into building a feature and what impedes our progress are the two of the most important contributions a product engineer can make. Answering these questions helps product prioritize better. Identifying impediments helps engineering management understand where the process isn’t working, and how to best support their teams. We should all feel ownership of the process and of the product that we’re working on. If the process wasn’t working, it was our job to fix it.

How did we do?

The result of adopting this structure and process had benefits across the board. The time we spent integrating work from multiple teams for a release dropped from a week or two to a day or two. We developed a release cadence that allowed us to be more confident in our projections and in the code we were shipping. Knowing that the next release train was leaving in two weeks instead of six weeks removed much of the penalty — real and imagined — for slipping to the next release. All of that was good for Remind.

The process was good for engineers, too. Engineers who paired to work in new areas felt like they were growing and stretching their expertise. When I spent time working on the iOS app, I suddenly understood why we sent things in certain formats. My level of empathy went way up. 

Engineers felt more empowered. Whether or not it’s true, engineers sometimes perceive their relationship with Product as parental or hierarchical. In this structure, engineers felt like they were peers with product and had the ability to impact the product in significant ways.

In short, the team jelled. This is a framework for building software, but it’s also a framework for building empathy and vulnerability. 


Thanks to Jason Fischl for reading draft versions of this post and providing valuable feedback. Mistakes and misrepresentations are solely mine.

Advice for the New Guy

We have three people joining the team at Eventbrite in the next month, and we’ve been talking about how to “indoctrinate” them in the Eventbrite Way. This isn’t “how to program” or “how to engineer”, it’s “how to work well with the established team” and “where to look for dragons, hang nails, paper cuts, and worse”.

Today, though, one of them emailed me, asking for what he should be studying and reading up on in the weeks before he joins us. He was planning to brush up on his Python and Django, but he was looking for suggestions. I thought about this a bit, and told him this:

First, rest up. Take a nap, or a walk around the park. Look at the grass and check out the breeze. Yes, really. It’s easy to wear yourself out in this line of work, and the first few weeks in particular are going to be challenging. You’ll be working different mental muscles that you were previously, and you’ll be meeting lots of new people. That takes a lot of energy. And it’ll be easier if you’re well rested.

When you wake up from your nap, if you still feel like doing preparatory work, here are a few suggestions:

One: Know what text editor you love and make sure your skills at using it are well honed.

You’ll be spending a lot of time in your editor, knowing it back and forth is a good thing. When I came to Eventbrite, I got to stop managing and write more code. And I realized that all the things that had been merely Emacs annoyances were suddenly really important to fix. I took the tutorial (M-x tutorial, natch), which actually was super helpful even though I’d been using it for years.

In particular, if you’re coming straight out of college, you’ll be working with more files at one time than you probably have in the past. Knowing how to find something in a file (and not necessarily knowing what file it’s in) will be useful. (And again, if you’re using Emacs, grep-find is your friend.)

[NB: If you want a longer treatise of why your editor matters, read The Pragmatic Programmer.]

Two: git.

We’ll give you some basic training on git if needed (and on how we use it here), but knowing the basics will put you ahead. If you want a reference, you could do worse than Chapters 1-3 of Pro Git. Remember: it’s not a set of files, it’s a graph of changes.

Three: Read some code

You’ll spend as much time reading code as you do writing it, if not more. Find a project and read it until you understand how it works. Better yet, find a project that’s a little, ahem, imperfect. A few suggestions based on things I’ve worked on recently-ish:

  • Hieroglyph (Follow your nose down into Sphinx and figure out how .. noslides: works.)
  • MediaGoblin (How is meddleware used? How’s it different than Django’s middleware?)
  • OpenHatch (How would you take the new mission framework and make it work for things like the SVN mission?)

Really, just practice reading code.

And four, explore the poetry of Billy Collins, a former poet laureate. I suggest the CC-licensed album of him reading his own work, The Best Cigarette. Specifically “Another Reason I Don’t Keep a Gun In The House”, “Marginalia”, and “The Best Cigarette”. Maybe it’s just me, but listening to Collins calms me down and puts me in that space where I can be calm, creative, and make great software.

So there you have it, my advice for the new guy. If I had to sum it up, it’s “tools and skills, not technologies”. And poetry.

date:2012-05-22 18:10:20
wordpress_id:2118
layout:post
slug:advice-for-the-new-guy
comments:
category:engineering

New Foundations

I’ve written a little about how I think about technical debt, and what it means to live with it. I want to talk about some technical debt at Creative Commons, and how we handled it the wrong way. A project we thought would take a couple months stretched into years, and in the end never fulfilled the promise we thought it had. And it was supposed to be a straight-forward project.

One of the things people don’t always know about Creative Commons is that there was a large technical component undergirding the licenses. Every license was prepared for three audiences (in talks, this is where I call them disjoint, in a lame attempt at humor): humans (the license “deed”), lawyers (the legal text), and machines. The output for machines was an RDF model of the license: it’s permissions, requirements, and prohibitions. In 2008 we had a technical all hands meeting where the tech team came to the San Francisco office for a week. At that time porting (preparing a license for a new legal jurisdiction and translating the web tools) was in full swing, and as we talked about what the pain points were, launching these new jurisdictions came up as a major source of pain. As we started drawing the model of how things worked on the site, I arrived at the following diagram.

We had at least three different “products” — the license chooser, the API for 3rd parties, and the prepared licenses (deeds and RDF). And for hysterical and historical reasons, they didn’t really use the same information. Well, they did at a certain level: they all used the same translation files, but after that all bets were off. We had the “questions” used for selecting a license modeled as an XSLT transformation (why? don’t remember; wish I knew what we were thinking when we did that), but the transformation needed to have localized content, so we would generate a new XSLT document from a ZPT template (yes, really) when we updated the translations. The license RDF was stored as static files for performance, but there was increasing pressure to provide localized data there, too, which was going to cause a world of hurt. And the chooser had a thin wrapper, cc.license, around the XSLT. Except when it went directly to the XSLT for special cases.

If you look in the upper right hand corner, you’ll see something labeled “cc.licenze”. This was a prototype library I had written when adding support for CC0 to the site. The idea was this: We claim that the RDF is the technical model for our legal tools. If that’s true, can we put enough information in it to drive the entire process, and have a single source of information? After launching CC0, signs pointed to yes. We set out to build a glorious future.

We’d build a single wrapper around our RDF and use it everywhere. We’d update one thing when we launched a new jurisdiction, and all the changes would flow to all parts of the site. It sounded amazing. The thing is, we were talking about moving our core infrastructure — our house — to a new foundation, but that foundation wasn’t built yet. We hadn’t really even figured out if it’d support the house or not.

Undeterred, an engineer set out to start building out “cc.licenze”, filling in the gaps I’d left to make it do all the things that licenses need that CC0 does not. And he got most of the work done, and then he left. So the work languished while we focused on continuing to ship new jurisdictions and do everything an understaffed technical team has to deal with.

The problem isn’t that we wanted to improve our underlying infrastructure, or that we wanted a coherent and consistent model. Those are the right goals. The problem was trying to build an entirely new foundation, with similar but not exactly the same APIs as the original one, and thinking we were going to slip it in. Starting this project today, I’d look at the three ways we were doing things, find the one that had the least debt, and rebase the other services/products onto it. By choosing one currently in use, any improvements made (either by rebasing or fixing bugs) would show immediate benefit. There’s immediate, tangible benefit to going from three ways to do something to two, and from two to one. Once everything uses the same foundation, there’s only one thing to rebuild and replace, not three, and we probably have a better idea about everything it needs to do.

To successfully live with technical debt, this is the sort of maneuver you often have to use. I think of this as Lateral Refactoring: you’re not refactoring to the API/design you want to wind up with, you’re tacking along an orthogonal axis until you’re at the point where you can start moving forward again. By doing this you can realize some benefit sooner, and continue shipping new features and bug fixes.

date:2012-05-16 23:13:57
wordpress_id:2073
layout:post
slug:new-foundations
comments:
category:engineering, process, talks

Living With It

So now that I’ve talked about what I think of when I say “technical debt”, I want to dig in on the other half of the title, “Living With It”. What does it mean to live with technical debt? I want to be clear: it does not mean simply accepting or ignoring it. I’m certain that’s the wrong way to build long-lived, robust software. When we encounter technical debt, or something that feels hard, I think there are a few common, understandable, and dangerous reactions. These roughly fall into the categories of “I can do better”, “One more won’t hurt”, and “I can’t go on.”

When some engineers — even good (but not great) ones — encounter technical debt, their reaction may be “I can do better”. That is, “Oh, this is terrible, I can’t possibly work with code like this, I’ll rewrite this part of the system, and then I can get around to what I came here to do.” Rewriting or refactoring debt away may be the right decision, but this statement contains unspoken assumptions that better code is more important than new features or bug fixes for users. This is where the paradox of living with technical debt first shows itself: living with technical debt does not mean accepting it, but it also doesn’t mean fixing it. Right now. The business, the organization, has to make decisions about what’s most important. (Engineers need input into those decisions, and the business needs to respect that input, or the best engineers will go elsewhere, where their input will be respected.) It’s up to the business to decide “can we go dark for n days/weeks/months.” Sometimes the answer may be yes, and we’re free to improve the code with abandon. I think that’s a rare situation. More often the answer is “no”, so we need to live with the debt and develop strategies for improving it (more on that later).

Another reaction that I think is all too common is “I guess one more won’t hurt”. That is, “Well, we’re stupid is these five places, what’s one more?” Living with technical debt does not mean you continue to incur it. If anything, it’s essential to stop running up the tab. This requires rigor and strength of will, not just on the part of the engineer working on the code, but on her peers. The team needs to decide that incurring additional debt is not acceptable: you can maintain or you can improve, but you can’t backslide. The danger of “one more won’t hurt” is that the problem spreads: you build new features that repeat past mistakes, instead of providing a model for future work.

Finally, sometimes we look at code and think, “I can’t go on”. I find that those are the time it’s helpful to step away from a project, take a break, come back after a good night’s sleep. You don’t always have that luxury, but feelings of despair rarely coincide with my best work. I’ve observed that indulging in the first two ways of thinking — “I can do better” and “One more won’t hurt” — often leads to the final one — “I can’t go on”. “One more won’t hurt” just digs a deeper and deeper hole, until you can’t see your way out, and “I can do better” often leaves you with a piece of “perfect” code that doesn’t quite fit into the rest of the system, leaving you to shims and scotch tape, the very things you started out trying to avoid.

In “Good to Great”, Jim Collins writes about characteristics that separate good companies from great ones. One of the principles he identifies is “Confront the brutal facts, but never lose faith.” In other words, it does no good to pretend that your code (company in his case) is something that it isn’t. Collins talks about meeting Admiral Stockdale, and asking him, “Who didn’t make it out?” “Oh, that’s easy — the optimists.” Stockdale explains that the optimists were routinely disappointed, and eventually lost faith. “I can’t go on.” Collins quotes Stockdale as saying, “You must never confuse faith that you will prevail in the end — which you can never afford to lose — with the discipline to confront the most brutal facts of your current reality, whatever they might be.” Technical debt may be a far cry from Stockdale’s situation, but the principle holds as the heart of truly living with technical debt: we must confront things as they are, not as we wish they were. And we must believe that we can make things better, that we know where we’re going.

date:2012-05-15 21:32:00
wordpress_id:2085
layout:post
slug:living-with-it
comments:
category:engineering, process, talks

Living With Technical Debt, Part One

I’m speaking at Velocity next month on “Living with Technical Debt”. Like any mature codebase, our software at Eventbrite has technical debt. Like any project with rapidly shifting priorities, the code we built at Creative Commons had technical debt. It’s only in the last year or so that I’ve really come to see that and start to think about how one navigates technical debt. So there are a lot of ideas floating around in my head about what I want to talk about. This post (probably the first of several) is me trying to get those ideas out of my head and into text, so I can go about organizing my talk. Not everything in here is going to make it into the final talk, and I expect that whatever does will be re-organized and re-synthesized.

I don’t think it’s unreasonable to start with what I mean by “technical debt”. “Technical Debt” is a euphemism, usually trotted out when we’re talking about something we don’t like about software or systems. I say “don’t like” as if the label is undeserved: it’s not always clear when someone says “technical debt” if they’re talking about code that’s obviously difficult to work with, or just makes a different set of choices than they would have made. One definition I’ve been thinking about is this: technical debt is some aspect of your system that increases the cognitive overhead of understanding, improving, and maintaining it. It’s possible there should be a clause added about “for the majority of developers”, too: I know there’s code I’ve written that absolutely minimizes cognitive overhead for me, but the things I’m used to, idiomatic Nathan, makes it harder for someone else to come and fix a bug or add functionality.

By speaking about technical debt in terms of cognitive overhead, we can start to detach ourselves from the situation emotionally. It’s pretty easy to become emotionally involved with the code we write. And usually that’s a good thing: it’s important for me to work on things that feel important, things that I feel like I can leave my mark on. I’d like to posit, however, that it’s possible to become emotionally co-dependent with your code. That may sound like a strange idea, so let me explain: whenever something I create becomes a proxy for my self — my individuality, my self worth — it is nearly impossible for me to see problems with it. It is nearly impossible for me to hear anything but glowing praise. And when I do hear glowing praise, it’s never enough. I’ve observed two different effects of these feelings. First, I start treating situations like a zero sum game: it’s not enough to succeed, others must fail. It’s not hard to see how this would lead to hypersensitivity and hypercriticality at the same time. Second, I don’t make smart decisions: I make them based on my feelings rather than on reality. I don’t know why this would be any less true of code than it is of other endeavors. So to really see technical debt in our systems, we need to detach ourselves emotionally: it’s not about who’s at fault, it’s about how we make it better.

(There’s a whole other topic around team building here; I’m going to assume for the purposes of this discussion that you have the people you want on your team, either because they’re operating at the level you need them to, or because you believe they can grow to that level.)

So what are some ways your system can add to the cognitive overhead needed to understand it? I can think of a few: inconsistency, duplication, and lack of cohesion all immediately come to mind. These all make it difficult for an engineer to understand, maintain, and improve a system. More on that later.

date:2012-05-14 21:29:05
wordpress_id:2080
layout:post
slug:living-with-technical-debt-part-one
comments:
category:engineering, process, talks