Wednesday, April 21, 2010

Agile Game Interview - Agile Engineering for Online Communities

James Birchler is an Engineering Director at IMVU, where he implemented and manages IMVU’s product development process using Agile and Lean Startup methodologies. Prior to joining IMVU in 2006, he redesigned the product development process at Bargain Network, Inc., helping complete the company’s acquisition by Vertrue, Inc. Previously, James was a Director at, responsible for technical design, creative design, and content of the web applications and GUI for There’s virtual world application. James will be presenting on Lean Startup methodologies at the Startup Lessons Learned Conference on April 23 in San Francisco, and is a frequent contributor to the IMVU Engineering Blog.

Clinton Keith: What is IMVU?

James Birchler: IMVU, Inc. ( is an online community where members use 3D avatars to meet new people, chat, create and play with their friends. IMVU has reached 40 million registered users, 10 million unique visitors per month and a $30+ million annualized revenue run rate. IMVU has the world's largest virtual goods catalog of more than 3 million items, almost all of which are created by its own members.  IMVU achieved profitability in 2009 and is hiring aggressively with plans to double the size of its team in 2010.

CK: What is the overview of the IMVU engineering process?

JB: Our engineering team practices what people refer to as "Agile" or "Lean Startup" methodologies, including Test-Driven Development (TDD), pair programming, collective code ownership, and continuous integration. We refactor code relentlessly, and try to ship code in small batches. And we've taken continuous integration a step further: every time we check in server side code, we push the changes to our production servers immediately after the code passes our automated tests. In practice, this means we push code live to our production web servers 35-50 times per day. Taken together with our A/B split-testing framework, which we use to collect data to inform our product development decisions, we have the ability to quickly see and test the effects of new features live in production. We also practice "5 Whys" root cause analysis when something goes wrong, to avoid making the same mistakes twice.

CK: How do you get so many changes out in front of customers without causing problems, either for your customers or your production web servers?

JB: I think it’s important to point out that sometimes we actually do introduce regressions and new bugs that impact our customers.  However, we try to strike a balance between minimizing negative customer impacts and maximizing our ability to innovate and test new features with real customers as early as possible. We have several mechanisms we use to help us do that, and to control how customers experience our changes. It boils down to automation on one hand, and our QA engineers on the other.

First the automation: we take TDD very seriously, and it’s an important part of our engineering culture. We try to write tests for all of our code, and when we fix bugs, we write tests to ensure they don’t regress. Next, we built our code deployment process to include what we call our "cluster immune system," which monitors and alerts on statistically significant changes in dozens of key error rates and business metrics. If a problem is detected, the code is automatically rolled back and our engineering and operations teams are notified. Next, we have the ability to limit rollout of a feature to one or more groups of customers--so we can expose new features to only QA or Admin users, or ad-hoc customer sets. We also built an incremental rollout function that allows us to slowly increase exposure to customers while we monitor both technical and business metrics to ensure there are no big problems with how the features work in production. Finally, we build in a "kill switch" to most of our applications, so that if any problems occur later, for example, scaling issues, we have fine-grained control to turn off problematic features while we fix them.

CK: You mentioned QA engineering, too: how do you approach Quality Assurance at IMVU?

JB: Most features go through some level of manual testing. Minimally, the engineer that pushes code verifies the functionality in production. In many cases, features get dedicated testing time from a QA engineer. By the time a feature is ready for QA testing, our QA engineers can focus on using the feature much like a customer would. They expose bugs by doing complex, multi-dimensional manual testing based on more complicated use cases. People are really good at this work, which is more difficult to cover with automated testing.

In terms of our overall approach to QA, for the last 18 months we have tried to find ways to leverage the strengths of our small QA team--we have only two QA engineers currently. Our ratio of QA to engineering has ranged from about 1:9 to 1:14 in that time span. Initially this was because we didn’t have any QA engineers in our hiring plan. However, we decided to delay hiring until we got a clear signal from our product owners that they needed more QA bandwidth.  While this has been challenging in many ways, having limited QA resources forced us to innovate on process, and to triage and focus on features that are most important to our customers.

In practice, this means that our teams are always reviewing QA priorities to see which features get attention. In some cases there is so much important QA work that needs to be done that we organize group tests, either with the team or the entire company, to quickly find and fix issues. Our software engineers are also becoming skilled in carrying out manual testing, and following our QA test plans. When this happens, it is more akin to classic Scrum, where all tasks (including QA tasks) are shared among the team.

From a process standpoint, we integrate QA engineers into our product design process as early as possible. They provide a unique and valuable point of view for developing customer use cases and exposing potential issues before engineering begins work. The collaboration between our product owners and QA engineers adds tremendous value. I’ve written more about IMVU’s Approach to QA on our IMVU Engineering Blog.

CK: What about the rest of your product development process--are you using a Scrum framework?

JB: We are now, but we didn't from the outset. We had mixed success using several different development styles, and we didn't standardize on a single set of methods for all our teams to follow until about 18 months ago. Before we standardized, we had many problems. One was that we didn’t have a good infrastructure for ensuring clear visibility about what teams were doing and why. Another was that there was a ramp-up time when moving people among teams, since had to figure out how the new team operated. I think our process reflected what was a fairly decentralized product management organization--product level decision making authority was distributed among many product owners, each responsible principally for their own area of the product. The dynamic made it challenging to get agreement on how to allocate team resources. One result of this was that we shipped a lot of small features, but were unable to ship large features or major product improvements in response to customer needs. In Lean Startup parlance, we failed to execute a Pivot. For example, revamping the UI for our instant messenger product interface was impossible for a long time because it required implementing a new infrastructure for building and updating our UI—a major undertaking.

We've transitioned over the last 18 months to using a consistent set of Scrum practices for all our product development teams, accompanied by our key hire of a VP of Product. We don't follow Scrum dogmatically, though we adhere to some important tenets. Fundamentally, we use Scrum to facilitate communication. Maximizing face-to-face communication among team members while issues are minor adds a lot of efficiency.  Like all our processes, our Scrum is subject to iteration and incremental improvements.

CK: What are some of the modifications you've made to your Scrum process?

JB: For starters, we encourage two important meetings that we’ve found are particularly valuable. The first is a meeting involving the product owner, engineering technical lead and QA engineer, which occurs prior to sprint planning. These stakeholders review their respective design documents (project brief, technical design and QA test plan) to resolve inconsistencies and potential issues, and update those plans. By the time we have the sprint planning meeting, we have a high level of confidence that we have a solid technical plan for building the feature, that the team knows how the feature will be tested by QA, and that our customer use cases will be fulfilled.

The second meeting occurs during the sprint when a feature is nearly complete. The engineer that built the feature meets with the product owner and QA engineer to demonstrate the feature and ensure it is functioning as expected and that it has all possible automated test coverage in place. Demonstrating a feature to other team members almost always exposes issues that result in minor changes and optimizations.

We're currently at a stage of growth where we are planning to add a user experience design lead to both of these meetings--again, we're iterating on our own process based on experience and seeking to optimize and improve.

CK: How difficult was your transition to using Scrum exclusively in your product development process?

JB: The transition was fairly smooth, but not without problems. For example, it took a few sprints to get used to the daily Scrum meeting routine. We had to practice and iterate on how to give a daily update that made sense to the team. We even had to institute a $1/minute late fee to help ensure everyone showed up for the meeting. We had some great parties with all the money we collected.

I think our success with Scrum has hinged on two factors: first, we have unwavering executive-level support for using Scrum and constantly iterating to improve our processes. Second, our engineering team was already completely bought in to TDD, pairing, and continuous integration and deployment.  This made it relatively easy to add a consistent Scrum framework across all our teams, and proceed to iterate on the process sprint by sprint.

CK: How has your Scrum process evolved?

JB: Initially, we were quite strict about following classic Scrum ground rules and processes. For example, we tried to ensure that any team member was able to take any task from the board. I think this would be a good practice if you can make it work, but we discovered pretty quickly that sometimes it's most valuable to recognize specialties among the team. For example, we often have one person assigned to a task or even an entire feature at the start of the sprint. We also tried to keep new tasks from being added to the sprint.

This turned out to be really hard, and it caused a lot of stress. When a product owner wanted to add work, the team would push back hard, afraid that they wouldn’t be able to meet their commitment for the sprint. If the team tried to add tasks (typically addressing technical debt or fixing issues found in the code), the Product Owner would push back hard, too. Eventually we realized that no matter how hard we tried, we couldn’t anticipate everything that would happen during a sprint, and that we needed a way to accommodate issues that needed to be addressed immediately. What we did was we added a “New” column on our task board, where anyone could add a task. We review all the new tasks during scrum, and agree as a team on which tasks will be added to the sprint backlog, and which are moved to the product backlog. When we add a task, the team agrees to either remove a task of similar size, or to agree to change the team commitment for the sprint.

Currently, we evaluate our Scrum practice each sprint in a retrospective, and we optimize our process each sprint.  One interesting thing we noticed was that sharing best practices among teams didn’t necessarily work. While a certain best practice might make perfect sense to one team that was reacting to their own recent experience, other teams often need to experience the same problem themselves before they successfully adopt the practice.  

Our careful practice of doing sprint retrospectives is key to ability to evolve process overall.  We take time to review the events of each sprint, being careful to recognize patterns and make changes to improve.

CK: What are some of the other unique aspects of IMVU’s Scrum processes?

JB: For example, our planning meetings involve a detailed walk-through of the technical design for each feature in development, so that everyone on the team participates in task estimation and understands the path each project is likely to take over the course of the sprint. In practice, this doesn't take as long as it might seem, since a fair amount of pre-planning has already taken place.

We also added a permanent lane to our Scrum board called, "Engineering Project Follow-up", where we constantly track and make time to improve engineering infrastructure. It's a lane where an engineer can actually add tasks to the backlog and have them prioritized and implemented. We did this in part because as we have scaled our product development process, we noticed that our velocity was being negatively impacted by past technical debt. Making time to make infrastructure improvements is also a key to ensuring a consistent velocity. Our product owners recognize and appreciate the value in doing this.

Another example is that we learned to be comfortable with our velocity without relying on a burn down chart. When we started off with Scrum, we spent a lot of time trying to create an accurate daily burn down chart—and it was useful initially as we figured out what our velocity was. After many sprints under our belts, our teams have developed a good sense of their velocity, and we routinely complete planned work each sprint.  Currently, we only create a burn down when we've had a problem delivering a product feature or when we have the feeling that something isn't quite right. That said, if we could find a simple, automated tool for tracking tasks and creating accurate burn down charts, we would find it useful.

CK: What is the relationship between Product Management and Software Engineering on your teams?

JB: This is an important and large topic, and I'll share just a few thoughts here. Overall, Engineering is a service organization supporting our Product Management team. Put simply, Product decides what to build, and Engineering decides what technologies we are going use to build it.

In Scrum, product management is represented by product owners on each team, which consists of between 6-12 software engineers. We have developed a strong sense of trust and respect between these disciplines over time. This trust and respect is important. For example, refactoring code is a strong part of our engineering culture at IMVU. Initially, product owners would question the introduction of work into a sprint by the engineers, asking, "Do we really need to fix this technical debt? Isn't there a faster way to implement this feature?"

It didn't take long for several concrete examples to appear highlighting the value of engineering things correctly from the outset, and of paying back technical debt that causes ongoing drag on engineering efficiency. At present, product owners trust engineers who say, "I'm delayed on my task because I needed to make time to refactor code.” Engineers also trust product owners to help ensure that the team has time to build and maintain solid engineering infrastructure and repay technical debt.

We've had plenty of conflict, though. To help manage that, we put a simple process in place: if the team technical lead and product owner couldn't agree on a how to handle an issue, then the matter was simply resolved by escalating the issue to the management team. Overall, our product owners consistently report that our development teams are the most productive that they’ve ever worked with.

CK: What’s next for IMVU’s product development process?
JB: We continue to scale up our processes as we continue hiring. We’re starting to find ways to integrate design functions into our processes now. Our overall strategy remains the same: continue paying close attention to how we are doing, and keep on iterating.

CK: James, thanks for sharing.  Congratulations on IMVU’s success!

No comments: