Wednesday, December 16, 2020

Avoiding Quality Disasters

On September 23, 1999, NASA's latest planetary probe, the Mars Climate Orbiter, began its orbital insertion maneuver to enter Mars orbit. It was never heard from again. The subsequent investigation determined that the likely cause was that one component of the software system was communicating measurements in imperial units, while the rest was communicating in metric. The discrepancy in the measurements resulted in the orbiter crashing on Mars rather than orbiting it. As summed up by a NASA executive: "The problem here was not the error; it was the failure of NASA's systems engineering, and the checks and balances in our processes, to detect the error."

This was not the first mission failure or mishap for NASA around the time. Failed rocket launches, blurry space telescopes, and the pair of shuttle tragedies underscored how the organization had let their quality slip. Subsequent studies of their culture coined the phrase "normalization of deviance" to describe how the increasing tolerance of known problems led to these failures. Managers and teams just got used to the lowered quality until it was too late.

While the consequences of low quality aren't as significant for game developers and studios, "normalization of deviance" impacts us as well...especially near our own launches. We increasingly tolerate bugs and crashes until there is a disastrous release.

Anyone who has read the reviews about some of the recently released games knows the impact of compromised quality. Players will ignore ingenious gameplay, technology, and art when bugs get in their way.


Bugs, glitches, bad art, unfinished mechanics, etc., are all forms of "debt". Ward Cunningham, one of the authors of the Agile Manifesto, once likened the problems with technology, such as bugs and unmaintainable code, to financial debt—it grows the cost of payback over time.

Debt results in an ever-expanding amount of work that must be addressed before

the game is deployed. Numerous forms of debt exist for game development, such as

  • Technical debt: Bugs and slow and unmaintainable code.
  • Art debt: Assets such as models, textures, and audio needing replacement or tuning.
  • Design debt: Unproven mechanics waiting to be fully integrated into the game to prove their value.
  • Production debt: The amount of content that the game needs to be commercially viable (for example, 12 to 20 hours of gameplay).
  • Optimization debt: Work waiting to be done to run the game at acceptable frame rates on target platforms.

If unmanaged, the amount of work to address debt is unpredictable and results in crunch and a compromise in quality as key features or assets are dropped in favor of hitting a ship or deployment date. Worse yet, testing, which is often delayed until the end of development, is curtailed to meet a scheduled release.

Debt is always going to exist. The quality issues seen with recent games released are all instances of poorly managed debt. This article describes one approach to managing debt effectively.

Managing Debt

When studios decide that they need to improve debt management, usually it's when they are badly impacted by it. A release's sales are low, or a live game is shedding players who won't put up with the problems any longer. At that point, like a doctor, we have to employ some practices to stabilize the patient, fix some of the root problems, and set up a system where the patient slowly returns to health and stays that way.

The approach I find works the best has two parts:

  • Create a quality strike team to stabilize the game, establish metrics and testing tools.
  • Continuously roll out improved studio-wide practices.

Strike Team, Assemble!

The adage that "quality is everyone's responsibility" is correct. However, building that culture takes time, and without proper metrics and systems in place, it won't happen. Therefore putting together a team of developers with insight into the technology and a passion for fixing problems is a necessary step. It's useful to add a dedicated product owner to prioritize the effort, measure this team's cost, and protect it. Protection is crucial because a common cause of the massive debt is the pressure put on teams by stakeholders to add features as quickly as possible to the game. This pressure always results in the team skipping some practices that improve quality, such as iterating on a mechanic or refactoring messy unmaintainable code, or polishing an asset. This isn't a "cleanup crew." This team's responsibility isn't to refactor code, iterate on mechanics, or polish assets. That would be cruel to them and even encourage developers outside the team not to address debt...they'd just hand problems off to this team.

The focus of the strike team is to:

  • Implement metrics that clearly show the quality of the current build
  • Build test automation and a test flow that will catch debt quickly
  • Find the root causes of debt and address them

Strike Team Product Ownership Role

A Product Owner for the strike team is essential. The team will have its own backlog. This PO will work with other product POs to understand the emergent practices for developers and their impact on their short term velocity. While boosting productivity over the long-term, quality improvements will cost time in the short term (this is one reason teams often don't do enough of it).

The Initial Product Backlog

A Product Owner's primary responsibility is to maintain a Product Backlog that the team works from. Below is an example of the initial epic goals (goals too big to be completed in a single Sprint/iteration) of a strike team product backlog:

The overall epic user story: "As the Quality Product Owner, I want the current release to be free of priority one issues, so we don't lose existing and potential customers."

Epics are broken down into smaller epics as the team approaches the work on them. An example:

"As the Quality Product Owner, I want a set of metrics that show the quality of the current release."

Acceptance Criteria:

  • The rate of crashes and locked builds per hour, per 1000 users
  • The rate at which users encounter problems (e.g., microphone issues, etc.)


"As the Quality Product Owner, I want a set of automated tests that will "bless" a potential build as free of priority one issues before it is released."

Acceptance Criteria: App launches in a test suite, and the avatar runs through several tests (e.g., crossing room boundaries, testing audio, etc.).


"As the Quality Product Owner, I want a definition of done established, with regular retrospectives to catch defects and lead to improved development practices to avoid release issues."

These epic goals will often challenge the team to innovate in new areas. For example, they might ask, "how do we measure quality?" For this example, the team came up with the following solutions:

  • Have the game engine send an email containing debug information every time it crashes.
  • Build up a series of automated tests that catch problems.

Most modern engines and operating systems can catch asserts and use them to trigger emails sent to an address that collects the debug information (stack traces, player behaviors, hardware information, etc.). Additionally, a watchdog process can determine if the game is "locked up". Using these tools, you can capture measures that give you a reliable metric of stability.

Create a target

What is our quality target? What not shoot for 100%? This is near impossible, but that's OK. For those of you who use OKRs, this is called an "aspirational OKR." "Aspirational OKRs express how we'd like the world to look, even though we have no clear idea how to get there and/or the resources necessary to deliver the OKR." - Google OKR Playbook.

Aiming for 100% quality: it took us six months to go from 25% (the percentage of time a build had zero detectable priority one defects) to 95%. From there, it took another six months to get to 98%. The benefits were tremendous. The boost in productivity working with a stable and performing game was clear.

Strike Team Work

The work the strike team took on from here fell into three categories:

  1. Find and fix root causes
  2. Build test automation
  3. Collaborate with the rest of the developers on improved practices

Find and Fix Root Causes

When collecting metrics and crash data, it's useful to capture what the user was doing at the time of the crash. Using that information, the team can categorize the causes and, using root cause analysis, identify the most impactful culprits. In this example, the team found that poorly named assets were responsible for 25% of the crashes. With a simple fix to a few exporters, they eliminated a quarter of the crashes.

Build Test Automation

Test automation is another key to quality. Manual testing can't keep up when it takes a full day to test a game manually while developers are committing hundreds of changes. 

Test automation should provide a layered approach that can catch various problems that take an increasingly long time for each layer.

An example of the layers from simple/quick to complex/time consuming:

  1. Compile for all build configurations for all hardware. This will catch compile/link errors.
  2. Localized unit testing to test code around the areas committed
  3. Asset export/validation. Export assets to all hardware configurations and run tests (naming conventions and other standards (telex density, etc.)
  4. Smoke tests all levels or modes on all platforms. Detect if any crashes the game.
  5. Scripted gameplay. Have a replay or scripted gameplay run through the entire game (or portions in levels or modes) and find crashes.
  6. Run full unit tests for the entire code base. This can take an entire night.

The number and type of these tests are determined by the frequency of the problems found in the root cause analysis. For example, a team found that code changes made often broke Android builds, so running tests that found those problems first was a good approach.

Test automation requires not only the cost to code the tests but an investment in test servers and, for large AAA games, an investment in improving a studio's network to handle the frequent transfer of huge code/asset files.

Scripted gameplay/replays can find problems that are subtle. For one racing game, we had all the AI vehicles race each other for all races. It took most of the night. The test would record all the finish times and flag discrepancies. One morning we found that none of the AI players had finished one of the races. Replaying the race, we found that a prop had been accidentally moved into an intersection. The collision with this prop set off a pileup among all the AI players that they could not recover from.

Collaborate with developers on improved practices

The most challenging part of this approach is for development teams to change their practices and culture to support higher quality. Developers often resist changing the practices they've spent a career building up. Trying to force change on them will always fail.  

The following are some approaches that can better influence changes to improve quality.

Create a shared vision of why these changes are needed.

It's not a hard sell to convince developers that making better games with less "death-march crunch" is a good thing. That vision has to be connected with changing the way they work.

Work together to improve practices, starting with outcomes.

Scrum has a "definition of done," which all features that are considered complete in a Sprint must achieve. After every Sprint, the Product Owner and Development Team will meet to discuss what quality issues impact them and the game and explore improving that definition of done. Those improvements will lead to changes that the team will experiment with over the coming Sprints and determine if they help and should be continued or don't help and should be abandoned. Keep these changes small so as not to overwhelm the team. They will typically aim to improve quality by 1-2% every Sprint, which doesn't sound like a lot, but can lead to 25-60% improvement over the course of a year (1.01 X 26 sprints/year ~= 25%).

Explore significant changes with beachhead teams

Some changes, such as implementing unit testing, are significant and are harder to roll out incrementally. For introducing these changes, try having a single development team explore the value and barriers to unit testing can be a valuable start. These teams, called beachhead teams, referring to soldiers that would land on an enemy's beach first, can refine the practices to fit the current development effort and culture best. In rolling those changes out, they are also effective coaches to other teams.


Disaster is often the catalyst for dramatic change, but the cost paid for that is often too high. Years of work on a game can be wasted on a bad launch. Day one patches are not the solution. They're an admission of failure.

Changing a development culture before the impact of a disaster is not easy. It costs money and time. It requires courage, leadership, and patience.

Monday, May 04, 2020

It's Only a Model

“As to methods there may be a million and then some, but principles are few. The man who grasps principles can successfully select his own methods. The man who tries methods, ignoring principles, is sure to have trouble.”
~ Ralph Waldo Emerson

Recently, the following article has been making the rounds:

"Failed #SquadGoals Spotify doesn't use "the Spotify model," and neither should you."

The article claims, with backup from former employees, that the famous Spotify development model -- popularized by articles, videos, and books created largely by Henrick Kniberg -- was never very successful at Spotify, and organizations should ignore it.

If you are not familiar with the Spotify Model, Google it. It describes Spotify's culture and practices for scaling, team formation, engineering, Lean, Agile, etc. The articles describe overcoming the challenges of scaling engineering, understanding customer needs, dealing with stakeholders, etc. It was a beacon of light for any manager struggling in these areas...which means most of them.

The prestige of the Spotify model is undeniable. I've sat next to strangers on planes who gush about it. I've even had clients who insisted on cloning everything in the model, even naming their teams Squads as if the new naming convention would bestow higher productivity.

You can't blame Henrick Kniberg. Henrick is a master communicator. He's a very personable guy . His early book "Scrum from the Trenches" inspired many and I share clips of his videos and articles with clients.

But inspiration isn't the same as imitation.  Models, especially cultural models, can never be directly transplanted between organizations. All you end up with is a cargo cult of practices with no heart behind them. Cargo cults most famously existed after World War 2, when natives living near abandoned military airfields imitated the observed practices of the former airbase soldiers to entice the return of cargo planes that would bring them valuable things such as steel knives and candies. Unfortunately for them, imitating microphones and headsets with pineapples and coconuts did not work. Similarly, calling teams "Squads" and quoting articles without growing the heart of values and principles, which they are built upon, is equally futile.

Models are for Inspiration 

In 1968, Dee Hock helped form VISA International.  VISA International (later renamed Visa) was established as a decentralized organization and was soon dominating the credit card industry. In his book "One from Many: VISA and the Rise of Chaordic Organization," Hock describes a self-organized company with complete transparency and self-organization. The "Chaordic Model" was an inspiration for many.  Hock resigned in 1984, and Visa is no longer decentralized or a leader in the industry. It has reverted to a slow, hierarchical company that does not innovate very much.

Does this mean the Chaordic model was wrong? Does Spotify's current culture discredit the "Spotify model?"

Not at all. Organizations are ever-changing aggregations of individuals that evolve continuously. Culture is the collection of people and their interactions. Nothing more. Models are just for inspiration.

So, What Do We Do?

What many in the Agile consultant industry ignore is the first and most crucial Agile value:

"Individuals and interactions over processes and tools."  I often shorten this to "People over process".  As mentioned above, culture is the result of the people in the organization and their relationships with one another. Practices merely help guide and constrain some of those interactions, hopefully in a useful way, but often not.

Cultural growth requires clear and consistent leadership. It starts with a vision for the organization and a set of values and principles that guide it. The culture will evolve and grow around those. I've always liked the analogy of that leadership and culture is similar to a gardener and the garden.  A gardener cannot make a garden grow. It will grow on its own. The gardener can provide the conditions and remove the weeds and other threats to growth, but in the end, some plants grow best in the soil that exists.

Cultivating a productive and healthy culture takes work and skill. Many leaders don't know where to start (it's usually not on many MBA curriculums). It just seems easier to read a book or article and adopt the formula it's selling, but that never works. It takes engagement, constant course correction, and the willingness to hand over control with trust and to grow from failure.  Following a pattern sounds more comfortable, but we end up chattering into pineapple microphones hoping the World War 2 planes will once again land.

Thursday, April 16, 2020

Player Stories

In my agile coaching journeys, one of the things I see misused the most is the User Story.

User Stories were challenging for us to embrace at High Moon Studios. We were very document-driven, and User Stories just became another form of detailed documentation for us. Our coach Mike Cohn worked hard to wean us off of our desire to write everything down, but we fought him. At first, we stapled 10-page documents to the 3x5 index cards he told us to use. When he discouraged that, we found ways to fill the index cards with tiny printed fonts.

Even when we switched to the Connecxtra Template:
"As a [someone], I want to do [something], so that [I get some benefit],"
we got it wrong. User Stories simply became structured features or tasks.

For example:
"As a User, I want an ammo pickup so I can keep using my weapon."

So what's wrong with this? As a feature, nothing. It's an excellent feature. But it doesn't represent what User Stories are meant for. They are intended to demonstrate why the user wants something. This finally hit home when we encountered the following User Story:
"As a player, I want police chasing me through an open city."

We might have ignored any problems with this, but we had a short schedule and someone on our team who wouldn't stop asking questions about it.

The first question was about the missing second part of the story:
"so that [I get some benefit]"

The question was, "what is the motivation created by police chasing the player?"

Why would a player want police chasing them? The answer is that police chasing the player is a means to an end.

So what is the motivation? Naturally enough, the motivation is to have a hair-raising time racing through an entire city and getting away, which you can't do if the police catch you.

Expressing that as a User Story became:
"As a Player, I want to race through a city so that I can achieve my goal of getting to the next level."
Notice the reintroduction of the word "to." This communicates a motivation for the player.
Let's instead call this a "Player Story."

Why Do This?
There turns out to be a beneficial reason for writing stories this way. It turns out that when you write Stories about why the player wants something, developers come up with better solutions for fulfilling those desires. This was the case with Player Stories. When we started having conversations about "As a Player, I want to race through a city so that I can achieve my goal of getting to the next level," the focus became more about the player experience and less about the police feature.
With previous games, such as "Midnight Club," we'd spend people years coming up with the best police AI because we were focused on the AI feature. The solution was never great. Police AI in a complex city is tough to implement and very expensive.

On this game, we asked ourselves, "what does the player really need to drive that motivation"? The answer was, "the need to drive fast enough to not get caught or crash." The developers focused on that by simply looking at the player's average speed. When the player drove too slow, they'd play a siren sound. Then, if the player continued to drive too slow, they'd ramp up the threat (flashing lights, police cars appearing and ramming you, etc.) until the player got arrested (cinematic). Not worrying about AI navigation, etc. shaved many months off of the work. The results were better because we kept focusing on the player experience.

Other Benefits
Focusing on player motivation led to other benefits. First, splitting Player Stories like this into smaller Sprint-sized stories became easier. Most of the time, it was about a slice of the whole experience, such as having a cube car racing through a cube city filled with ping-ponging traffic cubes and siren sounds. That showed us what was working or not after the first sprint. The previous approach of splitting features would have resulted in parts of the experience, such as police AI navigation, that wouldn't prove their value for months.

Not all User Stories need to be Player Stories that capture motivation, but more Epic (large) Stories should. When the motivation of the player is at the forefront of design and planning at every level, you get better results for the player.

Tuesday, March 03, 2020

Managing Risk on Video Games

Back in the early nineties, I had the privilege of working at a game studio occasionally visited by Shigeru Miyamoto. Miyamoto arrived every few months to play the game we were developing for Nintendo. He didn't care about a schedule or budget. He only wished to know if we had "found the fun" yet.

Finding the fun is one of the most significant areas of uncertainty in a new game's development. Shipping a game that isn't fun is always a considerable risk.

Risk is the impact on your plan caused by uncertainty. Uncertainty is also an essential part of game development. Think of all the great games you've ever played. Did many of them do something that you'd never seen before? For every successful game that does something new, there are many more that embraced uncertainty and failed. This was key to Miyamoto's approach: find the fun or fail fast.

Many other stakeholders attempt to avoid risk by coming up with ways to minimize uncertainty.  Often this is done through:
  • Detailed design documents that try to answer every design question upfront in an attempt to reduce scope uncertainty.
  • Comprehensive schedules that identify work to be done, in an effort to minimize schedule and cost uncertainty.
Embracing Risk

A detailed set of practices for embracing risk is too long to describe here, but is detailed in the next edition of "Agile Game Development."

Here is an overall view of how risk can be managed:

The steps are:
  • Identify risk. There are a number of activities that can identify a broad set of risks based on what you know is uncertain and what might surprise you.
  • Classify areas of uncertainty that we can plan for and other areas we can't. Knowing which can help us figure out how to handle them. For example, you can plan to port to a new platform, but can't really know if the game is fun until it is.
  • Prioritize. Risks have different levels of likelihood and impact on our game.  We need to sort those out and prioritize which we deal with first.
  • Find a root cause. Most risks have root causes, like purchasing that piece of middleware that was "supposed to be ported" by a specific date. By identifying root causes, we can come up with better ways of avoiding the impact of that risk, if it comes true.
  • Identify a trigger condition. What is the earliest we can know whether a risk has materialized? In the case of the undercooked middleware, we might have a trigger that says, "it fails to do so-and-so on our target by this date." Triggers should be testable and binary.
  • Create a mitigation strategy. What are you going to do if the risk triggers?  Are you going to buy the source code license to the middleware and port it yourself? Having a plan in place helps sell this approach and resolve risks.
  • Evaluate your triggers regularly. Make this a part of backlog refinement. If a risk is triggered, it'll probably change your next sprint, and the backlog refinement is where that is best handled.
Make Stakeholders Your Partners in Risk Management
One of the best tips I received from a former boss of mine was, "don't come to me with problems.  Come to me with solutions".

Sometimes the hardest thing to do is admit that you "don't know" to your boss or publisher. Going to them with a list of risks is even harder. That's why the mitigation strategy above described above is valuable. It has a solution associated with every prioritized risk. It might take a development cycle to prove the value, but I've found that with most well-developed risk management lists, at least 20% of them come true. Because those risks are triggered with enough time to solve them, the value becomes apparent.

A publisher once accused my risk management list as a CYA ("Cover Your Ass") document. I agreed with him that it was partly that, but added that it covered his as well since everyone has a boss that they answer to.

Where to Start
  • Brainstorm risk. A favorite practice of mine is called a "PreMortem" (see the GearUp book). Gather all potential risks and prioritize them through the mapping practice described above.
  • Come up with triggers and mitigation plans for the higher priority risks
  • Set aside a regular time to identify new threats, evaluate the triggers, and retire any risks that have been mitigated or have otherwise been resolved.
How to Master It
  • Document and share the risk mitigation plan with your stakeholders.  Involve them in the regular evaluations.
  • Reduce your existing planning practices to move away from "documenting away uncertainty." This is a "cultural security blanket" that may take a few development cycles to wean stakeholders off of.
Learn More
  • The second edition of "Agile Game Development". As mentioned, there will be a lot more about this approach in the next edition of the book, coming out in summer.
  • Gear Up, 2nd edition. Over 100 practices for team, game, and development improvements you can immediately implement. On Amazon and LeanPub.
  • Me. I teach courses on improving game development, including integrating debt management into your existing process. Visit and contact me.

Tuesday, February 18, 2020

Ending Video Game Death Marches - #1 Managing Debt

We’ve all experienced it: A sink stacked high with dirty dishes. You rarely have time or incentive to tackle them. Usually, you’re in a rush to do it before a visitor arrives or when you run out of clean dishes for your next meal.

Eventually, most of us learn that washing the dishes once a day (or at least throwing them in the dishwasher) is a better approach. It takes a little discipline to get into the habit…similar to flossing your teeth every day, but it’s for the better good.

The same principle applies to game development. We often let the crud in our games pile up. We call this crud debt. We often push off paying that debt until it’s an emergency, and that usually leads to crunch, and lots of crunch sucks.

Why Debt contributes to Crunch (getting stuck between a rock and a hard place)
Debt is unfinished work, whether it’s bugs that need fixing, stand-in art, or untuned mechanics. It’s called debt because, like financial debt, it has an interest rate whose payback grows over time. Debt piles up —usually inside some tracking software—where it stays, growing until some point in development, commonly called alpha (the “rock”), when teams dive in to address it. The problem is that alpha is close to a ship/deployment date that is fixed (the “hard place”). Teams quickly discover that there is not enough time to address all that debt, and management decides it’s time to crunch.
It Often Gets Worse 
Ironically, managers usually react to crunch, low quality, and missed deadlines by demanding more detailed task planning. This tends to squeeze out the slack that should be used to address emergent debt. That slack is essential; when was the last time you estimated the amount of work it took to fix a bug that hadn’t been found?

Setting Aside Time to Manage Debt Works, but it’s Not as Easy as You Might Think
It’s easy to see that if you have less debt, there’s less reason to crunch because of it. So what’s so hard about managing debt?
There are two significant reasons that we fail at managing debt. First, it seemly slows down development. Practices like unit testing, creating automated test tools, and making sure new features introduce minimal debt takes extra time upfront, but in the long run, it saves a lot of time. However, it’s often hard to convince a stakeholder that doing the extra work now is a lot less expensive than doing it months from now.
Second, we developers can be lazy. Yes, I said it. Managing debt, such as optimizing polygon usage, replacing stand-in audio, or refactoring code that has gotten “crufty” is a lot less fun than tossing in something new. Like washing your dishes daily, it’s less fun than making them dirty, but it’s a discipline that needs to be built up to avoid the piles in the sink.

Where to Start
Debt is best managed by setting aside time to eliminate levels of it. You can start by establishing an agreed-upon time that’s set aside to address debt. For example:
  • Every day, eliminate any problems that cause the game to crash or otherwise be unplayable.
  • Every Sprint, make sure the game is demo-able to internal stakeholders. This could mean that the frame rate is stable and fast, and players can have fun.
  • Every Release or milestone, make the game worthy of showing the outside world. It could be missing key features, like a marketing demo, but what is there is polished.
How to Master It
  • Build test automation. The Gear Up book describes several useful ways automation can help
  • Metrics. There’s a lot of debate about whether tools that measure code quality, like unit test coverage, help. I feel they do. If you can measure something useful, you can improve it, but beware of the metric becoming the goal.
  • Educate. Try pair programming. For example, hold a “Wednesday Pizza Talk” (Gear Up) to educate developers about practices to reduce debt.
Learn More

Monday, February 10, 2020

Six Signs your Game is in Trouble

We all want to make great games and not suffer making them, but sometimes it doesn't work out that way. Below is a list of 6 typical signs that the game you're working on is in trouble.

1. Your bug database is growing out of control
I'm not a fan of bug databases to begin with. They are often rugs to sweep dirt under, and that dirt gets more expensive to clean over time. All that debt has to be paid off, and it's often paid off with crunch and compromise. We should be spending time at the end fine-tuning the experience of the game, not making it barely shippable.

2. You Don't See the Big Picture
Often, especially on large games, individual developers don't understand the game they are working on. It's all in the head of some lead designer, who might even be in a different city. Without a shared vision, moving your game forward becomes like pulling a wagon with a hundred harnessed cats. Chaos ensues.

3. Gantt Charts
Detailed Gantt charts often just serve as project management theater. A complex, graphical chart can often placate a publisher. Not that it's terrible to think about the complexities of work and dependencies, but these artifacts, adopted from manufacturing, aren't well suited for creative work. For one, they don't lend themselves to change very well. The one thing I always look out for are Gantt charts that magically slope downward off in the future when some magical burst of productivity is forecasted, which brings us to the next sign.

4. Wishful Thinking
I'm an optimist, but it usually takes the form of "yes, I'm positive something will go wrong!" Projects should embrace risk. It's more important than task management. Problems do not solve themselves and that reassuring management reserve set aside for problems will probably be gone by the time it's needed. If the game is not fun and on track now, it's unlikely it will suddenly become so "someday."

5. Building Final Content Based on Unproven Technology
This is probably the most expensive one. Technical promises and their schedules are often not worth the electrons used to store them in the project management tool. Even console manufacturers are guilty of this (who remembers the "Emotion Engine"?). If you are creating final, shippable content using budgets (graphics, physics, AI, etc.) that are beyond what your current engine can do, it's a good sign you'll be redoing all that work, in crunch, again.

6. Management "Tells"
As with poker, there are some tells that managers show, which are signs that there is trouble.

  • Mandated crunch. Scheduled crunch often means more is coming. It's panic time.
  • Moving people between projects to speed things up. This always slows things down. You know why.
  • Sudden micromanagement. I'm all for management being engaged daily with developers and communicating more, but when it instantly ramps up, they're worried.

This might sound a bit negative, but game teams get into trouble all the time, and this list could be much more extensive. Identifying the signs early on is the first step in solving them.

Well-proven solutions to these problems exist, but they are not easy. They involve changing approaches to how we think of game development and stakeholders. We need to focus on the game first and the project second. That requires courage from leadership and developers.

Friday, January 24, 2020

Solving Large Team Dependencies

Simulated Annealing for a Travelling Salesman

We've all seen it. The larger a game team, the more dependencies between developers and teams emerge to slow development down to a crawl.

The problem of dependencies is a complex one. They are called NP-complete" problems, usually only solved by time-consuming brute force approaches. So forget about an easily managed solution.

The best approach has its analogy in computer science called "simulated annealing," a technique where you start with an approximate solution and add a bit of change from time to time and see if it improves the solution. The GiF above shows simulated annealing as applied to the classic Travelling Salesman Problem. Instead of thinking of cities (groups of dots), and paths between (lines), think of developers(dots), teams(groups) and dependencies (lines). Over time, as teams and individuals within teams experiment with ways to reduce dependencies, you see those inter-team dependencies reduce.

For large teams, those changes are often to team makeup and the formation of the Product Backlog. The goal isn't to eliminate inter-team dependencies, but to move as many as you can within the individual cross-functional teams (Scrum-sized, 5-9 developers). Within those teams, they build accountability and better practices to address debt and reduce the cost of dependencies.

By experimenting with more self-contained teams and organizing the Product Backlog to reflect those teams and to depend on fewer "future integrations" and build development practices, dependencies will slowly diminish over time.

Practices to Try

  • Instead of creating a detailed release plan every few months, just define the major epics and let teams reform around the epics they'll take on and refine the release plan themselves.
  • Identify dependent specialties in the release plan between each team instead of tasking out dependencies. Often this indicates where you don't have enough specialists or where there is an opportunity to cross-train someone and spread some skill.
  • Talk about them in retrospectives. Encourage the team to come up with solutions.
  • Measure inter-team dependencies. If you don't measure it, it's harder to improve it.
  • Find ways to visualize dependencies. Program boards that use strings to identify inter-team dependencies are useful. A board that looks like the one below should horrify anyone.

This is not a good "after" shot IMO

Dependencies on large games are a huge anchor that slows development down in a very opaque way. Your focus should be on those and less on tracking individual effort.