Dienstag, 19. November 2013

across the plains of software development

In this article, I want to introduce a graphical and symbolic notation that allows us to communicate about several phenomena in SCRUM. The notation is based on a metaphor for software development, which is fantasy role-playing games. The metaphor existed in my head for a while and doing my every-day work as a SCRUM master I referred to it frequently in my imagination. By writing down some aspects of agile software development using this metaphor, I wanted to test how far it could be stretched (it reached surprisingly far to my opinion). Also, I hope for a post with some entertaining value, for people familiar with both subjects.

Team = Party

Let's start with the team. A bunch of brave adventurers, travelling the vast plains of software development, facing danger at every corner. 



As you can see, the team members are not identical, because everybody has his specific skills. Think of database expert, tester and GUI designer as warrior, wizard and rogue. Note that even though they are different, they are equally important. This is a question of good game design. If a priest can do nothing than heal (so he is no more than a discount on health potions), everybody wants to play the wizard.

But what about experience? Clearly a senior database-programmer is more important than a newbie mobile expert? Well, let us see. You all know from D&D: as long as you are the only wizard or rogue at the table, you are important, no matter if you are level 3 or 21. If nobody else can break the evil witch curse, or pick that all important prison lock, they all rely in you.

If you have a group playing for years and your fighter has reached level 17, and now you have a new friend who wants to play a fighter as well, what do you do? He does a puny d6 damage while your old warrior deals 3d12 with his battle axe. If the team shows this to the new guy, he will soon lose interest and never reach the experience to become valuable. A master-padawan kind of relationship might work, but typically this only helps for a while. Especially in real-life where the new guy just spent 5 years or so studying and now wants to do some impact for real, not be the apprentice again. In my opinion the best answer is subclassing. Your old warrior has more of a tribal background (remember the axe), so the new guy can be a noble knight who knows his banners, history and even can read. What happened now is that you removed the generic fighter class from the game, introducing barbarians and knights as specializations. The same thing works for the team. You already have an extremely experienced backend-programmer but you need two, so in comes a new guy. Instead of having a junior backend-programmer for a long time, pick one aspect (or library, or subsystem or whatever) and have him specialize. Instead of a junior backend-programmer, you now have a logging-expert on the team. Cool. And as long as you are the only logging-expert...


Remark: we had some extend of discussions about specialization between the team members an how this should work. I want to stretch the conclusion once more: specialization of skills is desirable. Specialization of responsibilities is not. The entire team will take advantage of having somebody who really is an export on the logging system, and can step in if everybody else fails. But that still means the guy will pick every task not related to logging if the task needs to done right now. What we cannot use is an attitute of "I am not in charge of logging, let Thomas do this, I only do database". Every team member will pick every task that needs to be done, unless the task is so difficult he really cannot do it. And every team member is expected to build some basic skills in areas that are not his expertise, so that the logging expert can do, say, 90% of the database tasks. For the remaining 10% he can then call the database expert.


Back to our fantasy metaphor. Let us consider a typical party with one warrior, one wizard and two rangers. Suddenly, the wizard dies. You know, the dungeon master told a story of a tragic accident on the eternal plane of fear but we all know Thilo who plays the wizard has a new girl-friend and has better things to do now than play D&D with his friends. In the project that means that our GUI expert quit the company.


We now have a party with a warrior, two grass eating elven rangers and no wizard. What can we do? If we continue like this, we are dead as soon as we face the first undead that can only be harmed by magic. So we put an issue to the SCRUM boards. Now everybody is aware of the fact but it does not really help. We can escalate to management that we have a resource problem because Thilo quit. Management will probably understand that we are one resource short. Being good managers, they will sit down, think, play some politics, pull some strings and finally find another resource. Good one, too, a senior guy, good reputations and excellent bow shooting skills. 



Fuck! So how could this be avoided? In an ideal world, your boss knew D&D. He could tell the difference between ranged combat and melee fighting and knew the subtle differences of clerical and arcane magic. In reality, he probably played last time when DSA belonged to Schmidt Spiele. Also, even if he knew, he probably got 2 applications when making an external job offering and both guys can start in six months at the earliest, so he really had to call in some favors from some buddies to organize the one guy he gave to you.

Now it is up to you, and this means Legolas needs to pick up a wizard wand and become a junior magic user. By "becoming" I do not mean temporarily step in till Thilo is back but becoming a wizard for good. After working for years to become expert marksman this sucks but who said life is fair? And again: as long as you are the only magic-user ...

Still, there is one thing that went wrong here. In my opinion, it is the usage of the term "resource", when really you wanted to say "we are one Thilo short". By entitling people as "resources", one implies a generic, replaceable working force, which is a very poor image to have in your head when dealing with software development. As a solution, I propose banning the term "resource" when referring to human beings, and replacing it by the term "skill". You might wonder why not go all the way and call humans what they are, namely humans. But in the situation above, you would have complained to your boss "we are one human being short since Thilo quit" and consequently, he might have brought his 13 year old daughter the next day, who clearly is a wonderful human being. But in the end, the project is not missing Thilos winning personality and jokes, but the fact that nobody knows how to operate the damn database anymore. So it is "skill", not "human". In the example above, the boss would have understood "we are one skill short since Thilo quit" and immediately have asked "which skill?". Maybe, (but just maybe) the whole situation might have gone better.


Product Owner = Blind Oracle

As mentioned before, the party is travelling the vast planes of software development. And they do so for a reason, namely they want to reach the mystic castle in the clouds, called the product vision.


Because nobody knows where exactly this castle is, they were given an old man in a brown cloak, his eyes blind-folded. The old man is a mystic seer, an oracle, who can see the castle at the horizon. The old man is being dragged along on a donkey so how doesn't fall over a rock or get lost and instructs the party every evening about further directions on the map. Let us have a look at the picture.



With every sprint the party moves on, fighting through adventures and hazards getting ever closer to the castle. We draw an arrow on the map to mark their progress. Hereby, everybody has his role. The party must understand that the oracle is the only one who really sees the castle in the clouds. Without him, they would never reach it. The oracle must understand that except from his mystic skill, he is just a blind old man travelling with a bunch of veteran adventurers. They know how to cross rivers, deal with trolls on the way and avoid being robbed by hobgoblins. You might have guessed by now that the oracle is the product owner.

In the end, the party (including the product owner) are judged by how much progress they make, in direction of the castle.



Progress that goes in a different direction is lost. Progress that goes in a very different direction (more than 90° away from the castle) is even harmful. In this picture, the responsibilities are clear. The team is responsible for how long the individual arrows are, which is called velocity of the team. The product owner is responsible that they go in the right direction. 

Stakeholder = Pantheon

So what kind of strange mystic skill is seeing a castle in the clouds? How does the oracle produce this image? How does he know it is a mystic skill, and he is not just mad? How does the team know?

Let us stay in our fantasy setting. The oracle is a kind of priest, in touch with a number of deities, called stakeholder. They all follow their own scheme and plot for reasons that are not of us mortals to understand. But each of them has his own castle in the clouds and wants the adventurers to reach it. If he raises the anger of the gods, the oracle is in trouble. So first he finds out what his gods want and makes a mental map of this (being blind it is the only map available to him, poor bastard). Obviously, he cannot reach all of the castles at the same time. So he then ranks the different deities for how important they are. If you anger a god, at least you make sure it is minor one. So how does this look like on a map?


Our mystic decides to guide our group to the red location, deeming that this is the spot that will get him into least trouble.

However, gods are not equally important. Particularly, every priest swears his allegiance to one god. In the real world, it is the one with the budget, called the project sponsor. Reaching his goal is required. In a sense, everything else is optional. This sounds as if the oracle really has very limited freedom of where to place the castle. In reality, this depends on how clear the cloud castle of the project sponsor is. Some sponsors have crystal clear ideas of where they want the team with very well defined boundaries. They pick product owners only because they do not have the time to travel with the team themselves. But many sponsors produce very blurry images only, more clouds than castle.





In the second case, the product owner has a corridor in which to travel. He can construct his own castle in the clouds.

Now let us consider a more complex situation and discuss what a wise oracle would do.  


Phil ,the sponsor, gave a vision to the product owner in form of a EU grant, mostly buzzwords and very vague ideas (a) in the direction of "detailed virtual characters". Tom, a stakeholder would have preferred to move in the direction of "massive worlds", where he has a very clear idea for a cool application (b), but his grant was rejected so he did not become sponsor. Swen has a project in the direction of virtual characters as well with focus on facial expression (c). It is in the corridor given by the EU grant of the sponsor and a deadline is approaching. Jörg has a project also in "virtual characters", but his focus is on physically plausible animation (d). His idea might just be in the corridor given by the EU grant, but is quite different from the idea of Swen.

So what should the product owner do?

a) First of all, whatever he does, he should stay within the corridor. As long as he has Phil on his side, he is protected by a higher being if he gets into trouble.

b) Toms vision cannot be followed. Tom should be informed about this, maybe this will cool his anger. To be sure, the situation should be escalated to Phil, so he can protect the product owner should Tom be angry at him later and try to harass him. Apart from this, he has several possibilities.



One option is to move to Swen's castle first and make sure he hits the deadline. After this, he changes the objective and moves in a direction close to Jörgs castle. This way, both Swen and Jörg will be happy with him. However, Phil will judge him by how much progress he made in direction of his castle, and the detour reduces this progress.


So a different option would be to construct a castle somewhere between the idea of Swen and Jörg and move straight at it. Both Swen and Jörg will be less happy with this, but progress in Phils direction will be greater.

Both are valid plans and it is those decisions the product owner must make.

Issues = Encounters

What would adventuring be without encounters? The oracle told the party to go east, across the great river. The bridge is broken, the river has high tide, the ferry is captured by pirates and a druid protects the trees that could be used to build a raft. A normal day in the life of an adventurer.

Now the party discusses what to do: kill the pirates, deal with the druid or go search another bridge. They do so in the sprint planning meeting and in the dailies, it is what they are used to do, it is what they are good at. While they discuss this, the oracle is present some of the time, sitting on his donkey. Involving him might be a good idea at times. Maybe he has an idea everybody else missed (did we not get a treaty with the pirates last year?). When the solution involves diverting from the path, he should be consulted in every case. He can contribute information like: while we should try to get across the river, we are slightly too much north, best search a bridge further south.

But problems can arise as well. Many oracles were capable adventurers when they were younger and might start to smart-ass. "Just kill the pirates", they might say. "They are pussies and they always run. In my days, we used to slaughter pirates by the hundreds.". In this case, the oracle should kindly be reminded to sit on his donkey and shut up because a) back in his days pirates did not carry rapid-fire crossbows, b) those pirates number several dozens and have a swamp-troll with them (which he did not see because he is blind). He might not give in "you are cowards" he might say. "I could kill the pirates all by myself". But in fact, he could not. Everybody knows it, and mostly the oracle himself proved it by hiring the adventurers. If he could travel alone, he would long have done it. At this point, the SCRUM master might step in and comfort the oracle who probably just misses adventuring himself.

So normally it is the job of the team to solve issues. Sometimes, the best solution involves a small detour. In this case, they should consult the product owner. Sometimes he might come with a solution like: if we need to include this gesture library anyways, let us put up the following five user stories for gestures. Then Jörg can report the software for his project deadline and we make extra points.   

The map now looks like this. Originally, the oracle wanted to travel straight at the castle. But with the river blocked, they moved south to find another bridge. So the oracle decided that from where they stand now, going all the way to Jörg's castle is easy extra credits so they go there. This process is called re-planning, and is built at the very core of SCRUM. In fact, it is the name-giving feature of agile methods that stands them apart from classical project management.

Let us keep in mind that the oracle is blind, so he does not see the river, nor the pirates, nor the druid. The adventurers on the other hand have no connection to the realm of the arcane, they only know about castles in the sky because the oracle keeps talking about them. It is a situation that requires mutual trust. Lots of it.

So how can the adventurers misuse the trust of the oracle? If they move close to a village they know, they could invent or exaggerate some issue to manipulate the group into moving there so they can spent a night with their sweetheart and or have a drink at the tavern. In the real world, this corresponds to exaggerating issues with some library the programmers dislike, or over-emphasising issues that could potentially be solved by some technology they wanted to test for a long time for personal reasons. Even more so, it applies to spending significant time watching youtube at working hours or complaining about issues while in truth, progress is small because they came in late and made long launch breaks. The oracle might suspect that this is happening but there really is now way he can tell. For situations like this, there is a SCRUM master.

And how could the oracle misuse the trust of the adventurers? Well, obviously he could not be an oracle but just a drunk old asshole on a donkey. His vision might be unclear and fuzzy and he might be unsure himself of where the travel should lead. While this is unfortunate, it really betrays the party only when he makes up reasons of higher politics of why he keeps changing his mind. Again, there is no way they can tell and if they really suspect their oracle might be a fraud, they should consult the SCRUM master.

A second thing that might happen is that the oracle could be a evil god in disguise (a kind of camouflaged stakeholder). In the real work, this is the case of the product owner has own interests in the product that divert from the stated mission statement of the sponsor. This situation is both dangerous and hard to indentify. The entire team might get into serious trouble after the project when the question arises "I send you on a mission to build a browser-based 3D editor, so why the fuck do you deliver me a cross-platform word processor?". Once the situation has been identified by the SCRUM master, there is no way around exchanging the product owner. If he openly communicated that he doubles as a stakeholder from the beginning, there might have been a chance to deal with the situation. But once he lied about his double-role by keeping this information to himself, he needs to be replaced without merci. 

Problem Solving = Dungeons

So far, we have considered only overland adventures where the party travels on a landscape which, apart from an occasional river or mountain range, they are free to go where ever they want. In the real world, this corresponds to implementing features on a working platform. We go in small steps from one working version of the system to the next, and discussion is mostly about what makes sense for a user, less so on how to get it going at all. It is the ideal of SCRUM.

Sometimes, we all know, the project is not at all like this. We are trapped in a labyrinth of issues, have five branches but neither of them compiles and still we do not know which of the five Bluetooth stack libraries that we created the branches for is the best way to go.

In the fantasy world, this corresponds to dungeoneering. Underground, in the dark, there are no maps and no directions. You never know how far you are from the exit before you actually step outside and you never know what monsters lurk behind the next bend. When a party is dungeoneering, this has several consequences. First of all, the oracle is no more help, because knowing that the castle is east does not tell you if the left bend or right bend is right in the labyrinth. Second, the party never knows how far the exit is before they actually reach it. Both are good reasons not to go into dungeons at all.


So what did get us into the underdark in the first place? Basically, there are three reasons. First, the adventure might start in a dungeon. In the real world, this means that for external reasons (the customer put up some constraints or the EU grant tells us to), we have to use technology we are unfamiliar with and need to integrate and master it. In this case, dungeoneering cannot be avoided. In order to succeed, we need to deal with the issues involved. Hereby, we should take care of three points.

First, we shoud take care of what we call the "user story zero" at the DFKI. When starting underground, we always implement a user story first, which is defined to integrate all involved technologies in a "hello world" application, which is build by a fully automated, functional build system. Only after this zero-story is finished, actual user-value providing stories are tackled, which are counted starting with one.

Note that in a typical fantasy setting, the party would travel overland to reach a dungeon, then go underdark to kill some monsters. This is the case, because from a story telling perspective, you want the biggest risk at the end of the story. In the real world, we put every risk we can not avoid at all at the beginning of the project, so at this point the nice analogy between agile software development and fantasy adventures breaks down somewhat.

Second, we need to make sure the product owner does not get too frustrated. As stated before, he is virtually useless underground and needs to somehow deal with it. The solution is written down easily: he just needs to sit tight, relax and wait. But as we all know, he is under a lot of pressure, so he will inevitably become more and more frustrated as the user story zero fails to assemble. Get him some coffee or liquor, take him to the cinema or buy him a cat, whatever helps. He really just needs to cope. The last point is that we might be in a situation where the whole project period is not long enough to get out of the dungeon, or we might get out too late to reach an acceptable state in the end. So bearing this in mind, the product owner and SCRUM master need to keep an eye on this together and estimate a latest possible date that we need user story zero to be completed so the project does not fail horribly. If this date passes with issues unsolved, the situation needs to be escalated to management consequently and with no delay. But except from this, neither SCRUM master nor product owner should get involved, after all adventurers are expert at an good old-school dungeon crawl.

There are two more reasons to go dungeoneering. One is, the product owner and team decided to do so together. Think about Moria. Caradhras turned out to be impassable, the southern way took the fellowship far too close to Isengard, but still they needed to get passed the Misty Mountains. So even though Gandalf knew about the dangers in the deep and was rightfully afraid of them, they decided to take their chances with the long dark. In the real world the situation corresponds to exchanging the GUI library in mid-project or starting a big reengineering that cannot easily be subdivided. It is not something you want to do, but sometimes all alternatives look worse. If the team takes this way, all the team members should be involved in the decision, as well as product owner and SCRUM master. Afterwards, the situation is basically the same as for "user story zero", except that there is one more option available all of the time, namely leaving the dungeon and continuing the long road south, dealing with Saruman. In the real world, this would mean dealing with the shortcomings of your old GUI library and implementing the missing widget manually. Note that in order to keep this way open, dungeoneering should always happen on a separate source branch, which is a good idea anyways. The get-the-hell-out-of-here option should be discussed at every sprint planning meeting while the party is underground.

There is a third reason to go into a dungeon. It is the ugly thing that might happen now and then, it is called side-questing, and can happen intentionally or accidentally. How does it occur? The party might have dealt with a typical every-day issue, say a bunch of goblins in a hut. As they explore the hut, they find a cellar. Looking for some easy loot, they search the cellar, which turns out to be surprisingly large. In the fourth room, they find a tunnel. And somehow, they miss the point where they should have realized that the thing is a damn full-fledged dungeon and clearing it out entirely is not really required to deal with the goblins. Note the subtle difference here between dealing with an encounter (solving issues and making the required every-day decisions to do so) and going into a dungeon. The first should be done by the team and they need not consult the product owner to do so. The difference between cleaning out a cellar and exploring a dungeon is not a qualitative one, it is a question of how deep they dig in and how much time is spend underground. As a rule of thumb, everything that can be passed in the duration of one sprint is not a dungeon but an encounter. If going down a tunnel means zero story points for an entire sprint or more, it is a dungeon.

In the real world of software development, side-questing typically occurs either when a refactoring gets out of hand (you know: one change leading to the next) or when a team member briefly wants to evaluate some technology and then somehow gets caught in it. The thing is, the situation comes in a goodwill variety, in which the adventurers really did not notice the danger, and a smelly one, in which they noticed, but driven by curiosity, adventure-lust or hoping for loot they continued anyway, ignoring the complaints of the oracle. In either way, the solution is simple. They need to quit the dungeon right away, no discussion and no regret. The SCRUM master might help with this decision and force them out if need be.

Fishy Technology = Swamps

Besides encounters and dungeons, there is another kind of obstacle, which I want to call a swamp. In the fantary world, we pretty much know what swamps are, namely reduced movement rate and tons of encounters, typically at high combat level and low loot. So adventurers move the long way around a swamp whenever they can.

In the real world of software development, swamps are what I call fishy technology. It can be a combination of nonfunctional requirements (high troughput plus mobile) or technologies (Bluetooth plus Windows) or even an entire algorithmic approach (maybe genetic algorithms) that just stinks like trouble. 

The oracle has a bad feeling because he has been in the swamp in the past and made bad experiences or knows somebody who did or read about it. So he would like to silently maneuver the adventurers around the swamp. Sounds like a good idea? Nope. For the same reasons the adventurers should not make the decision to go into a dungeon by themselves, the oracle should not decide to move around a swamp. He will inevitably get in the way of the team who try to solve encounters. Worse, traveling around a swamp implies constantly changing direction which will confuse the team and leads to mutual mistrust. So instead of silently trying to outmaneuver the swamp, he should share the information that he suspects a swamp with the team. The decision to pass through or circle around can then be taken together. After all, maybe a team member is an expert at Bluetooth technologie and is convinces he can safely guide the party trough. If the decision is made to go the long way, the party at least knows why they walk in circles.

Obviously, the same approach is taken if somebody other than the oracle smells a swamp. Just team members are typically aware that they have to share this information, so the temptation so take the silent road is bigger for the product owner.

The End = Release Party

Having successfully managed their way around countless dangers and encounters, crossed rivers and avoided swamps and dungeons, the party finally reaches the mystic castle in the clouds. They enjoy a drink at the tavern and some serious XP (level-up!). Only in the morning, they are waked by a new oracle who tells them about another castle, even higher in the clouds and even more beautifull. But this is another release and the story should be told another time.

I hope you enjoyed reading the small article on the analogy of pen & paper role playing game and SCRUM, an analogy that carried surprisingly far.