The software industry is known for high failure rates, late delivery and low quality – but still produces products used by millions of humans. At GoRetro, we have helped thousands of agile teams to conduct tens of thousands of retro meetings in recent years. We were puzzled, and wondered if there are key factors that enable Agile teams to achieve high results in software and value delivery. Therefore, we set out on a journey to interview various Agile teams in order to learn what makes them successful. In addition, our quest happened during the coronavirus pandemic, which took its toll on our value delivery. In the age of COVID-19, many of us have transformed overnight from office workers to telecommuters. After roughly two years of remote work amid the COVID-19 pandemic, more and more companies are opting for the hybrid work model.
One thing that stood out to us was the way successful Agile teams incorporated data into retrospective meetings. It made perfect sense when we zoomed out: Would you plan a trip without checking the weather? Would you buy something without reading reviews? Is it logical to do your work without data? No, it isn’t.
Best-in-class Scrum teams incorporate objective data into their retrospectives, eliminating guesswork. A similar situation can be found in the investing field. Data is incorporated into the decision-making process of the best investors in the world or as one of these investors famously said:
"The more you learn the more you earn"
- Warren Buffet
There are many metrics and KPIs that measure the quality, velocity and output of an Agile based team, but we see the burndown chart as the most fundamental of the objective measurements.
Burndown chart recap and its data gap
Generally speaking, the burndown chart should consist of:
- X axis to display working days
- Y axis to display remaining effort; different companies use different attributes on the Y axis (story points, time units, number of tasks, etc.)
- Ideal effort as a guideline
- Real progress of effort
The burndown chart can be used both during a sprint iteration to track the development progress in a sprint, and after the sprint to align the Product Owner's expectations with what lies ahead.
However, the simple burndown chart one can usually find in platforms like Jira is not enough if process gaps need to be identified. In this case, two lines are not enough as they simply display a summary of work for all team members, and the gaps need to be identified on a task board.
Usually it is unclear if a team is too late or somebody added additional work. Especially in the case that an equal amount of work has been added and completed at the same time, there will be no indication of progress.
In this situation, viewing the total size of the sprint backlog should be helpful. Any change in the total size can provide a clear explanation for the actual events.
GoRetro's burndown added multiple indicators above the progression to provide a clear view on what happened during the sprint: New added items (a.k.a unplanned work), removed work, completed work, issues carried over from the previous sprint and even a drill down of the items and changes in each specific day.
However, the burndown chart can quickly become a riddle that makes it difficult to grasp. We’ve seen many certified Scrum Masters unable to correctly explain the situations described by these charts – let alone development teams.
Burndown charts are an easy way to gain visibility on the team’s path, but they should be read and understood correctly without premature conclusions. By seeing the path ahead we can ship our product to market as soon as possible, without losing our competitive advantage.
Understanding a burndown chart requires spatial gymnastics from your brain
Graphs and charts exist in a visual space, and they depend on our visual system to extract relationships from 2D information. The information extracted relates to the visual relationship between the variables in the 2D system. Reading a burndown chart is not as straightforward as it may seem. To interpret a graph with three bars, the viewer must consider multiple pieces of information.
If the public does not understand the logarithmic graphs used to portray COVID-19, how can we expect the dev team to easily understand the burndown chart?
Analyzing burndown charts?
As mentioned above, there are many factors that affect the trendline of a burndown chart. Is there anything we can do to make the retrospective meeting more pleasant while clearing the confusion and starting a discussion on our blockers?
In the following sections, we will examine a number of patterns that can help you quickly assess the situation and meet our objectives. Lastly, we will present our new, automatic method for analyzing the situation.
Burndown chart patterns
The burndown chart can teach us about what happened during the sprint, and looking at common patterns can help us decide what we can do in order to improve our workflow. Below are some of the common patterns we see dev teams experience and some suggestions for what could have gone wrong and what can be done about it.
The "perfect" burndown
How to recognize
The above example shows a pattern for a team that managed to plan well and meet its planning goals. The daily work capacity followed the expected progress trendline. During the sprint, there were no unplanned/unexpected tasks, and the team succeeded in completing all planned tasks.
Such a diagram indicates a great team that’s able to organize itself. It indicates a great product owner who understands the reason for a locked sprint backlog and a great Scrum Master who is capable of helping the team. The team is not over-committing and has finished the sprint backlog on time. The team is also able to estimate capacity correctly.
Possible actions to take
Some people believe that it is fine to take no corrective action in this situation, while others contend that 10%--20% more work should be planned for the next sprint. By planning more, we can make sure there is no capacity “left on the table” and we’d be happy to see a 80%--90% completion instead of 100%.
The “Slow start” Team
How to recognize
In the above example you can see that the team made almost no progress until mid sprint, then towards the second week the team managed to burn most of the plan and make it to ~75% completion. This kind of pattern can occur due to several reasons:
- The team faced some challenges in the first part of the sprint, but managed itself well enough to overcome them and complete most of the work by the end of the sprint.
- The team did not break the stories/tasks into smaller sub-tasks, and therefore the work was less linear. When we focus on one big task, we can lose visibility of the overall progress (along with many other disadvantages), resulting in a nonlinear burndown chart.
- Team capacity was lower than planned. Another reason for a slow start can be fewer resources than originally planned in the first week of the sprint – for instance, unplanned PTO or unexpected quarantine.
Possible actions to take
As part of the retrospective, the team should discuss the reasons for the slow progress in the first half of the sprint and resolve issues to ensure that they do better in the next one. The team should also assess how much they can accomplish during the sprint.
In case of unexpected complexity, try to look at why this happens. In our experience, the answers can usually be found in these areas:
- Technical design: Tech design is an important part of the planning. Some teams tend to neglect it or not give it the right amount of attention.
- Last minute changes: Scope might change towards the beginning of the sprint and teams tend to look for shortcuts in order to start the sprint on time instead of “wasting time” on what they might consider redundant overhead and processes. As a result, the team can lose the handshake it had with the product and start the sprint when things are only half-baked.
- Wrong estimation: Sometimes we just don’t know what might go wrong and the team can’t anticipate how complex a task is going to be. In such cases the best thing to do would be to try and learn from it for the future and share this knowledge with the rest of the team.
- Not breaking stories into subtask: As trivial as it sounds, most teams don’t do it, or do it wrong. Breaking tasks into subtasks is an art. You should probably invest resources and time in doing that and getting your senior engineers to train the more junior team members on how it should be done. Breaking down tasks doesn’t only mean the task itself; one example is to add a subtask for QAing the task – this way we can measure both the work and the QA efforts. We’d recommend this as one of the first and most important steps a team should take to improve.
Breaking down tasks into smaller chunks helps in so many ways:
- Improve estimation accuracy: It's easier to estimate smaller tasks. By breaking tasks down into smaller subtasks, the team can estimate its work better.
- Improve predictability: Once estimations are more accurate, the team becomes more predictable and alignment and communication also improve.
- Improve visibility: The sprint and/or project progress become much easier to visualize, more granular, and cases like the above burndown will not happen.
- Improve efficiency: Breaking tasks down into smaller subtasks reduces cycle time (the time it takes an issue to be completed from start to finish). Reducing cycle time increases quality and efficiency.
- Improve team motivation: With shorter cycle times, the team can accomplish more. The feeling of completion on a daily basis has been proven to improve team motivation.
Capacity change: 90% of the time, this should be known and planning should include a PTO or other absence of a team member. Some teams also add a buffer for such unexpected cases.
The “Bump in the Road” Sprint
How to recognize
Most of the iteration the team work capacity was above the trendline. In this specific example, many unplanned issues were added to the sprint after the iteration had started. This is usually an example of planning that was not ready on time, and more work that was added at the last minute.
Possible actions to take
The first thing to check is why planning was not ready on time. Once we understand this we can approach the possible responses:
- If it was a one time thing that happened due to last minute changes, it might not be the right time to rock the boat. If it happens more than 1-2 times, you better look into your planning process: make sure the process has an owner, that it happens on time, and try to identify bottlenecks.
- Another option is that the product depends on external variables which don’t leave us enough time to plan ahead. In this scenario, reaching the beginning of the sprint is a constraint we must face. We can treat new work that was added after the sprint started as planned work. This is exactly why we at GoRertro built the Grace Planning Period mechanism, which allows you to define a grace period of X hours in which new work will be counted as planned work.
- If we have no choice and these new unplanned tasks are high priority, it is usually suggested to move some lower priority items from the sprint backlog to the next sprint or back to the product backlog.
The Easy Sprint
How to recognize
In the above example you can see a sprint that started as planned, then after a week the team completed more than planned. This can happen due to an overestimated task, a bunch of removed issues or a scope change that simplified some issues.
Towards the last week of the sprint you can see that the team nearly completed all the work and some new work was added – obviously we don’t want our team to sit and wait, right? Seems like the team also managed to complete the new unplanned work.
You can also see that the sprint was unusual in terms of working days: there is a long holiday in the middle where no progress is expected (the gray areas); indeed, this was during New Year’s break.
There is either too little commitment by the team or not enough work from the Product Owner for the sprint.
Additionally, an overestimation of complexity could result in a sprint completion earlier than intended.
Possible actions to take
The above can definitely happen due to wrong expectations of the capacity of the team in times of holidays. You can even see progress during the holiday, which means some of the team actually worked even though planning didn’t expect it. If this is the case, we’d recommend to align expectations regarding PTOs and other plans team members may have, in order to understand how much free capacity we are going to have in the sprint. This works both ways of course; usually the challenges we see are unexpected leave and not unexpected work :)
If the above repeats, it's better to reexamine the planning and try to carefully add more capacity to the team. The goal is for the team to reach a completion rate of 80%--90% of the work.
The Never Ending Sprint
How to recognize
The above example may seem like an extreme case, but engineering is the land of the unexpected, and you’d be surprised to learn how many times we see such patterns.
You can see that 3 days after the sprint started there was an estimation change; more time estimate was added, hence the bar becoming higher even though no new unplanned work was added. Then after a few more days with uncompleted work, another estimation changed. After about one week of no real progress, some new unplanned work was added (the red bars), and one week later, when the team finally managed to complete some tasks, more new unplanned work was added.
The sprint started with ~150 estimated hours and although the team worked and completed some of the work, it ended with ~240 hours.
This pattern can occur due to the lack of status updates in Jira, or new work that was added which is similar to the amount of work that was already done, so the total amount of uncompleted work remains the same.
Possible actions to take
The above is a combination of several factors and is probably related to more than one issue, including but not limited to:
- Planning: Usually when estimations change more than once, it can be a sign of a lack of proper planning. This can happen due to vague scope from product, or too high level tech design that didn’t take every scenario into account .
- Unplanned: Sometimes unplanned is inevitable, especially when working closely with important customers who need it “here and now.” But even when reality means we know we are going to have more work mid sprint, it is probably best to plan less and leave enough of a buffer for the unplanned.
- The above represents a 3 weeks sprint. As uncertainty increases uncertainty (in this example, more and more unplanned work coming in mid-sprint), it might be worth considering shorter, 2-week sprints for planning shorter iterations and reducing the amount of unplanned work.
“To the Moon” Sprint
How to recognize
This is what a typical first sprint looks like. It should not be seen as a failure, but rather as a good source of knowledge and understanding.
There are many reasons why our sprint goals aren't being met. For example, adding new stories to the sprint backlog every day without tracking progress. There's also the possibility that tasks were re-estimated too frequently during the sprint.
In the above example you can see that the sprint started pretty much as expected, but after 3 days most of the estimations changed and the total planned work jumped from ~100 estimated hours to ~230 estimated hours. Obviously this kind of aggressive change didn’t leave the team any chance to complete the sprint on time.
We can also see that most of the work is carried over from the previous sprint (dark purple), which makes the problem even worse. It might mean that the team knew about this work because it was already planned for the previous sprint, meaning they should have had enough time to deep dive into how much time it should take to complete.
Possible actions to take
The above is a combination of several factors and is probably related to more than one issue, including but not limited to:
- Planning: Usually when estimations change more than once, it can be a sign of a lack of proper planning. This can happen due to vague scope from product or too high level tech design that didn’t take into account all scenarios.
- Unplanned: Sometimes unplanned is inevitable, especially when working closely with important customers who need it “here and now.” But even when reality means we know we are going to have more work mid sprint, it is probably best to plan less and leave enough of a buffer for the unplanned.
- Reevaluating and rearranging the sprint backlog as soon as possible. This can happen due to vague scope of the product: lack of detailed requirements can cause lack of expectations regarding what the requested feature really is, which can lead to changes (and frustration) during the sprint.
- High level tech design that didn’t take into account important cases: There is always the tradeoff between a detailed tech design that covers every possible edge scenario to starting the task with no tech design altogether. Estimation change of 150% usually means the planning should have been done better.
The “Illusion” Sprint
How to recognize
In the above example, things seem to work well, the chart is burned throughout the sprint and everybody is happy. But a deeper look can tell us that more of the “burned issues” were simply removed from the sprint and not completed. :(
This is a great example of why drilling down to what is really behind the chart is important.
We often see many issues being removed in the last 2 days of the sprint, which means the team didn’t manage to complete most tasks and clean the sprint by pushing the remaining work to the next sprint
Possible actions to take
Drill down to understand which tasks were removed and why.
A high percentage of removed work can occur due to requirements change from the users, priority change or unexpected capacity change, e.g., team members got sick.
The “Late Planning” Sprint
How to recognize
In the above example you can see that the sprint started out with ~60 SP (story points), then on the second day about 50 SP were added to the sprint (the red bar). This is usually an example of planning that was not ready on time, and added more work at the last minute.
Possible actions to take
The first thing to check is why planning was not ready on time. Once we understand this we can approach the possible responses:
- If it was a one time thing that happened due to last minute changes, it might not be the right time to rock the boat. If it repeats more than 1-2 times, you better look into your planning process: make sure the process has an owner, that it happens on time, and try to identify bottlenecks.
- Another option is that the product depends on external variables which don’t leave us enough time to plan ahead. In this scenario, reaching the beginning of the sprint is a constraint we must face. If that’s the case, we can treat new work that was added after the sprint started as planned work. This is exactly why we at GoRetro built the Grace Planning Period mechanism, which allows you to define a grace period of X hours in which new work will be counted as planned work.
Danger Zone
Previously, we discussed the correlation between burndown charts and sprint health. However, you cannot overlook the macro level. An impressive burndown chart does not necessarily suggest a great team. Look at the bigger picture, judge the team based on consecutive sprints and don’t try to boost the velocity KPI for no good reason.
Management may ask a team to simply increase velocity and 'go faster' after reviewing a burndown chart. A team under such pressure might experience painful side effects such as 'estimate inflation' and moral issues.
The burndown chart goal is to help us see into the progress and events throughout the sprint, it should help the team understand what happened and where we have a broken process, and from there the team should focus on the outcome and not on the output.
Epilogue
Our work environment is littered with alerts and noise. The new standard of a hybrid work environment does not make it easier. We need to align our team quickly and in a remote meeting; and that’s not an easy task, and now it got even harder. There are tricks managers should know about how to solve burndown issues however, you need to be able to track it in the first place. We need to decipher a chart. That’s why GoRetro is releasing an automatic way to analyze the burndown chart
It will help you to have an honest retro, focus on improving your team’s performance, and not on collecting data. Level the playing field for all Scrum teams.
We are on a mission to give you superpowers. We believe you deserve them. Join us!