Trying to understand why the software industry is so inefficient and ineffective #
Note: This is going to be a long post! Please note that if you don’t have time (or don’t fancy) to read this much, the contents of this post are also available as a more concise slide deck.
1. The software industry is inefficient and ineffective. #
“Efficiency is doing things right; effectiveness is doing the right things” – Peter Drucker.
The Standish Group’s CHAOS Report is probably the most extensive and longest-running research study in the software industry. The CHAOS Report examined thousands of software projects over three decades and found some very disappointing performance metrics:
Over the last three decades, the success rate of software projects has never exceeded 38%.
The success rate of software projects increased between 1994 and 2015 but has decreased since then.
The problems go beyond project failures. Other reports put the spotlight on other equally depressing metrics:
- In 2018, Stripe estimated that there is ~$300 billion of Global GDP loss annually due to developer inefficiency.
As part of my job, I interact with many developers and technology businesses daily, and sadly, I believe the CHAOS report is accurate. What I have witnessed across our industry is mostly a widespread estate of despair. The following list contains some of the most common problems:
As a developer: #
- Feeling unproductive
- Low work-life-balance
- Low autonomy
- Failing to see the impact of work
- Meaningless meetings
- Mundane, repetitive tasks
- Unrealistic arbitrary deadlines
- Constant interruptions
- Dealing with site reliability issues
- Working around technical debt
- Changing priorities
- Context switching
- Cognitive overload
- Slow feedback loops
- Lost trust in management
- Lost trust in other departments
As a business: #
- Unable to compete
- Struggling to retain customers
- Struggling to gain new customers
- Increasing customer acquisition costs
- Struggling to attract talent
- Struggling to retain talent
- Lacking innovation
- Lacking agility
- Low employee morale
- Business divisions are hostile towards each other
- Lost trust in development
2. How did we get here? #
“Risk comes from not knowing what you are doing” – Warren Buffett.
We have been building software products for a few decades now. I would expect our industry to have an excellent understanding of what is required to achieve high levels of efficiency and effectiveness by now. However, we are not there yet.
We have managed to identify and document the main risks involved in developing software products. We have developed Software Development Methodologies (SDMs) and other principles that allow us to mitigate some of these risks. So, how is it possible that after decades of progress, we have failed to improve the rate of success of software projects significantly? To attend to answer this question, we first need to take a little trip through the history of SDMs and other principles.
2.1 Waterfall (1956-1995?) #
The Waterfall is an SDM that uses a linear approach in which each phase must be completed before the next one can begin. The name “waterfall” comes from the idea of representing each stage as a “water dam”. As we progress, the dam is filling. Only when the stage is complete can the water overflow and starts to fall into a “lower dam”, representing the progress in the next stage.
Developing software without any plan is almost certainly a recipe for chaos. The waterfall SDM was one of the earliest attempts to prevent projects from falling into the abyss in which chaos resides.
The good thing about Waterfall is undoubtedly better than chaos.
The bad things are that it has a very rigid structure. It does a poor job managing uncertainty and leads to low customer and stakeholder engagement, infrequent & deferred testing, and scope creep.
Like in finance, in software, “Risk comes from not knowing what you are doing”. We create SDMs in the first place to “mitigate risks” or, in other words, “eliminate unknowns”. Since Waterfall does a poor job managing uncertainty, it is no surprise that Waterfall is today recognised as a recipe for failure in software projects.
2.2 The Agile Manifesto (2001) #
After many years of failed waterfall software projects, 17 renowned software developers met at a resort in Snowbird, Utah, to discuss lightweight development methods. Together they published the Manifesto for Agile Software Development.
Based on their combined experience of developing software and helping others do that, the authors of The Agile Manifesto declared that they valued:
- Individuals and interactions over processes and tools
- Working software over comprehensive documentation
- Customer collaboration over contract negotiation
- Responding to change over following a plan
That is to say, while both sides have value and the items on the right should be considered, the authors felt that the items on the left should have more influence on how people approach their work.
The Agile Manifesto is not a methodology. It is just a set of principles, but it significantly impacted the industry and popularised the use of Agile methodologies, such as Scrum and Kanban, which are now widely adopted by software development organisations worldwide.
2.3 Scrum (1995) #
Scrum is an Agile SDM that presents organisations with a prescript process. The organisation that governs Scrum did an excellent job documenting the process and facilitating its mainstream adoption through certification programmes.
Scrum quickly became the most adopted Agile SDM. Process changes in larger organisations are usually riskier, but Scrum provided executives with enough resources to mitigate some of their fears. Some of the leading businesses in the software industry adopted Scrum, and their influence quickly pushed smaller players to follow.
Scrum can be summarised as follows:
Scrum proposes dividing the delivery of software projects into smaller iterations known as Sprints.
The sprints are usually 2 to 4 weeks long. There is a repository of work items known as the Product backlog.
The backlog is prioritised by a team member known as the Product owner.
The product owner is a team member who profoundly understands the business and the product.
At the beginning of the Sprint, there is a Sprint planning session to determine the Sprint’s goal and scope. The work items that become part of the Sprint are moved from the Product Backlog into the Sprint backlog.
Every day, there is a meeting known as the Daily Standup in which the team members ensure that work can continue during the Sprint as planned. Resolving any impediments is prioritised during the Daily Standup.
At the end of the Sprint, the completed work (known as product increment) is released, and there is a meeting known as the Sprint retrospective that aims to identify ways to improve how the team operates over time.
The best thing about Scrum is that it facilitated the mainstream adoption of agile and managed to “kill” Waterfall. Scrum helped businesses embrace the idea of planning being something that adheres to the “law of diminishing returns”. At first, planning seems to improve things, but there is a point at which investing more in planning fails to provide any meaningful returns.
Some of the worst things about Scrum include the following:
Whether we like it or not, the reality of building software products is that there will always be a high level of unknowns that cannot be resolved by planning or estimating. The only way to resolve these unknowns is through discovery or experimentation (e.g. development of a prototype). While Scrum is highly prescript, it fails to establish a formal “discovery” phase. Scrum also failed to encourage user-centric design explicitly.
Some of Scrum’s metrics, such as burn-down charts and rules (such as the time-boxed nature of the Sprints), encourage organisations to emphasise outputs over outcomes subconsciously, leading to decreased quality.
The combination of Scrum’s emphasis time-boxes (Sprints) and estimates and the historical background that preceded it (Waterfall) made Scrum too easy to bastardise into mini-waterfall iterations by the executive teams in many organisations.
2.4 Lean UX (2008) #
As we have already mentioned, the reality of building software products is that there will always be a high level of unknowns. Planning and estimating can help clear some unknowns but never eliminate them. Scrum did an excellent job by recognising that a little bit of planning and estimating can help to make things better, but attempting to plan and estimate the entire product upfront is ultimately a waste of time.
The problem with Scrum is that it failed to formally recognise that introducing a discovery or experimentation phase into the product team’s workflow can eliminate more unknowns than planning or estimating, which is the main principle behind Lean UX.
Like the Agile Manifesto, Lean UX is more of a list of principles than an SDM. The main principles of Lean UX include the following:
Customer-centric: understand your customers and the problem you are solving for them before you build a product.
MVPs: start with a minimal product and add features gradually as you learn more.
Continuous innovation: continuously look for ways to improve your product or business through experimentation and iteration.
Data-driven: use data and customer feedback to make informed decisions.
Embrace failure: an opportunity to learn and improve your product or business based on customer feedback and data.
Action over planning: prioritise taking action over creating detailed plans.
Lean encourages us to build as little as possible and collect as much feedback from users as early as possible. We must then decide our next move based on the collected user data.
The best thing about Lean UX is that it introduces the idea of using discovery, experimentation and data analytics over planning and estimation as the primary way to mitigate risk in software development projects.
The bad thing about Lean UX is that it is not as prescript as Scrum and requires an upfront research investment. The nature of experimentation makes it hard to plan an estimate. These reasons make Lean UX much scarier for management than Scrum, especially for large organisations.
2.5 Kanban (2010) #
Kanban was introduced as an alternative or complement to Scrum, which was already widely used in the industry. While both Kanban and Scrum are Agile SDMs that aim to improve the delivery of software products, they have different approaches and strengths.
Kanban is a pull-based approach that emphasises visualising and managing the flow of work. It does not have time-boxed iterations like Scrum and focuses on continuously improving the delivery process.
The following list includes the main principles of Kanban:
Make work visible: Use a board to visualise the flow of work
Limit work in progress: to prevent bottlenecks and optimise flow
Manage flow: rather than managing individual tasks or resources
Make policies explicit: everyone involved can understand how work should flow
Implement feedback loops: continuously improve the process and make adjustments as needed
measure performance: use metrics such as lead time, cycle time, and throughput to improve the performance:
- Lead time: is the time it takes for a work item to be completed, from when it’s received until it’s marked as done.
- Cycle time: is the time it takes for a work item to move through the entire process, from start to finish.
The best thing about Kanban is that it introduces the idea of reinforcing focus by limiting work in progress and removing time boxes, leading to increased quality.
Like Lean UX, the bad thing about Kanban is that it is also harder to implement than Scrum. Scrum has a more prescriptive framework and a stronger focus on Agile principles, which can make it more appealing for organisations looking to adopt Agile practices. Scrum also has a well-established certification process and training offerings, which can help organisations adopt and implement the methodology more effectively.
Another potential problem with Kanban is that its focus on metrics like cycle and lead time can reinforce the idea of outputs over outcomes. Leading to reduced customer value and decreased quality.
2.6 The 12 Principles of Agile (2011) #
On the 10th anniversary of The Agile Manifesto, its original authors reunited and improved it by adding the following 12 principles:
Our highest priority is to satisfy the customer through early and continuous delivery of valuable software.
Welcome changing requirements, even late in development. Agile processes harness change for the customer’s competitive advantage.
Deliver working software frequently, from a couple of weeks to a couple of months, with a preference for a shorter timescale.
Business people and developers must work together daily throughout the project.
Build projects around motivated individuals. Give them the environment and support they need, and trust them to get the job done.
The most efficient and effective method of conveying information to and within a development team is face-to-face conversation.
Working software is the primary measure of progress.
Agile processes promote sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely.
Continuous attention to technical excellence and good design enhances agility.
Simplicity–the art of maximising the amount of work not done–is essential.
The best architectures, requirements, and designs emerge from self-organising teams.
At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behaviour accordingly.
The best thing about The 12 Principles of Agile is that it incorporates learnings from other movements, such as Lean UX. Also, some of The 12 Principles of Agile set the stage for the mainstream adoption of DevOps.
Like with The Agile Manifesto or Lean UX, the main problem with The 12 Principles of Agile is that adopting a set of principles without a prescript process, extensive documentation, certifications, etc. is a clear impediment to mainstream adoption.
2.7 The three ways (2013) #
The three ways are a set of principles designed to improve the efficiency of software development projects. These principles are highly influenced by The Agile Manifesto, Lean UX and Kanban and are grouped into three main categories.
2.7.1 The First Way: Systems thinking and the principles of flow #
Move work from Business, through Development, to Operations, and ultimately to the Customer (where the value is created) as quickly as possible.
- Use a board to make work visible.
- Limit work in progress.
- Reduce batch sizes.
- Reduce the number of handoffs.
- Identify and resolve constraints.
- Eliminate things that do not add value
2.7.2 The Second Way: Feedback loops and the need for amplification #
Increase the feedback loops from right to left. Focus on increasing the number of feedback loops and their speed. Treat problems as opportunities to learn how to prevent them and create an ever safer and more resilient system instead of a cause for punishment and blame.
- Increase causality.
- Learn from your mistakes.
- Swarm and fix.
- Push quality closer to the source.
- Prioritised non-functional requirements as highly as user features.
2.7.3 The Third Way: Creating a culture of continual experimentation and learning #
Developing and fostering a culture where constant experimentation and learning are encouraged and where people acknowledge that the way to mastery is through repetition and practice:
- Enable organisational learning and safety culture.
- Institutionalise the improvement of daily work.
- Transform local discoveries into global improvements.
- Make anti-fragility a habit.
The best thing about the three ways is that these principles leverage ownership within the development team to eliminate hostilities between technology disciplines such as frontend or backend development, testing, infrastructure, and site reliability engineering. The three ways also encourage implementing a high level of automation to prevent human errors, speed up the development feedback loops, increase anti-fragility and avoid repetitive tasks. The three ways can mitigate some inherent risks associated with developing technology products, particularly those associated with handovers and operations.
The bad thing about the three ways is that these principles are not a methodology. These principles are not prescript enough and are open to interpretation. As I have mentioned several times, not being prescript makes mainstream adoption much more complicated. However, the three ways and DevOps seem to have escaped this “curse”, and today, they are mainstream in businesses of all sizes across the world. How did the three ways gain popularity despite being hard to implement? Two factors can explain the adoption of the three ways:
It is possible to get started as an individual developer. You will need company-wide buy-in to create a highly mature implementation of the three ways. However, you can begin by implementing automated tests or a deployment script without the management team’s support. Your colleagues will experience some of the initial improvements and join the cause. Eventually, you and the rest of your team can attempt to convince your management team to support your initiative.
These principles are not prescript, but there is a lot of documentation about specific technologies that are closely related to these principles. The documentation about technologies such as Docker is highly prescript and facilitates the adoption of the three ways.
2.8 Product-led growth (PLG) (2016) #
The three ways principles and the DevOps movement leverage ownership to eliminate hostilities between technology disciplines. PLG aligns the development, marketing, and sales teams by focusing on creating a product that is the driving force behind customer acquisition, retention, and revenue growth, which creates a shared objective and a common goal for all teams to work towards rather than relying on traditional, siloed sales and marketing tactics.
The following list contains some of the main principles of PLG:
Focus on UX and delivering value: This principle emphasises the importance of user experience and ensuring that the product provides tangible value to the user. The goal is to create a product that is easy to use, intuitive, and solves a real problem for the user.
Product-led customer acquisition: This principle focuses on using the product itself as the primary driver for acquiring new customers by creating a product that is so valuable that users naturally tell their friends and colleagues about it.
Focus on customer retention: This principle emphasises the importance of retaining customers over acquiring new ones. Companies can reduce customer churn and increase customer lifetime value by creating a product that provides real value.
Data-driven decisions: This principle stresses the importance of making data-driven decisions regarding product development and marketing. By analysing data and user feedback, companies can make informed decisions that lead to better products and more successful outcomes.
Continuous experimentation: This principle emphasises the importance of continuous experimentation and improvement. Companies should continuously test new ideas, gather data, and iterate on their products to stay ahead of the curve and remain relevant to their customers.
The best thing about PLG is that it aligns the sales, marketing and customer success departments with the development team. When all groups focus on delivering a high-quality user experience and adding value to the product, the marketing and sales teams can use the product as a selling point and reference for customer acquisition and retention. Ultimately leading to a virtuous cycle where the product drives customer acquisition, and the customer acquisition drives product development and improvement. By aligning the development, marketing, and sales teams around a product-led growth strategy, companies can create a more cohesive and efficient growth engine that drives long-term success.
The bad thing about PLG is that, once more is not a methodology and is not very prescript, making it hard to gain widespread adoption. Like Lean UX, PLG is hard to predict and requires a significant upfront research investment.
3. Why are we still failing? #
“Ease of use may not be the most important feature, but it’s the one that’s most important to get right.” – Jef Raskin
Answering this question is a big challenge. Our collective failures cannot be attributed to a single cause. The following list contains the main reasons I believe are preventing our industry from achieving a greater rate of success:
The success of Scrum is holding us back: Scrum has done many good things for our industry, but, at this point, it is probably holding us back. If Scrum got one thing right, it is, without a doubt, its ease of use. The combination of being highly prescript and having extensive documentation and certifications gives organisations enough confidence to choose Scrum.
Leadership is not leading the cause: One of the reasons the leadership teams across many organisations like Scrum is that it strongly emphasises time boxes and estimates:
- Estimation and planning are valued more than discovery: The leadership team often says they are fully committed to making the organisation more Agile. Still, the reality is that they can’t let go of their old Waterfall ways.
- Lack of trust and ownership: The leadership team doesn’t fully trust the product team and fails to provide the team with the level of ownership that it deserves. The leadership team uses the stand-up meetings as status reports and deadlines to mitigate their lack of trust.
Metrics & practices that fail to reinforce principles: You have to be careful with what you measure. If you measure something, the entire organisation will change how it operates to hit the desired metrics. One of the main problems of Scrum is using practices, such as time boxes (Sprints), that enforce delivery dates and metrics, such as the Burndown charts that measure the amount of work remaining to be completed over time and helps track progress towards the project completion. The main problem is that completing many tasks doesn’t necessarily mean that we are delivering value to customers. The pressure to deliver more increases, and we no longer have time to deal with technical debt. The development team burns out, and the best developers quit, making the lives of the developers that stay even more miserable. Finally, the accumulated technical debt makes delivering value impossible, and the business loses against the competition. Our metrics and practices fail to reinforce our principles and often go against them. We need to use metrics and practices that strengthen and promote outcome-oriented behaviours instead of output-oriented ones.
Sales and marketing are setting the product team up for failure: The sales and marketing teams often make unrealistic promises when the organisation fails to deliver value and customer acquisition and retention starts to struggle. The product team is tasked with an impossible mission that leads to further disappointment, increased technical debt and lower quality.
Technology should be a tool, not a goal: Many developers often feel pain in their jobs when technical debt reaches critical levels. The problem is that technical debt is not a disease but a symptom. A series of factors cause technical debt, but the most significant is time pressure. Time pressure itself is often the result of unhappy customers. Sometimes fixing technical debt is a waste of time because as long as we don’t cure the disease, we will continue to create unsustainable levels of technical debt. What is the disease? In most cases, the disease is a leadership/culture that focuses on outputs over outcomes.
In PART II, I will share how I believe we can solve these problems.