Team Topologies by Matthew Skelton & Manuel Pais goes over how to design team structures and the APIs for how they work with one another. It’s quite popular in modern software delivery circles. Key concepts from the authors.
Broadly speaking, the central idea of the book is that we should organize teams along value streams such that they can deliver value to the customer without being impeded by another team. This is counter balanced with the fact that teams should remain small to ensure they don’t go over Dunbar’s number. This is analogous to Amazon’s two-pizza team concept.
There are three forms of organizational structure in all orgs, according to Niels Pflaeging in “Organize for Complexity”.
- Formal structure (org chart) :: facilitates compliance
- Informal structure :: “realm of influence” between individuals
- Value creation structure :: how work actually gets done based on inter-personal and inter-team reputation
The author touches on cognitive load as a significant contributor to inefficiency. He says monoliths need to be broken down when “any indivisible software part [exceeds] the cognitive capacity of any one team”. When teams are spread too thin (e.g. too much cognitive load) means the team lacks bandwidth to pursue mastery of their trade & struggles with the cost of switching contexts. Cognitive load can be assessed with this question: “Do you feel like you’re effective and able to respond in a timely fashion to the work you are asked to do?”
They lean heavy into Conway’s Law, going so far as to quote Michael Nygard saying “Team assignments are the first draft of the architecture”. There was a novel concept introduced by Mike Cohn (from Scrum) who said that if two teams need to communicate but the architecture suggests they logically shouldn’t need to.. then this is a smell for the architecture. I hadn’t heard of communication as a smell for bad architecture, but it’s fascinating to think about.
They believe that teams should be long lived and the work should flow to them (as opposed to re-forming a new team around the work). They caution against multiple teams having “ownership” over a single code base, saying “The danger of allowing multiple teams to change the same system or subsystem is that no one owns either the changes made or the resulting mess.”
They encourage diversity among teams, quoting “Peopleware” saying “a little bit of heterogeneity can be an enormous aid to create a jelled team”. A bit non-intuitive, but echos many of the same things as Managing Diversity.
When divvying up work/domains for teams, they recommend limiting the number. “Identify distinct domains that each team has to deal with and classify these domains into simple (most of the work has a clear path of action), complicated (changes need to be analyzed and might require a few iterations on the solution to get it right) or complex (solutions require a lot of experimentation and discovery)“. As a heuristic, an average team should be able to do 2-3 simple domains [context switching isn’t bad b/c they are boring] or 1 complex domain [context switching is too high a cost]. No team should be given 2 complicated domains.
There are three types of dependencies between teams (A Taxonomy of Dependencies in Agile Software Development, Strode & Huff, 2012):
- knowledge
- task
- resource dependencies
To enforce boundaries between teams, they talk about different “fracture planes”. They caution that in splitting software along boundaries, it can lead to data duplication and disjoint user experiences.. so something to be cognizant of. The fracture planes they discuss are:
- business domain bounded context
- regulatory compliance
- change cadence
- team location
- risk
- performance isolation
- technology (not the best)
- user personas
There are a few triggers for when to evolve the team structures:
- software has grown too large for one team (Dunbar’s number)
- delivery cadence is becoming slower
- multiple business services rely on a large set of underlying services (lots of integration of the parts which is painful)
Steam aligned teams
These are teams which are aligned to a single stream (defined as the continuous flow of work aligned to a business domain or organizational capability).
They require many capabilities to get their work done. This seems to amount to “full stack development + product people + a platform that lets you operate the system”.
The list:
- application security
- commercial and operational variability analysis
- design and architecture
- development and coding
- infrastructure and operability
- metrics & monitoring
- product management & ownership
- testing and quality assurance
- user experience
The “expected behaviors” of a stream aligned team are:
- they aim to produce a steady flow of feature delivery
- they are quick to course correct based on feedback from the latest changes
- they use an experimental approach to product evolution, expecting to constantly learn and adapt
- they have minimal (ideally zero) hand-offs of work to other teams
- they are evaluated on the sustainable flow of change they produce 9together with some supporting technical and team-health metrics)
- they must have time and space to address code quality changes (tech debt) to ensure that changing the code remains safe and easy to do
- they proactively and regularly reach out to the supporting fundamental topology teams (platform/enabling/complicated subsystem)
- the members feel they have achieved or are on the path to achieving “autonomy, mastery and purpose”
Enabling team
Comprised of specialists for a given area / domain. They are collaborative and offer guidance (not execution) to stream-aligned teams. They are driven by “servant leadership” instead of becoming ivory-tower types.
Enabling teams are expected to:
- proactively seeking to understand the needs of stream-aligned teams, establishing regular checkpoints and jointly agreeing when more collaboration is needed
- stay ahead of the curve in understanding new approaches, tooling, and practices well before an actual need is expected from a stream-aligned team.
- act as a messenger for good news (new test framework will save us 50% effort!) and bad news (this library is now deprecated and we need to move away from it)
- act as an occasional proxy for internal/external services which are hard to use directly
- promote learning within their team and across stream-aligned teams, facilitating appropriate knowledge sharing inside the organization
Complicated subsystem team
These teams are responsible for building and maintaining a part of the system that requires deep specialist knowledge. They reduce cognitive load on stream-aligned teams that need to ue the subsystem. Examples are video codec development, facial recognition model development, etc. These form when the subsystem requires so much specialized knowledge that the cognitive load is too much for a stream aligned team.
They are expected to:
- be mindful of the current stage of development for their subsystem and act accordingly (high collab w/ stream teams early, focus on interfaces and evolution after they achieved stability)
- increase the rate of change for delivery speed and quality as compared to when it was in the stream-aligned teams
- correctly prioritize and deliver upcoming work so stream teams aren’t blocked
Platform teams
They enable stream-aligned teams to deliver work with autonomy, reducing the cognitive load on stream-aligned teams. Broadly speaking, similar to platform engineering.
They are expected to:
- use strong collab w/ stream teams to understand their needs
- rely oin fast prototyping and involving stream teams for fast feedback on what works and what doesn’t
- have a strong focus on usability and reliability for their services (treating them like a product) and regularly assessing if they are fit for purpose and usable
- lead by example, dogfooding as possible; partnering with stream/enabling teams, consuming lower level platforms.
- understand that adoption of new services isn’t immediate but follows an adoption curve.
A good test for the devex of these platforms is the time it takes to onboard a new employee into them.
A platform is not just a collection of features that dev teams happened to ask for at specific points in the past, but a holistic, well-crafted, consistent thing that takes into account the direction of technology change in the industry as a whole and the changing needs of the organization.
Choice quotes
- “the organizational structure must coordinate accountabilities to support the goals of delivering high-quality impactful software” - 2016 paper, Thinking Environments
- monoliths need to be broken down “(in particular, any indivisible software part that exceeds the cognitive capacity of any one team)”
- When delivering changes rapidly, it is important to ensure that high trust is explicitly valued and designed for.
- They referred to outcome-oriented management style as “eyes on, hands off” (McChrystal, Team of Teams), which I hadn’t heard before.
- “A bounded context is a unit for partitioning a larger domain or system model into parts, each of which represents an internally consistent business domain area (the term was introduced in the book Domain-Driven Design by Eric Evans)”
We’d argue that for a team to communicate efficiently, the options are between full colocation (all team members sharing the same physical space) or true remote-first approach (explicitly restricting communication to agreed channels - such as messaging and collaboration apps - that everyone on the team has access to and consults regularly). When neither of these options is feasible (full colocation or remote first), then it’s better to split off the monolith into separate subsystems for teams in different locations. In this way, an organization can leverage Conway’s law to align the system architecture with the communication constraints in real life.
[Conway’s law] creates an imperative to keep asking: “Is there a better design that is not available to us because of our organization?”
- Mel Conway, Towards Simplifying Application Development in a Dozen Lessons
Too often, a platform is left to former system administrators to build and run without using well-defined software development techniques (agile practices, TDD, continuous delivery, product management, etc) or it receives so little funding and attention from the organization that it never helps other teams, only hinders them.