Distributed Computing Fundamentals
Distributed Computing Fundamentals
Distributed systems promise scalability, resilience, and flexibility, but they introduce significant complexity. Understanding the fundamental challenges and decision frameworks helps architects choose the right level of distribution for their context.
The Fallacies of Distributed Computing
Originally identified by Peter Deutsch (1994), later expanded by James Gosling and others at Sun Microsystems
Developers new to distributed systems often make false assumptions that lead to serious problems in production. Peter Deutsch and colleagues at Sun Microsystems identified eight fallacies: incorrect assumptions that seem reasonable but break in real distributed systems.
The Original Eight Fallacies
1. The network is reliable
Networks fail constantly. Packets get lost, connections drop, switches crash, and cables get unplugged. Distributed systems must handle network failures as normal conditions, not exceptional cases.
Use retries with exponential backoff, timeouts on all network calls, and circuit breakers to prevent cascading failures. Design for failure as the default state.
2. Latency is zero
Network calls are orders of magnitude slower than local method calls. A local method call takes nanoseconds. A network call takes milliseconds at best. That’s a factor of one million or more.
Minimize round trips between services. Batch operations when possible. Use asynchronous communication where immediate responses aren’t required. Design APIs to avoid chatty back-and-forth communication.
3. Bandwidth is infinite
Network bandwidth is limited and often shared. Large payloads slow down everything using the same network. Bandwidth is also expensive in cloud environments.
Be mindful of payload sizes. Compress data when transferring large amounts. Avoid sending unnecessary data. Consider pagination for large result sets.
4. The network is secure
Networks are inherently insecure. Data travels through switches, routers, and infrastructure you don’t control. Attackers can intercept, modify, or inject traffic.
Encrypt data in transit using TLS. Authenticate every request. Authorize every operation. Never trust network-level security alone.
5. Topology doesn’t change
Services move, scale, fail, and get replaced continuously. IP addresses change. Services get deployed to new hosts. Load balancers route traffic differently.
Use service discovery mechanisms rather than hardcoded addresses. Design for services to come and go dynamically. Health checks and graceful degradation handle topology changes.
6. There is one administrator
In distributed systems, different teams manage different components. You don’t control every part of the infrastructure. Coordination across teams becomes a challenge.
Document dependencies clearly, communicate changes early, and establish contracts between services using API versioning and backward compatibility.
7. Transport cost is zero
Infrastructure, bandwidth, and serialization all have costs. Cloud providers charge for network traffic. Serializing and deserializing data consumes CPU and memory.
Consider transport costs in architectural decisions. Sometimes consolidating services makes sense to avoid network overhead. Profile and measure actual costs.
8. The network is homogeneous
Distributed systems run on a mix of hardware, operating systems, network equipment, and software versions. This heterogeneity creates compatibility challenges.
Handle incompatibilities gracefully. Use standard protocols. Version your APIs. Test across different environments.
Modern Additional Fallacies
As distributed systems evolved, practitioners identified additional fallacies:
9. Versioning is easy
Managing multiple versions of services in production is complex. Rolling deployments mean old and new versions run simultaneously. Clients may be on different versions than servers.
Plan for backward compatibility, use API versioning strategies, and support multiple versions during transition periods. Schema evolution requires careful design.
10. Compensating transactions always work
Distributed systems often rely on eventual consistency and compensating transactions to roll back failed operations. However, some operations can’t be easily reversed.
Understand the limitations of compensating transactions. Some business operations are inherently difficult to undo. Design for idempotency where possible. Consider sagas carefully.
11. Observability is optional
In monolithic systems, debugging is hard but possible. In distributed systems, debugging without observability is nearly impossible. Requests flow through multiple services, making problems difficult to diagnose.
Invest in observability from the start. Implement distributed tracing, structured logging, and metrics collection. Without observability, you’re debugging blind.
Architecture Quantum
Concept from Mark Richards & Neal Ford’s Fundamentals of Software Architecture (2020)
An architecture quantum is an independently deployable artifact with high functional cohesion and synchronous connascence. Think of it as the smallest useful piece of the system that can be deployed on its own.
Understanding the Components
Independent deployment means the quantum can be deployed without deploying other parts of the system. It has its own deployment pipeline, versioning, and release schedule.
High functional cohesion means the quantum does something purposeful and complete. It encompasses all the functionality needed to deliver a specific business capability.
Synchronous connascence includes all parts of the system that must work together synchronously. If component A makes synchronous calls to component B, they belong in the same quantum. Asynchronous communication creates quantum boundaries.
Examples of Architecture Quanta
Single quantum (monolith): An entire e-commerce application deployed as one unit. The shopping cart, checkout, inventory, and user management all deploy together. Changes to any component require deploying the entire application.
Multiple quanta (microservices): Separate Order Service, Payment Service, and Inventory Service. Each deploys independently. Each has its own database and communicates asynchronously with others through events.
Hybrid architecture: A modular monolith serving the main application with a separate background job processor. The monolith and job processor are two distinct quanta that can deploy independently but communicate asynchronously through a message queue.
Why Quanta Matter
The number of quanta in a system determines the architectural style, deployment strategy, scalability approach, and team organization.
More quanta provide more flexibility. Different quanta can use different technologies, scale independently, and deploy on different schedules. Different teams can own different quanta with minimal coordination.
However, more quanta mean more operational complexity. Distributed tracing, service discovery, network communication, data consistency, and failure handling all become harder. The operational overhead of managing ten services is significantly higher than managing one.
The right number of quanta depends on your context. Start with one quantum unless you have specific reasons to split. Add quanta only when you need the flexibility they provide.
Monolith vs Distributed: Making the Decision
The choice between monolithic and distributed architecture is not binary. It’s a spectrum with many options in between. Understanding when distribution makes sense prevents both over-engineering and under-engineering.
When to Choose Distributed Architecture
Different parts need different architecture characteristics: The order processing component needs high availability and scalability, while the administrative reporting component doesn’t. Separating them into different quanta allows independent optimization.
High scalability or availability requirements: When you need to scale specific components independently, distribution makes sense. Scaling a monolith means scaling everything, even parts that don’t need it.
Teams need independent deployment: Large organizations with multiple teams working on different business domains benefit from independent deployment. Teams can release changes without coordinating with every other team.
Domain boundaries are clear: Distribution works best when business domains have natural boundaries with minimal cross-domain transactions. Clear domains mean less distributed complexity.
When to Choose Monolithic Architecture
A single set of architecture characteristics suffices: If the entire system has similar performance, scalability, and availability needs, a monolith is simpler. You avoid distributed complexity for no benefit.
Simpler deployment is preferred: Monoliths are easier to deploy, test, and debug. One deployment pipeline, one version, one set of logs.
Smaller team or scope: Small teams are more productive working on a single codebase. The coordination overhead of distributed systems slows down small teams.
Cost and complexity must be minimized: Distributed systems require more infrastructure, more operational tooling, and more specialized skills. If budget or team capability is limited, monoliths make sense.
The Modular Monolith Middle Ground
A modular monolith provides many benefits of distributed systems while retaining monolithic simplicity. Organize code into well-defined modules with clear boundaries and interfaces. Modules communicate through defined APIs rather than direct method calls.
This approach provides:
- Clear domain boundaries within a single deployable unit
- Easier refactoring than a tangled monolith
- A path to future distribution if needed
- Lower operational complexity than microservices
- Faster development than managing multiple services
Many systems should start as modular monoliths and distribute only when specific needs justify the complexity.
Decision Framework
Use this framework to evaluate whether distribution makes sense for your system:
| Question | Monolith | Distributed |
|---|---|---|
| Do different parts need different architecture characteristics? | No | Yes |
| Is extreme scalability critical? | Low-Medium needs | High needs |
| Do teams need independent deployment? | No | Yes |
| Are domain boundaries clear and stable? | No | Yes |
| What is the team size? | Small (1-10) | Multiple teams |
| What is the operational maturity? | Limited | High |
If most answers point toward monolithic, start there. You can always distribute later if needs change. Going from distributed back to monolithic is much harder.
The best architecture is the simplest one that meets your requirements. Distribution is a tool, not a goal. Use it when the benefits outweigh the costs.
Found this guide helpful? Share it with your team:
Share on LinkedIn