When Infrastructure Constraints Doom Sound Architecture
How a SQL Server mandate undermined CQRS and Event Sourcing in a multi-tenant loan servicing platform
Executive Summary
A financial services company built a loan servicing engine using CQRS and Event Sourcing. The architecture delivered remarkable benefits: complete audit trails, trivial bug reproduction through event replay, and clean separation of concerns. Testing became straightforward because all code was decoupled from server datetime and bound instead to event timestamps.
The problem began when business leadership refused to use any data storage other than SQL Server. With hundreds of tenants and millions of users, event sourcing against a relational database became untenable. The team tried everything: table partitioning, query optimization, async command processing. None of it was enough.
The project was eventually abandoned despite costing over $5 million to build. Both CQRS and Event Sourcing became poisoned in the minds of leadership and developers. The company replaced the system with a third-party solution that was measurably inferior by every metric, but by then confidence had been lost and no amount of evidence could restore it.
Key lessons:
- Infrastructure decisions can undermine sound architecture regardless of implementation quality
- Event Sourcing requires storage optimized for append-only writes and partition-based reads
- Once confidence is lost, technical proof becomes irrelevant
- Without documented decisions, blame settles on whoever is most visible when projects fail
The Business Context
The company provided loan servicing capabilities to financial institutions. The platform needed to track every state change in a loanβs lifecycle with complete audit trails, support hundreds of tenants with millions of users, satisfy federal audit requirements, and enable bug reproduction without production access.
The existing third-party solution was expensive and inflexible. Leadership saw an opportunity to build a superior in-house platform.
Previous Attempts
This wasnβt the first attempt. Earlier iterations had failed: too slow, unable to replay state correctly, unable to satisfy audit needs. Each was rejected by the business.
The event sourcing approach was brought in specifically to fix these problems. It succeeded, and also solved future problems the business hadnβt yet recognized. But this history meant the project was already operating under a cloud of previous failures.
The Constraint: SQL Server Only
Whether the SQL Server mandate was communicated early is unclear. The organization wasnβt in the habit of documenting constraints, requirements, or decisions. Most developers didnβt learn about it until near the end, when performance problems forced the conversation.
Thereβs no way to prove who knew what and when, which is perhaps the larger issue. No documented constraints, no shared understanding of requirements, no explicit agreement on technical boundaries before work began.
When the architect presented the storage strategy, they recommended alternatives for the event store: MongoDB, EventStore, or similar. The reasoning was straightforward: event stores are append-only with partition-scoped reads. Relational overhead adds no value. SQL Serverβs transaction model, locking, and page-based storage are optimized for workloads that event sourcing doesnβt have.
The Business Veto
Leadership refused. Federal auditors knew SQL, and NoSQL felt risky for regulatory scrutiny. Leadership believed they couldnβt βhave their cake and eat it too,β that event sourcing was incompatible with queryable state for auditors.
The specific technology didnβt matter. The architect could have suggested a mythical database written by a sparkling unicorn cat and the response would have been the same. The decision wasnβt about evaluating alternatives; it was about not deviating from SQL Server under any circumstances.
The Hybrid Approach (Rejected)
The architect proposed using a purpose-built event store for writes with SQL Server projections for audit queries:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Command Side: Event Store (EventStore or MongoDB) β
β β’ Partitioned by TenantId, optimized for append-only writes β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
Event propagation
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Query Side: Projections (SQL Server) β
β β’ Auditor-friendly, full SQL capabilities, familiar tooling β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
This would have met all audit requirements while achieving performance targets. The architect addressed each concern directly: auditors would still have SQL access through projections, the causal chain would be preserved by design, and the regulatory requirement was data availability, not specific storage technology.
None of it mattered. The decision was made: SQL Server only.
The Architecture: CQRS and Event Sourcing
Despite the storage constraint, the team chose CQRS and Event Sourcing because they aligned with business requirements.
Why Event Sourcing
Event Sourcing stores every state change as an immutable event rather than overwriting current state. For loan servicing, this provided:
Complete audit trails by design. Every change to a loanβs state was captured as an event with timestamp, causation, and context. Auditors could trace any current state back through the exact sequence of events that produced it.
Temporal decoupling for testing. All code was decoupled from server datetime and bound instead to event timestamps. Bug reproduction became trivial: replay events up to the problematic state, and the system is in the exact condition when the bug occurred.
Debugging without production access. When bugs were reported, the team could replay the event stream to recreate the exact conditions. No guessing about state, no βworks on my machineβ mysteries.
Why CQRS
CQRS separates the write model (commands that produce events) from the read model (projections optimized for queries). For loan servicing:
Optimized read models for different consumers. Auditors needed different views than loan officers, who needed different views than batch processing systems. Each could have projections optimized for their access patterns.
Independent scaling. Command processing and query processing had different scaling characteristics. CQRS allowed each to scale independently.
The Architecture in Practice
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β API Gateway β
βββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββ΄ββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββββββββββββββββ
β Command Side β β Query Side β
β β β β
β βββββββββββββββββββββββββββ β β βββββββββββββββββββββββββββββββββββββββ β
β β Command Handlers β β β β Read Model Projections β β
β β β β β β β β
β β β’ Validate business β β β β β’ Auditor View β β
β β rules β β β β β’ Loan Officer View β β
β β β’ Generate events β β β β β’ Batch Processing View β β
β β β’ Append to store β β β β β’ Reporting View β β
β βββββββββββββ¬ββββββββββββββ β β ββββββββββββββββ¬βββββββββββββββββββββββ β
β β β β β β
β βΌ β β βΌ β
β βββββββββββββββββββββββββββ β β βββββββββββββββββββββββββββββββββββββββ β
β β Event Store β β β β Projection Storage β β
β β (SQL Server) ββββΌββΌβββ (SQL Server) β β
β β β β β β β β
β β Append-only events β β β β Denormalized for query patterns β β
β β Partitioned by tenant β β β β Eventually consistent β β
β βββββββββββββββββββββββββββ β β βββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββββββββββββββββ
The architecture was sound, the implementation was clean, and the patterns delivered their promised benefits. The problem was underneath all of it.
The Descent
With SQL Server mandated, the team did their best to make it work.
The initial implementation hit problems at scale: write contention from page-level locking, read amplification when building projections, and transaction overhead on every event append.
Table partitioning by tenant helped initially but created new problems: partition management overhead with hundreds of tenants, wildly uneven partition sizes, and query optimizer confusion requiring manual hints.
The teamβs optimization options were severely limited. This wasnβt a cloud environment with elastic scaling; it was on-premises SQL Server with fixed infrastructure. Many optimizations that might have helped werenβt available in the deployed SQL Server version or required infrastructure changes the organization wouldnβt approve.
Read replicas were considered but rejected due to licensing costs. The same cost sensitivity that drove the build-versus-buy decision now prevented the investment needed to make the build succeed.
None of it was enough.
The Human Cost
Leadership blamed middle management for the failure and dropped them along with most of the development team. People were fired or left. This was deeply unfair: the developers didnβt create the constraint that doomed the project, and they couldnβt override it. They built exactly what they were asked to build, within constraints they didnβt set and in many cases didnβt know about until it was too late.
Without a paper trail showing who decided what and when, blame settled on whoever was most visible when the project collapsed.
Key Lessons
Infrastructure constrains architecture regardless of implementation quality
The team wrote excellent code, the architecture was sound, and the patterns were appropriate. None of it mattered because the storage layer couldnβt support the access patterns. Patterns have storage requirements, and those requirements arenβt negotiable through implementation heroics.
Pattern failure attribution matters
βCQRS and Event Sourcing failedβ became the organizational narrative. The accurate narrative was βSQL Server couldnβt support Event Sourcing at scale.β These are very different lessons. Mis-attribution leads to avoiding good patterns while repeating actual mistakes.
Once confidence is lost, evidence becomes irrelevant
The in-house system was measurably superior by almost every metric. It didnβt matter. Confidence had eroded past the point where evidence could restore it. Technical teams often focus on fixing problems while ignoring the confidence damage that accumulates during the fixing.
Alignment requires documentation, not assumptions
No one could prove who knew about the SQL Server constraint and when. When the project failed, there was no paper trail showing where the real failure occurred. Without explicit alignment at the start, blame becomes arbitrary. Documentation isnβt bureaucracy; itβs protection for everyone involved.
Conclusion
CQRS and Event Sourcing delivered everything they promise. The projectβs failure wasnβt a failure of patterns; it was a failure of infrastructure alignment and organizational communication. The patterns became scapegoats for constraints that made them impossible to implement correctly.
For teams considering these patterns, the lesson isnβt to avoid them. The lesson is to ensure infrastructure decisions support the patternsβ requirements, and to engage business constraints as collaborative problems rather than obstacles to overcome through technical arguments alone.
The right storage for the right pattern isnβt a nice-to-have. Itβs a prerequisite for success.
Find this case study insightful? Share it with your network:
Share on LinkedIn