Data Management Patterns
Data management patterns address how distributed systems handle data storage, access, and consistency across multiple services and databases.
Database per Service
Each microservice owns its data and database, ensuring loose coupling and service autonomy.
Use When:
- Building microservices architecture
- Services have different data storage requirements
- Want to enable independent service evolution
- Need to prevent database-level coupling
Database per Service Challenges
- No ACID transactions across services
- Data consistency becomes more complex
- Reporting across services requires aggregation
Example: E-commerce system where user service uses SQL database, product catalog uses document database, and recommendation engine uses graph database.
User Service β PostgreSQL
Product Service β MongoDB
Recommendation Service β Neo4j
Shared Database
When to Use Shared Database
- Tight coupling between services is acceptable
- ACID transactions across services are required
- Migrating from monolithic applications
- Services have significant data overlap
Drawbacks of Shared Database
- Creates tight coupling between services
- Database becomes a bottleneck
- Reduces service autonomy
- Schema changes affect multiple services
Event Sourcing
Pattern popularized by Martin Fowler and Greg Young in early 2000s
Instead of storing just the current state, stores every change as an immutable event. The current state is derived by replaying all events from the beginning. Think of it like a bank statement: you can see every transaction, not just the current balance.
How It Works:
Traditional (state-based): Event Sourcing:
βββββββββββββββββββββββ βββββββββββββββββββββββββββββββββββ
β Account β β Event Store β
β βββββββββββββββββββ β β βββββββββββββββββββββββββββββββ β
β id: 123 β β 1. AccountOpened(id:123) β
β balance: $120 β β 2. Deposited($100) β
β status: active β β 3. Withdrawn($30) β
β β β 4. Deposited($50) β
β (only current state)β β β
βββββββββββββββββββββββ β Current state = replay all β
β Balance = 0 + 100 - 30 + 50 β
β = $120 β
βββββββββββββββββββββββββββββββββββ
Event sourcing maintains a complete audit trail of all changes, allowing you to replay events for testing, analytics, or answering temporal queries like "what was the state at time X?"
Use When:
- Need complete audit trail of all changes (finance, healthcare, legal)
- Want to replay events for testing, debugging, or analytics
- Building systems that answer temporal queries (βwhat was the state on March 1st?β)
- Implementing complex business domains where understanding βhow we got hereβ matters
Snapshotting for Performance:
Replaying millions of events to get current state is slow. Snapshots periodically save the computed state so you only replay events since the last snapshot.
Event Store with Snapshots:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Events 1-1000 β Snapshot @ 1000 β Events 1001-2000 β Snapshot @2000β
β β balance: $5000 β β balance: $7500β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
To get current state (at event 2347):
1. Load snapshot @ 2000 (balance: $7500)
2. Replay only events 2001-2347
3. Much faster than replaying all 2347 events
Schema Evolution:
Events are immutable, but your event schema will change over time. Strategies:
| Strategy | How It Works | Trade-off |
|---|---|---|
| Upcasting | Transform old events to new schema on read | No data migration, runtime overhead |
| Versioned events | Store schema version with event, handle each version | Explicit handling, more code paths |
| Copy-transform | Migrate all events to new schema | One-time cost, breaks immutability |
Example: Adding a field to DepositedEvent
v1: { type: "Deposited", amount: 100 }
v2: { type: "Deposited", amount: 100, currency: "USD" }
Upcaster for v1 β v2:
if event.version == 1:
event.currency = "USD" // Default for legacy events
GDPR and Data Deletion
Event sourcing conflicts with "right to be forgotten" requirements. Solutions include crypto-shredding (encrypt PII per user, delete keys to make data unreadable) or keeping deletion tombstone events that indicate "treat as if this user never existed."
CQRS (Command Query Responsibility Segregation)
Pattern introduced by Greg Young (2010), based on Bertrand Meyerβs Command-Query Separation principle (1988)
Uses separate models for reading and writing data. Writes go to a normalized model optimized for consistency; reads come from a denormalized model optimized for queries. The two models are synchronized asynchronously.
How It Works:
Commands (writes) Queries (reads)
β β
βΌ βΌ
βββββββββββββββββ βββββββββββββββββ
β Command Model β β Query Model β
β (normalized) β β(denormalized) β
βββββββββ¬ββββββββ βββββββββββββββββ
β β²
β βββββββββββββββββββ β
ββββββ Sync Mechanism βββββββββ
β (events/CDC/ β
β polling) β
βββββββββββββββββββ
Write: CreatePost(userId, content)
β Command DB: INSERT into posts, users_posts, etc. (normalized)
β Publish: PostCreated event
Sync: PostCreated event received
β Query DB: UPDATE user_feed (denormalized: includes user name, avatar, etc.)
Read: GetUserFeed(userId)
β Query DB: SELECT * FROM user_feed WHERE user_id = ? (single table, fast)
Use When:
- Read and write patterns are significantly different (10:1 or higher read ratio)
- Need to scale reads and writes independently
- Complex reporting requirements that donβt fit the write model
- Different consistency requirements (strong for writes, eventual OK for reads)
Synchronization Mechanisms:
| Mechanism | How It Works | Latency | Complexity |
|---|---|---|---|
| Same transaction | Write to both in one transaction | 0ms | Low (but defeats purpose) |
| Events | Publish domain events, projector updates query model | 10-100ms | Medium |
| CDC (Change Data Capture) | Stream database changes to projector | 1-10s | Medium |
| Polling | Periodically query command DB for changes | 1-60s | Low |
Eventual Consistency Timeline:
T=0: User creates post
T=1ms: Post saved to Command DB
T=2ms: PostCreated event published
T=50ms: Event received by projector
T=55ms: Query DB updated
T=60ms: User's feed shows new post
Reader sees stale data for ~60ms
(Acceptable for most use cases; not acceptable for banking)
Example: E-commerce order history.
Command Model (normalized):
orders: id, user_id, status, total, created_at
order_items: order_id, product_id, quantity, price
products: id, name, description, current_price
Query Model (denormalized for "My Orders" page):
user_orders: user_id, order_id, status, total, created_at,
items: [{name, quantity, price, image_url}, ...]
Write: PlaceOrder β Insert into orders + order_items (normalized, ACID)
Sync: OrderPlaced event β Update user_orders (denormalized, fast reads)
Read: GetMyOrders β SELECT from user_orders (single query, no joins)
Warning
CQRS adds complexity: two models, synchronization logic, eventual consistency handling. Don't use unless you have a specific problem it solves (high read/write ratio, complex queries, independent scaling needs).
Materialized View
Pre-computed and stored query results that are periodically updated, optimizing read performance for complex queries.
Use When:
- Complex queries are expensive to compute
- Query results donβt need to be real-time
- Read-heavy workloads with predictable query patterns
Trade-offs:
- Data freshness vs. performance
- Storage space for additional views
- Complexity of keeping views updated
Example: E-commerce analytics dashboard showing daily sales summaries, computed overnight and stored for fast display during business hours.
Nightly Job: Compute sales by category, region, time period β Store in materialized view
Dashboard: SELECT * FROM daily_sales_summary WHERE date = today
Quick Reference
Pattern Comparison
| Pattern | Data Distribution | Consistency | Complexity | Use Case |
|---|---|---|---|---|
| Database per Service | Isolated | Eventual | High | Microservices independence |
| Shared Database | Shared | Strong | Low | Monolith or tight coupling OK |
| Event Sourcing | Event log | Eventual | High | Audit trail, event replay |
| CQRS | Separate read/write | Varies | Medium-High | Different read/write patterns |
| Materialized View | Cached | Stale | Low | Expensive query optimization |
Decision Tree
| Question | Pattern |
|---|---|
| Independent services? | Database per Service |
| Need ACID across services? | Shared Database (consider monolith instead) |
| Need full audit trail? | Event Sourcing |
| Different read/write patterns? | CQRS |
| Expensive queries? | Materialized View |
Consistency Trade-offs
Strong Consistency: Shared Database | CQRS (same DB) Eventual Consistency: Database per Service | Event Sourcing | CQRS (separate DBs) Stale Data OK: Materialized View
Found this guide helpful? Share it with your team:
Share on LinkedIn