Azure Cosmos DB for System Architects
What Is Azure Cosmos DB
Azure Cosmos DB is a fully managed, distributed database service that stores data in JSON-like documents (and other formats depending on the API chosen). Unlike traditional relational databases, data in Cosmos DB is schemaless, distributed across regions, and accessed through API models (NoSQL, MongoDB, Cassandra, Gremlin, Table) rather than SQL.
Cosmos DB addresses the problem of building systems that are both globally distributed and highly available. Rather than forcing you to build custom sharding layers, replicate data to multiple regions manually, or choose between consistency and availability, Cosmos DB manages these concerns at the platform level.
What Problems Cosmos DB Solves
Without Cosmos DB:
- Replicating data across regions requires custom code and operational complexity
- Consistency between replicas must be managed manually
- Choosing a single region creates availability and latency concerns
- Scaling to millions of requests per second requires application-level sharding
- Managing failover between data centers is a complex operational burden
- Different applications need different data models but no single platform supports them
With Cosmos DB:
- Automatic multi-region replication and failover at the platform level
- Configurable consistency levels allow trading off latency against consistency guarantees
- Single database can serve globally distributed users with low latency from any region
- Throughput scales from hundreds to millions of requests per second transparently
- Application code doesn’t need to handle replication, failover, or sharding logic
- Single platform supports multiple data models through different APIs
How Cosmos DB Differs from AWS DynamoDB
Architects familiar with AWS should note several important differences:
| Aspect | AWS DynamoDB | Azure Cosmos DB |
|---|---|---|
| Consistency models | Eventually consistent by default; strong consistency requires application logic | Five configurable levels (Strong, Bounded Staleness, Session, Consistent Prefix, Eventual) |
| Multi-region | Global Tables (separate management) | Native multi-region built into the database |
| Global writes | Not supported; must write to primary region | Supported; any region can accept writes |
| APIs supported | Only key-value and DynamoDB API | NoSQL/Core SQL, MongoDB, Cassandra, Gremlin, Table |
| Throughput pricing | Per-table; separate for global tables | Single provisioned throughput across all regions |
| Query language | DynamoDB Query/Scan limited | SQL (Core API) with rich relational-like queries |
| Multi-model | Key-value only | Documents, key-value, graph, column-family |
Core Concepts
What Cosmos DB Is, Architecturally
Cosmos DB is a write-optimized, distributed document store. Unlike relational databases that optimize for joins and normalized schemas, Cosmos DB optimizes for:
- Fast writes at scale. Writes are acknowledged to the client after committing to the local region, not after replicating everywhere.
- Configurable consistency. You choose how strongly consistent data needs to be in exchange for throughput and latency.
- Schemaless data. No schema enforcement; documents in the same container can have different structures.
- Transparent distribution. The database handles multi-region replication; applications see a single logical database.
- Request-based throughput. Capacity is measured in Request Units (RUs), not CPU cores or storage.
Request Units (RUs)
A Request Unit is Cosmos DB’s currency for throughput. Every operation (read, write, query) consumes a number of RUs based on the operation’s complexity.
RU consumption examples:
- A small document read: 1 RU
- A small document write: 5 RUs
- A query scanning 100 documents: 50-200 RUs (depends on filtering and projection)
- A write with complex triggers or stored procedures: 10-50+ RUs
You provision throughput in RUs per second (RU/s). If you provision 1,000 RU/s, the database allows 1,000 RUs of operations per second. Operations exceeding the provisioned throughput are rate-limited (with exponential backoff retries available to clients).
Provisioned vs serverless vs autoscale:
| Model | Cost | Best For |
|---|---|---|
| Provisioned | Per RU/s per hour | Predictable workloads; development/testing |
| Autoscale | Per max RU/s provisioned; scales from 10% automatically | Workloads with variable throughput; production workloads with spiky traffic |
| Serverless | Per RU consumed (no hourly charge) | Sporadic usage; development; workloads that rarely exceed a few RU/s |
Provisioned throughput is shared across all containers in an account (if provisioned at the account level) or scoped to a single container (if provisioned at the container level). Autoscale automatically scales between 10% and 100% of the max RU/s you specify.
Consistency Levels
Cosmos DB offers five consistency levels that trade off between data freshness and throughput. This is the core architectural lever for application design.
Strong Consistency
Guarantee: All reads return the most recently committed write.
The client never sees stale data. Every read sees the committed state at the time of the request. This is the default consistency of traditional relational databases.
Trade-offs:
- Reads must wait for replication to all regions (if multi-region replicas exist) or for local durability confirmation
- Highest latency
- Lowest throughput (fewer concurrent operations can be served)
- Write availability is affected if regions disconnect
When to use:
- Financial systems where reading stale data is unacceptable
- Authorization systems where permissions must be checked against the latest state
- Inventory systems where overselling is not acceptable
Bounded Staleness Consistency
Guarantee: Reads lag behind writes by a bounded amount. You specify either a time bound (e.g., 5 seconds) or a write count bound (e.g., 100 operations).
A read will return data that is at most the specified amount behind the latest committed write. If replication is caught up, the read is as fresh as strong consistency. If replication is lagging, the read returns data no older than your specified bound.
Trade-offs:
- Latency between strong and session consistency
- Throughput between strong and session consistency
- Predictable freshness guarantees (useful for SLAs)
When to use:
- Systems requiring freshness guarantees (e.g., stock quotes must be no older than 10 seconds)
- Applications that need freshness without paying the full cost of strong consistency
- Financial dashboards where slightly stale data is acceptable if you know the maximum staleness
Session Consistency
Guarantee: Within a session (single client), reads see writes executed by the same client. Across different clients, reads are eventually consistent.
A client sees its own writes immediately. However, another client reading the same data might see stale data until the updates replicate. This matches how traditional web applications often work (client sees updates immediately, other users see them eventually).
Trade-offs:
- Lower latency than bounded staleness
- Better throughput than bounded staleness
- Reads are not guaranteed to be fresh for other clients
- No freshness guarantees across clients
When to use:
- Web applications where you cache data per user (user sees their updates, other users see eventual consistency)
- Multi-player games where each player sees their own updates immediately
- Content management systems where editors see updates immediately but readers tolerate slight delays
Consistent Prefix Consistency
Guarantee: Reads never see writes out of order. If write A happened before write B, reads never see B without seeing A first.
Updates are causally ordered but not necessarily up-to-date. This prevents the strange situation where you read a response to a question before seeing the question itself.
Trade-offs:
- Better latency and throughput than session consistency
- Order guarantees but not freshness guarantees
- Useful for applications with causal relationships between writes
When to use:
- Conversation threads where responses must appear after the original message
- Database change logs or audit trails where order matters
- Distributed counter implementations where increments must appear in order
Eventual Consistency
Guarantee: Reads will eventually reflect all writes, but there are no freshness or ordering guarantees. A read may return stale data indefinitely (in the absence of new writes).
This is the cheapest consistency model in terms of latency and throughput, and the most permissive.
Trade-offs:
- Lowest latency
- Highest throughput
- No freshness guarantees
- No ordering guarantees
When to use:
- Counters and analytics where approximate values are acceptable
- Page view counts or social media engagement metrics
- Recommendation systems where slightly stale data is acceptable
- Caches where staleness is inherent
Partitioning and Partition Keys
Cosmos DB scales by distributing data across physical partitions. A partition key is the document property that determines which partition a document belongs to.
Choosing a Partition Key
The partition key is the single most important architectural decision. Cosmos DB uses the partition key value to distribute documents across multiple physical partitions, each of which can serve up to 10 GB of data and 10,000 RU/s.
Good partition keys distribute writes evenly:
| Scenario | Partition Key | Why It Works |
|---|---|---|
| Multi-tenant SaaS | tenantId |
Each tenant’s data goes to a different partition; requests are isolated by tenant |
| Time-series metrics | deviceId |
Each device’s metrics go to the same partition; writes are spread across devices |
| User activity log | userId |
Each user’s activity goes to the same partition; writes are spread across many users |
Bad partition keys concentrate writes on a few partitions (hot partitions):
| Scenario | Bad Key | Why It’s Bad | Better Choice |
|---|---|---|---|
| User activity log | isActive (true/false) |
Nearly all writes go to two partitions (true/false); rest of the cluster is idle | userId |
| Time-series metrics | metricType |
All events of a type go to one partition; other partitions stay idle | deviceId + metricType |
| Blog posts | isPublished |
All published posts concentrated on one partition | authorId or blogId |
| Orders | status (pending/completed) |
Most writes concentrate on pending; completed orders don’t scale | customerId or orderId |
Hot partitions are the most common Cosmos DB performance issue. Requests hitting a partition exceeding its 10,000 RU/s limit are rate-limited, causing client backoff and poor latency even if the database has unused capacity.
Hierarchical Partition Keys
Hierarchical partition keys allow you to specify multiple levels of partitioning, improving distribution for scenarios where a single key creates hot partitions.
With hierarchical keys, you specify a path like /tenantId/userId. This creates a two-level hierarchy: first partition by tenant, then by user within each tenant. This approach provides better isolation and allows different throughput management at each level.
Multi-Model APIs
Cosmos DB supports multiple APIs, each optimized for different access patterns and migration scenarios. The API you choose affects the query language, consistency options, and default indexing.
Core API (NoSQL)
The Core API stores and queries data using JSON documents with a SQL-like query language.
Use Core API when:
- Building new applications that need flexible queries
- Migrating from other document databases
- Needing rich SQL-like queries with joins
- Working with hierarchical or nested documents
Advantages:
- Rich SQL query language similar to relational databases
- Full indexing flexibility
- Best analytics and reporting capabilities
- Native JSON support
MongoDB API
MongoDB API provides MongoDB wire-protocol compatibility, allowing applications written for MongoDB to connect with minimal code changes.
Use MongoDB API when:
- Migrating from MongoDB to Azure
- Team expertise is in MongoDB
- Existing MongoDB clients/drivers are used
Advantages:
- MongoDB driver compatibility
- Smooth migration path from MongoDB
- Team familiarity if using MongoDB in other projects
Limitations:
- Smaller feature set than MongoDB (some MongoDB features not supported)
- Slightly different consistency semantics
- Less rich querying than Core API
Cassandra API
Cassandra API provides Apache Cassandra wire-protocol compatibility for applications using Cassandra drivers.
Use Cassandra API when:
- Migrating from Apache Cassandra
- High-volume write scenarios where Cassandra excels
- Team has Cassandra expertise
Advantages:
- Cassandra driver compatibility
- Wide-column model good for time-series data
- Designed for extremely high write throughput
Gremlin API
Gremlin API supports graph queries using Apache Tinkerpop’s Gremlin language for traversing relationships.
Use Gremlin API when:
- Data has complex relationships (social networks, knowledge graphs)
- Queries involve traversing relationships (shortest path, recommendations)
- Building recommendation engines
Advantages:
- Native graph traversal queries
- Efficient relationship queries
- Good for knowledge graphs and social networks
Table API
Table API provides a key-value store compatible with Azure Table Storage and AWS DynamoDB.
Use Table API when:
- Migrating from Azure Table Storage
- Legacy DynamoDB-compatible code
- Simple key-value access patterns
Advantages:
- Compatibility with existing Table Storage code
- Simple key-value model
Limitations:
- Most limited querying (no joins, no complex filters)
- Least flexible of the APIs
Global Distribution
Cosmos DB automatically replicates data to multiple regions you specify. You configure which regions hold replicas, and Cosmos DB manages replication automatically.
Single-Region Writes vs Multi-Region Writes
Single-Region Writes (more common):
- One region is designated as the write region
- All writes go to the write region; reads can use any region
- Other regions are read-only replicas
- Reduces write complexity and consistency concerns
- Failover to another write region is possible but requires configuration
Multi-Region Writes:
- Any region can accept writes
- Cosmos DB merges writes from different regions automatically
- Requires conflict resolution policies
- More complex but eliminates write-region bottleneck
- Useful for globally distributed teams that need local write latency
Conflict resolution policies (for multi-region writes):
- Last Write Wins: Latest timestamp wins; earlier writes are discarded
- Custom Procedure: Application-defined JavaScript function determines which version persists
- Custom Async: Application code outside Cosmos DB resolves conflicts asynchronously
Change Feed
The Change Feed is a stream of changes (inserts, updates, deletes) applied to documents in a container. Applications can subscribe to the change feed to react to changes.
Use cases:
- Event-driven architectures: Subscribe to changes and trigger downstream processes
- Cache invalidation: Update caches when source documents change
- Search index updates: Keep Elasticsearch or other search indices in sync with Cosmos DB
- Audit logging: Capture all changes for compliance
- Notifications: Notify users when data they follow changes
Change feed patterns:
- Push model: Cosmos DB change feed processor automatically pushes changes to your code
- Pull model: Application explicitly reads from a checkpoint and processes changes
- Azure Functions trigger: Automatically invoke a function when documents change
Integrated Cache
Cosmos DB includes a built-in cache (called the Integrated Cache) that caches frequently accessed items in memory, reducing RU consumption.
The cache is transparent to applications. When a read would hit the cache, no RU cost is incurred. This is particularly valuable for workloads with hot reads (a small set of documents read frequently).
How the cache works:
- Data accessed by queries is cached in the client connection
- Repeated reads of the same item from the same client consume no RUs
- Cache is per-client connection (not shared across clients)
- Cache is automatically invalidated when the item is updated
DocumentDB Comparison
Amazon DocumentDB is AWS’s MongoDB-compatible database. Architects comparing the two should understand these differences:
| Aspect | Cosmos DB | DocumentDB |
|---|---|---|
| Multi-region | Native multi-region built-in | Regional, requires read replicas |
| Global writes | Supported | Not supported; single write region |
| Consistency levels | Five configurable levels | MongoDB consistency only |
| APIs | Core SQL, MongoDB, Cassandra, Gremlin, Table | MongoDB only |
| Throughput model | Request Units (RU/s) | Instance-based (vCPU) |
| Scaling | Transparent; configure RU/s | Manual scaling; instance changes |
DocumentDB is MongoDB-compatible but lacks Cosmos DB’s flexibility around consistency models, multi-model support, and global write capabilities.
Architectural Patterns
Pattern 1: Single-Region, Single-API Application
Use case: New application with users in a single region, using Core API.
Architecture:
- Single Cosmos DB account in one region
- Core API for JSON documents
- Session consistency for web application behavior
- Autoscale provisioning (10%-100% of max RU/s)
Components:
- Single region reduces data transfer costs and latency concerns
- Core API provides rich querying
- Session consistency matches typical web app user experience (user sees their updates, others see eventual consistency)
- Autoscale handles traffic spikes without manual intervention
Trade-offs:
- No geographic redundancy (region failure causes outage)
- No multi-region read latency optimization
- Suitable for applications serving a single geographic area
Pattern 2: Multi-Region Global Application
Use case: Application serving users globally with read-heavy workloads.
Architecture:
Write Region (East US)
↓
Cosmos DB Container
↓
Replicate to → Read Region 1 (Europe West)
→ Read Region 2 (Asia Southeast)
→ Read Region 3 (Australia East)
Components:
- Primary write region where all writes occur
- Read replicas in regions near users for local read latency
- Configured multi-region failover (if primary fails, secondary becomes write region)
- Strong or bounded staleness consistency for read-freshness guarantees
- Private endpoints in each region for network isolation
Trade-offs:
- Cross-region replication latency (eventual consistency for remote replicas)
- Higher cost (provisioned throughput applies to all regions)
- Suitable for read-heavy applications where write latency is local and read latency must be global
Pattern 3: Multi-Region Writes with Conflict Resolution
Use case: Globally distributed system where different regions need to accept writes independently.
Architecture:
Region 1 (East US) Region 2 (Europe West) Region 3 (Asia Southeast)
Accept writes → Cosmos DB Replication ← Accept writes
↓ ↓ Merge conflicts ↓
App using last-write-wins App
Components:
- All regions configured as write regions
- Automatic conflict resolution (last-write-wins or custom procedure)
- Multi-region writes enabled
- Higher provisioned throughput (each region’s writes count separately)
Trade-offs:
- Conflict resolution adds complexity (which version wins?)
- Higher cost (write throughput needed in every region)
- Lost updates possible depending on conflict resolution strategy
- Useful only when write-region latency is critical problem
Pattern 4: Event-Driven Architecture with Change Feed
Use case: Data changes in Cosmos DB trigger downstream processes.
Architecture:
Cosmos DB Container (Orders)
↓ Change Feed
Document inserted/updated
↓
Change Feed Processor
↓
Event Handler (Azure Function / Custom App)
↓
Downstream Systems (Elasticsearch, Cache, Email Service, etc.)
Components:
- Cosmos DB as the system of record
- Change feed processor (managed by Azure Functions or custom code) subscribes to changes
- Event handlers react to changes and update downstream systems
- Guarantees at-least-once delivery (handlers must be idempotent)
Trade-offs:
- Handlers must be idempotent (same event processed multiple times must be safe)
- Change feed ordering is guaranteed within a partition, not globally
- Eventual consistency in downstream systems (lag between source and derived data)
- Decouples Cosmos DB from downstream services (changes trigger workflows independently)
Common Pitfalls
Pitfall 1: Hot Partition from Poor Partition Key
Problem: Choosing a partition key that concentrates writes on a small number of partitions (e.g., status field with only two values).
Result: Rate limiting on hot partitions while other partitions stay idle. Requests hitting the hot partition are rate-limited and fail with 429 errors. The database appears throughput-constrained even with low overall RU utilization.
Solution: Choose partition keys with high cardinality (many distinct values) that distribute writes evenly. Use hierarchical partition keys if a single key creates hotness. Monitor partition usage with Azure Monitor metrics to detect hot partitions early.
Pitfall 2: Forgetting Consistency Model Impact
Problem: Using strong consistency everywhere, paying the latency and throughput cost of waiting for global replication.
Result: Unnecessary latency on reads that don’t need freshness guarantees. Reduced throughput because fewer concurrent operations can complete while waiting for replication.
Solution: Use strong consistency only where required (authorization, critical updates). Use session consistency for typical application reads (user sees their updates, others see eventual consistency). Use eventual consistency for analytics and non-critical data.
Pitfall 3: Underestimating RU Consumption for Complex Queries
Problem: Assuming simple reads cost 1 RU, writing queries that scan entire containers without proper filtering.
Result: Queries cost far more RUs than expected. Autoscale provisions much higher than anticipated. Bills spike unexpectedly.
Solution: Test queries in development environment to understand RU consumption. Use query metrics in the Azure portal to see estimated and actual RU cost. Use composite indexes for common filter/order combinations. Avoid unbounded scans.
Pitfall 4: Not Understanding Multi-Region Write Costs
Problem: Enabling multi-region writes assuming similar costs to single-region setup.
Result: Throughput is provisioned per region. Cost multiplies by the number of write regions. A 10,000 RU/s container becomes 10,000 RU/s per region, tripling or quadrupling costs.
Solution: Multi-region writes are expensive. Use only if write-region latency is truly critical. For most applications, single write region + read replicas provides better cost/benefit.
Pitfall 5: Ignoring Partition Throughput Limits
Problem: Provisioning 100,000 RU/s but not realizing throughput is distributed across partitions, each capped at 10,000 RU/s.
Result: If all requests target one partition, that partition’s 10,000 RU/s limit is hit first, rate-limiting requests even though 90,000 RU/s of database capacity is unused.
Solution: Partition key distribution matters as much as provisioned throughput. Ensure partition keys distribute evenly. Monitor partition metrics to detect hot partitions. Each physical partition can handle 10,000 RU/s; distributing to 10 partitions allows 100,000 RU/s across diverse partition keys.
Pitfall 6: Change Feed Handlers Not Idempotent
Problem: Change feed handlers that fail if the same change is processed twice.
Result: If a handler crashes mid-processing, the change feed processor restarts and replays the change. Non-idempotent handlers duplicate work or corrupt state.
Solution: Change feed guarantees at-least-once delivery, not exactly-once. All handlers must be idempotent: processing the same change multiple times produces the same result as processing it once. Use idempotency keys or check for duplicates before applying changes.
Key Takeaways
-
Cosmos DB differs significantly from relational databases. It’s a globally distributed, schema-free document store designed for scale. Understanding the distributed nature and eventual consistency implications is critical to using it effectively.
-
Partition key selection is the most important architectural decision. A good partition key distributes writes evenly across physical partitions. A bad one creates hot partitions that limit throughput. This cannot be changed after container creation without migration.
-
Consistency levels are the primary architectural lever. Five levels allow trading latency against freshness. Most applications should use session consistency for typical reads and strong consistency only where required.
-
Request Units (RUs) are the throughput currency. Every operation consumes RUs based on complexity. Understanding RU consumption for your queries drives capacity planning and cost management.
-
Autoscale is usually better than provisioned for variable workloads. It automatically scales from 10% to 100% of max RU/s, handling spikes without manual intervention. Provisioned throughput is better for predictable, steady workloads.
-
Multi-region replication is transparent but not free. Replicating to additional regions increases costs and introduces replication latency. Use read replicas for scale, but understand the cost implications.
-
Multi-region writes are powerful but expensive. Any region can write, and Cosmos DB merges conflicts automatically. This eliminates write-region bottlenecks but costs more and requires careful conflict resolution.
-
The change feed enables event-driven architectures. Subscribe to document changes to trigger workflows, cache invalidation, or search index updates. Handlers must be idempotent because change feed guarantees at-least-once delivery.
-
Choose the right API for your access patterns. Core API for new applications with rich queries, MongoDB API for migrations, Cassandra API for time-series workloads, Gremlin for graphs. The API determines query language, indexing, and consistency options.
-
Monitor and optimize partition key distribution continuously. Hot partitions are the most common performance issue. Use Azure Monitor metrics to identify skew and consider hierarchical partition keys if distribution is uneven.
Found this guide helpful? Share it with your team:
Share on LinkedIn