Graph Databases

📖 3 min read

What They Are

Graph databases store data as nodes (entities) and edges (relationships between entities). Unlike relational databases where relationships are computed at query time through joins, graph databases store relationships explicitly, making traversal a direct lookup rather than a search.

Graph databases solve a problem that relational databases handle poorly: highly connected data. In a social network, finding friends-of-friends in a relational database requires self-joining the friendship table, with performance degrading exponentially as you traverse more levels. In a graph database, each traversal step is a constant-time pointer lookup regardless of total graph size.


Data Structure

                    ┌─────────────────┐
                    │  :Person        │
                    │  name: "Alice"  │
                    │  age: 32        │
                    └────────┬────────┘
                             │
              ┌──────────────┼──────────────┐
              │              │              │
              ▼              ▼              ▼
       [:FOLLOWS]      [:WORKS_AT]    [:PURCHASED]
       since: 2023                    date: 2024-01
              │              │              │
              ▼              ▼              ▼
    ┌─────────────────┐  ┌─────────────┐  ┌─────────────────┐
    │  :Person        │  │  :Company   │  │  :Product       │
    │  name: "Bob"    │  │  name:      │  │  name: "Laptop" │
    │  age: 28        │  │  "Acme Inc" │  │  price: 999     │
    └────────┬────────┘  └─────────────┘  └─────────────────┘
             │
             ▼
       [:FOLLOWS]
             │
             ▼
    ┌─────────────────┐
    │  :Person        │
    │  name: "Carol"  │
    └─────────────────┘

Cypher Query: MATCH (a:Person)-[:FOLLOWS]->(b)-[:FOLLOWS]->(c)
              WHERE a.name = "Alice"
              RETURN c.name
              → Returns "Carol" (friend-of-friend)

Nodes have labels (:Person, :Product) and properties. Edges have types (FOLLOWS, PURCHASED) and can also have properties. Traversing relationships is a direct pointer lookup, not a join.


How They Work

Nodes and Properties

Nodes represent entities (people, products, locations). Each node can have properties (key-value pairs) and labels (categories like “Person” or “Product”).

Edges and Relationships

Edges connect nodes with typed, directed relationships. An edge from Alice to Bob might have type “FOLLOWS.” Edges can have properties too (the date the follow relationship was created).

Index-Free Adjacency

This is the key architectural feature. Each node physically stores pointers to its connected nodes. Traversing from one node to its neighbors doesn’t require index lookups or joins; it’s following pointers. This makes graph databases O(1) for traversals regardless of total data size.

Graph Query Languages

Specialized languages express graph patterns naturally. Cypher (Neo4j’s language) lets you write patterns like (alice)-[:FOLLOWS]->(bob)-[:FOLLOWS]->(charlie) to find people Alice follows who also follow Charlie. Gremlin takes a traversal-based approach where you step through the graph programmatically.

Graph Algorithms

Graph databases often include built-in algorithms for:

  • Pathfinding: Shortest path between nodes
  • Centrality: Identifying influential nodes
  • Community detection: Finding clusters
  • Similarity: Finding similar nodes based on neighborhood structure

Why They Excel

Relationship-Heavy Queries

Any query that asks about connections is natural and fast: “who knows whom,” “what’s connected to what,” or “how are these related.”

Variable-Length Paths

“Find all paths from A to B with up to 6 hops” is a simple query. In SQL, you’d need recursive CTEs or multiple self-joins.

Schema Flexibility

Add new node types and relationship types without migrations. The graph evolves with your domain.

Whiteboard-Friendly Modeling

Domain experts draw diagrams with boxes and arrows. Graph databases store those diagrams directly.


Why They Struggle

Aggregate Queries

“How many orders did we have last month?” requires touching every order node. Relational databases with proper indexes handle this better.

High-Volume Writes

The pointer-based structure that makes reads fast makes writes more complex. Updating a highly-connected node touches many pointers.

Global Operations

Anything that requires scanning the entire graph rather than traversing from a starting point will be slow.

Horizontal Scaling

Partitioning a graph across servers is hard because traversals cross partition boundaries constantly. Some graph databases offer clustering, but it’s more complex than partitioning key-value or document data.


When to Use Them

Graph databases shine for:

  • Social networks: Friends, followers, connections
  • Fraud detection: Identifying suspicious patterns across accounts, devices, and transactions
  • Recommendation engines: Collaborative filtering based on user-item graphs
  • Knowledge graphs: Representing complex domains with rich relationships
  • Network infrastructure: Dependencies, impact analysis, topology

When to Look Elsewhere

If your queries are primarily CRUD operations without relationship traversal, if you need high-throughput writes, or if your main access pattern is aggregation rather than navigation, graph databases add complexity without benefit.


Examples

Neo4j dominates the graph database market, with the Cypher query language, ACID transactions, and both community and enterprise editions.

Amazon Neptune provides a managed service supporting both property graphs (Gremlin) and RDF graphs (SPARQL).

JanusGraph is open source and runs on various storage backends (Cassandra, HBase, BerkeleyDB), allowing you to leverage existing infrastructure.

TigerGraph focuses on performance for large-scale analytics, with native parallel processing for graph algorithms.


Found this guide helpful? Share it with your team:

Share on LinkedIn