Subgraph Indexing for SIMD in Graph Databases

Graph databases normally answer multi-hop queries by pointer-chasing through indexes — one hop at a time. sutraDB auto-discovers repeated subgraph shapes and flattens them into columnar pseudo-tables, turning 3-hop joins into SIMD-accelerated column scans. Click "Run Demo" to watch the transformation.

Query

SELECT ?country ?mayor
WHERE {
  ?country :hasCapital ?city .
  ?city :hasMayor ?mayor .
  ?mayor :birthDate ?bd .
  FILTER(?bd < "1960-01-01")
}

The Problem

This 3-hop query normally requires 3 separate index lookups per candidate, each chasing pointers through a B-tree/LSM index. The CPU stalls waiting on cache misses — SIMD can't help because the data isn't contiguous.

The Innovation

sutraDB discovers that many nodes share the same subgraph shape (country → capital → mayor → birthDate) and materializes a columnar pseudo-table where each column is packed contiguously in memory. Now AVX2 can compare 4 TermIDs per cycle — what was 3 joins becomes a single vectorized scan.

Depth Threshold

Deeper pseudo-tables are riskier (wider invalidation surface), so the qualification threshold scales geometrically: depth 1 = base, depth 2 = base × 4, depth 3 = base × 9. Only overwhelmingly common patterns get materialized.

Steps