Data Model

On this page

Three Levels of Abstraction
Level 1: Tables (Most Developers Start Here)
Level 2: Events (Audit/History)
Level 3: Custom Projections (Advanced)
Event Structure
The Log
Segment Structure
Record Format
Hash Chaining
Projections (Derived Views)
How Projections Work
Example: Table Projection
Time-Travel Queries
Stream Model
Idempotency
Compaction (Future)
Key Takeaways
Related Documentation

Kimberlite’s data model is based on a single principle: all data is an immutable, ordered log; all state is a derived view.

Three Levels of Abstraction

Kimberlite presents three levels to developers, allowing you to work at the appropriate level of detail.

Level 1: Tables (Most Developers Start Here)

Most developers interact with Kimberlite through a familiar table abstraction:

-- DDL defines the projection
CREATE TABLE records (
    id          BIGINT PRIMARY KEY,
    name        TEXT NOT NULL,
    created_at  TIMESTAMP,
    region      TEXT
);

-- DML appends to the log and updates projection
INSERT INTO records (id, name, created_at, region)
VALUES (1, 'Record Alpha', '2024-01-15 10:30:00', 'us-east');

-- Queries read from projection
SELECT * FROM records WHERE id = 1;

Under the hood:

INSERT becomes an event appended to the tenant’s log
The projection engine applies the event to update the records projection
SELECT reads from the materialized projection (B+tree)

Level 2: Events (Audit/History)

When you need the audit trail, query the event stream directly:

-- See all events in a stream
SELECT * FROM __events
WHERE stream = 'records'
ORDER BY position DESC
LIMIT 10;

-- Point-in-time reconstruction
SELECT * FROM records AS OF POSITION 12345
WHERE id = 1;

Events are the source of truth. Tables are just a convenient view.

Level 3: Custom Projections (Advanced)

For complex read models, define custom projections:

-- Projection with JOIN (computed at write time, not query time)
CREATE PROJECTION entity_summary AS
SELECT
    e.id,
    e.name,
    COUNT(a.id) as activity_count,
    MAX(a.timestamp) as last_activity
FROM entities e
LEFT JOIN activities a ON a.entity_id = e.id
GROUP BY e.id, e.name;

-- Query the pre-computed view
SELECT * FROM entity_summary WHERE id = 1;

Projections are maintained incrementally as events arrive, not computed at query time.

Event Structure

Every event in the log has this structure:

struct Event {
    // Position in the log
    position: LogPosition,

    // Cryptographic chain
    prev_hash: Hash,
    hash: Hash,

    // Metadata
    tenant_id: TenantId,
    stream: StreamId,
    timestamp: Timestamp,
    caused_by: Option<EventId>,  // Correlation

    // Payload
    event_type: EventType,
    data: Bytes,

    // Integrity
    checksum: Crc32,
}

Key fields:

position - Global sequence number in the log
prev_hash - Links to previous event (tamper-evident chain)
hash - Hash of this event’s contents
tenant_id - Which tenant owns this event
stream - Logical grouping within tenant
timestamp - When the event was committed
caused_by - Optional correlation to parent event
event_type - What kind of event (INSERT, UPDATE, DELETE, etc.)
data - Application payload (serialized bytes)
checksum - CRC32 for integrity

The Log

The append-only log is the system of record. All state derives from it.

Segment Structure

The log is divided into segments for manageability:

data/
├── log/
│   ├── 00000000.segment     # Segment 0: positions 0-999999
│   ├── 00000001.segment     # Segment 1: positions 1000000-1999999
│   ├── 00000002.segment     # Segment 2: current active segment
│   └── index.meta           # Segment index metadata

Each segment is a sequential file of records:

┌─────────────────────────────────────────────────────────────────┐
│ Segment File                                                     │
│                                                                  │
│  ┌──────────┬──────────┬──────────┬──────────┬─────────────┐   │
│  │ Record 0 │ Record 1 │ Record 2 │   ...    │  Record N   │   │
│  └──────────┴──────────┴──────────┴──────────┴─────────────┘   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Record Format

Each record is self-describing and integrity-checked:

┌──────────────────────────────────────────────────────────────────┐
│ Record (on disk)                                                  │
│                                                                   │
│  ┌─────────┬──────────┬───────────┬──────────┬────────────────┐ │
│  │ Length  │ Checksum │ Prev Hash │ Metadata │     Data       │ │
│  │ (4 B)   │ (4 B)    │ (32 B)    │ (var)    │    (var)       │ │
│  └─────────┴──────────┴───────────┴──────────┴────────────────┘ │
│                                                                   │
└──────────────────────────────────────────────────────────────────┘

Length: Total record size in bytes (u32)
Checksum: CRC32 of the entire record (excluding length field)
Prev Hash: SHA-256 hash of the previous record (32 bytes)
Metadata: Position, tenant, stream, timestamp, event type
Data: Application payload (serialized bytes)

Hash Chaining

Every record includes the hash of the previous record, creating a tamper-evident chain:

Record N-1          Record N            Record N+1
┌─────────┐         ┌─────────┐         ┌─────────┐
│ ...     │         │ ...     │         │ ...     │
│ hash ───┼────────►│ prev    │         │ prev    │
│         │         │ hash ───┼────────►│ hash    │
└─────────┘         └─────────┘         └─────────┘

If any record is modified, all subsequent hashes become invalid.

Projections (Derived Views)

Projections are materialized views derived from the log.

How Projections Work

┌─────────────────────────────────────────────────────────────────┐
│                        Append-Only Log                           │
│  [Event 1] → [Event 2] → [Event 3] → ... → [Event N]             │
└────────────────────┬───────────────────────────────────────────┘
                     │ Apply
                     ▼
┌─────────────────────────────────────────────────────────────────┐
│                        Projection                                │
│                                                                  │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ B+Tree Index                                              │   │
│  │  ┌────┐  ┌────┐  ┌────┐                                  │   │
│  │  │ K1 │  │ K2 │  │ K3 │  ...                              │   │
│  │  └──┬─┘  └──┬─┘  └──┬─┘                                  │   │
│  │     │       │       │                                     │   │
│  │     ▼       ▼       ▼                                     │   │
│  │  [Row 1] [Row 2] [Row 3] ...                              │   │
│  └──────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘

Key properties:

Incrementally maintained: New events update projections immediately
Rebuildable: Delete a projection and rebuild it from the log
Multiple projections: Different views for different query patterns
MVCC: Multiple versions for point-in-time queries

Example: Table Projection

CREATE TABLE patients (
    id BIGINT PRIMARY KEY,
    name TEXT NOT NULL,
    dob DATE,
    region TEXT
);

Behind the scenes:

Creates a B+tree projection keyed by id
On INSERT: Add row to B+tree
On UPDATE: Modify row (keeping old version for MVCC)
On DELETE: Mark row as deleted (keeping tombstone)

Time-Travel Queries

Query data as it existed at any point:

-- As of specific timestamp
SELECT * FROM patients
AS OF TIMESTAMP '2024-01-15 10:30:00'
WHERE id = 123;

-- As of specific log position
SELECT * FROM patients
AS OF POSITION 1000
WHERE region = 'us-east';

How it works:

Each row version stores created_at position
Query scans B+tree, filters versions by position
Returns rows visible at that position

Stream Model

Events are organized into streams:

Tenant 1
├── Stream A (e.g., "patients")
│   ├── Event 1
│   ├── Event 2
│   └── Event 3
├── Stream B (e.g., "appointments")
│   ├── Event 1
│   └── Event 2
└── Stream C (e.g., "billing")
    └── Event 1

Tenant 2
├── Stream A
│   └── Event 1
└── Stream B
    └── Event 1

Stream properties:

Logical grouping within a tenant
Enables per-stream queries
Supports event correlation
Maps to tables in SQL layer

Idempotency

Every command includes an idempotency ID to prevent duplicates:

// Client generates ID before first attempt
let idempotency_id = IdempotencyId::generate();

// First attempt
let result = client.execute_with_id(idempotency_id, cmd).await;

// If network fails, retry with SAME ID
let result = client.execute_with_id(idempotency_id, cmd).await;
// Returns same result without re-executing

The log stores idempotency IDs, ensuring exactly-once semantics.

Compaction (Future)

While the log is append-only, old segments can be compacted to reclaim space:

Snapshot projection at position P
Delete log segments before P
Store snapshot as new “base” for replays

This preserves audit trails while managing storage.

Status: Planned for v0.6.0. See [ROADMAP.md].

Key Takeaways

Log is truth, state is cache: Projections can always be rebuilt from the log
Immutability: Nothing is ever modified or deleted in the log
Time-travel: Query any point in history
Tamper-evident: Hash chains detect modifications
Idempotent: Commands can be safely retried

Architecture - System overview
Consensus - How VSR replicates the log
Compliance - Audit trails and tamper evidence
Storage Internals - Low-level storage format

Remember: In Kimberlite, there is no “current state.” There is only “what the log says.” Everything else is just a convenient view.

Getting started

Concepts

Coding

Quickstarts

Guides

Recipes

Reference

CLI reference

SQL reference

SDK reference

Operations

Cloud deployment

Internals

Architecture

Design docs

Testing

Formal verification

Compliance

Resources

Recent

Suggestions

Three Levels of Abstraction

Level 1: Tables (Most Developers Start Here)

Level 2: Events (Audit/History)

Level 3: Custom Projections (Advanced)

Event Structure

The Log

Segment Structure

Record Format

Hash Chaining

Projections (Derived Views)

How Projections Work

Example: Table Projection

Time-Travel Queries

Stream Model

Idempotency

Compaction (Future)

Key Takeaways

Three Levels of Abstraction

Level 1: Tables (Most Developers Start Here)

Level 2: Events (Audit/History)

Level 3: Custom Projections (Advanced)

Event Structure

The Log

Segment Structure

Record Format

Hash Chaining

Projections (Derived Views)

How Projections Work

Example: Table Projection

Time-Travel Queries

Stream Model

Idempotency

Compaction (Future)

Key Takeaways

Related Documentation