Kimberlite Reference

Frequently Asked Questions

On this page

General

What is Kimberlite?

Kimberlite is a compliance-first database built on a single principle: All data is an immutable, ordered log. All state is a derived view.

It combines:

  • Append-only storage (like event sourcing)
  • SQL interface (like PostgreSQL)
  • MVCC time-travel queries (like Datomic)
  • Deterministic consensus (VSR protocol)
  • Hash-chained tamper-evidence (like blockchain, without the blockchain)

Target users: Healthcare systems, financial services, legal tech, and any domain requiring audit trails and compliance.

Who should use Kimberlite?

Ideal for:

  • 🏥 Healthcare apps needing HIPAA compliance and audit trails
  • 🏛️ Legal tech requiring tamper-evident record keeping
  • 💰 Financial services with regulatory requirements (SOC 2, PCI DSS)
  • 🔬 Database researchers exploring immutable log architectures
  • 💻 Systems programmers learning deterministic design patterns

Not ideal for:

  • High-frequency trading (latency-sensitive)
  • Social media feeds (append-only is overkill)
  • Ephemeral data (caches, sessions)
  • Workloads requiring in-place updates

Is Kimberlite production-ready?

Current status (v0.4.0): Developer preview focused on learning and exploration.

  • Core is solid: 1,300+ tests, deterministic simulation testing, production-grade crypto
  • Architecture is stable: No major breaking changes planned
  • APIs are evolving: v0.x means breaking changes possible (follow SemVer)
  • Limited deployment experience: Not yet battle-tested at scale

Recommendation: Use for:

  • Internal tools and prototypes
  • Learning database internals
  • Compliance research and evaluation

Wait for v1.0 if you need:

  • Guaranteed API stability
  • Large-scale production deployments
  • 24/7 commercial support

Is Kimberlite a blockchain?

No. Kimberlite uses hash chains (like blockchain) but is fundamentally different:

FeatureBlockchainKimberlite
ConsensusProof-of-Work / PoSVSR (Viewstamped Replication)
Trust modelTrustless (anyone can join)Trusted replicas (permissioned)
PerformanceSlow (global consensus)Fast (cluster consensus)
PurposeDecentralized ledgerCentralized audit-first database
Energy useHigh (mining)Low (standard servers)

TL;DR: Kimberlite borrows hash chains for tamper-evidence, but runs on standard servers with trusted replicas. No mining, no tokens, no decentralization.

Architecture

Why append-only? Isn’t that wasteful?

Benefits outweigh costs for compliance workloads:

Audit trail is free - History is built-in, no separate audit table Time-travel queries - View state at any past timestamp Simpler concurrency - No locks, no in-place updates Tamper-evident - Hash chain detects modification Deterministic replay - Reproduce exact state from log

Storage trade-off:

  • 1M patient records ≈ 500 MB (with 5 updates each)
  • Use retention policies to prune old versions
  • Compression reduces storage by 60-70%

Real-world comparison:

  • PostgreSQL with audit triggers: Similar storage, worse query performance
  • Event sourcing: Same append-only model, but no SQL interface

How does MVCC work?

Each row version has two timestamps:

CREATE TABLE patients (
    id INTEGER,
    name TEXT,
    _created_at TIMESTAMP,  -- implicit
    _deleted_at TIMESTAMP   -- implicit, NULL if current
);

Query at time T:

SELECT * FROM patients AS OF TIMESTAMP '2026-02-03 10:00:00';

Returns rows where:

_created_at <= '2026-02-03 10:00:00'
AND (_deleted_at > '2026-02-03 10:00:00' OR _deleted_at IS NULL)

Under the hood:

  1. Log contains all versions: [v1, v2, v3, ...]
  2. Query specifies point-in-time
  3. Kernel filters versions by timestamp
  4. Returns consistent snapshot

No read locks needed - queries never block writes.

How fast is it compared to PostgreSQL?

Current benchmarks (v0.4.0, M1 Mac):

OperationPostgreSQLKimberliteNotes
Single insert150 μs200 μs33% slower (hash chain overhead)
Batch insert (1K)80 ms90 ms12% slower (CRC32 + hashing)
Point query50 μs60 μs20% slower (MVCC filtering)
Time-travel queryN/A80 μsFree (no audit table)
Full table scan (1M rows)300 ms450 ms50% slower (version filtering)

Key insight: Kimberlite trades 10-50% performance for audit trail + time-travel queries.

When Kimberlite is faster:

  • Audit queries (no separate audit table)
  • Historical analysis (MVCC built-in)
  • Compliance reports (tamper-evidence free)

When PostgreSQL is faster:

  • Pure transactional workloads (OLTP)
  • Workloads not needing history
  • In-place updates (Kimberlite always appends)

What’s the “Functional Core / Imperative Shell” pattern?

The kernel is a pure function:

fn apply_committed(
    state: State,
    cmd: Command
) -> Result<(State, Vec<Effect>)>

No IO inside the kernel:

  • No file operations
  • No network calls
  • No system clock (Instant::now())
  • No randomness (except via explicit RNG parameter)

Why?

  • Deterministic - Same inputs → same outputs (enables simulation testing)
  • Testable - No mocks needed, pure unit tests
  • Replayable - Rebuild state from log perfectly
  • Simple - Easy to reason about, no hidden state

The shell executes effects:

for effect in effects {
    match effect {
        Effect::WriteLog(entry) => storage.append(entry),
        Effect::SendMessage(msg) => network.send(msg),
    }
}

See docs/ARCHITECTURE.md for details.

Compliance

Does Kimberlite meet HIPAA requirements?

Kimberlite provides technical controls for HIPAA compliance:

Audit trails (§164.312(b)) - Immutable log records all data access Integrity controls (§164.312(c)(1)) - Hash chains detect tampering Access controls (§164.312(a)(1)) - Multi-tenant isolation Encryption (§164.312(a)(2)(iv)) - AES-256-GCM for data at rest

You still need:

  • Business Associate Agreement (BAA)
  • Administrative safeguards (policies, training)
  • Physical security (server room access)
  • Breach notification procedures

Kimberlite ≠ automatic HIPAA compliance. It’s a compliance-friendly tool, not a compliance solution.

See docs/COMPLIANCE.md for detailed guidance.

What about GDPR “right to be forgotten”?

GDPR Article 17 (right to erasure) seems incompatible with immutable logs. Here’s how Kimberlite handles it:

1. Logical deletion (preferred):

UPDATE patients SET name = '[REDACTED]', email = '[REDACTED]' WHERE id = 123;

Creates new version with redacted data. Old versions stay in log, but:

  • Marked as “superseded by redaction”
  • Not returned in queries
  • Can be physically purged after retention period

2. Tombstone marker:

DELETE FROM patients WHERE id = 123;

Inserts tombstone: {id: 123, _deleted_at: now(), reason: 'GDPR erasure'}

3. Physical purging (advanced):

Kimberlite can rewrite the log, removing specific entries:

kimberlite purge --entity-id 123 --reason "GDPR Art. 17 request" ./data

Creates new hash chain starting from purge point. Old chain archived for audit.

Legal note: Consult a lawyer. GDPR has exceptions for legal obligations and public interest.

Can I use it for SOC 2 compliance?

Yes! Kimberlite helps with SOC 2 Trust Service Criteria:

CriterionHow Kimberlite Helps
CC6.1 (Logical access controls)Multi-tenant isolation, role-based access
CC7.1 (Detect threats)Integrity monitoring via hash chains
CC7.2 (Monitor system)Event log records all operations
CC8.1 (Change management)Schema migrations tracked in log

See docs/COMPLIANCE.md for SOC 2 evidence templates.

Operations

How do I back up a Kimberlite database?

Simple approach (development):

# Stop the server
kimberlite stop ./data

# Copy the entire data directory
tar -czf backup-$(date +%Y%m%d).tar.gz ./data

# Restart server
kimberlite start --address 127.0.0.1:3000 ./data

Production approach:

  1. Streaming replication - Run 3+ replicas, one is always a backup
  2. Point-in-time recovery - Export log segments to S3/GCS:
    kimberlite export --since '2026-02-01' --output backup.kmb ./data
    
  3. Checkpoint + incremental - Export checkpoint + log delta

See docs/BACKUP.md for detailed procedures.

How do I monitor Kimberlite in production?

Kimberlite exposes Prometheus metrics at /metrics:

# Key metrics
kimberlite_log_offset         # Current log position
kimberlite_hash_chain_valid   # Hash chain integrity (0/1)
kimberlite_crc_errors_total   # Corruption detections
kimberlite_vsr_view           # Current consensus view
kimberlite_query_duration_seconds  # Query latency (histogram)

Alerting rules:

- alert: HashChainBroken
  expr: kimberlite_hash_chain_valid == 0
  severity: critical

- alert: HighCRCErrors
  expr: rate(kimberlite_crc_errors_total[5m]) > 0
  severity: critical

See docs/MONITORING.md for Grafana dashboards.

What’s the disaster recovery strategy?

RPO (Recovery Point Objective): Zero data loss if you have 3+ replicas

RTO (Recovery Time Objective): Depends on data size

  • 1 GB: ~30 seconds (hash chain verification)
  • 100 GB: ~5 minutes
  • 1 TB: ~30 minutes

Recovery scenarios:

  1. Single node failure → Automatic failover to replica (30s)
  2. Data corruption → Restore from last valid checkpoint (5 min)
  3. Complete cluster loss → Restore from offsite backup (30 min - 2 hours)
  4. Logical error (bad DELETE) → Time-travel query to recover data (instant)

See docs/DISASTER_RECOVERY.md for runbooks.

Development

How do I contribute?

  1. Read the docs:

  2. Pick an issue:

    • Good first issues: https://github.com/kimberlitedb/kimberlite/labels/good-first-issue
    • Join Discord to discuss before starting
  3. Follow the checklist:

    • No unsafe code (workspace lint enforces)
    • No recursion (use bounded loops)
    • 2+ assertions per function
    • Tests pass (just test)
    • Clippy clean (just clippy)
  4. Submit PR using the template (includes PRESSURECRAFT checklist)

When will APIs be stable?

Target timeline:

  • v0.5.0 (Q2 2026) - SQL dialect stabilization
  • v0.8.0 (Q4 2026) - Client API freeze
  • v1.0.0 (Q2 2027) - Full API stability guarantee

What changes between v0.x versions:

  • SQL syntax (new keywords, functions)
  • Storage format (will provide migration tools)
  • Client protocol (breaking changes documented)

What WON’T change:

  • Core invariants (immutable log, hash chains, FCIS)
  • VOPR testing framework
  • Compliance guarantees

Subscribe to releases: https://github.com/kimberlitedb/kimberlite/releases

Where’s the roadmap?

See ROADMAP.md for planned features:

v0.5.0 (Q2 2026):

  • Streaming replication
  • Retention policies
  • Full-text search

v0.8.0 (Q4 2026):

  • GraphQL interface
  • Change data capture (CDC)
  • Multi-region clustering

v1.0.0 (Q2 2027):

  • API stability guarantee
  • Commercial support options
  • Certified compliance templates

Comparisons

Kimberlite vs PostgreSQL?

FeaturePostgreSQLKimberlite
ModelRelational (mutable)Relational (immutable)
HistoryAudit triggers (manual)Built-in (MVCC)
Time-travelVia extensionsNative SQL
IntegrityChecksumsHash chains + CRC32
ConsensusStreaming replicationVSR (deterministic)
Best forGeneral OLTPCompliance-heavy workloads

When to use PostgreSQL: General purpose, mature ecosystem, need 100+ extensions

When to use Kimberlite: Audit trails, compliance, tamper-evidence, time-travel queries

Kimberlite vs EventStoreDB?

FeatureEventStoreDBKimberlite
ModelEvent sourcingEvent sourcing + SQL views
QueryStream reads, projectionsSQL (familiar)
SchemaSchemaless (JSON events)Typed (SQL DDL)
Time-travelReplay eventsSQL queries
Best forEvent-driven architecturesCompliance + traditional apps

Key difference: EventStoreDB is event streams with projections. Kimberlite is SQL database backed by event log.

Kimberlite vs Datomic?

FeatureDatomicKimberlite
ModelDatalog (facts)SQL (tables)
Time-travelAs-of queries (native)AS OF TIMESTAMP (SQL)
StoragePluggable (DynamoDB, etc.)Append-only log (native)
LicenseProprietaryApache 2.0 (open source)
Best forClojure apps, graph queriesCompliance, SQL familiarity

Key difference: Datomic is Datalog-native. Kimberlite targets SQL users needing immutability.


Didn’t find your question? Ask on Discord: https://discord.gg/QPChWYjD