SQL Engine Implementation Status
On this page
- Overview
- Current Implementation Status
- ✅ Fully Implemented
- Architecture
- Actual Flow (SQL API - Implemented)
- Implementation Details (Completed)
- DDL Implementation (Completed)
- DML Implementation (Completed)
- Server Integration (Completed)
- Testing (Completed)
- Event Format Design
- Event Structure
- Schema Event Format
- Constraints & Validation
- Primary Key Enforcement
- Foreign Keys
- CHECK Constraints
- NOT NULL
- Performance Considerations
- Batch Operations
- Schema Cache
- Migration from Event API
- Option 1: Dual API (Recommended)
- Option 2: SQL-Only
- Documentation Updates
- New Guides
- Updated Docs
- Implementation Timeline (Completed)
- ✅ Phase 1: DDL Foundation
- ✅ Phase 2: DML Implementation
- ✅ Phase 3: Server Integration
- ✅ Phase 4: Polish & Documentation
- Open Questions
- Success Metrics (Achieved)
- Functional ✅
- Testing ✅
- Quality ✅
- Performance 🚧
- Related Documents
Overview
The SQL engine provides a familiar SQL interface over Kimberlite’s event-sourcing core. It translates SQL statements into underlying stream operations while maintaining the immutable log and derived-view architecture.
Current Implementation Status
✅ Fully Implemented
SELECT Queries
- Parser: Full support via
sqlparsercrate - Planner: Query optimization with index selection
- Executor: Executes against projection store
- Features:
- Column selection (
SELECT id, nameorSELECT *) - Single-table queries (
FROM users) - WHERE predicates (
=,<,>,<=,>=,IN) - AND combinations
- ORDER BY (ASC/DESC)
- LIMIT
- Parameterized queries (
$1,$2, etc.) - Point-in-time queries (
query_at(position))
- Column selection (
Example:
SELECT id, name FROM users WHERE id = $1 ORDER BY name LIMIT 10
DDL (Data Definition Language)
- Status: Working - Full support for table and index management
- Implemented Commands:
-
CREATE TABLE- Define tables with columns, types, constraints, and primary keys -
DROP TABLE- Remove tables from the database -
CREATE INDEX- Create secondary indexes on columns
-
- Validation:
- Primary key requirement enforcement
- Column type validation
- Duplicate table detection
-
ALTER TABLE(ADD COLUMN, DROP COLUMN) - Schema evolution - Future:
CREATE PROJECTION
Example:
(
id BIGINT NOT NULL,
name TEXT NOT NULL,
created_at TIMESTAMP,
PRIMARY KEY (id)
);
ON patients (name);
patients;
DML (Data Manipulation Language)
- Status: Working - Full CRUD operations with parameter binding
- Implemented Commands:
-
INSERT INTO ... VALUES- Insert rows with literal or parameterized values -
UPDATE ... SET ... WHERE- Update rows matching WHERE predicates -
DELETE FROM ... WHERE- Delete rows matching WHERE predicates
-
- Features:
- Parameterized queries (
$1,$2) for all DML operations - Structured predicate serialization (not debug strings)
- Full projection materialization (INSERT/UPDATE/DELETE all update projections)
- Composite primary key support
- Parameterized queries (
- Validation:
- Column existence checking
- Type compatibility validation
- NOT NULL constraint enforcement
- Primary key NULL rejection
Examples:
-- Parameterized INSERT
INSERT INTO patients (id, name) VALUES ($1, $2);
-- UPDATE with WHERE clause
UPDATE patients SET name = 'Jane Smith' WHERE id = 1;
-- DELETE with WHERE clause
DELETE FROM patients WHERE id = 1;
-- Composite primary key
(
user_id BIGINT NOT NULL,
order_id BIGINT NOT NULL,
amount BIGINT,
PRIMARY KEY (user_id, order_id)
);
UPDATE orders SET amount = 6000 WHERE user_id = 1 AND order_id = 100;
Advanced Query Features (Completed)
HAVING Clause:
SELECT department, COUNT(*) as cnt
FROM employees
GROUP BY department
HAVING COUNT(*) > 5;
HavingConditionenum supports aggregate comparisons (COUNT, SUM, AVG, MIN, MAX)HavingOpenum: Eq, Lt, Le, Gt, Ge- Conditions evaluated after GROUP BY aggregation in executor
- 3 parser tests + executor integration
UNION / UNION ALL:
SELECT name FROM employees
UNION ALL
SELECT name FROM contractors;
ParsedUnionstruct with left/right SELECT andallflagQueryEngine::execute_union()executes both sides independently- UNION deduplicates via HashSet; UNION ALL keeps all rows
- Not supported in point-in-time queries (single-table only)
JOIN Support (INNER, LEFT):
SELECT p.name, a.appointment_date
FROM patients p
INNER JOIN appointments a ON p.id = a.patient_id;
QueryPlan::Joinvariant with nested scan plans- Hash join execution strategy
- LEFT JOIN with NULL-fill for non-matches
CTE (WITH) Support:
WITH active_patients AS (
SELECT * FROM patients WHERE status = 'active'
)
SELECT name, age FROM active_patients WHERE age > 30;
ParsedCtestruct with name and innerParsedSelectexecute_with_ctes()materializes each CTE as a temporary table, then executes the main queryWITH RECURSIVEexplicitly rejected
Subquery Support:
SELECT d.name, sub.total
FROM departments d
INNER JOIN (SELECT dept_id, COUNT(*) as total FROM employees GROUP BY dept_id) AS sub
ON d.id = sub.dept_id;
- Subqueries in FROM and JOIN clauses converted to inline CTEs
- Reuses CTE materialization infrastructure
ALTER TABLE:
patients ADD COLUMN email TEXT;
patients DROP COLUMN email;
ParsedAlterTablewith ADD COLUMN and DROP COLUMN operations
Code: crates/kimberlite-query/ and crates/kimberlite/
-
kimberlite-query/src/parser.rs- SQL parsing for SELECT, DDL, and DML -
kimberlite-query/src/planner.rs- Query planning -
kimberlite-query/src/executor.rs- Query execution -
kimberlite-query/src/schema.rs- Schema definitions -
kimberlite-query/src/key_encoder.rs- Lexicographic key encoding -
kimberlite/src/tenant.rs- DDL/DML execution and validation -
kimberlite/src/kimberlite.rs- Projection materialization for UPDATE/DELETE - 85+ tests passing (including comprehensive DML roundtrip tests)
Note: Planned SQL enhancements (advanced DDL, explicit transactions, query optimizations) are documented in ROADMAP.md.
Architecture
Actual Flow (SQL API - Implemented)
Client → CREATE TABLE → TenantHandle → Command::CreateTable → Kernel → Effect::TableMetadataWrite → Schema Update
Client → INSERT INTO → TenantHandle → Command::Insert → Kernel → Effect::StorageAppend → Log + Projection
Client → UPDATE/DELETE → TenantHandle → Command::Update/Delete → Kernel → Effect::UpdateProjection → Projection Update
Client → SELECT → QueryEngine → Projection Store (B+tree with MVCC)
Key Components:
- TenantHandle: SQL-to-Command translation layer
- Kernel: Pure state machine validates and produces effects
- Projection Store: Materialized views kept in sync via effects
- QueryEngine: Executes SELECT queries against projections
Implementation Details (Completed)
DDL Implementation (Completed)
Schema Commands
// crates/kimberlite-kernel/src/command.rs
DDL Parser
// crates/kimberlite-query/src/parser.rs
Table-to-Stream Mapping (Implemented)
Each table gets its own stream for event isolation:
CREATE TABLE patients → Command::CreateTable → kernel state tracks metadata
INSERT INTO patients → Command::Insert → appends to table's stream
Benefits:
- Clean separation of concerns per table
- Independent replay and recovery per table
- Simpler to reason about event ordering
DML Implementation (Completed)
INSERT Statement
SQL:
-- Literal values
INSERT INTO patients (id, name) VALUES (1, 'Alice');
-- Parameterized (recommended for security)
INSERT INTO patients (id, name) VALUES ($1, $2);
Implementation (crates/kimberlite/src/tenant.rs):
Projection Update (crates/kimberlite/src/kimberlite.rs):
"insert" =>
UPDATE Statement
SQL:
UPDATE patients SET name = 'Bob', status = 'active' WHERE id = 1;
Implementation:
Projection Update:
"update" =>
DELETE Statement
SQL:
DELETE FROM patients WHERE id = 1 AND status = 'inactive';
Implementation:
Projection Update:
"delete" =>
Server Integration (Completed)
The SQL engine is fully integrated with kimberlite-server and the REPL:
Request Handler
All SQL statements are routed through TenantHandle::execute():
// Client request
client.execute
// Server handler (kimberlite-server/src/handler.rs)
let result = tenant.execute?;
// TenantHandle routes to appropriate handler
match parse_statement?
REPL Support
The REPL (kimberlite-cli) supports all statement types:
;
|
|
;
|
|
;
|
|
;
|
|
;
|
|
Testing (Completed)
Unit Tests (85+ passing)
- DDL parsing (
CREATE TABLE,DROP TABLE,CREATE INDEX) - DML parsing (
INSERT,UPDATE,DELETE) with placeholders - Parameter binding for all DML operations
- Predicate serialization (structured JSON, not debug strings)
- Projection updates for INSERT/UPDATE/DELETE
- Column existence validation
- Type compatibility checking
- NOT NULL constraint enforcement
- Primary key NULL rejection
Integration Tests
Validation Tests
Event Format Design
Event Structure
Every DML operation becomes an event in the table’s stream:
Schema Event Format
Schema changes go to the __schema stream:
Constraints & Validation
Primary Key Enforcement
- Kernel validates uniqueness before append
- Return
KernelError::PrimaryKeyViolationif duplicate
Foreign Keys
- Phase 1: Not supported (too complex for initial implementation)
- Future: Add
CreateForeignKeycommand
CHECK Constraints
- Phase 1: Not supported
- Future: Validate in kernel before append
NOT NULL
- Validated at parse time, enforced in kernel
Performance Considerations
Batch Operations
Support bulk inserts for efficiency:
INSERT INTO patients (id, name) VALUES (1, 'Alice'), (2, 'Bob'), (3, 'Charlie');
Translates to single AppendBatch with multiple events.
Schema Cache
Cache parsed schemas in memory to avoid repeated parsing:
Invalidate when __schema stream advances.
Migration from Event API
For users currently using CreateStream + AppendBatch:
Option 1: Dual API (Recommended)
Support both APIs indefinitely:
- Event API: Low-level, full control
- SQL API: High-level, familiar syntax
Option 2: SQL-Only
Deprecate event API, force migration:
- More consistent
- Simpler to maintain
- Breaking change for existing users
Recommendation: Option 1 - Keep both APIs, SQL as higher-level convenience.
Documentation Updates
New Guides
docs/guides/sql-quickstart.md- SQL tutorialdocs/guides/schema-management.md- DDL best practicesdocs/guides/data-modeling.md- Table design patterns
Updated Docs
docs/ARCHITECTURE.md- Add SQL layer diagramdocs/guides/quickstart-*.md- Use SQL examplesREADME.md- Show SQL in main example
Implementation Timeline (Completed)
✅ Phase 1: DDL Foundation
- Added
Command::CreateTable,Command::DropTable,Command::CreateIndex - Implemented kernel logic for schema commands
- Added schema state tracking in kernel
- Parsed DDL statements via sqlparser
- Unit tests for DDL (27+ tests)
✅ Phase 2: DML Implementation
- Added
Command::Insert,Command::Update,Command::Delete - Implemented DML-to-event translation with parameter binding
- Projection updates for all DML events (INSERT/UPDATE/DELETE)
- Parsed DML statements with placeholder support
- Unit tests for DML (50+ tests)
✅ Phase 3: Server Integration
- Wired SQL engine to
kimberlite-server - Updated REPL for full DDL/DML support
- Integration tests (DDL + DML + Query roundtrips)
- Comprehensive validation layer
✅ Phase 4: Polish & Documentation
- Error message improvements
- Documentation updates (this file, README, website)
- 🚧 Performance benchmarks (planned)
- 🚧 VOPR tests for crash recovery (planned)
Open Questions
Transaction Boundaries: Should
INSERTbe auto-commit or support explicit transactions?- Recommendation: Auto-commit for Phase 1, add
BEGIN/COMMITlater
- Recommendation: Auto-commit for Phase 1, add
Schema Evolution: How to handle
ALTER TABLE ADD COLUMN?- Resolved:
ALTER TABLE ADD COLUMNandDROP COLUMNimplemented in parser (v0.5.0)
- Resolved:
Query Planner: Should we optimize JOIN queries in projections?
- Recommendation: Yes, but only for projection definitions, not runtime queries
Error Recovery: What happens if projection update fails after event append?
- Recommendation: Projection rebuilds from log on next startup (already handled by architecture)
Success Metrics (Achieved)
Functional ✅
- All DDL statements parse and execute (CREATE TABLE, DROP TABLE, CREATE INDEX)
- All DML statements parse and execute (INSERT, UPDATE, DELETE)
- Queries return correct results after DML (verified via roundtrip tests)
- Parameterized queries work for all DML operations ($1, $2, etc.)
- Schema changes tracked in kernel state
- Point-in-time queries work via MVCC
Testing ✅
- 85+ unit tests passing (parser, validation, projection updates)
- 27+ integration tests (end-to-end DDL/DML workflows)
- Comprehensive validation tests (NULL rejection, type checking, column validation)
- No regressions in existing query tests (all 27 kimberlite tests pass)
- Parameterized query tests
- Composite primary key tests
- UPDATE/DELETE projection materialization verified
Quality ✅
- No panics in library code (all .expect() replaced with proper error handling)
- Structured predicate serialization (not debug strings)
- Memory safety (fixed allocation bug)
- Correct offset calculations
- Full parameter binding support
Performance 🚧
- 🚧 INSERT throughput benchmarks (planned)
- 🚧 Bulk INSERT performance (planned)
- 🚧 SELECT latency p99 (planned)
Related Documents
- ARCHITECTURE.md - System design overview
- ROADMAP.md - Future enhancements and roadmap
- TESTING.md - Testing strategy
- PRESSURECRAFT.md - Coding standards
Status: Implemented and Working (DDL + DML + Query + JOINs + HAVING + UNION + ALTER TABLE + CTEs + Subqueries) Last Updated: 2026-02-08 Author: Kimberlite Team