Production Assertion Strategy
On this page
- Overview
- Table of Contents
- Decision Matrix
- Use assert!() (Production Enforcement) When:
- Use debug_assert!() (Development Only) When:
- Never Use Assertions For:
- The 38 Promoted Assertions
- Cryptography (25 assertions)
- Consensus (9 assertions)
- State Machine (4 assertions)
- Writing Effective Assertions
- Message Quality
- Assertion Density
- Assertion Pairing
- Testing Assertions
- Unit Tests with #[should_panic]
- Property-Based Testing
- Performance Considerations
- Measurement
- Actual Impact
- Optimization Techniques
- Incident Response
- When Production Assertions Fire
- Root Cause Analysis
- Prevention
- Examples by Crate
- kimberlite-crypto
- kimberlite-vsr
- kimberlite-kernel
- FAQ
- Summary
Overview
This document provides comprehensive guidance on assertion usage in Kimberlite, covering when to use production assertions (assert!()) vs development assertions (debug_assert!()), and how to write effective assertions that catch bugs early.
Key Principle: Assertions are executable documentation of invariants. They detect corruption, Byzantine attacks, and state machine bugs BEFORE they propagate.
Table of Contents
- Decision Matrix
- The 38 Promoted Assertions
- Writing Effective Assertions
- Testing Assertions
- Performance Considerations
- Incident Response
Decision Matrix
Use assert!() (Production Enforcement) When:
Cryptographic Invariants:
- All-zero detection (keys, hashes, nonces, signatures)
- Key hierarchy integrity (Master→KEK→DEK wrapping)
- Ciphertext validation (auth tag presence, minimum sizes)
- Rationale: Cryptographic failures can compromise entire system security
Consensus Safety Properties:
- Leader-only operations (only leader can prepare)
- View number monotonicity (prevents rollback attacks)
- Commit number monotonicity (prevents uncommit)
- Sequential commit ordering (prevents gaps)
- Quorum validation (f+1 responses required)
- Rationale: Consensus violations lead to data loss or divergence
State Machine Invariants:
- Stream existence postconditions (stream must exist after creation)
- Effect count validation (ensures complete audit log)
- Offset monotonicity (append-only guarantee)
- Stream metadata consistency
- Rationale: State machine bugs propagate and corrupt derived state
Compliance-Critical Properties:
- Tenant isolation (no cross-tenant data leakage)
- Audit trail completeness (every operation logged)
- Data integrity (checksums match content)
- Rationale: Compliance violations have legal/regulatory consequences
Memory Safety in Unsafe Blocks:
- Buffer bounds checking
- Alignment requirements
- Null pointer checks
- Rationale: Safety violations cause undefined behavior
Use debug_assert!() (Development Only) When:
Performance-Critical Hot Paths:
- Tight loops with assertion overhead >1% of function time
- After profiling confirms assertion is bottleneck
- Example: Per-byte checksum validation in tight loop
Redundant Checks:
- Type system already prevents the error
- Previous production assertion guarantees the property
- Example: After validating array bounds with
assert!(), subsequent accesses can usedebug_assert!()
Developer Convenience:
- Precondition checks that are always satisfied in correct usage
- Internal helper function invariants
- Example:
debug_assert!(sorted_array.is_sorted())after sorting
Never Use Assertions For:
Input Validation: Use
Resulttypes and return errors// WRONG: assert!; // RIGHT: if user_input.len >= MAX_SIZEControl Flow: Use
if/elseormatch// WRONG: assert!; do_something; // RIGHT: if conditionExpected Errors: Use error handling
// WRONG: let file = read_to_string; assert!; // RIGHT: let file = read_to_string .context?;
The 38 Promoted Assertions
Cryptography (25 assertions)
All-Zero Detection (11):
Location: crates/kimberlite-crypto/src/encryption.rs
// Encryption keys
assert!;
// Signing keys
assert!;
// Nonces (Initialization Vectors)
assert!;
// Hashes
assert!;
// Signatures
assert!;
Why: All-zero cryptographic material indicates:
- RNG failure (not properly seeded)
- Memory corruption (zeroed memory)
- Uninitialized data
- Replay attack (reused nonce)
Key Hierarchy Integrity (9):
Location: crates/kimberlite-crypto/src/chain.rs
// Master Key → KEK wrapping
assert!;
// KEK → DEK wrapping
assert!;
// Unwrapping validation
assert!;
Why: Key hierarchy violations compromise entire encryption scheme. If KEK is corrupted, all DEKs are unrecoverable.
Ciphertext Validation (5):
Location: crates/kimberlite-crypto/src/encryption.rs
// Minimum size check
assert!;
// Auth tag presence
assert!;
// Output buffer size
assert!;
Why: Ciphertext format violations indicate:
- Truncated data (storage corruption)
- Missing authentication tag (forgery attempt)
- Buffer overflow vulnerability
Consensus (9 assertions)
Leader-Only Operations (1):
Location: crates/kimberlite-vsr/src/replica/state.rs
assert!;
Why: Followers preparing operations violates VSR protocol and causes divergence. This is either a Byzantine attack or a critical logic bug.
View Number Monotonicity (2):
Location: crates/kimberlite-vsr/src/replica/view_change.rs
assert!;
assert!;
Why: View number regression enables rollback attacks where Byzantine leader reverts committed operations.
Commit Number Monotonicity (2):
Location: crates/kimberlite-vsr/src/replica/state.rs
assert!;
assert!;
Why: Commit regression or gaps violate linearizability and can cause data loss.
Quorum Validation (2):
Location: crates/kimberlite-vsr/src/checkpoint.rs, view_change.rs
assert!;
assert!;
Why: Quorum violations break Byzantine fault tolerance guarantees (tolerates f failures in 2f+1 cluster).
Cluster Membership (2):
Location: crates/kimberlite-vsr/src/replica/state.rs
assert!;
assert!;
Why: Messages from unknown replicas indicate configuration error or attack. Clusters <3 nodes cannot tolerate any failures.
State Machine (4 assertions)
Stream Existence Postconditions (1):
Location: crates/kimberlite-kernel/src/kernel.rs
assert!;
Why: If stream creation succeeds but stream doesn’t exist, state machine is broken and subsequent operations will fail.
Effect Count Validation (1):
Location: crates/kimberlite-kernel/src/kernel.rs
assert!;
Why: Every state-modifying command must produce at least one effect for audit trail completeness. Zero effects indicates bug.
Offset Monotonicity (1):
Location: crates/kimberlite-kernel/src/kernel.rs
assert!;
Why: Append-only streams must have monotonically increasing offsets. Violation breaks append-only guarantee.
Stream Metadata Consistency (1):
Location: crates/kimberlite-kernel/src/kernel.rs
assert!;
Why: Tenant isolation violation has compliance implications (HIPAA, GDPR). Must never allow cross-tenant access.
Writing Effective Assertions
Message Quality
Good assertion messages:
- State what failed (the invariant)
- Provide context (relevant values)
- Suggest possible causes
// EXCELLENT:
assert!;
// GOOD:
assert!;
// BAD:
assert!;
// TERRIBLE:
assert!;
Assertion Density
Target: 2+ assertions per function (precondition + postcondition)
Pattern: Assert at both write and read sites
- Write site: Production assertion (invariant enforcement)
- Read site: Debug assertion (invariant verification)
Assertion Pairing
Write assertions in pairs at boundaries:
// Encryption side:
// Decryption side:
Testing Assertions
Unit Tests with #[should_panic]
Every promoted assertion must have a corresponding test:
Test file location: crates/kimberlite-crypto/src/tests_assertions.rs contains all 38 tests.
Property-Based Testing
Use proptest to verify assertions fire on invalid inputs:
use *;
proptest!
Run with high iteration count:
PROPTEST_CASES=10000
Performance Considerations
Measurement
Benchmark before and after assertion promotion:
# Before
# After
# Compare
Acceptance criteria (from Phase 1 validation):
- Throughput regression < 1%
- p99 latency increase < 5μs
- p50 latency increase < 1μs
Actual Impact
After promoting 38 assertions:
- Throughput: <0.1% regression (within noise)
- p99 latency: +1μs
- p50 latency: <1μs
Conclusion: Production assertions have negligible performance impact when properly written.
Optimization Techniques
Early Exit: Check most likely failures first
// Fast path: length check before iteration assert!; // Slow path: iterate only if needed assert!;Const Evaluation: Use const expressions when possible
const MIN_CLUSTER_SIZE: usize = 3; assert!;Avoid Allocations: Don’t allocate in assertion messages
// GOOD (no allocation): assert!; // BAD (allocates String): assert!;Branch Prediction: Assertions are cold branches (never taken in correct execution)
- Modern CPUs predict not-taken by default
- No penalty in happy path
Incident Response
When Production Assertions Fire
Immediate Actions:
Isolate the Node:
- Remove from cluster immediately
- Do NOT restart (preserves forensic state)
- Prevent client connections
Capture State Dump:
# Capture core dump # Capture logs # Capture replica stateTriage by Category:
Cryptographic Assertions (all-zero keys, key hierarchy violations):
- Likely causes: Storage corruption, RNG failure, memory corruption
- Investigation:
- Check storage device SMART status:
smartctl -a /dev/sda - Verify RNG entropy:
cat /proc/sys/kernel/random/entropy_avail - Memory test:
memtester 1G 1 - Review storage write patterns (torn writes?)
- Check storage device SMART status:
Consensus Assertions (view monotonicity, commit ordering):
- Likely causes: Byzantine attack, logic bug, state corruption
- Investigation:
- Analyze message logs for Byzantine patterns
- Verify quorum agreement with other replicas
- Check for clock skew:
chronyc tracking - Review network partition history
State Machine Assertions (stream existence, offset monotonicity):
- Likely causes: Logic bug, concurrent modification, state corruption
- Investigation:
- Dump kernel state to JSON
- Check for race conditions in logs
- Verify serialization/deserialization correctness
- Review recent code changes
Root Cause Analysis
Use the assertion message to guide investigation:
assertion failed: ciphertext too short: 8 bytes, need at least 16 - storage corruption or truncated write
Investigation steps:
- Check disk space:
df -h - Check filesystem errors:
dmesg | grep -i error - Review storage layer logs for truncated writes
- Verify write atomicity guarantees
- Check backup integrity
Prevention
After root cause identified:
- Add Test Case: Reproduce the failure condition
- Add Monitoring: Detect early warning signs (disk errors, low entropy)
- Add Graceful Degradation: If possible, handle gracefully instead of panic
- Update Documentation: Document the incident and resolution
Examples by Crate
kimberlite-crypto
High value targets (already promoted):
- All-zero detection (keys, hashes, nonces)
- Key hierarchy integrity
- Ciphertext format validation
Still development-only (keep as debug_assert):
- Internal helper function invariants
- Performance-critical checks in tight loops
kimberlite-vsr
High value targets (already promoted):
- Leader-only operations
- View/commit monotonicity
- Quorum validation
- Cluster membership
Still development-only (keep as debug_assert):
- Internal state machine invariants after production checks
- Redundant checks after quorum validation
kimberlite-kernel
High value targets (already promoted):
- Stream existence postconditions
- Effect count validation
- Offset monotonicity
- Tenant isolation
Still development-only (keep as debug_assert):
- Internal stream metadata consistency after validation
- Index invariants (covered by type system)
FAQ
Q: Why promote assertions instead of returning Result?
A: Assertions are for invariants that MUST hold (bugs if violated). Results are for expected errors (invalid user input, network failures). If the error is recoverable, use Result. If it indicates a bug, use assertion.
Q: What’s the performance overhead?
A: Negligible. Our 38 promoted assertions added <0.1% throughput regression and +1μs p99 latency. Assertions are cold branches (predicted not-taken) and compile to simple comparisons.
Q: Should I panic in production?
A: For invariant violations, YES. Panicking prevents corruption propagation. Better to crash one replica and recover from healthy peers than to propagate corrupted state across the cluster.
Q: How do I test assertions?
A: Every assertion needs a #[should_panic] unit test that triggers it. Use property-based testing to verify assertions fire on all invalid inputs. See crates/kimberlite-crypto/src/tests_assertions.rs for examples.
Q: When should I use expect() instead of assert?
A: expect() is for Option/Result unwrapping with context. Use it when you KNOW the value is Some/Ok due to prior checks:
let value = map.get.expect;
Use assert!() for boolean conditions:
assert!;
Q: What about fuzzing?
A: Fuzzing is complementary. Assertions catch bugs during development and in production. Fuzzing finds test cases that trigger assertions. Both are essential.
Summary
Production assertions are:
- Executable documentation of invariants
- Early warning system for bugs and attacks
- Last line of defense against corruption propagation
- Negligible performance overhead (<0.1%)
38 promoted assertions protect:
- Cryptographic integrity (25 assertions)
- Consensus safety (9 assertions)
- State machine correctness (4 assertions)
Every assertion must have:
- Clear, informative message with context
- Corresponding
#[should_panic]test - Documented rationale in this guide
When in doubt:
- Ask: “If this fires in production, what does it mean?”
- If it means “bug or attack”, use
assert!() - If it means “invalid input”, use
Result - If it means “programming error”, use
expect()
See also:
website/content/blog/006-hardening-kimberlite-vsr.md- Lessons learned from hardeningCLAUDE.md- Project coding guidelinesdocs/TESTING.md- Testing strategydocs/SECURITY.md- Security practices