Secure Data Sharing
On this page
- Table of Contents
- Overview
- Why Secure Data Sharing?
- The Problem
- The Solution
- Architecture
- Data Sharing Layer
- Data Flow
- Anonymization Techniques
- Redaction
- Generalization
- Pseudonymization
- Comparison
- Token-Based Access Control
- Access Token Structure
- Creating Access Tokens
- Token Lifecycle
- Revocation
- Export Audit Trail
- Audit Record Structure
- Querying the Audit Trail
- Cryptographic Proof
- MCP Integration
- Overview
- MCP Tools
- Safety Features
- Example MCP Session
- Compliance Considerations
- GDPR Data Sharing
- HIPAA Minimum Necessary
- Audit Requirements
- Summary
Kimberlite includes first-party support for securely sharing data with third-party services while protecting sensitive information. This document describes the data sharing architecture, anonymization capabilities, and integration patterns.
Table of Contents
- Overview
- Why Secure Data Sharing?
- Architecture
- Anonymization Techniques
- Token-Based Access Control
- Export Audit Trail
- MCP Integration
- Compliance Considerations
Overview
Many applications need to interact with external services—analytics platforms, LLMs, third-party APIs, or partner systems—without compromising sensitive data. Kimberlite provides built-in capabilities to:
- Anonymize or pseudonymize data before sending it out
- Encrypt sensitive fields so only authorized recipients can decrypt them
- Audit all data exports to maintain a verifiable record of what was shared, when, and with whom
This ensures that even when you integrate with external systems, your compliance guarantees remain intact.
Why Secure Data Sharing?
The Problem
Traditional approaches to data sharing create compliance risks:
- Full access grants: Giving third parties direct database access exposes all data
- Manual exports: Ad-hoc CSV exports lack audit trails and consistency
- Application-level filtering: Business logic can miss edge cases, leading to data leaks
- No consent tracking: Difficult to prove what was shared and why
The Solution
Kimberlite treats data sharing as a first-class concern:
| Challenge | Kimberlite Solution |
|---|---|
| Over-exposure | Field-level access controls, scoped tokens |
| Audit gaps | Complete export audit trail with cryptographic proof |
| Consent tracking | Purpose and consent recorded for every export |
| Anonymization | Built-in redaction, generalization, pseudonymization |
| Revocation | Time-bound tokens, instant revocation |
Architecture
Data Sharing Layer
The data sharing layer sits between the protocol layer and the core database:
┌─────────────────────────────────────────────────────────────────────┐
│ Data Sharing Layer │
│ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ 1. Access Control │ │
│ │ • Validate access token │ │
│ │ • Check scopes (tables, fields, time ranges) │ │
│ │ • Verify consent/purpose │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ 2. Query Rewriting │ │
│ │ • Remove unauthorized fields from SELECT │ │
│ │ • Add filters for authorized time ranges │ │
│ │ • Enforce row-level security policies │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ 3. Transformation Pipeline │ │
│ │ • Apply redaction rules │ │
│ │ • Apply generalization rules │ │
│ │ • Apply pseudonymization with tenant-specific keys │ │
│ │ • Apply field-level encryption for recipients │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ 4. Audit │ │
│ │ • Log export with hash of contents │ │
│ │ • Record recipient, purpose, timestamp │ │
│ │ • Generate cryptographic proof │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
Data Flow
┌──────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────┐
│ External │ │ Access │ │ Transformation│ │ Core │
│ Service │────►│ Control │────►│ Pipeline │◄────│ Database │
└──────────┘ └──────────────┘ └──────────────┘ └──────────┘
│ │ │ │
│ │ │ │
│ ┌──────┴───────┐ ┌──────┴───────┐ │
│ │ Token Store │ │ Audit Log │ │
│ └──────────────┘ └──────────────┘ │
│ │ │
└───────────────────────────────────────┘
Audit Trail
Anonymization Techniques
Kimberlite supports multiple anonymization strategies, configurable per access token or export.
Redaction
Complete removal of sensitive fields:
// Configuration
Redact
// Before: { "id": 1, "name": "Alice", "ssn": "123-45-6789" }
// After: { "id": 1, "name": "Alice" }
Use cases: Untrusted third parties, public datasets, analytics where field is irrelevant
Generalization
Reduce precision to prevent re-identification:
// Date generalization
Generalize
// Numeric generalization
Generalize
// Geographic generalization
Generalize
Use cases: Research datasets, aggregate analytics, demographic analysis
Pseudonymization
Replace identifiers with consistent tokens:
// Configuration
Pseudonymize
// Before: { "user_id": "alice@example.com", "action": "login" }
// After: { "user_id": "tok_a7b3c9d2e1f0", "action": "login" }
Properties:
- Consistent: Same input always produces same token (within scope)
- Reversible: With the key, original value can be recovered
- Scoped: Different tenants get different tokens for same value
Use cases: Trusted partners, internal analytics, data that needs correlation
Comparison
| Technique | Reversible | Maintains Relationships | Data Utility | Privacy Level |
|---|---|---|---|---|
| Redaction | No | No | Low | Highest |
| Generalization | No | Partial | Medium | High |
| Pseudonymization | With key | Yes | High | Medium |
Token-Based Access Control
Access Token Structure
Creating Access Tokens
// Create a scoped token for analytics partner
let token = tenant.create_access_token?;
Token Lifecycle
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Create │────►│ Active │────►│ Expired │────►│ Deleted │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
│
│ (manual)
▼
┌─────────────┐
│ Revoked │
└─────────────┘
Revocation
Tokens can be revoked immediately:
// Revoke a specific token
tenant.revoke_access_token?;
// Revoke all tokens for a purpose
tenant.revoke_tokens_by_purpose?;
// List all active tokens
let tokens = tenant.list_access_tokens?;
Export Audit Trail
Audit Record Structure
Every data access is logged:
Querying the Audit Trail
-- All exports for a tenant
SELECT * FROM __export_audit
WHERE tenant_id = 123
ORDER BY timestamp DESC;
-- Exports by a specific token
SELECT * FROM __export_audit
WHERE token_id = 'tok_abc123';
-- Exports containing specific data
SELECT * FROM __export_audit
WHERE 'customers' = ANY(tables_accessed)
AND timestamp > '2024-01-01';
Cryptographic Proof
Each export includes a proof that can be verified:
This proves:
- The exact data that was exported
- When the export occurred (linked to log position)
- The database attested to this export
MCP Integration
Overview
Kimberlite will provide an MCP (Model Context Protocol) server for LLM and AI agent access. This allows AI systems to query data while respecting access controls.
MCP Tools
// Query tool - read data with automatic redaction
{
name: "kmb_query",
description: "Query Kimberlite with automatic access control enforcement",
parameters: {
sql: "string", // The SQL query
token: "string", // Access token for authorization
}
}
// Export tool - bulk data export
{
name: "kmb_export",
description: "Export data from Kimberlite with transformations",
parameters: {
tables: "string[]",
format: "json | csv",
token: "string",
}
}
// Verify tool - verify data integrity
{
name: "kmb_verify",
description: "Verify the integrity of previously exported data",
parameters: {
export_id: "string",
content_hash: "string",
}
}
Safety Features
MCP access includes additional safety measures:
- Query validation: Prevents data exfiltration patterns
- Rate limiting: Per-token and global limits
- Result size limits: Configurable maximum rows per query
- Audit emphasis: All MCP access prominently logged
Example MCP Session
User: What were our top 10 products last quarter?
LLM: [Calls kmb_query with analytics token]
Kimberlite:
- Validates token scope includes "products" and "orders"
- Rewrites query to exclude unauthorized fields
- Applies generalization to dates
- Logs the access
- Returns transformed results
LLM: Based on the data, your top products were...
Compliance Considerations
GDPR Data Sharing
| Requirement | Kimberlite Feature |
|---|---|
| Lawful basis | Purpose tracking in tokens |
| Data minimization | Field-level scoping |
| Right to be informed | Export audit trail |
| Accountability | Cryptographic proof of exports |
HIPAA Minimum Necessary
Kimberlite enforces the “minimum necessary” standard:
- Tokens specify exactly which fields are needed
- Generalization reduces precision where full values aren’t required
- Redaction removes fields entirely when not needed
Audit Requirements
For regulated industries, Kimberlite provides:
- Complete record of all data exports
- Cryptographic proof linking exports to database state
- Retention of audit records independent of data retention
- Export of audit trails for compliance review
Summary
Kimberlite’s secure data sharing capabilities ensure that:
- Data is protected: Anonymization techniques prevent over-exposure
- Access is controlled: Scoped tokens limit what can be accessed
- Everything is audited: Complete trail of all data sharing
- Compliance is maintained: Built-in support for regulatory requirements
For implementation details, see:
- ARCHITECTURE.md - System architecture
- COMPLIANCE.md - Regulatory compliance
- ROADMAP.md - Future enhancements and implementation roadmap