Data Classification
On this page
- Overview
- Classification Levels
- Healthcare (HIPAA)
- General Purpose
- Schema Design
- Classification Types
- Tagging Data
- At Insert Time
- Bulk Classification
- Querying by Classification
- Access Control by Classification
- De-identification
- Compliance Reports
- HIPAA Data Inventory
- PCI DSS Data Report
- Data Minimization Report
- Automatic Classification
- Audit Classification Changes
- Labeling and Watermarking
- Best Practices
- 1. Classify Early
- 2. Document Classification Decisions
- 3. Review Classifications Regularly
- 4. Enforce Access Controls
- 5. Minimize Sensitive Data
- Related Documentation
Tag and manage data by sensitivity level in Kimberlite.
Overview
Data classification helps you:
- Identify sensitive data (PHI, PII, PCI, etc.)
- Apply appropriate security controls
- Generate compliance reports
- Enforce access policies
Classification Levels
Common classification schemes:
Healthcare (HIPAA)
| Level | Description | Examples |
|---|---|---|
| PHI | Protected Health Information | Name + DOB, Medical records, SSN |
| De-identified | HIPAA Safe Harbor compliant | Age range, Zip code (3 digits) |
| Public | No PHI | Facility hours, Public health stats |
General Purpose
| Level | Description | Examples |
|---|---|---|
| Restricted | Highly sensitive | SSN, Credit cards, Passwords |
| Confidential | Sensitive business data | Financial records, Contracts |
| Internal | Internal use only | Employee directory, Policies |
| Public | Public information | Marketing materials, Website |
Schema Design
Add a classification column to track sensitivity:
(
id BIGINT PRIMARY KEY,
name TEXT NOT NULL,
date_of_birth DATE,
ssn_encrypted TEXT,
classification TEXT DEFAULT 'PHI', -- Data classification
created_at TIMESTAMP
);
-- Track classification at the column level
(
table_name TEXT NOT NULL,
column_name TEXT NOT NULL,
classification TEXT NOT NULL,
reason TEXT,
PRIMARY KEY (table_name, column_name)
);
-- Insert classification metadata
INSERT INTO data_classifications VALUES
('patients', 'name', 'PHI', 'Directly identifies patient'),
('patients', 'date_of_birth', 'PHI', 'Part of HIPAA identifiers'),
('patients', 'ssn_encrypted', 'RESTRICTED', 'Encrypted SSN');
Classification Types
Define an enum for type safety:
Tagging Data
At Insert Time
Bulk Classification
-- Classify all existing patients as PHI
UPDATE patients SET classification = 'PHI';
-- Classify based on content
UPDATE patients
SET classification = 'DE_IDENTIFIED'
WHERE name IS NULL OR name = 'REDACTED';
Querying by Classification
-- Find all PHI records
SELECT * FROM patients WHERE classification = 'PHI';
-- Find all restricted data
SELECT table_name, column_name
FROM data_classifications
WHERE classification = 'RESTRICTED';
-- Count records by classification
SELECT classification, COUNT(*)
FROM patients
GROUP BY classification;
Access Control by Classification
Enforce policies based on classification:
De-identification
Convert PHI to de-identified data (HIPAA Safe Harbor):
use Datelike;
Compliance Reports
HIPAA Data Inventory
-- All PHI fields in database
SELECT
table_name,
column_name,
classification,
COUNT(*) as record_count
FROM data_classifications dc
JOIN information_schema.tables t ON t.table_name = dc.table_name
WHERE classification = 'PHI'
GROUP BY table_name, column_name, classification;
PCI DSS Data Report
-- All PCI data (should be minimal)
SELECT * FROM data_classifications
WHERE classification = 'PCI';
Data Minimization Report
-- Check for unnecessary sensitive data
SELECT
table_name,
column_name,
classification,
last_accessed
FROM data_classifications
WHERE classification IN ('RESTRICTED', 'PHI', 'PCI')
AND last_accessed < NOW - INTERVAL '1 year';
Automatic Classification
Use rules to classify data automatically:
Audit Classification Changes
-- Track classification changes
(
id BIGINT PRIMARY KEY,
table_name TEXT NOT NULL,
record_id BIGINT NOT NULL,
old_classification TEXT,
new_classification TEXT NOT NULL,
changed_by BIGINT NOT NULL,
changed_at TIMESTAMP NOT NULL,
reason TEXT
);
-- Trigger on classification change
AFTER UPDATE OF classification ON patients
FOR EACH ROW
BEGIN
INSERT INTO classification_audit (
table_name, record_id, old_classification, new_classification,
changed_by, changed_at, reason
) VALUES (
'patients', NEW.id, OLD.classification, NEW.classification,
current_user_id, CURRENT_TIMESTAMP, 'Classification updated'
);
END;
Labeling and Watermarking
Add visual indicators for sensitive data:
Best Practices
1. Classify Early
// Good: Classify at insert time
INSERT INTO patients VALUES ;
// Bad: Classify later (easy to forget)
INSERT INTO patients VALUES ;
--
2. Document Classification Decisions
INSERT INTO data_classifications (table_name, column_name, classification, reason)
VALUES ('patients', 'ssn_encrypted', 'RESTRICTED', 'Contains encrypted SSN per HIPAA requirements');
3. Review Classifications Regularly
# Quarterly data classification review
4. Enforce Access Controls
// Check before allowing access
if !policy.can_access
5. Minimize Sensitive Data
-- Bad: Store full SSN when last 4 digits suffice
ssn TEXT
-- Good: Store only what's needed
ssn_last_4 TEXT
Related Documentation
- Compliance - Compliance architecture
- Encryption - Encrypting sensitive data
- Multi-tenancy - Tenant isolation
Key Takeaway: Data classification helps you understand what data you have, how sensitive it is, and what controls are needed. Start classifying from day one.