A safe-by-construction schema toolkit for ClickHouse — built for user-defined, multi-tenant schemas.
When your customers' data shapes are defined at runtime, you end up turning untrusted input into SQL. smooai-clickhouse-kit owns that boundary so the happy path makes SQL injection and unbounded tables impossible, not merely discouraged — an allowlisted type system, identifier validation, DDL generation, forward-only migrations, additive evolution, and drift detection. Rows stay Serde-native (use the clickhouse crate's #[derive(Row)]), so the kit never reimplements row mapping.
[dependencies]
smooai-clickhouse-kit = "0.1"The crate is
smooai-clickhouse-kit; it imports asclickhouse_kit—use clickhouse_kit::....
A column type can come straight from a customer config / JSON. The allowlist is an enum — disallowed types like Decimal, FixedString, Tuple, or arbitrary expressions simply have no representation, so they fail to deserialize at the boundary. There is no path to an arbitrary type string reaching the DDL.
use clickhouse_kit::{
to_create_table_sql, ColumnSpec, ColumnTypeSpec, ScalarType, SchemaLimits, TableSpec,
};
// `{"lowCardinality": "String"}` from untrusted JSON — `Decimal(...)` here would be rejected.
let org_type: ColumnTypeSpec = serde_json::from_str(r#"{"lowCardinality":"String"}"#)?;
let table = TableSpec {
name: "events".into(),
columns: vec![
ColumnSpec { name: "id".into(), type_spec: ColumnTypeSpec::Scalar(ScalarType::Uuid), default: None },
ColumnSpec { name: "org".into(), type_spec: org_type, default: None },
ColumnSpec { name: "ts".into(), type_spec: ColumnTypeSpec::Scalar(ScalarType::DateTime64), default: None },
],
engine: "MergeTree()".into(),
order_by: vec!["id".into()],
};
let ddl = to_create_table_sql(&table, &SchemaLimits::default())?;
// CREATE TABLE IF NOT EXISTS events (
// id UUID,
// org LowCardinality(String),
// ts DateTime64(3)
// )
// ENGINE = MergeTree()
// ORDER BY (id)Every identifier is validated (^[A-Za-z_][A-Za-z0-9_]*$ + a length bound, backtick-quoted on render), column counts are bounded, and ORDER BY entries must be real columns — so a malicious table/column name can't inject SQL.
The most-reused multi-tenant shape in one call — your mandatory + promoted typed columns, plus an attrs Map(String, String) catch-all and a raw String:
use clickhouse_kit::{flexible_table, FlexibleConfig, ColumnSpec, ColumnTypeSpec, ScalarType, SchemaLimits};
let table = flexible_table(
"customer_events",
FlexibleConfig {
mandatory: vec![ColumnSpec { name: "ts".into(), type_spec: ColumnTypeSpec::Scalar(ScalarType::DateTime64), default: None }],
promoted: vec![ColumnSpec { name: "amount".into(), type_spec: ColumnTypeSpec::Scalar(ScalarType::Float64), default: None }],
engine: "MergeTree()".into(),
order_by: vec!["ts".into()],
reserved: None, // defaults to ["attrs", "raw"]
},
&SchemaLimits::default(),
)?;Shape an arbitrary record to a (possibly dynamic) table — known keys land in their columns, the long tail flattens into attrs, and raw captures the original:
use clickhouse_kit::{coerce_to_table, FlattenOptions};
let result = coerce_to_table(input_json, &table, &FlattenOptions::default());
// result.row: BTreeMap<String, Value> ready to insert · result.overflow_keys: what went to `attrs`The I/O layer is written against a tiny ChExecutor trait, so the crate never depends on a concrete ClickHouse driver. Implement it over the clickhouse crate (or any client):
use clickhouse_kit::{run_migrations, check_drift};
// forward-only, tracked in `_ch_migrations`; already-applied files are skipped
let applied = run_migrations(&exec, std::path::Path::new("clickhouse/migrations")).await?;
// compare the live schema (system.columns) to your TableSpecs
let drift = check_drift(&exec, &[table]).await?;For growing a per-tenant table, diff_columns + alter_add_columns_sql emit a guarded, additive-only ALTER TABLE … ADD COLUMN IF NOT EXISTS … (identifiers quoted; types from your trusted spec, never from the live DB).
- Safe by construction. The type allowlist is unrepresentable-by-default; identifiers are validated + quoted; tables are bounded. The dangerous bits are impossible, not discouraged.
- Rows are Serde-native. Use
#[derive(clickhouse::Row, Deserialize)]for reads — the kit doesn't reinvent row mapping. - Forward-only. No auto-diff engine; schema changes are explicit migrations. The additive
ALTERpath for dynamic per-tenant tables is separate and bounded. - Tested against real ClickHouse. The migration runner, drift gate, and DDL round-trip are verified via testcontainers in CI, not just string assertions.
MIT © SmooAI