Skip to content

SmooAI/clickhouse-kit

Repository files navigation

smooai-clickhouse-kit

crates.io docs.rs CI License: MIT

A safe-by-construction schema toolkit for ClickHouse — built for user-defined, multi-tenant schemas.

When your customers' data shapes are defined at runtime, you end up turning untrusted input into SQL. smooai-clickhouse-kit owns that boundary so the happy path makes SQL injection and unbounded tables impossible, not merely discouraged — an allowlisted type system, identifier validation, DDL generation, forward-only migrations, additive evolution, and drift detection. Rows stay Serde-native (use the clickhouse crate's #[derive(Row)]), so the kit never reimplements row mapping.

[dependencies]
smooai-clickhouse-kit = "0.1"

The crate is smooai-clickhouse-kit; it imports as clickhouse_kituse clickhouse_kit::....

Turn untrusted input into safe DDL

A column type can come straight from a customer config / JSON. The allowlist is an enum — disallowed types like Decimal, FixedString, Tuple, or arbitrary expressions simply have no representation, so they fail to deserialize at the boundary. There is no path to an arbitrary type string reaching the DDL.

use clickhouse_kit::{
    to_create_table_sql, ColumnSpec, ColumnTypeSpec, ScalarType, SchemaLimits, TableSpec,
};

// `{"lowCardinality": "String"}` from untrusted JSON — `Decimal(...)` here would be rejected.
let org_type: ColumnTypeSpec = serde_json::from_str(r#"{"lowCardinality":"String"}"#)?;

let table = TableSpec {
    name: "events".into(),
    columns: vec![
        ColumnSpec { name: "id".into(),  type_spec: ColumnTypeSpec::Scalar(ScalarType::Uuid),       default: None },
        ColumnSpec { name: "org".into(), type_spec: org_type,                                       default: None },
        ColumnSpec { name: "ts".into(),  type_spec: ColumnTypeSpec::Scalar(ScalarType::DateTime64), default: None },
    ],
    engine: "MergeTree()".into(),
    order_by: vec!["id".into()],
};

let ddl = to_create_table_sql(&table, &SchemaLimits::default())?;
// CREATE TABLE IF NOT EXISTS events (
//     id UUID,
//     org LowCardinality(String),
//     ts DateTime64(3)
// )
// ENGINE = MergeTree()
// ORDER BY (id)

Every identifier is validated (^[A-Za-z_][A-Za-z0-9_]*$ + a length bound, backtick-quoted on render), column counts are bounded, and ORDER BY entries must be real columns — so a malicious table/column name can't inject SQL.

The flexible (hybrid) table

The most-reused multi-tenant shape in one call — your mandatory + promoted typed columns, plus an attrs Map(String, String) catch-all and a raw String:

use clickhouse_kit::{flexible_table, FlexibleConfig, ColumnSpec, ColumnTypeSpec, ScalarType, SchemaLimits};

let table = flexible_table(
    "customer_events",
    FlexibleConfig {
        mandatory: vec![ColumnSpec { name: "ts".into(), type_spec: ColumnTypeSpec::Scalar(ScalarType::DateTime64), default: None }],
        promoted:  vec![ColumnSpec { name: "amount".into(), type_spec: ColumnTypeSpec::Scalar(ScalarType::Float64), default: None }],
        engine: "MergeTree()".into(),
        order_by: vec!["ts".into()],
        reserved: None, // defaults to ["attrs", "raw"]
    },
    &SchemaLimits::default(),
)?;

Ingest: flatten + coerce

Shape an arbitrary record to a (possibly dynamic) table — known keys land in their columns, the long tail flattens into attrs, and raw captures the original:

use clickhouse_kit::{coerce_to_table, FlattenOptions};

let result = coerce_to_table(input_json, &table, &FlattenOptions::default());
// result.row: BTreeMap<String, Value> ready to insert · result.overflow_keys: what went to `attrs`

Migrations + drift — bring your own client

The I/O layer is written against a tiny ChExecutor trait, so the crate never depends on a concrete ClickHouse driver. Implement it over the clickhouse crate (or any client):

use clickhouse_kit::{run_migrations, check_drift};

// forward-only, tracked in `_ch_migrations`; already-applied files are skipped
let applied = run_migrations(&exec, std::path::Path::new("clickhouse/migrations")).await?;

// compare the live schema (system.columns) to your TableSpecs
let drift = check_drift(&exec, &[table]).await?;

For growing a per-tenant table, diff_columns + alter_add_columns_sql emit a guarded, additive-only ALTER TABLE … ADD COLUMN IF NOT EXISTS … (identifiers quoted; types from your trusted spec, never from the live DB).

Design

  • Safe by construction. The type allowlist is unrepresentable-by-default; identifiers are validated + quoted; tables are bounded. The dangerous bits are impossible, not discouraged.
  • Rows are Serde-native. Use #[derive(clickhouse::Row, Deserialize)] for reads — the kit doesn't reinvent row mapping.
  • Forward-only. No auto-diff engine; schema changes are explicit migrations. The additive ALTER path for dynamic per-tenant tables is separate and bounded.
  • Tested against real ClickHouse. The migration runner, drift gate, and DDL round-trip are verified via testcontainers in CI, not just string assertions.

License

MIT © SmooAI

About

Safe-by-construction ClickHouse schema toolkit for user-defined / multi-tenant schemas — allowlisted types, DDL generation, forward-only migrations, drift detection. Rust, Serde-native.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors