Skip to content

Proposal: Schema-driven plan-time validation for airbyte_connector_config via Dynamic config (Speakeasy-generated provider) #235

@aaronsteers

Description

Proposal: Schema-driven plan-time validation for airbyte_connector_config via Dynamic config (Speakeasy-generated provider)

Context

We are designing the next major version of the Airbyte Terraform provider. The provider must support:

  • 600+ connectors
  • 100+ versions per connector
  • Connector versions are immutable once published (no republish / no schema drift for a given (connector, version)).

A connector config schema can change across versions, but often does not. We want early feedback (plan-time when possible) for invalid configurations rather than deferring errors to apply.

We are generating the provider via Speakeasy (Terraform provider generation from OpenAPI). The OpenAPI spec alone does not capture our desired plan-time validation behaviors, so we need to leverage Speakeasy’s customization paradigms (extensions and/or injected custom code).

Goal

Create a generic resource (or equivalent abstraction) that:

  1. Accepts a connector identifier + version
  2. Accepts a connector config value in Terraform-native HCL (not primarily a JSON string)
  3. Fetches the corresponding JSON Schema at plan time (when (connector, version) are known)
  4. Validates the user-provided config against that schema during plan as early as Terraform allows
  5. Supports partially unknown/deferred values (e.g., secrets interpolated from other resources), by validating known portions and deferring unknowns without causing false plan failures.

Proposed Resource API

Resource name (placeholder): airbyte_connector_config

Attributes:

  • connector_type (string) — "source" | "destination"
  • connector_name (string) — canonical name
  • connector_version (string) — immutable version identifier
  • config (Dynamic) — Terraform-native object/map/list structure (HCL object at top level)

Key decision: config must be a DynamicAttribute so users can supply nested “JSON-like” structures in HCL while allowing leaf values to remain unknown/deferred at plan time.

Note: We initially considered MapAttribute(ElementType = DynamicType), but terraform-plugin-framework does not support dynamic values as collection element types; the recommended approach is a DynamicAttribute.

User syntax (inline, non-JSON-string)

Users can supply:

  • Top-level object with primitive values
  • Object containing lists
  • Nested object/list structures

Example:

resource "airbyte_connector_config" "example" {
  connector_type    = "source"
  connector_name    = "postgres"
  connector_version = "1.2.3"

  config = {
    host = "db.example.com"
    port = 5432
    ssl  = true

    replication = {
      method  = "INCREMENTAL"
      streams = [
        { name = "users", cursor_field = "updated_at" },
        { name = "orders", cursor_field = "updated_at" },
      ]
    }
  }
}

Provider behavior: enforce that config is a top-level object (JSON Schema root is expected to be "object" for connector configs).

Plan-time Validation Design

Where validation runs

Implement plan-time validation via generated provider hooks consistent with Speakeasy and terraform-plugin-framework:

  • Resource-level ValidateConfig (or equivalent) to validate cross-attribute logic (name+version determine schema, schema validates config).
  • Speakeasy: use Speakeasy’s supported customization mechanism to inject custom plan validation logic (e.g., plan validators / code injection points).

Validation semantics

Terraform plans frequently include unknown values at plan time (e.g., values derived from other resources). Validation must:

  • Validate known values against schema constraints
  • Skip constraints requiring concrete values when values are unknown
  • Fail plan when values are known-invalid
  • Fail plan when required fields are known-missing
  • Do not fail plan when a required field is present but unknown (could be provided at apply time)

JSON Schema evaluation strategy

We need a validator that can handle partial/unknown inputs. Standard JSON Schema libraries generally assume a complete JSON document (no concept of Terraform “unknown” values). Therefore we need one of the following approaches:

Option A (preferred): Partial schema validation via provider-side traversal

  • Fetch full JSON Schema for (connector, version)
  • Traverse the Terraform value tree (Dynamic value) and schema tree together
  • Validate constraints at each known node:
    • type correctness
    • enums
    • min/max, minLength/maxLength, pattern
    • object properties / additionalProperties
    • array items constraints
    • required fields (with unknown-aware semantics)

Option B: Transform + validate

  • Create a concrete JSON doc by omitting unknowns
  • Validate against schema, and separately enforce “known-missing required” logic
  • Risk: subtle incorrectness with arrays, oneOf/anyOf, and required semantics

We should expect to implement Option A unless we can prove Option B is correct for our schema subset.

Scope control: supported JSON Schema features

To reduce complexity and risk, define an explicit supported subset (initially):

  • type
  • properties, additionalProperties
  • required
  • enum
  • numeric bounds: minimum, maximum
  • string constraints: minLength, maxLength, pattern
  • arrays: items, minItems, maxItems, uniqueItems

If Airbyte schemas include advanced features (oneOf, anyOf, allOf, if/then/else), we should explicitly decide whether to support them in the first iteration.

Schema Retrieval

At plan time, when connector_name and connector_version are known:

  • Fetch JSON Schema from Airbyte control plane / registry (implementation detail)
  • Cache in-memory per plan run to avoid repeated fetches across multiple resources
  • Since connector versions are immutable, we can safely treat schema as deterministic per (name, version).

Speakeasy Integration Plan

We need to implement this within Speakeasy’s generation model:

  1. Identify Speakeasy-supported customization points (extensions, overlays/patches, injected code regions, plan validators).
  2. Ensure the generated Terraform schema can declare config as a DynamicAttribute.
  3. Inject:
    • a schema fetcher module (Airbyte registry client)
    • an unknown-aware JSON Schema validation implementation
    • a plan validator hook that ties it all together

Deliverables

  • New resource (or updated resource design) that exposes config as Dynamic
  • Plan-time validation path:
    • Fetch schema using (connector_name, connector_version)
    • Validate known config values; skip unknowns
    • Emit high-quality diagnostics (pathful errors: config.replication.streams[0].cursor_field)
  • Unit tests:
    • Known-invalid leaf fails plan
    • Unknown leaf does not fail plan
    • Known-missing required field fails plan
    • Nested arrays/objects work up to typical depth (2–3 layers)
  • Performance checks:
    • Schema fetch caching
    • Validation on “large configs” remains acceptable

Acceptance Criteria

  • Users can declare configs using HCL objects/lists (no forced JSON string)
  • For known inputs, invalid config fails at plan with actionable diagnostics
  • For partially unknown inputs, provider validates known portions and does not produce false negatives
  • Works cleanly within Speakeasy-generated provider codebase (customization approach documented and reproducible)

Open Questions

  1. Where is the canonical schema hosted (registry endpoint), and what auth is required?
  2. Do Airbyte connector schemas use advanced JSON Schema features (e.g., oneOf, anyOf, allOf, if/then/else)? If yes, do we need them in v1 of validation?
  3. Should we also accept config_json (string) as a compatibility escape hatch, or require object-only in vNext?

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions