Proposal: Schema-driven plan-time validation for airbyte_connector_config via Dynamic config (Speakeasy-generated provider)
Context
We are designing the next major version of the Airbyte Terraform provider. The provider must support:
- 600+ connectors
- 100+ versions per connector
- Connector versions are immutable once published (no republish / no schema drift for a given
(connector, version)).
A connector config schema can change across versions, but often does not. We want early feedback (plan-time when possible) for invalid configurations rather than deferring errors to apply.
We are generating the provider via Speakeasy (Terraform provider generation from OpenAPI). The OpenAPI spec alone does not capture our desired plan-time validation behaviors, so we need to leverage Speakeasy’s customization paradigms (extensions and/or injected custom code).
Goal
Create a generic resource (or equivalent abstraction) that:
- Accepts a connector identifier + version
- Accepts a connector
config value in Terraform-native HCL (not primarily a JSON string)
- Fetches the corresponding JSON Schema at plan time (when
(connector, version) are known)
- Validates the user-provided config against that schema during plan as early as Terraform allows
- Supports partially unknown/deferred values (e.g., secrets interpolated from other resources), by validating known portions and deferring unknowns without causing false plan failures.
Proposed Resource API
Resource name (placeholder): airbyte_connector_config
Attributes:
connector_type (string) — "source" | "destination"
connector_name (string) — canonical name
connector_version (string) — immutable version identifier
config (Dynamic) — Terraform-native object/map/list structure (HCL object at top level)
Key decision: config must be a DynamicAttribute so users can supply nested “JSON-like” structures in HCL while allowing leaf values to remain unknown/deferred at plan time.
Note: We initially considered MapAttribute(ElementType = DynamicType), but terraform-plugin-framework does not support dynamic values as collection element types; the recommended approach is a DynamicAttribute.
User syntax (inline, non-JSON-string)
Users can supply:
- Top-level object with primitive values
- Object containing lists
- Nested object/list structures
Example:
resource "airbyte_connector_config" "example" {
connector_type = "source"
connector_name = "postgres"
connector_version = "1.2.3"
config = {
host = "db.example.com"
port = 5432
ssl = true
replication = {
method = "INCREMENTAL"
streams = [
{ name = "users", cursor_field = "updated_at" },
{ name = "orders", cursor_field = "updated_at" },
]
}
}
}
Provider behavior: enforce that config is a top-level object (JSON Schema root is expected to be "object" for connector configs).
Plan-time Validation Design
Where validation runs
Implement plan-time validation via generated provider hooks consistent with Speakeasy and terraform-plugin-framework:
- Resource-level
ValidateConfig (or equivalent) to validate cross-attribute logic (name+version determine schema, schema validates config).
- Speakeasy: use Speakeasy’s supported customization mechanism to inject custom plan validation logic (e.g., plan validators / code injection points).
Validation semantics
Terraform plans frequently include unknown values at plan time (e.g., values derived from other resources). Validation must:
- Validate known values against schema constraints
- Skip constraints requiring concrete values when values are unknown
- Fail plan when values are known-invalid
- Fail plan when required fields are known-missing
- Do not fail plan when a required field is present but unknown (could be provided at apply time)
JSON Schema evaluation strategy
We need a validator that can handle partial/unknown inputs. Standard JSON Schema libraries generally assume a complete JSON document (no concept of Terraform “unknown” values). Therefore we need one of the following approaches:
Option A (preferred): Partial schema validation via provider-side traversal
- Fetch full JSON Schema for
(connector, version)
- Traverse the Terraform value tree (Dynamic value) and schema tree together
- Validate constraints at each known node:
- type correctness
- enums
- min/max, minLength/maxLength, pattern
- object
properties / additionalProperties
- array
items constraints
- required fields (with unknown-aware semantics)
Option B: Transform + validate
- Create a concrete JSON doc by omitting unknowns
- Validate against schema, and separately enforce “known-missing required” logic
- Risk: subtle incorrectness with arrays, oneOf/anyOf, and required semantics
We should expect to implement Option A unless we can prove Option B is correct for our schema subset.
Scope control: supported JSON Schema features
To reduce complexity and risk, define an explicit supported subset (initially):
type
properties, additionalProperties
required
enum
- numeric bounds:
minimum, maximum
- string constraints:
minLength, maxLength, pattern
- arrays:
items, minItems, maxItems, uniqueItems
If Airbyte schemas include advanced features (oneOf, anyOf, allOf, if/then/else), we should explicitly decide whether to support them in the first iteration.
Schema Retrieval
At plan time, when connector_name and connector_version are known:
- Fetch JSON Schema from Airbyte control plane / registry (implementation detail)
- Cache in-memory per plan run to avoid repeated fetches across multiple resources
- Since connector versions are immutable, we can safely treat schema as deterministic per
(name, version).
Speakeasy Integration Plan
We need to implement this within Speakeasy’s generation model:
- Identify Speakeasy-supported customization points (extensions, overlays/patches, injected code regions, plan validators).
- Ensure the generated Terraform schema can declare
config as a DynamicAttribute.
- Inject:
- a schema fetcher module (Airbyte registry client)
- an unknown-aware JSON Schema validation implementation
- a plan validator hook that ties it all together
Deliverables
Acceptance Criteria
- Users can declare configs using HCL objects/lists (no forced JSON string)
- For known inputs, invalid config fails at plan with actionable diagnostics
- For partially unknown inputs, provider validates known portions and does not produce false negatives
- Works cleanly within Speakeasy-generated provider codebase (customization approach documented and reproducible)
Open Questions
- Where is the canonical schema hosted (registry endpoint), and what auth is required?
- Do Airbyte connector schemas use advanced JSON Schema features (e.g.,
oneOf, anyOf, allOf, if/then/else)? If yes, do we need them in v1 of validation?
- Should we also accept
config_json (string) as a compatibility escape hatch, or require object-only in vNext?
Proposal: Schema-driven plan-time validation for
airbyte_connector_configvia Dynamicconfig(Speakeasy-generated provider)Context
We are designing the next major version of the Airbyte Terraform provider. The provider must support:
(connector, version)).A connector config schema can change across versions, but often does not. We want early feedback (plan-time when possible) for invalid configurations rather than deferring errors to apply.
We are generating the provider via Speakeasy (Terraform provider generation from OpenAPI). The OpenAPI spec alone does not capture our desired plan-time validation behaviors, so we need to leverage Speakeasy’s customization paradigms (extensions and/or injected custom code).
Goal
Create a generic resource (or equivalent abstraction) that:
configvalue in Terraform-native HCL (not primarily a JSON string)(connector, version)are known)Proposed Resource API
Resource name (placeholder):
airbyte_connector_configAttributes:
connector_type(string) —"source" | "destination"connector_name(string) — canonical nameconnector_version(string) — immutable version identifierconfig(Dynamic) — Terraform-native object/map/list structure (HCL object at top level)Key decision:
configmust be aDynamicAttributeso users can supply nested “JSON-like” structures in HCL while allowing leaf values to remain unknown/deferred at plan time.User syntax (inline, non-JSON-string)
Users can supply:
Example:
Provider behavior: enforce that
configis a top-level object (JSON Schema root is expected to be"object"for connector configs).Plan-time Validation Design
Where validation runs
Implement plan-time validation via generated provider hooks consistent with Speakeasy and terraform-plugin-framework:
ValidateConfig(or equivalent) to validate cross-attribute logic (name+version determine schema, schema validates config).Validation semantics
Terraform plans frequently include unknown values at plan time (e.g., values derived from other resources). Validation must:
JSON Schema evaluation strategy
We need a validator that can handle partial/unknown inputs. Standard JSON Schema libraries generally assume a complete JSON document (no concept of Terraform “unknown” values). Therefore we need one of the following approaches:
Option A (preferred): Partial schema validation via provider-side traversal
(connector, version)properties/additionalPropertiesitemsconstraintsOption B: Transform + validate
We should expect to implement Option A unless we can prove Option B is correct for our schema subset.
Scope control: supported JSON Schema features
To reduce complexity and risk, define an explicit supported subset (initially):
typeproperties,additionalPropertiesrequiredenumminimum,maximumminLength,maxLength,patternitems,minItems,maxItems,uniqueItemsIf Airbyte schemas include advanced features (
oneOf,anyOf,allOf,if/then/else), we should explicitly decide whether to support them in the first iteration.Schema Retrieval
At plan time, when
connector_nameandconnector_versionare known:(name, version).Speakeasy Integration Plan
We need to implement this within Speakeasy’s generation model:
configas aDynamicAttribute.Deliverables
configas Dynamic(connector_name, connector_version)config.replication.streams[0].cursor_field)Acceptance Criteria
Open Questions
oneOf,anyOf,allOf,if/then/else)? If yes, do we need them in v1 of validation?config_json(string) as a compatibility escape hatch, or require object-only in vNext?