Skip to content

feat(engine): let plugin column configs declare all model aliases for the startup health check #606

@nabinchha

Description

@nabinchha

Priority Level

Medium (Nice to have)

Is your feature request related to a problem? Please describe.

Plugin column generators that inherit from ColumnGeneratorWithModelRegistry and depend on more than one model alias (e.g. a generator + judge pattern) cannot opt their secondary aliases into the standard startup model health check. The builder collects aliases for health checks like this:

# packages/data-designer-engine/src/data_designer/engine/dataset_builders/dataset_builder.py
def _run_model_health_check_if_needed(self) -> None:
    model_aliases: set[str] = set()
    for config in self.single_column_configs:
        if column_type_is_model_generated(config.column_type):
            model_aliases.add(config.model_alias)
        if isinstance(config, CustomColumnConfig) and config.model_aliases:
            model_aliases.update(config.model_aliases)
    ...
    self._resource_provider.model_registry.run_health_check(list(model_aliases))

This has two practical consequences for plugin authors:

  1. Only the primary model_alias is endpoint-checked at startup. Secondary aliases (e.g. judge_model_alias, critic_model_alias) on a plugin config are never passed to ModelRegistry.run_health_check. A typo, missing API key, or unreachable endpoint on a secondary model only surfaces at first generation call, after partial work has potentially already happened.
  2. config.model_alias is implicitly required. Any column type whose impl inherits from ColumnGeneratorWithModelRegistry triggers model_aliases.add(config.model_alias) unconditionally. A plugin config without a model_alias field raises AttributeError from inside _run_model_health_check_if_needed rather than a friendly registry error.

The current workaround in the new plugin docs (docs/plugins/models.md on #603) tells plugin authors to call self.get_model_config(alias) from _validate() for each secondary alias. That only verifies the alias is registered; it does not exercise the endpoint.

The CustomColumnConfig.model_aliases: list[str] field already proves the engine knows how to roll multiple aliases into the central health check — packaged plugins just don't have the same hook.

Describe the solution you'd like

Add a single overridable accessor on SingleColumnConfig:

# packages/data-designer-config/src/data_designer/config/column_configs.py
class SingleColumnConfig(...):
    def get_model_aliases(self) -> list[str]:
        """Return every model alias this column depends on.

        The startup health check uses this to decide which endpoints to ping.
        Override on configs that depend on more than one model.
        """
        alias = getattr(self, "model_alias", None)
        return [alias] if alias else []

Plugin configs that depend on more than one model override it:

class PairwiseJudgeColumnConfig(SingleColumnConfig):
    column_type: Literal["pairwise-judge"] = "pairwise-judge"
    model_alias: str
    judge_model_alias: str

    def get_model_aliases(self) -> list[str]:
        return [self.model_alias, self.judge_model_alias]

CustomColumnConfig overrides to absorb the existing isinstance special case in the builder:

class CustomColumnConfig(SingleColumnConfig):
    model_aliases: list[str] | None = None

    def get_model_aliases(self) -> list[str]:
        return self.model_aliases or []

The builder collapses to one loop:

def _run_model_health_check_if_needed(self) -> None:
    model_aliases: set[str] = set()
    for config in self.single_column_configs:
        if column_type_is_model_generated(config.column_type):
            model_aliases.update(config.get_model_aliases())
    ...

Behavior after the change:

  • All built-in model-backed configs continue to work unchanged — the default get_model_aliases() reads model_alias exactly like today.
  • Plugin authors with multi-model configs get the same endpoint-level health check as the primary alias, with no manual _validate() workaround.
  • Plugin configs without a model_alias field can override get_model_aliases() and stop crashing the health-check loop.
  • The isinstance(config, CustomColumnConfig) branch in _run_model_health_check_if_needed is removed.

The plugin docs (docs/plugins/models.md) update to recommend overriding get_model_aliases() instead of validating manually in _validate().

Describe alternatives you've considered

  • Pydantic field annotation — mark fields with Annotated[str, ModelAlias()] and have the builder walk model_fields. More declarative, but more machinery and a one-way door. Reasonable to revisit if a second consumer (fingerprinting, secret resolution, redaction) wants to enumerate aliases.
  • Classmethod on the generator impl — e.g. ColumnGeneratorWithModelRegistry.get_required_model_aliases(config). Indirected: alias values still come from config, so the impl ends up reading config fields anyway. Useful if dynamic alias selection (e.g. "only include judge_model_alias if enable_critic=True") becomes a real requirement.
  • Convention-based scan — auto-collect any field ending in _alias. Conflicts with tool_alias (MCP), surprises plugin authors who name fields differently, and fails the explicit-over-implicit smell test. Skipping.
  • Status quo: validate manually in _validate() — the current docs recommendation. Works for "is the alias registered?" but does not exercise the endpoint, so a bad credential on a secondary model surfaces only at generation time.

Agent Investigation

Current findings from the codebase:

  • Health check site: _run_model_health_check_if_needed in packages/data-designer-engine/src/data_designer/engine/dataset_builders/dataset_builder.py:749 runs in build() (line 208) and build_preview() (line 245), before _initialize_generators_and_graph(). Health check therefore cannot rely on generator instances; alias enumeration has to live on the config.
  • Plugin discovery: column_type_is_model_generated in packages/data-designer-engine/src/data_designer/engine/column_generators/utils/generator_classification.py:32 flags any plugin whose impl_cls inherits from ColumnGeneratorWithModelRegistry as model-generated. This is the gate that turns on the config.model_alias access.
  • Existing parallel: CustomColumnConfig.model_aliases: list[str] | None (packages/data-designer-config/src/data_designer/config/custom_column.py:38) is already collected into the same health-check set via an isinstance check, which the proposed accessor cleanly subsumes.
  • Plugin docs context: surfaced in #603, specifically docs/plugins/models.md lines 83–89 and 155–159.

Additional context

  • This is a small, additive API change: one new method on SingleColumnConfig, one updated method on CustomColumnConfig, and a one-line change in the dataset builder.
  • It does not change any existing CLI, builder, or plugin entry-point surface.
  • It improves the docs story for the multi-model plugin pattern that docs: graduate plugins out of experimental mode #603 introduces — the models.md guidance can switch from "manually validate aliases in _validate()" to "override get_model_aliases() and the health check pings every alias".

Checklist

  • I've reviewed existing issues and the documentation
  • This is a design proposal, not a "please build this" request

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions