Skip to content

Add a schema registry#264

Draft
TeresiaOlsson wants to merge 52 commits into
mainfrom
schema-registry
Draft

Add a schema registry#264
TeresiaOlsson wants to merge 52 commits into
mainfrom
schema-registry

Conversation

@TeresiaOlsson

@TeresiaOlsson TeresiaOlsson commented May 20, 2026

Copy link
Copy Markdown
Contributor

This PR adds a schema registry that can be used for validation and generating json schemas for dynamic nested pydantic models.

Motivation:

  • Better separation of concerns. Validation of the configuration is separated into different classes than the ones responsible for storing and using the configuration, resulting in a separate validation layer.

  • It is possible to make validation option. The entire validation layer can be skipped if the user wished. For example, if they know that the input data has already been validated once and not changed.

  • More lose dependency of pydantic. Pydantic is only used at the edge of the core and not everywhere in line with what has been discussed for how to manage the dependency of pint.

  • Solving two problems with pydantic:

    1. It is no good at handling nested models where the available types are only known at runtime.
    2. It is no good at generating json schemas for arbitrary types and including all available subclasses of a model.

Features:

  • A schema registry which maps a class path to the schema that should be used for validation of the input data to the class.

  • A decorator to automatically register schemas in the schema registry.

  • A schema validator which validates nested dynamic models with the use of the registry.

  • A custom json schema generator which generates json schemas including all available subclasses with the use of the registry.

Major changes:

  • All ConfigModels have been separate from the domain classes and turned into schema classes which only has the purpose to define the schema used for validation. This is a major refactoring since the ConfigModels are spread out in the codebase. A temporary legacy handler had to be implemented to handle the packages for the bindings. All domain classes also had to be changed since they now need to explicitly declare their attributes.

  • A baseclass ConfigurationSchema has been added that all schemas must inherit from. This is to ensure that all schemas registered in the registry has the minimum required fields and same behaviour.

  • The yaml file format has been changed to instead of defining the type by the module it is defined by the class. This is to be compatible with external packages where a certain module structure can't be enforced and to allow several similar classes to be grouped in the same module. A temporary function to translate between yaml file formats have been added.

Example usage:
Examples of the usage is available in https://github.com/python-accelerator-middle-layer/pyaml/tree/schema-registry/examples/validation.

The json schema can be visualized and tested with MetaConfigurator to easier understand what it includes. You can import the schema directly from here: https://github.com/python-accelerator-middle-layer/pyaml/blob/schema-registry/pyaml/validation/schema.json

@TeresiaOlsson TeresiaOlsson marked this pull request as draft May 20, 2026 08:50
@TeresiaOlsson TeresiaOlsson changed the title Add a Schema registry Add a schema registry May 20, 2026
@gubaidulinvadim

Copy link
Copy Markdown
Contributor

We have a working solution without duplication and that handle repr in a simple and efficient manner. The only missing added value is the json schema but I would do in a different manner because writing an expert tools for pyaml configuration based on json schema will be a nightmare.

There are several problems with the actual proposal:

* 4 duplications of the parameters (Model, constructor, fields for repr, attachment). This is too much.

* Strong dependance to pydantic with an overriding of **get_union_of_schema** and **model_schema** that can end in same problem than overriding **get_pydantic_core_schema**

* Apparently no way to handle same schema with same signature and a different name that will be a problem with pyaml_cs_oa

However lots of good other points.... I agree... Personally, I can already build a full configuration file (or dict) from our lattice files. At ESRF we even already have device name in our lattice file but this is not a standard.

@JeanLucPons this is a draft, so we have to give Teresia a chance to find solutions to the problems you flagged here. I agree with you that we should not run in circles reimplementing things that already work. However, in the case of configuration we were explicitly given feedback that it needs improvement and some rework is possible (already lots of new features like catalogs, regular expressions in the config file, etc.). It is the only part of the project that was flagged as a bit too complex after the workshop. So, we have to suffer a bit. This PR has some potentially nice features and some annoying things too. However, when it's a draft it's too hard to review because nothing works and there are too many changes. I remind you that with this PR @TeresiaOlsson is somewhat learning how everything is structured right now, which can be somewhat challenging. But afterwards it should be easier to contribute to different parts of the project.

@TeresiaOlsson In general I like the idea of Jean Luc that some big PR should start small by changing for example only a part of the config (single element). This does make the review a bit more straighforward. In the case of configuration it's very hard to do (and too late) but maybe for some other proposals it would make sense.

@JeanLucPons @TeresiaOlsson @gupichon @gupichon-soleil Maybe I'll ask the steering committee to put a (reasonable) deadline for freezing configuration (no large reworks, only improvements). That way it will be clear for everyone that we have to converge on a working solution in a reasonable time. What do you think? I can also give you the deadline myself :) Let's say the week of July 13th we freeze. I would say that we should also ask the steering committee that the next community meeting (September?) should be an online-hackathon where people should try the code and give their feedback.

@JeanLucPons

JeanLucPons commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

I quickly tested the creation of a dynamic model with arbitrary type and it worked.
So it is possible to a have single source of truth which is the constructor signature.

from pydantic import create_model, ConfigDict
import inspect
import json

class MyClass1:
    
    def __init__(self,param1: str,param2: int):
        ...

class MyClass2:

    def __init__(self,param1:str,param2:MyClass1):
        ...

def register_schema(cls):

    # Inspect constructor signature and create dyammic schemas
    # for both validation and json schema export
    sig = inspect.signature(cls.__init__)
    fields = {}
    sch_fields = {}
    for param_name, param in sig.parameters.items():
        # Jump self
        if param_name == 'self':
            continue
        # Default value,
        if param.default != inspect.Parameter.empty:
            default = param.default
        else:
            default = ... # No default value specified -> Required field
        # Type
        annotation = param.annotation if param.annotation != inspect.Parameter.empty else str        
        sch_fields[param_name] = (annotation, default)
        fields[param_name] = (annotation, default)
        if annotation is MyClass1:
            # Arbitrary type example
            # Replace by its ConfigModel (for json schema export)
            annotation = MyClass1._model
        sch_fields[param_name] = (annotation, default)

    # Create dynamic model for the class
    model_name = cls.__module__ + "." + cls.__name__
    cls._model = create_model(model_name, __config__=ConfigDict(arbitrary_types_allowed=True), **fields)
    cls._schemamodel = create_model(model_name+"Schema", __config__=ConfigDict(arbitrary_types_allowed=True), **sch_fields)

    # Override __init__ to handle validation
    original_init = cls.__init__

    @classmethod
    def custom_init(cls, *args, **kwargs):        
        # Validate using dynamic model
        sig = inspect.signature(original_init)
        cls._model.model_validate(kwargs)
        original_init(cls, *args, **kwargs)

    cls.__init__ = custom_init
    return cls


register_schema(MyClass1)
register_schema(MyClass2)

mc1 = MyClass1(param1="test1",param2=10)
mc2 = MyClass2(param1="test2",param2=mc1)

top_level_json = MyClass2._schemamodel.model_json_schema()
with open('sch.json', 'w') as fp:
    json.dump(top_level_json, fp)

@TeresiaOlsson

Copy link
Copy Markdown
Contributor Author

Personally I don't see how we can freeze for refactoring. Part of the culture of open source is that everyone is allowed to make PRs about everything at any time if they have a suggestion. It doesn't mean the PR has to be accepted and merged in. But if you want the vague "before the summer vacations" deadline the steering committee has given the maintainers for simplification of configuration to mean the week of the 13th July that sounds like a good idea to make the deadline concrete.

For the community meetings the steering committee currently isn't so involved in it. They only approve a date that I set and then I come up with the agenda myself. If you think it would be a good idea to make it into a hackathon we can do that. The only thing I have lined up for a future meeting is a presentation about osprey but that doesn't have to be on the September meeting.

I agree that making smaller changes to merge things in in steps (like you are doing for the catalog) would be preferable. I will try to do that. Only issue is that we need to rewrite the tests since many of them currently relies on loading a configuration file. I think we should try to make unit tests which rely on as few other parts of the code as possible so parts can be tested individually. And then we can have integration tests which tests things together. I don't see the need to test loading the configuration in many tests.

@gubaidulinvadim

Copy link
Copy Markdown
Contributor

Personally I don't see how we can freeze for refactoring. Part of the culture of open source is that everyone is allowed to make PRs about everything at any time if they have a suggestion. It doesn't mean the PR has to be accepted and merged in. But if you want the vague "before the summer vacations" deadline the steering committee has given the maintainers for simplification of configuration to mean the week of the 13th July that sounds like a good idea to make the deadline concrete.

For the community meetings the steering committee currently isn't so involved in it. They only approve a date that I set and then I come up with the agenda myself. If you think it would be a good idea to make it into a hackathon we can do that. The only thing I have lined up for a future meeting is a presentation about osprey but that doesn't have to be on the September meeting.

I agree that making smaller changes to merge things in in steps (like you are doing for the catalog) would be preferable. I will try to do that. Only issue is that we need to rewrite the tests since many of them currently relies on loading a configuration file. I think we should try to make unit tests which rely on as few other parts of the code as possible so parts can be tested individually. And then we can have integration tests which tests things together. I don't see the need to test loading the configuration in many tests.

There are no issues with PRs and changes for me. It's only "freezing" to keep us on track so we can meet all the milestones. I think the PR is timed well because it is now that we need to improve the configuration.

@JeanLucPons

JeanLucPons commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

I updated my code on the above post. I did 2 schema to handle validation and json schema.
There are definitely plenty of other ways to handle this with one single source of truth and arbitrary type.
We should then fall back in a situation similar to my first proposal.

@TeresiaOlsson

TeresiaOlsson commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

I quickly tested the creation of a dynamic model with arbitrary type and it worked. So it is possible to a have single source of truth which is the constructor signature.

from pydantic import create_model, ConfigDict
import inspect
import json

class MyClass1:
    
    def __init__(self,param1: str,param2: int):
        ...

class MyClass2:

    def __init__(self,param1:str,param2:MyClass1):
        ...

def register_schema(cls):

    # Inspect constructor signature and create dyammic schemas
    # for both validation and json schema export
    sig = inspect.signature(cls.__init__)
    fields = {}
    sch_fields = {}
    for param_name, param in sig.parameters.items():
        # Jump self
        if param_name == 'self':
            continue
        # Default value,
        if param.default != inspect.Parameter.empty:
            default = param.default
        else:
            default = ... # No default value specified -> Required field
        # Type
        annotation = param.annotation if param.annotation != inspect.Parameter.empty else str        
        sch_fields[param_name] = (annotation, default)
        fields[param_name] = (annotation, default)
        if annotation is MyClass1:
            # Arbitrary type example
            # Replace by its ConfigModel (for json schema export)
            annotation = MyClass1._model
        sch_fields[param_name] = (annotation, default)

    # Create dynamic model for the class
    model_name = cls.__module__ + "." + cls.__name__
    cls._model = create_model(model_name, __config__=ConfigDict(arbitrary_types_allowed=True), **fields)
    cls._schemamodel = create_model(model_name+"Schema", __config__=ConfigDict(arbitrary_types_allowed=True), **sch_fields)

    # Override __init__ to handle validation
    original_init = cls.__init__

    @classmethod
    def custom_init(cls, *args, **kwargs):        
        # Validate using dynamic model
        sig = inspect.signature(original_init)
        cls._model.model_validate(kwargs)
        original_init(cls, *args, **kwargs)

    cls.__init__ = custom_init
    return cls


register_schema(MyClass1)
register_schema(MyClass2)

mc1 = MyClass1(param1="test1",param2=10)
mc2 = MyClass2(param1="test2",param2=mc1)

top_level_json = MyClass2._schemamodel.model_json_schema()
with open('sch.json', 'w') as fp:
    json.dump(top_level_json, fp)

So the approach is to not have the ConfigModel but instead generate it automatically from the signature of the domain class as part of registering it in the schema registry? That sounds like an interesting idea and would also be possible to combine with the option to write your own dedicated schema class in case you want to use some of the more advanced validation features of pydantic like model_validator and field_validator. But I don't fully understand why you want to overwrite the constructor? Is it just because you want validation to happen automatically when creating the object instead of making the validation a separate step that is optional to do before?

I like this approach better than what I tested in #273 because that suddenly mixed validation and object creation and allowed pydantic to resolve nested models by itself and start to act like a factory. This approach doesn't seem to do that since you need to first create an object of MyClass1 and give that to MyClass2. So the object creation is still separated from the validation.

@TeresiaOlsson

Copy link
Copy Markdown
Contributor Author

I think the difference compared to adding __get_pydantic_core_schema__ is that mc2 = MyClass2(param1="test2",param2=mc1) will only validate so mc1 is of the correct type when running cls._model.model_validate(kwargs) and not do validation of the fields inside mc1. But that is not a major problem for me since the SchemaValidator will anyway do that type of nested validation. I don't think we need pydantic to do it for us.

@gupichon

gupichon commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

I'm not sure that overriding the constructor is a good idea. Model validation should not be done inside the class constructor, as you may want to validate data before the actual instantiation. This validation could be shared between the configuration validation layer and the constructor (as it might be used directly). Maybe in a class method with a decorator?

@TeresiaOlsson

Copy link
Copy Markdown
Contributor Author

I agree. But I think overriding the constructor is not needed. We only need that part if we want the class to validate automatically when an object is created. If not, we can keep the validation in a separate step using the SchemaValidator. That only requires the schema to be validated against to be registered in the SchemaRegistry.

@gupichon

gupichon commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

That only requires the schema to be validated against to be registered in the SchemaRegistry.

Yes, that's why I thinking about two decorators: one for registration and another for validation. If the second one is absent, it means no validation, which is a good first-step approach.

@JeanLucPons

Copy link
Copy Markdown
Contributor

Personally I would prefer to have the validation enabled by default for both "on the fly" construction and when loading.
Then a way to disable it globally by using the registry or for one single class.

@JeanLucPons

Copy link
Copy Markdown
Contributor

For configuration param getter/setter I imagine something like below.
The param1 property can be automatically defined to avoid supplication but , if we do like that, the code completion is lost in VS code :(

class ConfigReadWriteStringScalar:
    def __init__(self,parent: object,parameter:str):
        self.parent = parent
        self.parameter = parameter
    def set(self,value:str):
        setattr(self.parent,self.parameter,value)
    def get(self):
        return getattr(self.parent,self.parameter)


def assign_param(object,frame):
    ai:inspect.ArgInfo = inspect.getargvalues(frame)
    for argname,value in ai.locals.items():
        if argname != "self":
            setattr(object,"_pyaml_"+argname,value) # We can define a syntax usable for repr as well

class MyClass1:
    
    def __init__(self,param1: str,param2: int):
        # Assign parameter to local fields
        # To use this helper function, the constructor cannot be overloaded as above (lost of signature)
        assign_param(self,inspect.currentframe())

    # Follow the pyaml coding style and code completion is OK
    @property
    def param1(self) -> ConfigReadWriteStringScalar:
        return ConfigReadWriteStringScalar(self,"_pyaml_param1")

@TeresiaOlsson

Copy link
Copy Markdown
Contributor Author

Personally I would prefer to have the validation enabled by default for both "on the fly" construction and when loading. Then a way to disable it globally by using the registry or for one single class.

What do you think about separating it into two different decorators like @gupichon suggested? One which registers into the schema registry and one which adds the validation to the constructor? That feels easier to understand to me than one which is doing both things.

@TeresiaOlsson

Copy link
Copy Markdown
Contributor Author

For configuration param getter/setter I imagine something like below. The param1 property can be automatically defined to avoid supplication but , if we do like that, the code completion is lost in VS code :(

class ConfigReadWriteStringScalar:
    def __init__(self,parent: object,parameter:str):
        self.parent = parent
        self.parameter = parameter
    def set(self,value:str):
        setattr(self.parent,self.parameter,value)
    def get(self):
        return getattr(self.parent,self.parameter)


def assign_param(object,frame):
    ai:inspect.ArgInfo = inspect.getargvalues(frame)
    for argname,value in ai.locals.items():
        if argname != "self":
            setattr(object,"_pyaml_"+argname,value) # We can define a syntax usable for repr as well

class MyClass1:
    
    def __init__(self,param1: str,param2: int):
        # Assign parameter to local fields
        # To use this helper function, the constructor cannot be overloaded as above (lost of signature)
        assign_param(self,inspect.currentframe())

    # Follow the pyaml coding style and code completion is OK
    @property
    def param1(self) -> ConfigReadWriteStringScalar:
        return ConfigReadWriteStringScalar(self,"_pyaml_param1")

I'm not sure I understand the purpose of this. What is ConfigReadWriteStringScalar? for?

@JeanLucPons

JeanLucPons commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

What do you think about separating it into two different decorators like @gupichon suggested? One which registers into the schema registry and one which adds the validation to the constructor? That feels easier to understand to me than one which is doing both things.

I'm sure to see how you want to implement it but overriding the constructor creates the issue I mentioned above + the potential lost of completion in IPython. The validation can be handled in the assign_param helper function instead.

I'm not sure I understand the purpose of this. What is ConfigReadWriteStringScalar? for?

It allows to write:

mc1 = MyClass1(param1="test1",param2=10)
mc2 = MyClass2(param1="test2",param2=mc1)
print(mc1.param1.get())
mc1.param1.set("new value")
print(mc1.param1.get())

@gupichon

gupichon commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Hi @TeresiaOlsson and @JeanLucPons,
Here is an example that offers more flexibility with a single decorator (maybe easier than using two, by leveraging decorator parameters).
I have added some examples.

from pydantic import create_model, ConfigDict
from functools import wraps
import inspect
import json
import yaml


def register_schema(
    cls=None,
    *,
    override_constructor=True,
    custom_validation_method=None,
):
    def decorate(target_cls):
        # Inspect constructor signature and create dynamic schemas
        # for both validation and json schema export
        sig = inspect.signature(target_cls.__init__)
        fields = {}
        sch_fields = {}
        for param_name, param in sig.parameters.items():
            # Jump self
            if param_name == "self":
                continue
            # Default value,
            if param.default != inspect.Parameter.empty:
                default = param.default
            else:
                default = ...  # No default value specified -> Required field
            # Type
            annotation = (
                param.annotation if param.annotation != inspect.Parameter.empty else str
            )
            fields[param_name] = (annotation, default)

            # If another class has already been registered, use its schema model
            # for config validation and JSON Schema export.
            schema_annotation = getattr(annotation, "_schemamodel", annotation)
            sch_fields[param_name] = (schema_annotation, default)

        # Create dynamic model for the class
        model_name = target_cls.__module__ + "." + target_cls.__name__
        target_cls._model = create_model(
            model_name, __config__=ConfigDict(arbitrary_types_allowed=True), **fields
        )
        target_cls._schemamodel = create_model(
            model_name + "Schema",
            __config__=ConfigDict(arbitrary_types_allowed=True),
            **sch_fields,
        )

        def run_custom_validation(validated_config):
            if custom_validation_method is None:
                return
            custom_validation = getattr(target_cls, custom_validation_method)
            custom_validation(validated_config)

        def _validate_constructor_config(class_, config):
            validated_config = class_._model.model_validate(config)
            run_custom_validation(validated_config)
            return validated_config

        def validate_config(class_, config, *, input_format=None):
            if input_format not in (None, "json", "yaml"):
                raise ValueError("input_format must be None, 'json', or 'yaml'")

            if isinstance(config, str):
                if input_format == "json":
                    validated_config = class_._schemamodel.model_validate_json(config)
                elif input_format == "yaml":
                    validated_config = class_._schemamodel.model_validate(
                        yaml.safe_load(config)
                    )
                else:
                    try:
                        parsed_config = json.loads(config)
                    except json.JSONDecodeError:
                        parsed_config = yaml.safe_load(config)
                    validated_config = class_._schemamodel.model_validate(parsed_config)
            else:
                validated_config = class_._schemamodel.model_validate(config)

            run_custom_validation(validated_config)
            return validated_config

        target_cls._validate_constructor_config = classmethod(_validate_constructor_config)
        target_cls.validate_config = classmethod(validate_config)

        if override_constructor:
            # Override __init__ to handle validation
            original_init = target_cls.__init__
            original_sig = inspect.signature(original_init)

            @wraps(original_init)
            def custom_init(self, *args, **kwargs):
                # Validate using dynamic model
                bound_args = original_sig.bind_partial(self, *args, **kwargs)
                validation_data = {
                    name: value
                    for name, value in bound_args.arguments.items()
                    if name != "self"
                }
                target_cls._validate_constructor_config(validation_data)
                original_init(self, *args, **kwargs)

            target_cls.__init__ = custom_init
        return target_cls

    if cls is None:
        return decorate
    return decorate(cls)

@register_schema
class MyClass1:
    def __init__(self, param1: str, param2: int): ...


@register_schema(override_constructor=False, custom_validation_method="custom_validate")
class MyClass2:
    def __init__(self, param1: str, param2: MyClass1): ...

    @classmethod
    def custom_validate(cls, config):
        if config.param1 == config.param2.param1:
            raise ValueError("MyClass2.param1 must differ from MyClass1.param1")


@register_schema(override_constructor=False)
class MyClass3:
    def __init__(self, param1: str, param2: MyClass2): ...


# register_schema(MyClass1)
# register_schema(MyClass2)

mc1 = MyClass1(param1="test1", param2=10)
mc2 = MyClass2(param1="test2", param2=mc1)
mc3 = MyClass3(param1="test3", param2=mc2)

top_level_json = MyClass2._schemamodel.model_json_schema()
with open("sch.json", "w") as fp:
    json.dump(top_level_json, fp)


my_class1_json = """
{
  "param1": "from-json",
  "param2": 20
}
"""

validated_mc1 = MyClass1.validate_config(my_class1_json, input_format="json")
mc1_from_json = MyClass1(**validated_mc1.model_dump())


my_class2_yaml = """
param1: from-yaml
param2:
  param1: nested-from-yaml
  param2: 30
"""

validated_mc2 = MyClass2.validate_config(yaml.safe_load(my_class2_yaml))
validated_mc2_data = validated_mc2.model_dump()
mc2_param2 = MyClass1(**validated_mc2_data["param2"])
mc2_from_yaml = MyClass2(
    param1=validated_mc2_data["param1"],
    param2=mc2_param2,
)


my_class3_yaml = """
param1: top-from-yaml
param2:
  param1: nested-myclass2-from-yaml
  param2:
    param1: nested-myclass1-from-yaml
    param2: 40
"""

validated_mc3 = MyClass3.validate_config(my_class3_yaml, input_format="yaml")
validated_mc3_data = validated_mc3.model_dump()
mc3_param2_param2 = MyClass1(**validated_mc3_data["param2"]["param2"])
mc3_param2 = MyClass2(
    param1=validated_mc3_data["param2"]["param1"],
    param2=mc3_param2_param2,
)
mc3_from_yaml = MyClass3(
    param1=validated_mc3_data["param1"],
    param2=mc3_param2,
)

@gupichon

gupichon commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

I've done a few tests and it's a bit trickier with inheritance...

@TeresiaOlsson

Copy link
Copy Markdown
Contributor Author

What do you think about separating it into two different decorators like @gupichon suggested? One which registers into the schema registry and one which adds the validation to the constructor? That feels easier to understand to me than one which is doing both things.

I'm sure to see how you want to implement it but overriding the constructor creates the issue I mentioned above + the potential lost of completion in IPython. The validation can be handled in the assign_param helper function instead.

I'm not sure I understand the purpose of this. What is ConfigReadWriteStringScalar? for?

It allows to write:

mc1 = MyClass1(param1="test1",param2=10)
mc2 = MyClass2(param1="test2",param2=mc1)
print(mc1.param1.get())
mc1.param1.set("new value")
print(mc1.param1.get())

My personal opinion is that it will be too difficult for the users to understand. They will forget to use assign_param and it is confusing that you input one type and then the property for that attribute needs to return another type if it is just something simple as a string. We need to remember that many of them are going to be new to python and their default way of writing their own custom devices is going to be to follow the normal python conventions for how to assign attributes and how they are used to writing classes in pyEPICS/pyTANGO.

The only way I see it working is by doing something similar to how it is done in HAPPI where every attribute is an object of a custom class which takes care of the functionality. I think the users can understand if they need to write something like this:

class MyClass1:
    
    def __init__(self,param1: str, param2: int):
        self._param1 = pyAMLAttribute(param1)
        self._param2 = pyAMLAttribute(param2)

    @property
    def param1(self) -> pyAMLAttribute:
        return self._param1

and then the pyAMLAttribute will take care of the functionality. That I have seen people understanding since ophyd-async does something similar with signals. I think they understand it because the class makes it clearer that it is something which exist to provide some functionality. And it is much easier to teach people that every attribute of a class need to be of a specific type than to teach them that they need to write things a specific way.

But that is a different way to write classes which would affect the whole hardware abstraction layer. I'm not sure it's worth it compared to just doing the validation in a separate step before you initialize an object.

@JeanLucPons

JeanLucPons commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

and then the pyAMLAttribute will take care of the functionality...

What you propose is OK for me.

@TeresiaOlsson

Copy link
Copy Markdown
Contributor Author

I've done a few tests and it's a bit trickier with inheritance...

I think when using create_model to dynamically create the schema from the class the information about the inheritance must be kept or the JSON schema will not be able to resolve abstract classes into it's subclasses. Without that the JSON schema becomes useless. My current thinking is try to use virtual subclasses for it but not sure if it will work out.

@TeresiaOlsson

Copy link
Copy Markdown
Contributor Author

and then the pyAMLAttribute will take care of the functionality...

What you propose is OK for me.

In that case I would still leave it out of this PR. It already includes too many things and probably needs to be separated into steps somehow to be able to review and merge. One way might be to first implement the parts needed for the schema registry with as few changes as possible to the existing ConfigModels and classes (maybe even do it module by module if possible like you do for the catalog) and then in a second step remove the ConfigModels + add dynamic generation of them. Then in the third step implement the things needed for having the automatic validation when creating objects. What do you think?

I think there might be a way to make this PR smaller by using the same approach as for the dynamic generation to turn the ConfigModels into schema classes. Then they can be registered and the SchemaRegistry, SchemaValidator and SchemaGenerator can be reviewed and merged in without having to modify a lot of classes in the same PR. I'm working on that part now.

@JeanLucPons

JeanLucPons commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

I think when using create_model to dynamically create the schema from the class the information about the inheritance must be kept or the JSON schema will not be able to resolve abstract classes into it's subclasses. Without that the JSON schema becomes useless. My current thinking is try to use virtual subclasses for it but not sure if it will work out.

Yes. in my first example I create 2 models for that. The idea is to rebuild dynamically exactly what you did manually in your PR to export json format but it cannot handle validation. You can update your PR, get rid off configmodel and build it dynamically but it will be a bit difficult to handle correctly the following part will all registered classes (including compound type).

        if annotation is MyClass1:
            # Arbitrary type example
            # Replace by its ConfigModel (for json schema export)
            annotation = MyClass1._schemamodel  # sorry for the mistake (due to testing)

@JeanLucPons

JeanLucPons commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

In that case I would still leave it out of this PR.

I would do it at same time and cut the PR into categories rather than features as we do for hardware abstraction. Otherwise you will face repr issue (and likely others).
You are changing something fundamental that has a strong impact everywhere.

@gupichon

gupichon commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

I have noticed a major issue here. If you want to inherit from an existing class without adding any attributes, but simply want to change its behavior, it won't work. You would have to change the signature, which is quite annoying for high-level PyAML objects.

@JeanLucPons

Copy link
Copy Markdown
Contributor

Yes I think @TeresiaOlsson already mentioned this issue during the last meeting.
This is why she removed all magnet types (correctors, quad,...) and make them Magnet.
It will be an issue with pyaml_cs_oa but we will try to solve this issue.

@TeresiaOlsson

Copy link
Copy Markdown
Contributor Author

I just removed them because a JSON schema validator can't handle two schemas where everything is the same. In the eyes of the validator it is the same schema. So I thought that were no reason to maintain several classes. And my goal was also eventually to manage to have oneOf instead of anyOf in the schema but so far that has turned out to be difficult.

@gupichon

gupichon commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

There is one way to do this by storing the registered schemas, but it's ugly and quite complex...

Also, if the schema is the same (inheritance without new attributes), it would be possible to have an optionnal attribute like __class__ added by the decorator to specify that you want to use a specific version of the implementation, while using standard PyAML classes by default.

param1: top-MyClass3
param2:
  __class__: MyClass2Special
  param1: nested-special
  param2:
    param1: nested-myclass1
    param2: 45

Actually, it's a bit like the "type" field we are using today.

@JeanLucPons

JeanLucPons commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

... to manage to have oneOf instead of anyOf in the schema but so far that has turned out to be difficult...

Is oneOf a part of the json pydantic schema ?

@JeanLucPons

Copy link
Copy Markdown
Contributor

Is oneOf a part of the json pydantic schema ?

it is, at least, a part of JSON schema standard:
https://json-schema.org/draft/2020-12/json-schema-validation#section-7.3.7

@TeresiaOlsson

Copy link
Copy Markdown
Contributor Author

Is oneOf a part of the json pydantic schema ?

it is, at least, a part of JSON schema standard: https://json-schema.org/draft/2020-12/json-schema-validation#section-7.3.7

I think pydantic can give it but only if you use the discriminated unions with a discriminator field but I haven't tried. Mostly pydantic gives anyOf. Since we anyway need a custom JSON schema generator to replace the abstract classes in the schema with their implementation I just added to it to replace anyOf to oneOf. That was easy. The problem occurred when I tried to use it in the Metaconfigurator because it crashed and became weird when using oneOf instead of anyOf for some of the subschemas.

However, I'm still trying to debug it because I'm not sure that the problem is with the schema. It could equally well be a bug on their side with nested models. In the generator I also added a Literal with the string of each class which I think should make the schemas unique and fulfill oneOf. But that had the same bugs.

I wanted oneOf to work because it renders much nicer in the GUI. Instead of being allowed to click several options, the user is only allowed to click one. For example, it is confusing for the user why they can choose something to be both a Quadrupole and a Sextupole at the same time. That only makes sense to someone who has looked into the details of JSON schema validation and know the difference between anyOf and oneOf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants