feat: XSD sequence order validation, extension merging for imports, and xsd:any wildcards#24
Open
AlexanderWillner wants to merge 9 commits into
Open
Conversation
Add XSD 1.0 section 3.3.6 substitution group support to the XSD validator. When element B declares substitutionGroup='A', B can appear anywhere A is expected in a content model. This is transitive: if C substitutes for B, C also substitutes for A. Changes: - Add substitution_group and is_abstract fields to XsdElement - Add substitution_groups index to XsdSchema (head -> members map) - Parse substitutionGroup/abstract attributes in parse_element_decl - Build substitution index after schema parse via build_substitution_index - Extend element_matches_decl to accept substitution group members - Add is_substitution_member for transitive chain resolution - Resolve instance element type in validate_sequence_element for correct content validation of substituted elements
Parse <xs:complexContent><xs:extension base='...'> in complex type definitions. After all schemas are loaded, merge base-type content model particles with extension particles in derivation order. Post-processing step merge_extension_bases() resolves the full inheritance chain recursively (with cycle detection) and prepends base-type particles to the derived type's sequence. Adds parse_complex_content() handler, extension_base field on ComplexType, resolve_base_particles_impl() with visited-set guard, and 3 unit tests covering simple extension, multi-level chains, and empty-base extension.
When a schema uses targetNamespace and elementFormDefault='qualified', type references like adv:DerivedType now correctly resolve to local types instead of only searching imported namespaces. Adds targetNamespace self-check in resolve_type_name and resolve_element_ref, plus a last-resort local-name fallback in resolve_type_name. Also adds find_complex_type helper that searches both local and imported types for base particle resolution. New tests: complex content extension with targetNamespace, optional element ordering detection.
Three bugs prevented substitution group members declared in imported schemas from being recognized during XSD validation: 1. build_substitution_index() only scanned local schema.elements, missing imported elements that declare substitutionGroup membership. Fix: also iterate imported_namespaces.*.elements. 2. element_matches_decl() rejected same-named elements from different namespaces without checking substitution group membership. Fix: when namespace differs but local name matches, fall back to is_substitution_member() check. 3. is_substitution_member() only looked up transitive member declarations in local schema.elements. Fix: also search imported_namespaces.*.elements for member decls. Fixes: FeatureCollection substitution group, AbstractCRS abstract element.
element_matches_decl() now resolves the namespace of element declarations referenced via ref= attributes (e.g. ref="wfs:FeatureCollection") instead of always checking against the main schema's targetNamespace. This fixes validation of documents where imported elements have different namespaces than the main schema, such as WFS FeatureCollection in NAS/AAA schemas. Also: - Allow unqualified child elements for element_ref declarations - build_substitution_index scans imported elements - is_substitution_member looks up transitive members in imports
Verifies that FeatureCollection substitution group is correctly resolved when validating NAS/AAA files. Known remaining limitations documented: AbstractCRS via xlink:href, boundedBy in FeatureCollection.
validate_sequence() now detects when elements appear in wrong order within a sequence. When a child doesn't match the current particle, checks if it matches a later particle. If not, reports an ordering error instead of silently skipping. This catches cases like hatDirektUnten appearing before optional extension properties (bauwerksfunktion, ergebnisDerUeberpruefung, qualitaetsangaben) in AAA/NAS schemas. Also removes debug eprintln from element_matches_decl.
merge_extension_bases() now also processes complexContent extension chains in imported namespaces, not just the main schema. This fixes FeatureCollectionType (WFS) which extends SimpleFeatureCollectionType to include boundedBy + member particles. Also adds sequence order validation that detects misplaced elements within xs:sequence (e.g. hatDirektUnten before optional extension properties). Removes debug eprintln statements.
Adds XsdParticle::Any variant with namespace constraints (##any, ##other, explicit list) and processContents modes (strict/lax/skip). - parse_any_wildcard() parses <xsd:any> declarations - validate_any_wildcard() consumes matching child elements - Choice validation accepts wildcard as valid alternative - matches_later_particle() treats Any as always matching This unblocks validation of NAS features inside <wfs:member> which uses <xsd:any processContents="lax" namespace="##other"/>.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Four XSD validation improvements that build on top of PR #23 (substitution groups + complexContent).
1. Sequence order validation (
921372c)validate_sequence()now detects misplaced child elements. When a child matches a later particle in the sequence (not the current one),matches_later_particle()identifies it and reports a sequence order violation. Previously, out-of-order elements were silently accepted.2. Extension base merging for imported types (
75efca9)merge_extension_bases()now processes complexContent extension chains in imported namespaces (not just the main schema). Uses extractedmerge_type_extension()helper to avoid borrow conflicts. Critical for schemas where a type in namespace A extends a type in namespace B (e.g., WFSFeatureCollectionTypeextends GMLSimpleFeatureCollectionType).3. NAS regression test (
7b791d7)Adds a test that validates a real AAA/NAS XML document against its XSD schema, exercising substitution groups, cross-namespace refs, and extension merging.
4. xsd:any wildcard support (
640ece3)New
XsdParticle::Any(XsdAny)variant with:XsdAnyNamespace—##any,##other, or explicit namespace listXsdProcessContents—strict,lax,skipparse_any_wildcard()— parses<xsd:any>elementsvalidate_any_wildcard()— consumes matching child elements per namespace constraintThis is required for validating WFS FeatureCollections where
<xsd:any processContents="lax" namespace="##other"/>appears inside<wfs:member>.Files changed
src/validation/xsd.rs— ~407 lines added across 4 commitsTesting