Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
5f2bdb8
Add metadata import module with format registry
rich-iannone Jun 14, 2026
286a988
Add file extension mapping and XML format set
rich-iannone Jun 14, 2026
6310ed2
Add _detect_format to infer metadata formats
rich-iannone Jun 14, 2026
6e2e148
Add JSON format detector for Frictionless/CSVW
rich-iannone Jun 14, 2026
b4cf2f3
Add XML format detection for CDISC files
rich-iannone Jun 14, 2026
564e58f
Add import_metadata dispatcher for formats
rich-iannone Jun 14, 2026
aabb127
Add metadata export module
rich-iannone Jun 14, 2026
8b5b3d8
Add export_metadata function (frictionless)
rich-iannone Jun 14, 2026
500b062
Add export to Frictionless Table Schema
rich-iannone Jun 14, 2026
1fd81d8
Add metadata-to-schema conversion helper
rich-iannone Jun 14, 2026
d6d0756
Add Validate builder from Metadata
rich-iannone Jun 14, 2026
f785b81
Add Frictionless & CSVW metadata reader
rich-iannone Jun 14, 2026
6dad8aa
Add CSVW datatype to Pointblank dtype map
rich-iannone Jun 14, 2026
f3796e9
Add Frictionless metadata reader
rich-iannone Jun 14, 2026
a984985
Add _extract_resource_schema helper
rich-iannone Jun 14, 2026
5c9b953
Add Frictionless table schema parser
rich-iannone Jun 14, 2026
9d8d673
Add CSVW (CSV on the Web) metadata reader
rich-iannone Jun 14, 2026
caafe6f
Add SPSS/SAS/Stata metadata readers
rich-iannone Jun 14, 2026
6328480
Add metadata types module scaffold
rich-iannone Jun 14, 2026
41339d6
Add Codelist and CodelistEntry dataclasses
rich-iannone Jun 14, 2026
b4442e3
Add MissingValueCode dataclass
rich-iannone Jun 14, 2026
b1df03c
Add CDISC Define-XML metadata reader
rich-iannone Jun 14, 2026
afde65a
Require lxml and detect Define-XML version
rich-iannone Jun 14, 2026
76dc15c
Add Define-XML metadata reader
rich-iannone Jun 14, 2026
e28c719
Add _parse_codelists to parse CodeList elements
rich-iannone Jun 14, 2026
bcfd558
Add _parse_item_defs for ItemDef parsing
rich-iannone Jun 14, 2026
0fc04cf
Add _parse_item_groups for ItemGroupDef parsing
rich-iannone Jun 14, 2026
cd2ec45
Add CDISC CT metadata reader
rich-iannone Jun 14, 2026
cac26fd
Add _build_ct_namespaces helper
rich-iannone Jun 14, 2026
2553ce6
Add _parse_ct_codelists for CDISC CT files
rich-iannone Jun 14, 2026
abb1ae7
Update _types.py
rich-iannone Jun 14, 2026
e60772f
Update _types.py
rich-iannone Jun 14, 2026
9899307
Update _types.py
rich-iannone Jun 14, 2026
325c491
Add VariableMetadata dataclass
rich-iannone Jun 14, 2026
7c309e3
Add MetadataImport dataclass for external metadata
rich-iannone Jun 14, 2026
6a67204
Add MetadataImport.to_schema() method
rich-iannone Jun 14, 2026
3806bcb
Add to_validate to MetadataImport
rich-iannone Jun 14, 2026
97be3c8
Add get_variable to MetadataImport
rich-iannone Jun 14, 2026
f245a1b
Add summary and accessors to MetadataImport
rich-iannone Jun 14, 2026
1889abd
Add MetadataPackage dataclass for multi-dataset
rich-iannone Jun 14, 2026
910f6f4
Add ADaM templates module
rich-iannone Jun 14, 2026
a3ba87e
Add ADaM dataset templates and structure validator
rich-iannone Jun 14, 2026
4cfc793
Add SDTM templates dataclass and exports
rich-iannone Jun 14, 2026
e162d89
Add SDTMDomainTemplate dataclass
rich-iannone Jun 14, 2026
32e8880
Add SDTM domain templates and validator
rich-iannone Jun 14, 2026
78a9bac
Update _adam_templates.py
rich-iannone Jun 14, 2026
838d0a0
Add ADaM dataset to MetadataImport converter
rich-iannone Jun 14, 2026
7322ca7
Add validate_adam workflow for ADaM datasets
rich-iannone Jun 14, 2026
b7ec5a7
Add SDTM metadata and validation helpers
rich-iannone Jun 14, 2026
e9a592c
Add pointblank.metadata package exports
rich-iannone Jun 14, 2026
a21817c
Expose metadata API in package __init__
rich-iannone Jun 14, 2026
ad95e80
Add cdisc extra with lxml dependency
rich-iannone Jun 14, 2026
7419813
Add metadata fixtures for tests
rich-iannone Jun 14, 2026
821cd2f
Robustly detect PySpark availability in tests
rich-iannone Jun 14, 2026
a25bd49
Improve PySpark test availability checks
rich-iannone Jun 14, 2026
e1cea3c
Add comprehensive metadata import/export tests
rich-iannone Jun 14, 2026
0e23b25
Add end-to-end metadata tests
rich-iannone Jun 14, 2026
4fe898a
Add metadata integration tests
rich-iannone Jun 14, 2026
23315e7
Add metadata import user guide
rich-iannone Jun 14, 2026
cf6e28a
Add statistical package metadata guide
rich-iannone Jun 14, 2026
6b3931d
Add CDISC validation user guide
rich-iannone Jun 14, 2026
48b927a
Add pyreadstat and lxml to dev dependencies
rich-iannone Jun 14, 2026
7960a90
Use correct key for SDTM unknown variables
rich-iannone Jun 14, 2026
e81628f
Add SDTM/ADaM and metadata reference to docs
rich-iannone Jun 14, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 78 additions & 24 deletions great-docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -116,10 +116,10 @@ reference:
sections:
- title: Validate
desc: >
When performing data validation, use the `Validate` class to get the process started.
It takes the target table and options for metadata and failure thresholds (using the
`Thresholds` class or shorthands). The `Validate` class has numerous methods for
defining validation steps and for obtaining post-interrogation metrics and data.
When performing data validation, use the `Validate` class to get the process started. It
takes the target table and options for metadata and failure thresholds (using the
`Thresholds` class or shorthands). The `Validate` class has numerous methods for defining
validation steps and for obtaining post-interrogation metrics and data.
contents:
- name: Validate
members: false
Expand Down Expand Up @@ -150,9 +150,9 @@ reference:

- title: Contract Import/Export
desc: >
Import external schema definitions (JSON Schema, Frictionless Table Schema, and more)
into Pointblank validation workflows, or export Pointblank contracts to those formats.
Use `import_contract()` as the entry point, `export_contract()` for the reverse, and
Import external schema definitions (JSON Schema, Frictionless Table Schema, and more) into
Pointblank validation workflows, or export Pointblank contracts to those formats. Use
`import_contract()` as the entry point, `export_contract()` for the reverse, and
`register_adapter()` to add support for custom formats.
contents:
- import_contract
Expand All @@ -166,9 +166,9 @@ reference:

- title: Validation Steps
desc: >
Validation steps are sequential validations on the target data. Call Validate's
validation methods to build up a validation plan: a collection of steps that provides
good validation coverage.
Validation steps are sequential validations on the target data. Call `Validate`'s validation
methods to build up a validation plan: a collection of steps that provides good validation
coverage.
contents:
- Validate.col_vals_gt
- Validate.col_vals_lt
Expand Down Expand Up @@ -223,9 +223,9 @@ reference:

- title: Column Selection
desc: >
Use the `col()` function along with column selection helpers to flexibly select columns
for validation. Combine `col()` with `starts_with()`, `matches()`, etc. for selecting
multiple target columns.
Use the `col()` function along with column selection helpers to flexibly select columns for
validation. Combine `col()` with `starts_with()`, `matches()`, etc. for selecting multiple
target columns.
contents:
- col
- starts_with
Expand All @@ -245,8 +245,8 @@ reference:

- title: Interrogation and Reporting
desc: >
The validation plan is executed when `interrogate()` is called. After interrogation,
view validation reports, extract metrics, or split data based on results.
The validation plan is executed when `interrogate()` is called. After interrogation, view
validation reports, extract metrics, or split data based on results.
contents:
- Validate.interrogate
- Validate.set_tbl
Expand All @@ -271,9 +271,9 @@ reference:

- title: Inspection and Assistance
desc: >
Functions for getting to grips with a new data table. Use DataScan for a quick
overview, `preview()` for first/last rows, `col_summary_tbl()` for column summaries,
and `missing_vals_tbl()` for missing value analysis.
Functions for getting to grips with a new data table. Use `DataScan` for a quick overview,
`preview()` for first/last rows, `col_summary_tbl()` for column summaries, and
`missing_vals_tbl()` for missing value analysis.
contents:
- DataScan
- preview
Expand All @@ -286,9 +286,9 @@ reference:

- title: Table Pre-checks
desc: >
Helper functions for use with the `active=` parameter of validation methods. These
inspect the target table before a step runs and conditionally skip the step when
preconditions are not met.
Helper functions for use with the `active=` parameter of validation methods. These inspect
the target table before a step runs and conditionally skip the step when preconditions are
not met.
contents:
- has_columns
- has_rows
Expand Down Expand Up @@ -338,11 +338,65 @@ reference:
- send_slack_notification
- emit_otel

- title: Metadata Import/Export
desc: >
Import variable-level metadata from external data standards files (CDISC Define-XML,
Controlled Terminology, SPSS `.sav`, SAS XPORT, Stata `.dta`, and more) and export metadata
to various formats. Use `import_metadata()` as the entry point and `export_metadata()` for
the reverse.
contents:
- import_metadata
- export_metadata
- name: MetadataImport
members: true
- name: MetadataPackage
members: true
- name: VariableMetadata
members: true
- name: Codelist
members: true
- name: CodelistEntry
members: true
- name: MissingValueCode
members: true

- title: SDTM Validation
desc: >
Validate clinical datasets against CDISC SDTM domain templates. Use `validate_sdtm()` to
generate a full `Validate` workflow, or `validate_sdtm_structure()` for a quick structural
conformance check. Retrieve domain templates with `get_sdtm_domain()` and
`list_sdtm_domains()`.
contents:
- validate_sdtm
- validate_sdtm_structure
- sdtm_to_metadata
- get_sdtm_domain
- list_sdtm_domains
- name: SDTMDomainTemplate
members: true
- name: SDTMVariableSpec
members: true

- title: ADaM Validation
desc: >
Validate analysis datasets against CDISC ADaM templates. Use `validate_adam()` to generate a
full `Validate` workflow, or `validate_adam_structure()` for a quick structural conformance
check. Retrieve dataset templates with `get_adam_dataset()` and `list_adam_datasets()`.
contents:
- validate_adam
- validate_adam_structure
- adam_to_metadata
- get_adam_dataset
- list_adam_datasets
- name: ADaMDatasetTemplate
members: true
- name: ADaMVariableSpec
members: true

- title: Integrations
desc: >
Classes for integrating Pointblank with external observability and monitoring
systems. Use `OTelExporter` to export validation results as OpenTelemetry
metrics, traces, and logs.
Classes for integrating Pointblank with external observability and monitoring systems. Use
`OTelExporter` to export validation results as OpenTelemetry metrics, traces, and logs.
contents:
- name: integrations.otel.OTelExporter
members: true
49 changes: 49 additions & 0 deletions pointblank/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,30 @@
from pointblank.generate.base import GeneratorConfig
from pointblank.inspect import has_columns, has_rows
from pointblank.integrations.otel import emit_otel
from pointblank.metadata import (
ADaMDatasetTemplate,
ADaMVariableSpec,
Codelist,
CodelistEntry,
MetadataImport,
MetadataPackage,
MissingValueCode,
SDTMDomainTemplate,
SDTMVariableSpec,
VariableMetadata,
adam_to_metadata,
export_metadata,
get_adam_dataset,
get_sdtm_domain,
import_metadata,
list_adam_datasets,
list_sdtm_domains,
sdtm_to_metadata,
validate_adam,
validate_adam_structure,
validate_sdtm,
validate_sdtm_structure,
)
from pointblank.pipeline import Pipeline, PipelineResult
from pointblank.schema import Schema, generate_dataset, schema_from_tbl
from pointblank.segments import seg_group
Expand Down Expand Up @@ -162,4 +186,29 @@
"export_contract",
"list_adapters",
"register_adapter",
# Metadata standards import/export
"import_metadata",
"export_metadata",
"MetadataImport",
"MetadataPackage",
"VariableMetadata",
"Codelist",
"CodelistEntry",
"MissingValueCode",
# SDTM domain validation
"SDTMDomainTemplate",
"SDTMVariableSpec",
"get_sdtm_domain",
"list_sdtm_domains",
"validate_sdtm_structure",
"sdtm_to_metadata",
"validate_sdtm",
# ADaM dataset validation
"ADaMDatasetTemplate",
"ADaMVariableSpec",
"get_adam_dataset",
"list_adam_datasets",
"validate_adam_structure",
"adam_to_metadata",
"validate_adam",
]
53 changes: 53 additions & 0 deletions pointblank/metadata/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
from __future__ import annotations

from pointblank.metadata._adam_templates import (
ADaMDatasetTemplate,
ADaMVariableSpec,
get_adam_dataset,
list_adam_datasets,
validate_adam_structure,
)
from pointblank.metadata._adam_validate import adam_to_metadata, validate_adam
from pointblank.metadata._export import export_metadata
from pointblank.metadata._import import import_metadata
from pointblank.metadata._sdtm_templates import (
SDTMDomainTemplate,
SDTMVariableSpec,
get_sdtm_domain,
list_sdtm_domains,
validate_sdtm_structure,
)
from pointblank.metadata._sdtm_validate import sdtm_to_metadata, validate_sdtm
from pointblank.metadata._types import (
Codelist,
CodelistEntry,
MetadataImport,
MetadataPackage,
MissingValueCode,
VariableMetadata,
)

__all__ = [
"CodelistEntry",
"Codelist",
"MissingValueCode",
"VariableMetadata",
"MetadataImport",
"MetadataPackage",
"SDTMDomainTemplate",
"SDTMVariableSpec",
"ADaMDatasetTemplate",
"ADaMVariableSpec",
"import_metadata",
"export_metadata",
"get_sdtm_domain",
"list_sdtm_domains",
"validate_sdtm_structure",
"sdtm_to_metadata",
"validate_sdtm",
"get_adam_dataset",
"list_adam_datasets",
"validate_adam_structure",
"adam_to_metadata",
"validate_adam",
]
Loading
Loading