Skip to content

Releases: kaparoo/kaparoo-python

v0.9.1

22 Jun 02:36

Choose a tag to compare

Changed

  • kaparoo.filesystem.hierarchy.locate now yields in a fully deterministic
    order: each directory's entries sorted by name and subdirectories descended
    in that same order (so locate_map's iteration order is deterministic too).
    Previously only siblings within a level were sorted; the order across sibling
    subtrees followed the OS directory order, so an open-depth match could vary
    by filesystem. validate's report was already sorted and is unchanged.
  • kaparoo.utils.aggregate.Aggregator.update no longer adds weight to the
    grand total for an empty values={} batch -- with nothing folded in, the
    call contributes no weight (the weight property counts weight actually
    folded in). A non-empty update is unchanged.

Fixed

  • kaparoo.filesystem.make_dirs now detects a duplicated path in its
    validate-first pass and raises FileExistsError before creating anything,
    under strict-create (exist_ok=False, clean=False). Previously the second
    occurrence's mkdir failed only after the first had already created the
    directory, leaving a partial side effect. A repeat stays harmless (idempotent)
    under exist_ok=True or clean=True and is still accepted there.
  • kaparoo.filesystem.wrap_path / wrap_paths now reject a Windows
    drive-relative prepend target or append value (e.g. C:foo -- a drive
    with no root) with ValueError, instead of silently discarding the other
    component (Path("base", "C:foo") collapses to Path("C:foo")). The guard
    moved from os.path.isabs to a Path.anchor check, which is platform-aware:
    C:foo stays an ordinary relative name on POSIX and is unaffected.

v0.9.0

21 Jun 16:55

Choose a tag to compare

Added

  • kaparoo.filesystem.utils.normalize_extension / normalize_extensions /
    file_extension: extension-string helpers. normalize_extension strips
    surrounding whitespace and leading dots (" .BIN " -> "BIN"), keeping case
    unless lowercase=True; normalize_extensions maps it over an iterable
    (threading lowercase; empties and duplicates deliberately kept -- that
    policy is the caller's). file_extension(path)
    returns the path's last (up to) level suffix(es), dot-joined and
    normalized -- level=2 yields "tar.gz" from data.tar.gz,
    lowercase=False keeps case, no suffix gives "". ensure_file_extension
    now builds on these. All are re-exported from the top-level
    kaparoo.filesystem namespace.
  • kaparoo.filesystem.exceptions.UnsupportedExtensionError (also re-exported
    from kaparoo.filesystem): a ValueError subclass for an extension that is
    none of the supported ones. The constructor normalizes supported (strips
    surrounding whitespace and leading dots, case preserved), de-duplicates it,
    and drops empties; an optional kind labels the message, rendering e.g.
    unsupported extension 'gif' (supported: 'jpg', 'png') (with for <kind>
    inserted when kind is given). It exposes ext / supported / kind.
  • kaparoo.filesystem.hierarchy.scaffold gains two options. on_create is a
    callback on_create(path, file_node) run once for each file actually
    created -- the seam for writing a file's content (scaffold otherwise leaves
    an empty skeleton); it is not called for an untouched existing file, under
    dry_run, or with dirs_only. dirs_only=True creates only the directory
    skeleton, skipping every file (including required ones); pairing it with
    on_create raises ValueError.
  • kaparoo.filesystem.hierarchy.Entry.is_direct_child: a read-only property,
    True only when the entry is pinned to exactly depth 1 (min_depth and
    max_depth both 1) -- the default, unranged position scaffold requires to
    create a node.

Changed

  • kaparoo.filesystem.utils.ensure_file_extension now raises the new
    UnsupportedExtensionError (a ValueError subclass, so existing
    except ValueError still catches it) instead of a plain ValueError when a
    path's final suffix is none of the accepted extensions. The empty-ext
    argument still raises a plain ValueError.
  • kaparoo.filesystem.hierarchy.scaffold now raises NotADirectoryError (a
    file where a directory is described) or NotAFileError (a directory where a
    file is described) for a wrong-kind conflict, instead of a plain ValueError,
    aligning with the rest of kaparoo.filesystem. Breaking for callers that
    caught these as ValueError -- NotADirectoryError / NotAFileError are
    OSError subclasses, not ValueError.

Fixed

  • kaparoo.utils.SpanTimer.measure now raises a clear, actionable RuntimeError
    when its block ends while still paused (a pause() left open across the block
    boundary), pointing to suspend(), instead of the misleading "Cannot record a
    lap while paused" it surfaced from the trailing lap.

v0.8.0

18 Jun 21:01

Choose a tag to compare

Added

  • kaparoo.filesystem.search (search_paths / search_files /
    search_dirs) gains an exclude= argument: paths to skip, as a StrPath
    (absolute under root, or root-relative), a Filter (matched on the
    root-relative POSIX path), a callable on the candidate Path (the real,
    filesystem-valid path), or an iterable of these (OR-combined). An
    excluded directory is pruned -- its subtree is never walked -- which
    name_filter cannot do (a directory failing name_filter is still
    descended). The excluder engine is shared with kaparoo.filesystem.hierarchy
    via the new internal kaparoo.filesystem.exclude module.

  • kaparoo.utils.checks: small validation guards, re-exported from
    kaparoo.utils. ensure_one_of(value, options, *, name) checks discrete
    membership (pass a range for an integer grid); ensure_in_range(value, *, lower, upper, step, inclusive, name) checks int / float bounds, with
    either side optional (half-open), inclusivity as a shared bool or a
    per-side tuple, and an optional step grid spacing (base + k*step,
    float-robust via math.isclose).

  • kaparoo.filters gains an enumerable filter family: LiteralFilter,
    OneOfFilter, TemplateFilter, and WithoutFilter (with short aliases
    Literal / OneOf / Template / Without, matching the rest of the
    package) implement an Expandable capability (expand()) that lists
    the finite set of names a filter matches, on top of the usual matches
    (Expandable is now a Filter subtype).
    Literal / OneOf are the case-sensitive, always-enumerable
    counterparts of Equals / EqualsAny; Template enumerates
    template.format(*combo) over the cartesian product of one or more
    value axes (Template("shard_{:03d}", range(8)),
    Template("{}_{}.png", ["real", "fake"], range(3))); Without(base, *excluded) is the enumerable form of And(base, Not(...)), expanding
    base minus anything the excluded filters match. They register as
    ordinary filter kinds ("literal" / "one_of" / "template" /
    "without") and each gets a matching TypedDict in
    kaparoo.filters.types (LiteralFilterDict, OneOfFilterDict,
    TemplateFilterDict, WithoutFilterDict) for statically-checked dict
    authoring.

  • kaparoo.filesystem.hierarchy: a new subpackage describing a filesystem
    tree declaratively. File / Directory nodes compose into a tree whose
    node names are kaparoo.filters filters — the full DSL (Glob,
    Regex, And / Or / Not, the enumerable Literal / OneOf /
    Template, ...) describes which siblings a node matches. As name sugar,
    a bare str becomes a Literal and a list[str] a OneOf, so one
    node can stand for several literally-named siblings that share a
    structure (Directory(["train", "val"], layout)); a sugar name must be
    a single path component (a / or \ separator raises ValueError).
    Nodes are immutable
    value objects (==, hash, repr) and take a keyword-only depth
    (default 1, a direct
    child) describing how far below the parent the entry sits, past
    intermediate directories of unknown name: an int is an exact level,
    None is any depth (the tree-level **), and a (min, max) tuple is
    an inclusive range (max=None unbounded), exposed as min_depth /
    max_depth. Each entry also takes a keyword-only required flag
    (default False) asserting it must be present. A Directory additionally
    takes a keyword-only allow_extra (default False, a bool | Filter):
    True makes validate / conformer ignore its on-disk contents that match
    none of its children (instead of reporting them unexpected), while a
    Filter ignores only those whose name it matches; a matched subdirectory
    keeps its own strictness. Two sibling constraints
    can sit among a directory's
    children: Exclusive (the present siblings may come from at most one of
    its alternatives, each a set of independent nodes on one side of the
    exclusion; required=True requires at least one; on_conflict="priority"
    resolves a multi-side conflict by declaration order — the first present
    alternative wins and the rest become unexpected — instead of the default
    "error") and Together (its
    members are all-or-nothing -- all present or all absent; required=True
    requires all). Both take Nodes, so constraints nest --
    Exclusive(Together(a, b), c) is "{a and b} or c". File / Directory
    (named, under the Entry base) and the constraint nodes Exclusive /
    Together (under a Group base that carries required and an
    entries accessor flattening to the leaf entries a constraint
    references, descending through nesting) share a common Node base, so a
    directory's children hold any Node. A whole tree round-trips through
    a "node"-discriminated dict (to_dict / Node.from_dict, mirroring
    the filter registry), so specs can be stored as JSON. The package
    depends on kaparoo.filters but nothing in kaparoo.filesystem.search.
    This first cut is the representation plus name-level semantics and the
    disk operations locate, validate, conformer, and scaffold (below).

  • kaparoo.filesystem.hierarchy.locate(tree, root): the first operation
    that applies a spec to a real filesystem. It maps each on-disk path
    under root (the container) to the spec node(s) it matches — by name
    filter, type (File ↔ file, Directory ↔ directory), and depth
    (intermediate levels of unknown name skipped) — yielding one
    (path, node) pair per match. It reports only what is present:
    Groups are treated as "any entry may appear," so Exclusive /
    Together enforcement and missing-required reporting are left to
    validate. A path may match several nodes (overlapping filters);
    locate yields one pair per node (lazily, duplicates kept by default; pass
    unique=True to suppress identical pairs), while the companion
    locate_map(tree, root) groups the results into a {path: (node, ...)}
    mapping (distinct nodes, spec-traversal order). Both take exclude= to
    drop paths from the results (e.g. specific cells of a Template product):
    an exclude rule — or an iterable of them, OR-combined — is a StrPath
    (absolute under root, or root-relative), a kaparoo.filters Filter
    matched on the root-relative POSIX string (the serializable counterpart of
    a callable), or a callable taking the candidate's own Path (the real,
    filesystem-valid path, so it may inspect the file), and a dropped directory
    has its whole subtree pruned. Pass root_as_top=True to
    treat root as the realized top node itself (you point at the top
    directly) rather than its container; the top must be an Entry (a Group
    raises TypeError) and root realizes it only when its leaf name / kind
    match, otherwise nothing is yielded.

  • kaparoo.filesystem.hierarchy.validate(tree, root): checks a real
    directory against a spec, returning a ValidationReport with matched
    (as locate_map), unexpected (paths matching no node — anything not
    matched and not an ancestor of a match, so contents of an unspecified
    directory count), missing (a required entry, or a required
    Exclusive / Together with nothing present), and violations (an
    Exclusive with more than one side present, or a partly-present
    Together). report.ok (and its truthiness) is True only when the
    last three are empty. A required entry is satisfied by one present match
    — an enumerable name (OneOf / Template) by any one listed name, an open
    name (Glob / Regex) by any one matching path. validate also accepts the
    same exclude= as locate, so excluded paths are dropped from matched
    and not reported unexpected. It also takes the same root_as_top=True to
    validate root as the realized top entry itself (a Group top raises
    TypeError); a leaf name / kind mismatch reports the top as missing
    without descending. A top-level allow_extra (bool | Filter) applies
    blanket leniency to every directory (and the container root), as if each
    carried it, combined with each Directory's own. Also exports the
    ValidationReport and Violation
    result types. Two reports combine with + (problem lists concatenate and
    matched merges, so the result is ok only when both are) for accumulating
    independent validations.

  • kaparoo.filesystem.hierarchy.conformer(spec): builds a path predicate (a
    search predicate) that accepts a path realizing spec's top node — a
    file matching a top File's name, or a directory matching a top
    Directory's name whose subtree conforms (via validate); a top Group
    is realized by any one of its alternatives / members. The path is always
    tested as the top of spec, never an inner node. Takes the same
    allow_extra as validate to accept a top whose subtree carries extra,
    unspecified entries. (Checking whether a path or sub-spec is contained
    within a spec is a separate future capability.)

  • kaparoo.filesystem.hierarchy.scaffold(tree, root): the write operation —
    creates the structure a spec describes under root (the container, made if
    absent) and returns the newly created paths in creation order. Only
    enumerable nodes are materialized: a node is creatable when its name is
    an Expandable filter (Literal / OneOf / Template / Without and the
    str / list[str] sugar) and it sits at a fixed depth of 1; open
    names (Glob, Regex) and non-fixed depths are acceptance patterns, so
    they are skipped when optional and raise when required. Together creates
    all members (all-or-nothing — a non-creatable member skips the whole set
    unless required); Exclusive creates the first fully-creatable
    alternative (declaration order is the priority). Files are created empty;
    creation is idempotent (existing directories are descended, existing files
    never clobbered) and a wrong-...

Read more

v0.7.0

04 Jun 18:29

Choose a tag to compare

Added

  • kaparoo.filesystem.utils.ensure_file_extension: a pure (no filesystem)
    extension check requiring a case-insensitive .<ext> final suffix
    (raising ValueError otherwise). ext may be a single extension or an
    iterable of acceptable ones (e.g. ("jpg", "jpeg")). add=True (mirroring
    make on ensure_dir_exists) appends the first extension when the path
    has no suffix instead of raising (np.save-style); a wrong suffix still
    raises. The leading dot on ext is optional.

Changed

  • Renamed SegmentTimer -> SpanTimer and SegmentRecord -> SpanRecord
    (module kaparoo.utils.timer). "Span" fits both lap (contiguous spans)
    and measure (arbitrary spans) without implying a partition, and avoids
    the "periodic timer" reading of interval. The lap / measure methods,
    the duration field, and all behavior are unchanged. Breaking: update
    imports from SegmentTimer / SegmentRecord to SpanTimer / SpanRecord.

v0.6.0

03 Jun 15:25

Choose a tag to compare

Added

  • kaparoo.data.sequences.TransformedSequence: a lazy view that applies a
    transform callable to each item of source. get_meta passes through
    source.get_meta by default (M_out = M_in); override in a subclass when
    M_out differs. T_out and M_out default to T_in / M_in (PEP 696).
  • kaparoo.data.sequences.ZippedSequence: element-wise zip of two
    sequences — item i is (first[i], second[i]) and metadata i is
    (M1, M2) (the "paired image + label" pattern ConcatSequence cannot
    express). strict=True (default) requires equal lengths and raises
    ValueError on a mismatch; strict=False truncates to the shorter
    length like the builtin zip. get_items / get_metas bulk-delegate to
    each source. For three or more, nest the pairs.

Changed

  • WindowedSequence[T, M_in, M_out]: M_out now defaults to M_in (PEP
    696), so the common case of M_out == M_in no longer requires the third
    type argument. Existing explicit three-argument usage is unaffected.
  • FileFolderSequence is now a subclass of FileListSequence — the folder
    case is just a FileListSequence whose list is discovered under a root
    and stored root-relative. Its API and behavior are unchanged (paths are
    still kept relative and get_file re-prepends root), but
    isinstance(seq, FileListSequence) is now True for folder sequences.

v0.5.0

02 Jun 05:47

Choose a tag to compare

Added

  • kaparoo.utils.aggregate (still experimental): Var and Std reductions
    -- weighted population variance and standard deviation, accumulated online
    (Welford) and merged exactly (Chan's parallel algorithm), so they nest
    across loop levels like the other reductions.
  • kaparoo.data.sequences.FileListSequence: a "one file per item"
    DataSequence over an explicit, ordered list of files. Unlike
    FileFolderSequence it takes the files directly (no root discovery),
    so they may live in unrelated directories -- or, on Windows, different
    drives -- which FileFolderSequence cannot represent. Subclasses
    implement only load_file / get_meta; the input order is preserved
    verbatim (duplicates kept) and files are loaded lazily.

Fixed

  • make_dirs now raises NotADirectoryError (matching make_dir) when a
    path exists but is not a directory, instead of the divergent
    FileExistsError that mkdir produced.
  • make_dir / make_dirs validate every path before any directory is
    wiped or created, so a deterministically bad entry (e.g. a file in the
    list) no longer leaves earlier directories already cleaned or created.
  • make_dir(clean=True) / make_dirs(clean=True) reject a symlink with
    NotADirectoryError rather than failing deep inside shutil.rmtree;
    cleaning never operates through a link.
  • reserve_path / reserve_paths treat a symlink -- including a broken
    one, which Path.exists reports as absent -- as occupying the path.
  • StagedFile.commit (with overwrite=False) no longer fails outright on a
    filesystem without hardlink support (FAT/exFAT, some network mounts): it
    falls back to an existence check plus replace instead of losing the staged
    content to a raw OSError.
  • StagedFile.commit / StagedDirectory.commit now fsync the destination's
    parent directory after the move, so the committed result survives a crash
    on POSIX (a no-op where directories cannot be fsynced, e.g. Windows).
  • StagedDirectory.commit with overwrite=True now restores the original
    directory if moving the staged one into place fails, instead of leaving
    the destination missing with the old contents stranded under a <name>.old
    name; the backup removal is best-effort.

v0.4.0

01 Jun 17:46

Choose a tag to compare

Added

  • kaparoo.filesystem.staged.StagedFile: a safe (atomic) file writer.
    Content is staged in a temporary file in the destination's directory and
    moved into place only on commit, so readers never see a half-written file
    and a failed write leaves any existing file untouched. Usable as a context
    manager (commit on clean exit, discard on exception) or explicitly like
    a file object (write / seek / tell / flush, plus commit /
    abort, path, committed, and the underlying file). Text by default
    (StagedFile[str]) with optional encoding / newline; binary=True
    gives a binary writer (StagedFile[bytes]), the type parameter tracking
    the mode. overwrite=False (default) fails fast on an existing destination
    and creates the file atomically; overwrite=True replaces it, keeping its
    permissions; make_parents=True creates a missing parent directory. An
    uncommitted writer discards its staged file on garbage collection.
  • kaparoo.filesystem.staged.StagedDirectory: the directory counterpart of
    StagedFile. Files are written into a temporary workdir in the
    destination's parent and moved into place on commit. Same context-manager /
    explicit usage and commit / abort / path / committed API (plus
    workdir), and the same overwrite / make_parents options. Creating a
    new directory is atomic (single rename); replacing an existing one
    (overwrite=True) swaps the old aside and removes it, which is not fully
    atomic. An uncommitted builder discards its staging directory on garbage
    collection.
  • kaparoo.filesystem.utils.reserve_path / reserve_paths: a guard (and
    its bulk form) for a path that should not yet exist, returning it
    (optionally stringified) so the caller can create something there.
    exist_ok (named as in make_dir / Path.mkdir) is a
    non-destructive bypass (nothing is deleted) and make_parents
    creates the parent directory when missing.
    Raises FileExistsError on conflict. reserve_paths is fail-fast and
    takes no root (compose with wrap_paths(prepend=...)). For directory
    destinations prefer make_dir(exist_ok=...); for exclusive file creation
    the stdlib open(path, "x") suffices.
  • clean option on make_dir / make_dirs: when an existing directory
    is present, remove its contents and recreate it empty (a fresh slate).
    Destructive, and only ever wipes a directory -- a non-directory at
    the path still raises NotADirectoryError. clean=True makes exist_ok
    moot, since the directory is removed and remade.
  • kaparoo.filesystem directory checks dir_not_empty,
    dir_not_empty_unsafe, dirs_not_empty, and dirs_not_empty_unsafe,
    the negated counterparts of the dir_empty series. dirs_not_empty
    is True only when every directory is non-empty.
  • kaparoo.utils.aggregate module (experimental -- the API may change in
    a later release)
    : Aggregator for nested, pluggable metric aggregation
    (the batch → epoch → run pattern). Each metric is
    reduced by a Reduction -- built-ins Mean (weighted), Sum, Min,
    Max, Last, and Fold (a scalar monoid from a callable) -- with
    per-metric overrides. Reductions are online (constant memory); nested
    levels compose via merge (exact sample-weighted pooling) or
    update(child.compute(), ...) (different reduction per level). Custom
    reductions subclass Reduction / UnweightedReduction.
  • SegmentTimer.measure(label): a stopwatch-style context manager (and
    decorator) that records a segment covering only the wrapped block, so
    time spent outside any measure block is excluded from records /
    summary. Complements lap, which splits the timeline into
    contiguous segments. Pauses inside the block are excluded; a block
    that raises records nothing.

Changed

  • Renamed LapTimer -> SegmentTimer, LapRecord -> SegmentRecord,
    and the record field lap_time -> duration, reflecting that the
    timer now records named segments via both lap (split) and the new
    measure (block). The lap method keeps its name.
  • Timer.resume / SegmentTimer.resume now return None instead of
    the pause duration in nanoseconds. The value had no consumer
    (suspend discarded it) and leaked a raw-nanosecond figure that broke
    the timer's unit abstraction. Subclasses that need the pause
    interval override the new protected _resume hook instead.

v0.3.0

27 May 18:49

Choose a tag to compare

Added

  • kaparoo.data.sequences subpackage: a Sequence-based foundation for
    dataset code.
    • DataSequence[T, M] ABC with abstract get_item / get_meta and
      default get_items / get_metas / get_pair / get_pairs.
      __getitem__ returns the item only.
    • Composers: SlicedSequence (stable-length view at given indices,
      duplicates allowed and order preserved); ConcatSequence
      (O(log N) lookup over multiple sources via cumulative lengths +
      bisect_right); WindowedSequence[T, M_in, M_out] (abstract
      sliding window with size / step / skip; get_item is
      implemented, get_meta is left abstract).
    • Templates: FileFolderSequence (folder-rooted, one file per item;
      subclasses implement list_files / load_file / get_meta;
      supports the "set state BEFORE super().__init__()" pattern for
      parameterized subclasses); SingleFileSequence (thin ABC for
      "one file, many records" formats).

Changed

  • generate_batches: step, skip, start, stop, and drop_last
    are now keyword-only. Empty ranges (start == stop) are accepted
    and yield no batches. Docstring expanded.

Fixed

  • register_filter decorator now preserves the decorated subclass's
    type. Previously it widened to type[Filter], so static checkers
    rejected subclass-specific constructor calls at decorated classes.
  • generate_batches with drop_last=False: the final partial window
    no longer extends past stop when stop < len(sequence).

Removed

  • kaparoo.data.sequence (single module) and kaparoo.data.utils
    replaced by the kaparoo.data.sequences subpackage. The previous
    DataSequence.by_index / by_indices API was a placeholder and
    has been superseded by get_item / get_items / get_meta /
    get_metas / get_pair / get_pairs.

v0.2.1

27 May 08:45

Choose a tag to compare

Added

  • Filter serialization: Filter.to_dict() / Filter.from_dict() with
    a "kind"-discriminated polymorphic dispatcher. Each concrete
    filter round-trips through a JSON-compatible dict.
  • register_filter(kind) decorator for registering custom Filter
    subclasses with the polymorphic dispatcher.
  • Filter.parse(value) — normalizes either a Filter instance
    (passed through) or a FilterDict into a Filter.
  • FilterDict TypedDict family at
    kaparoo.filesystem.search.filters.types: FilterDict (base,
    kind-only), PatternFilterDict, MultiPatternFilterDict,
    LogicalChildrenFilterDict, LogicalChildFilterDict. User-defined
    filter dicts extend these to type-check against Filter.parse.
  • Search.run / search_paths / search_files / search_dirs
    accept a FilterDict for part_filter and name_filter in
    addition to a Filter instance.

v0.2.0

26 May 18:40

Choose a tag to compare

Published on PyPI: https://pypi.org/project/kaparoo-python/0.2.0/

uv add kaparoo-python   # or: pip install kaparoo-python

Requires Python 3.14+.

See CHANGELOG.md for the full list of changes.