Releases: kaparoo/kaparoo-python
v0.9.1
Changed
kaparoo.filesystem.hierarchy.locatenow yields in a fully deterministic
order: each directory's entries sorted by name and subdirectories descended
in that same order (solocate_map's iteration order is deterministic too).
Previously only siblings within a level were sorted; the order across sibling
subtrees followed the OS directory order, so an open-depth match could vary
by filesystem.validate's report was already sorted and is unchanged.kaparoo.utils.aggregate.Aggregator.updateno longer addsweightto the
grand total for an emptyvalues={}batch -- with nothing folded in, the
call contributes no weight (theweightproperty counts weight actually
folded in). A non-empty update is unchanged.
Fixed
kaparoo.filesystem.make_dirsnow detects a duplicated path in its
validate-first pass and raisesFileExistsErrorbefore creating anything,
under strict-create (exist_ok=False,clean=False). Previously the second
occurrence'smkdirfailed only after the first had already created the
directory, leaving a partial side effect. A repeat stays harmless (idempotent)
underexist_ok=Trueorclean=Trueand is still accepted there.kaparoo.filesystem.wrap_path/wrap_pathsnow reject a Windows
drive-relativeprependtarget orappendvalue (e.g.C:foo-- a drive
with no root) withValueError, instead of silently discarding the other
component (Path("base", "C:foo")collapses toPath("C:foo")). The guard
moved fromos.path.isabsto aPath.anchorcheck, which is platform-aware:
C:foostays an ordinary relative name on POSIX and is unaffected.
v0.9.0
Added
kaparoo.filesystem.utils.normalize_extension/normalize_extensions/
file_extension: extension-string helpers.normalize_extensionstrips
surrounding whitespace and leading dots (" .BIN " -> "BIN"), keeping case
unlesslowercase=True;normalize_extensionsmaps it over an iterable
(threadinglowercase; empties and duplicates deliberately kept -- that
policy is the caller's).file_extension(path)
returns the path's last (up to)levelsuffix(es), dot-joined and
normalized --level=2yields"tar.gz"fromdata.tar.gz,
lowercase=Falsekeeps case, no suffix gives"".ensure_file_extension
now builds on these. All are re-exported from the top-level
kaparoo.filesystemnamespace.kaparoo.filesystem.exceptions.UnsupportedExtensionError(also re-exported
fromkaparoo.filesystem): aValueErrorsubclass for an extension that is
none of the supported ones. The constructor normalizessupported(strips
surrounding whitespace and leading dots, case preserved), de-duplicates it,
and drops empties; an optionalkindlabels the message, rendering e.g.
unsupported extension 'gif' (supported: 'jpg', 'png')(withfor <kind>
inserted whenkindis given). It exposesext/supported/kind.kaparoo.filesystem.hierarchy.scaffoldgains two options.on_createis a
callbackon_create(path, file_node)run once for each file actually
created -- the seam for writing a file's content (scaffold otherwise leaves
an empty skeleton); it is not called for an untouched existing file, under
dry_run, or withdirs_only.dirs_only=Truecreates only the directory
skeleton, skipping every file (includingrequiredones); pairing it with
on_createraisesValueError.kaparoo.filesystem.hierarchy.Entry.is_direct_child: a read-only property,
Trueonly when the entry is pinned to exactlydepth1 (min_depthand
max_depthboth 1) -- the default, unranged positionscaffoldrequires to
create a node.
Changed
kaparoo.filesystem.utils.ensure_file_extensionnow raises the new
UnsupportedExtensionError(aValueErrorsubclass, so existing
except ValueErrorstill catches it) instead of a plainValueErrorwhen a
path's final suffix is none of the accepted extensions. The empty-ext
argument still raises a plainValueError.kaparoo.filesystem.hierarchy.scaffoldnow raisesNotADirectoryError(a
file where a directory is described) orNotAFileError(a directory where a
file is described) for a wrong-kind conflict, instead of a plainValueError,
aligning with the rest ofkaparoo.filesystem. Breaking for callers that
caught these asValueError--NotADirectoryError/NotAFileErrorare
OSErrorsubclasses, notValueError.
Fixed
kaparoo.utils.SpanTimer.measurenow raises a clear, actionableRuntimeError
when its block ends while still paused (apause()left open across the block
boundary), pointing tosuspend(), instead of the misleading "Cannot record a
lap while paused" it surfaced from the trailinglap.
v0.8.0
Added
-
kaparoo.filesystem.search(search_paths/search_files/
search_dirs) gains anexclude=argument: paths to skip, as aStrPath
(absolute underroot, or root-relative), aFilter(matched on the
root-relative POSIX path), a callable on the candidatePath(the real,
filesystem-valid path), or an iterable of these (OR-combined). An
excluded directory is pruned -- its subtree is never walked -- which
name_filtercannot do (a directory failingname_filteris still
descended). The excluder engine is shared withkaparoo.filesystem.hierarchy
via the new internalkaparoo.filesystem.excludemodule. -
kaparoo.utils.checks: small validation guards, re-exported from
kaparoo.utils.ensure_one_of(value, options, *, name)checks discrete
membership (pass arangefor an integer grid);ensure_in_range(value, *, lower, upper, step, inclusive, name)checksint/floatbounds, with
either side optional (half-open), inclusivity as a sharedboolor a
per-side tuple, and an optionalstepgrid spacing (base + k*step,
float-robust viamath.isclose). -
kaparoo.filtersgains an enumerable filter family:LiteralFilter,
OneOfFilter,TemplateFilter, andWithoutFilter(with short aliases
Literal/OneOf/Template/Without, matching the rest of the
package) implement anExpandablecapability (expand()) that lists
the finite set of names a filter matches, on top of the usualmatches
(Expandableis now aFiltersubtype).
Literal/OneOfare the case-sensitive, always-enumerable
counterparts ofEquals/EqualsAny;Templateenumerates
template.format(*combo)over the cartesian product of one or more
value axes (Template("shard_{:03d}", range(8)),
Template("{}_{}.png", ["real", "fake"], range(3)));Without(base, *excluded)is the enumerable form ofAnd(base, Not(...)), expanding
baseminus anything the excluded filters match. They register as
ordinary filter kinds ("literal"/"one_of"/"template"/
"without") and each gets a matching TypedDict in
kaparoo.filters.types(LiteralFilterDict,OneOfFilterDict,
TemplateFilterDict,WithoutFilterDict) for statically-checked dict
authoring. -
kaparoo.filesystem.hierarchy: a new subpackage describing a filesystem
tree declaratively.File/Directorynodes compose into a tree whose
node names arekaparoo.filtersfilters — the full DSL (Glob,
Regex,And/Or/Not, the enumerableLiteral/OneOf/
Template, ...) describes which siblings a node matches. As name sugar,
a barestrbecomes aLiteraland alist[str]aOneOf, so one
node can stand for several literally-named siblings that share a
structure (Directory(["train", "val"], layout)); a sugar name must be
a single path component (a/or\separator raisesValueError).
Nodes are immutable
value objects (==,hash,repr) and take a keyword-onlydepth
(default1, a direct
child) describing how far below the parent the entry sits, past
intermediate directories of unknown name: anintis an exact level,
Noneis any depth (the tree-level**), and a(min, max)tuple is
an inclusive range (max=Noneunbounded), exposed asmin_depth/
max_depth. Each entry also takes a keyword-onlyrequiredflag
(defaultFalse) asserting it must be present. ADirectoryadditionally
takes a keyword-onlyallow_extra(defaultFalse, abool | Filter):
Truemakesvalidate/conformerignore its on-disk contents that match
none of itschildren(instead of reporting themunexpected), while a
Filterignores only those whose name it matches; a matched subdirectory
keeps its own strictness. Two sibling constraints
can sit among a directory's
children:Exclusive(the present siblings may come from at most one of
its alternatives, each a set of independent nodes on one side of the
exclusion;required=Truerequires at least one;on_conflict="priority"
resolves a multi-side conflict by declaration order — the first present
alternative wins and the rest becomeunexpected— instead of the default
"error") andTogether(its
members are all-or-nothing -- all present or all absent;required=True
requires all). Both takeNodes, so constraints nest --
Exclusive(Together(a, b), c)is "{a and b} or c".File/Directory
(named, under theEntrybase) and the constraint nodesExclusive/
Together(under aGroupbase that carriesrequiredand an
entriesaccessor flattening to the leaf entries a constraint
references, descending through nesting) share a commonNodebase, so a
directory'schildrenhold anyNode. A whole tree round-trips through
a"node"-discriminated dict (to_dict/Node.from_dict, mirroring
the filter registry), so specs can be stored as JSON. The package
depends onkaparoo.filtersbut nothing inkaparoo.filesystem.search.
This first cut is the representation plus name-level semantics and the
disk operationslocate,validate,conformer, andscaffold(below). -
kaparoo.filesystem.hierarchy.locate(tree, root): the first operation
that applies a spec to a real filesystem. It maps each on-disk path
underroot(the container) to the spec node(s) it matches — by name
filter, type (File↔ file,Directory↔ directory), anddepth
(intermediate levels of unknown name skipped) — yielding one
(path, node)pair per match. It reports only what is present:
Groups are treated as "any entry may appear," soExclusive/
Togetherenforcement and missing-requiredreporting are left to
validate. A path may match several nodes (overlapping filters);
locateyields one pair per node (lazily, duplicates kept by default; pass
unique=Trueto suppress identical pairs), while the companion
locate_map(tree, root)groups the results into a{path: (node, ...)}
mapping (distinct nodes, spec-traversal order). Both takeexclude=to
drop paths from the results (e.g. specific cells of aTemplateproduct):
an exclude rule — or an iterable of them, OR-combined — is aStrPath
(absolute underroot, or root-relative), akaparoo.filtersFilter
matched on the root-relative POSIX string (the serializable counterpart of
a callable), or a callable taking the candidate's ownPath(the real,
filesystem-valid path, so it may inspect the file), and a dropped directory
has its whole subtree pruned. Passroot_as_top=Trueto
treatrootas the realized top node itself (you point at the top
directly) rather than its container; the top must be anEntry(aGroup
raisesTypeError) androotrealizes it only when its leaf name / kind
match, otherwise nothing is yielded. -
kaparoo.filesystem.hierarchy.validate(tree, root): checks a real
directory against a spec, returning aValidationReportwithmatched
(aslocate_map),unexpected(paths matching no node — anything not
matched and not an ancestor of a match, so contents of an unspecified
directory count),missing(arequiredentry, or arequired
Exclusive/Togetherwith nothing present), andviolations(an
Exclusivewith more than one side present, or a partly-present
Together).report.ok(and its truthiness) isTrueonly when the
last three are empty. Arequiredentry is satisfied by one present match
— an enumerable name (OneOf/Template) by any one listed name, an open
name (Glob/Regex) by any one matching path.validatealso accepts the
sameexclude=aslocate, so excluded paths are dropped frommatched
and not reportedunexpected. It also takes the sameroot_as_top=Trueto
validaterootas the realized top entry itself (aGrouptop raises
TypeError); a leaf name / kind mismatch reports the top asmissing
without descending. A top-levelallow_extra(bool | Filter) applies
blanket leniency to every directory (and the containerroot), as if each
carried it, combined with eachDirectory's own. Also exports the
ValidationReportandViolation
result types. Two reports combine with+(problem lists concatenate and
matchedmerges, so the result isokonly when both are) for accumulating
independent validations. -
kaparoo.filesystem.hierarchy.conformer(spec): builds a path predicate (a
searchpredicate) that accepts a path realizingspec's top node — a
file matching a topFile's name, or a directory matching a top
Directory's name whose subtree conforms (viavalidate); a topGroup
is realized by any one of its alternatives / members. The path is always
tested as the top ofspec, never an inner node. Takes the same
allow_extraasvalidateto accept a top whose subtree carries extra,
unspecified entries. (Checking whether a path or sub-spec is contained
within a spec is a separate future capability.) -
kaparoo.filesystem.hierarchy.scaffold(tree, root): the write operation —
creates the structure a spec describes underroot(the container, made if
absent) and returns the newly created paths in creation order. Only
enumerable nodes are materialized: a node is creatable when itsnameis
anExpandablefilter (Literal/OneOf/Template/Withoutand the
str/list[str]sugar) and it sits at a fixeddepthof 1; open
names (Glob,Regex) and non-fixed depths are acceptance patterns, so
they are skipped when optional and raise whenrequired.Togethercreates
all members (all-or-nothing — a non-creatable member skips the whole set
unlessrequired);Exclusivecreates the first fully-creatable
alternative (declaration order is the priority). Files are created empty;
creation is idempotent (existing directories are descended, existing files
never clobbered) and a wrong-...
v0.7.0
Added
kaparoo.filesystem.utils.ensure_file_extension: a pure (no filesystem)
extension check requiring a case-insensitive.<ext>final suffix
(raisingValueErrorotherwise).extmay be a single extension or an
iterable of acceptable ones (e.g.("jpg", "jpeg")).add=True(mirroring
makeonensure_dir_exists) appends the first extension when the path
has no suffix instead of raising (np.save-style); a wrong suffix still
raises. The leading dot onextis optional.
Changed
- Renamed
SegmentTimer->SpanTimerandSegmentRecord->SpanRecord
(modulekaparoo.utils.timer). "Span" fits bothlap(contiguous spans)
andmeasure(arbitrary spans) without implying a partition, and avoids
the "periodic timer" reading of interval. Thelap/measuremethods,
thedurationfield, and all behavior are unchanged. Breaking: update
imports fromSegmentTimer/SegmentRecordtoSpanTimer/SpanRecord.
v0.6.0
Added
kaparoo.data.sequences.TransformedSequence: a lazy view that applies a
transformcallable to each item ofsource.get_metapasses through
source.get_metaby default (M_out = M_in); override in a subclass when
M_outdiffers.T_outandM_outdefault toT_in/M_in(PEP 696).kaparoo.data.sequences.ZippedSequence: element-wise zip of two
sequences — itemiis(first[i], second[i])and metadataiis
(M1, M2)(the "paired image + label" patternConcatSequencecannot
express).strict=True(default) requires equal lengths and raises
ValueErroron a mismatch;strict=Falsetruncates to the shorter
length like the builtinzip.get_items/get_metasbulk-delegate to
each source. For three or more, nest the pairs.
Changed
WindowedSequence[T, M_in, M_out]:M_outnow defaults toM_in(PEP
696), so the common case ofM_out == M_inno longer requires the third
type argument. Existing explicit three-argument usage is unaffected.FileFolderSequenceis now a subclass ofFileListSequence— the folder
case is just aFileListSequencewhose list is discovered under aroot
and stored root-relative. Its API and behavior are unchanged (paths are
still kept relative andget_filere-prependsroot), but
isinstance(seq, FileListSequence)is now True for folder sequences.
v0.5.0
Added
kaparoo.utils.aggregate(still experimental):VarandStdreductions
-- weighted population variance and standard deviation, accumulated online
(Welford) and merged exactly (Chan's parallel algorithm), so they nest
across loop levels like the other reductions.kaparoo.data.sequences.FileListSequence: a "one file per item"
DataSequenceover an explicit, ordered list of files. Unlike
FileFolderSequenceit takes the files directly (norootdiscovery),
so they may live in unrelated directories -- or, on Windows, different
drives -- whichFileFolderSequencecannot represent. Subclasses
implement onlyload_file/get_meta; the input order is preserved
verbatim (duplicates kept) and files are loaded lazily.
Fixed
make_dirsnow raisesNotADirectoryError(matchingmake_dir) when a
path exists but is not a directory, instead of the divergent
FileExistsErrorthatmkdirproduced.make_dir/make_dirsvalidate every path before any directory is
wiped or created, so a deterministically bad entry (e.g. a file in the
list) no longer leaves earlier directories already cleaned or created.make_dir(clean=True)/make_dirs(clean=True)reject a symlink with
NotADirectoryErrorrather than failing deep insideshutil.rmtree;
cleaning never operates through a link.reserve_path/reserve_pathstreat a symlink -- including a broken
one, whichPath.existsreports as absent -- as occupying the path.StagedFile.commit(withoverwrite=False) no longer fails outright on a
filesystem without hardlink support (FAT/exFAT, some network mounts): it
falls back to an existence check plus replace instead of losing the staged
content to a rawOSError.StagedFile.commit/StagedDirectory.commitnow fsync the destination's
parent directory after the move, so the committed result survives a crash
on POSIX (a no-op where directories cannot be fsynced, e.g. Windows).StagedDirectory.commitwithoverwrite=Truenow restores the original
directory if moving the staged one into place fails, instead of leaving
the destination missing with the old contents stranded under a<name>.old
name; the backup removal is best-effort.
v0.4.0
Added
kaparoo.filesystem.staged.StagedFile: a safe (atomic) file writer.
Content is staged in a temporary file in the destination's directory and
moved into place only on commit, so readers never see a half-written file
and a failed write leaves any existing file untouched. Usable as a context
manager (commit on clean exit, discard on exception) or explicitly like
a file object (write/seek/tell/flush, pluscommit/
abort,path,committed, and the underlyingfile). Text by default
(StagedFile[str]) with optionalencoding/newline;binary=True
gives a binary writer (StagedFile[bytes]), the type parameter tracking
the mode.overwrite=False(default) fails fast on an existing destination
and creates the file atomically;overwrite=Truereplaces it, keeping its
permissions;make_parents=Truecreates a missing parent directory. An
uncommitted writer discards its staged file on garbage collection.kaparoo.filesystem.staged.StagedDirectory: the directory counterpart of
StagedFile. Files are written into a temporaryworkdirin the
destination's parent and moved into place on commit. Same context-manager /
explicit usage andcommit/abort/path/committedAPI (plus
workdir), and the sameoverwrite/make_parentsoptions. Creating a
new directory is atomic (single rename); replacing an existing one
(overwrite=True) swaps the old aside and removes it, which is not fully
atomic. An uncommitted builder discards its staging directory on garbage
collection.kaparoo.filesystem.utils.reserve_path/reserve_paths: a guard (and
its bulk form) for a path that should not yet exist, returning it
(optionally stringified) so the caller can create something there.
exist_ok(named as inmake_dir/Path.mkdir) is a
non-destructive bypass (nothing is deleted) andmake_parents
creates the parent directory when missing.
RaisesFileExistsErroron conflict.reserve_pathsis fail-fast and
takes noroot(compose withwrap_paths(prepend=...)). For directory
destinations prefermake_dir(exist_ok=...); for exclusive file creation
the stdlibopen(path, "x")suffices.cleanoption onmake_dir/make_dirs: when an existing directory
is present, remove its contents and recreate it empty (a fresh slate).
Destructive, and only ever wipes a directory -- a non-directory at
the path still raisesNotADirectoryError.clean=Truemakesexist_ok
moot, since the directory is removed and remade.kaparoo.filesystemdirectory checksdir_not_empty,
dir_not_empty_unsafe,dirs_not_empty, anddirs_not_empty_unsafe,
the negated counterparts of thedir_emptyseries.dirs_not_empty
is True only when every directory is non-empty.kaparoo.utils.aggregatemodule (experimental -- the API may change in
a later release):Aggregatorfor nested, pluggable metric aggregation
(the batch → epoch → run pattern). Each metric is
reduced by aReduction-- built-insMean(weighted),Sum,Min,
Max,Last, andFold(a scalar monoid from a callable) -- with
per-metricoverrides. Reductions are online (constant memory); nested
levels compose viamerge(exact sample-weighted pooling) or
update(child.compute(), ...)(different reduction per level). Custom
reductions subclassReduction/UnweightedReduction.SegmentTimer.measure(label): a stopwatch-style context manager (and
decorator) that records a segment covering only the wrapped block, so
time spent outside anymeasureblock is excluded fromrecords/
summary. Complementslap, which splits the timeline into
contiguous segments. Pauses inside the block are excluded; a block
that raises records nothing.
Changed
- Renamed
LapTimer->SegmentTimer,LapRecord->SegmentRecord,
and the record fieldlap_time->duration, reflecting that the
timer now records named segments via bothlap(split) and the new
measure(block). Thelapmethod keeps its name. Timer.resume/SegmentTimer.resumenow returnNoneinstead of
the pause duration in nanoseconds. The value had no consumer
(suspenddiscarded it) and leaked a raw-nanosecond figure that broke
the timer'sunitabstraction. Subclasses that need the pause
interval override the new protected_resumehook instead.
v0.3.0
Added
kaparoo.data.sequencessubpackage: aSequence-based foundation for
dataset code.DataSequence[T, M]ABC with abstractget_item/get_metaand
defaultget_items/get_metas/get_pair/get_pairs.
__getitem__returns the item only.- Composers:
SlicedSequence(stable-length view at given indices,
duplicates allowed and order preserved);ConcatSequence
(O(log N) lookup over multiple sources via cumulative lengths +
bisect_right);WindowedSequence[T, M_in, M_out](abstract
sliding window withsize/step/skip;get_itemis
implemented,get_metais left abstract). - Templates:
FileFolderSequence(folder-rooted, one file per item;
subclasses implementlist_files/load_file/get_meta;
supports the "set state BEFOREsuper().__init__()" pattern for
parameterized subclasses);SingleFileSequence(thin ABC for
"one file, many records" formats).
Changed
generate_batches:step,skip,start,stop, anddrop_last
are now keyword-only. Empty ranges (start == stop) are accepted
and yield no batches. Docstring expanded.
Fixed
register_filterdecorator now preserves the decorated subclass's
type. Previously it widened totype[Filter], so static checkers
rejected subclass-specific constructor calls at decorated classes.generate_batcheswithdrop_last=False: the final partial window
no longer extends paststopwhenstop < len(sequence).
Removed
kaparoo.data.sequence(single module) andkaparoo.data.utils—
replaced by thekaparoo.data.sequencessubpackage. The previous
DataSequence.by_index/by_indicesAPI was a placeholder and
has been superseded byget_item/get_items/get_meta/
get_metas/get_pair/get_pairs.
v0.2.1
Added
- Filter serialization:
Filter.to_dict()/Filter.from_dict()with
a"kind"-discriminated polymorphic dispatcher. Each concrete
filter round-trips through a JSON-compatible dict. register_filter(kind)decorator for registering customFilter
subclasses with the polymorphic dispatcher.Filter.parse(value)— normalizes either aFilterinstance
(passed through) or aFilterDictinto aFilter.FilterDictTypedDict family at
kaparoo.filesystem.search.filters.types:FilterDict(base,
kind-only),PatternFilterDict,MultiPatternFilterDict,
LogicalChildrenFilterDict,LogicalChildFilterDict. User-defined
filter dicts extend these to type-check againstFilter.parse.Search.run/search_paths/search_files/search_dirs
accept aFilterDictforpart_filterandname_filterin
addition to aFilterinstance.
v0.2.0
Published on PyPI: https://pypi.org/project/kaparoo-python/0.2.0/
uv add kaparoo-python # or: pip install kaparoo-pythonRequires Python 3.14+.
See CHANGELOG.md for the full list of changes.