/txs/compare: multi-currency transaction summaries & wallet fingerprinting#98
Open
frederik-raphael wants to merge 15 commits into
Open
/txs/compare: multi-currency transaction summaries & wallet fingerprinting#98frederik-raphael wants to merge 15 commits into
/txs/compare: multi-currency transaction summaries & wallet fingerprinting#98frederik-raphael wants to merge 15 commits into
Conversation
Plumb the new fields added by raw_utxo schema migration 2->3 through the
async service layer. TxValue gains per-input sequence; TxUtxo gains
transaction-level version and lock_time. Read sites in io_from_rows and
std_tx_from_row pull them off the row with safe getattr/get fallbacks so
older keyspaces still load.
Internal-only - no REST model changes; FastAPI response_model filtering
keeps /txs/{tx_hash} unchanged.
Implemented signals, lineage, tests, tuned weights. Comparison is now optional --> only asking for summary is now possible.
Extend /txs/compare to handle account chains (ETH/TRX) in summary-only mode (include_analysis=false). The fingerprinting analysis stays UTXO-only; account currencies are rejected only when analysis is requested. Reshape ComparisonSummary to be currency- and asset-aware: - total_value: native base unit (satoshi/wei); sums native transfers only, since token transfers carry no native-unit amount - total_value_usd: USD fiat summed across all transfers (incl. tokens), so it is comparable across assets; partial-summed and flagged when some txs lack a rate - total_fee: native unit (gas is always native) - total_inputs/total_outputs: now optional, null for account chains - notes: flags excluded token transfers and partial/unavailable USD totals Renames total_output_sat -> total_value. A missing tx hash (including a hash from another chain, absent from the queried keyspace) aborts the whole comparison with TransactionNotFoundException rather than yielding a partial summary. Regenerates the Python client for the new ComparisonSummary schema.
A repeated hash was fetched twice, double-counted in the summary, and trivially compared as linked to itself. Dedup the hash list (order-preserving) up front and require 2+ distinct hashes, returning 400 otherwise.
The account summary fetched only the base/native tx per hash (get_tx returns the first trace), so token transfers (e.g. USDT) never reached build_summary: their USD was missing from total_value_usd and the token-exclusion note never fired. For account chains in summary-only mode, fetch the full asset-flow set per hash (get_asset_flows_within_tx: base + token legs) so build_summary folds token USD into total_value_usd and flags the excluded token transfers. Adds a regression test that feeds the orchestration a token leg and asserts the token USD is summed and the note is emitted.
get_asset_flows_within_tx(include_internal_txs=False) synthesized the base leg from trace[0]. ETH writes a synthetic outermost trace for every tx so this worked, but TRX only emits trace rows for internal contract calls -- plain native transfers and most TRC20 transfers have none. The compare endpoint therefore 404'd on those with "Found no traces in tx X:None", which made any multi-asset TRX summary that mixed a native or WTRX leg unusable. Delegate the base-leg lookup to get_tx, which already encodes the ETH-vs-TRX split (trace[0] for ETH, transaction row for TRX). Switch get_tx's branch from `currency == "eth"` to `currency_to_schema_type[currency] == "account"` so the rule is tied to a config invariant rather than a literal name. Adds a regression test that wires db.fetch_transaction_trace to raise if hit and asserts the TRX base leg comes back from the tx row.
The fingerprinting analysis was advertised as UTXO-only, but several signals (wasabi/whirlpool/joinmarket coinjoin detection, exchange-input overlap, change-address heuristics) are tuned to BTC. BCH/LTC have only the change heuristics wired up; ZEC has none. Running the analysis on those chains passed the gate but produced degraded, partially-empty fingerprints rather than honest "not supported" feedback. Reject include_analysis=true for any non-BTC currency with a 400. The summary-only mode (include_analysis=false) continues to work for every supported chain. Updates the endpoint description, the 400 description, and the docstring to match. Expands the existing eth/trx rejection tests to also cover bch/ltc/zec. Regenerates the Python client to pick up the new endpoint description.
…uery _fetch_input_address_exchange_flags paid get_best_cluster_tag per unique input cluster (through get_tag_summaries_by_subject_ids with include_best_cluster_tag=true). The underlying tagstore query ranks all cluster-definer tags by confidence and hydrates the winner -- cost scales with per-cluster tag count, so any tx with an input in a huge cluster (e.g. a major exchange, ~23M addresses) added ~2 s. Latency on the full tier was strongly bimodal: each compare call landed in either a ~150 ms fast path or a ~2.5-3.5 s cliff path depending on whether any input belonged to a heavy cluster. The exchange flag is consumed only as a boolean (broad_category == "exchange") to drive the signal_exchange_input_overlap demoting qualifier; we never use the ranked best tag itself. Replace the digest path with a focused existence check: - _get_clusters_with_concept_stmt: correlated EXISTS subquery driven by unnest(cluster_ids) so Postgres short-circuits at the first matching tag per cluster. Cost is bounded by len(cluster_ids), not by per-cluster tag count. - Tagstore.get_clusters_with_concept: thin wrapper returning the subset of input cluster_ids that match. - TagsService.which_clusters_have_concept: service-layer wrapper. - _fetch_input_address_exchange_flags now consumes addr_to_cluster from the caller instead of re-resolving clusters, and uses the new path. compare_txs chains the exchange-flag fetch after the parallel cluster/parent-ref resolution; the lost parallelism costs tens of ms vs the seconds saved. Semantic shift: the previous check fired only when the weighted-most- common concept across an address's tags was "exchange"; the new one fires when any cluster-definer tag on the cluster carries the concept. More inclusive, which is the right direction for the demoting qualifier: if there is meaningful evidence the cluster is an exchange, the shared-cluster linkage evidence should be weakened. Measured impact (scripts/fingerprint_perf.py, 30 hash-sets x 3 calls per N, tier=full, fresh seed): N before (median/p95 ms) after (median/p95 ms) 2 141 / 2228 96 / 157 5 182 / 2859 114 / 153 10 3361 / 3554 131 / 280 (~26x at median) 20 (not in prior sweep) 158 / 279 50 (not in prior sweep) 261 / 560 100 (not in prior sweep) 458 / 697 Per-set bimodal split (set median >= 1 s): before 3/9/19 of 30 sets at N=2/5/10; after 0 of 30 at every tested N (one outlier at N=50 peaked at 1.27 s, consistent with ordinary tail noise rather than a cluster-size cliff). The latency distribution is now unimodal and the cliff is gone. Adds 5 unit tests in tests/db/test_comparison_service.py covering: no tags_service, no addresses, exchange/non-exchange mix, unresolved clusters (-1), cluster-id dedup.
/txs/compare: multi-currency transaction summaries & wallet fingerprinting/txs/compare: multi-currency transaction summaries & wallet fingerprinting
soad003
requested changes
May 29, 2026
Member
There was a problem hiding this comment.
Here some high level notes:
- the summary fiat currency should be configurable
- maybe consider using a list of features to activate not a bool per feature (to be consistent with the tx_heuristics
- Scope: since we talked about introducing a details view for multi select i would maybe consider extending the scope of the endpoint e.g. moving it from the /txs/ path to something more generic, like /{network}/subgraph/ (not sure), such that we can in the longer run push subgraphs or just a list of txs and addresses to the endpoint and get summary to display in the ui including the tx compare signals. Lets discuss next week. New suggestion for naming /{network}/graph_summary
Swap the four boolean query params (include_details,
include_characteristics, include_signals, include_analysis) on
GET /txs/compare for a single include list, mirroring the
include_heuristics pattern: include=characteristics|details|signals|
lineage|verdict, or include=all. Defaults to characteristics, signals,
lineage and verdict (details excluded). Signals, lineage and verdict are
always computed internally (the verdict depends on the signals); the
list only controls what is returned.
Drop the summary-only mode: the summary and include_analysis are removed
from the compare response, so compare is now analysis-only and therefore
BTC-only. Chain-agnostic aggregate stats move to a forthcoming
POST /{currency}/subgraph/summary endpoint. Regenerate the python client.
Add a chain-agnostic aggregate-stats endpoint over a set of transactions,
relocating the summary that /txs/compare used to return. The POST body
{ txs, addresses } defines the subgraph; the node set must hold 2-100
distinct nodes. addresses is reserved for a future extension and rejected
(400) for now, so the field name is locked in the API contract. Unlike
/txs/compare the summary works for every supported chain: UTXO header
aggregation, and account asset-flow USD folding (token legs included).
Move build_summary (+ _usd_fiat) out of comparison_service into the new
subgraph DB service; rename ComparisonSummaryInternal ->
SubgraphSummaryInternal and drop the now-orphaned ComparisonSummary API
model. Share tx-model test builders via tests/db/helpers.py.
Regenerate the python client (new SubgraphApi).
Add a fiat_currency parameter (Literal["usd","eur"], default usd) to the
POST /{currency}/subgraph/summary request body. Rename the response field
total_value_usd -> total_value_fiat and add a fiat_currency field echoing
which currency the total is in, so the name no longer hard-codes USD.
build_summary/_fiat now look the rate up by the requested code and word the
notes accordingly; the value is summed from each tx's fiat_values (usd/eur
are the only rates GraphSense stores). Regenerate the python client
Restructure the subgraph/summary response from a flat tx summary into an
envelope { currency, txs, addresses }. The tx aggregates move under a new
SubgraphTxSummary block (currency lifted to the top level); addresses is a
reserved, null-until-implemented slot so adding per-address stats later is
not a breaking change. Regenerate the python client.
Contributor
Author
|
Quick overview of the changes:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TL;DR
A new read-only endpoint,
GET /{currency}/txs/compare, takes 2 to 100transaction hashes and ships two features behind one call:
Transaction summary (all chains) — an aggregate rollup over the whole
set: total value in native units and in USD, total fee, tx count, and the
block/timestamp range (plus input/output counts for UTXO). Works for UTXO
and account chains (ETH/TRX). Because it sums USD across every
transfer (incl. tokens), mixed-asset sets stay comparable. This answers
"tell me about this set of txs" and is always returned.
Wallet fingerprinting (UTXO-only, opt-in) — answers are these
transactions likely produced by the same actor? by extracting per-tx
fingerprint characteristics, running pairwise signals, and rolling them
into a single relation verdict (
linked...unlinked) with confidenceand notes. On-chain spending links between the compared txs are returned as
lineage edges.
The two are gated by
include_analysis. Withinclude_analysis=falsetheendpoint skips all the expensive cluster/spending/exchange lookups and returns
only the
summary— the lightweight, multi-currency path. Withinclude_analysis=true(UTXO only) it additionally runs the fullfingerprinting analysis.
Currency support. The summary is multi-currency (UTXO + ETH/TRX). The full
fingerprinting analysis (signals, lineage, verdict) is UTXO-only; requesting
the analysis on an account chain returns a 400.
Feature 1 — transaction summary (all chains)
total_valuetotal_value_usdnoteswhen a rate is missingtotal_feetotal_inputs/total_outputstx_count, block/timestamp rangenotestotal_value, partial USD totals)Feature 2 — wallet fingerprinting (UTXO-only, opt-in)
linkedtounlinked, plus confidence + notesoutput_spent_by_inputedges between the compared txs (with output/input indices)The scoring spec (signal weights and the verdict decision tree) is maintained
internally as the single source of truth for the rules implemented here.
Request / response shape
Response (
TransactionComparison):txs[]: per-tx items, each with optionalcharacteristicsanddetailssignals[]: the pairwise signal table (omitted body wheninclude_signals=false)lineage[]:output_spent_by_inputedges between compared txssummary: aggregate stats, always present and currency-aware:total_value/total_feein the chain's base unit (satoshi for UTXO,wei/sun for account).
total_valuesums native transfers only, sincetoken transfers carry no native-unit amount.
total_value_usdsums USD fiat across all transfers (incl. tokens), soit is comparable across assets; partial-summed and flagged in
noteswhensome txs lack a rate.
total_inputs/total_outputsare UTXO-only, omitted for account chains.notesflags caveats (token transfers excluded fromtotal_value, partialUSD totals).
tx_countand the block/timestamp range.verdict: the relation/confidence rollup, omitted wheninclude_analysis=falseIn the docs,
/txs/compareis intentionally ordered after the regular/txs/{tx_hash}endpoints (it has to be registered before them so Starlettedoes not match
compareas atx_hash; the order is corrected in the OpenAPIpost-processing).
How it works
The request flows through the standard four-layer REST pipeline:
web/routes/txs.py) declares the query params and delegates.web/service/txs_service.py) calls the DB-layer engine andruns the translator.
db/asynchronous/services/comparison_service.py) is theengine: fetch txs, extract characteristics, compute signals, aggregate the
verdict, build lineage.
web/translators.py) maps the internal models to the slimAPI models.
The summary (always computed, all chains)
build_summaryis currency-aware and runs for every request regardless ofinclude_analysis. For account txs (ETH/TRX) it sums the flat transfervalue, takesfeestraight off each tx, and leaves the UTXO input/outputcounts unset (no UTXO IO is fetched). UTXO txs keep the input/output
decomposition (value = summed outputs, fee = inputs − outputs on non-coinbase
txs).
total_value_usdis summed across all transfers incl. tokens via therates service, so mixed-asset sets stay comparable; it is partial-summed and
flagged in
noteswhen some txs lack a rate.A missing tx hash — including a hash from another chain that is absent from the
queried keyspace — aborts the whole comparison with a 404
(
TransactionNotFoundException) rather than returning a partial summary.The fingerprinting analysis (UTXO-only,
include_analysis=true)The four stages below run only when the analysis is requested.
Stage 1: per-tx characteristics
extract_characteristics(tx)produces oneTxCharacteristicsInternalper tx(pure, no extra DB calls): script types, witness presence (ground-truth
has_witnesspreferred, script-type inference as fallback), tx version, RBFsignaling, locktime classification, BIP69 output ordering, and the coinjoin
flag.
Stage 2: signals
Each
signal_*function takes the whole list of characteristics andreturns one
ComparisonSignalInternalwhoseverdictis the comparisonresult and whose
per_txcolumn shows each tx's value. Signals come in threekinds:
script_typetx_versionrbflocktime_patternwitness_presentoutput_count_shapebip69_outputs_sortedshared_clusterdirect_input_overlapchange_chaincommon_ancestorutxo_linkageexchange_input_overlapDiscriminators and scores contribute to a weighted mismatch/match sum; linkage
signals are categorical gates (counted, not weighted).
exchange_input_overlapis a demoting qualifier: shared exchange-tagged inputs weaken cluster overlap
as evidence.
Stage 3: verdict
aggregate_verdict(...)selects one of seven tiers from(linkage gates, weighted mismatch sum, weighted match sum, cluster verdict):cluster_verdict(same/different/unknown) is the single source oftruth for the
shared_clustergate. Thresholds on the weighted mismatch sum:<= -60reacheslikely_unlinked,< -30reachespotential_unlink.Stage 4: lineage
output_spent_by_inputedges are built from theget_spending_txsreferencesalready fetched during orchestration, restricted to pairs within the compared
set. Each edge carries
from_idx/to_idx(positions in the compared list) andout_index/in_index(the spent output and the spending input on those txs).Skipping the analysis (
include_analysis=false)include_analysis=falseshort-circuits the engine: it skips_fetch_input_address_clusters,_fetch_parent_refs, and_fetch_input_address_exchange_flags, computes no signals/verdict/lineage, andreturns only the
summary. Ifinclude_characteristics=falseas well, theper-tx fetch drops IO and heuristics entirely (header-only), which is the
cheapest path. This is also the only mode allowed for account chains (ETH/TRX).
Worked examples
All hashes below are real, public chain data (ORBITAAL-tagged entities and a
public Kraken exchange address); none are from any live investigation. The two
BTC examples embed captured responses from the live backend; the ETH summary
is illustrative (see the note under that example).
Different actors, correctly separated (BTC)
Two txs from unrelated tagged entities, from the cross-entity benchmark:
1f8f3416e06984f5d4470dad4e637ab55caeb421dc6f04a7e75ede2c5f8779aa(HappyCoins.com, exchange)ef5d192f3cc3e2a746671f3402707d5baacaa393408009544a7df05aef31ba98(AlphaBayMarket, darknet market)Different input clusters, and the
script_type(-80) andtx_version(-30)discriminators contradict, so the weighted score is -118 and the verdict is
unlinked.Captured response (verdict + signals)
{ "verdict": { "relation": "unlinked", "confidence": 95, "cluster_verdict": "different", "discriminator_hits": ["script_type", "tx_version"], "score_total": -118.0, "notes": ["Cluster splits these txs and discriminators contradict — strong evidence of separate actors."] }, "signals": [ {"name": "script_type", "kind": "discriminator", "verdict": "mismatch", "weight": -80, "per_tx": ["P2PKH,P2SH", "P2PKH"]}, {"name": "witness_present", "kind": "score", "verdict": "mismatch", "weight": -20, "per_tx": ["true", "false"]}, {"name": "tx_version", "kind": "discriminator", "verdict": "mismatch", "weight": -30, "per_tx": ["v2", "v1"]}, {"name": "rbf", "kind": "discriminator", "verdict": "match", "weight": 3, "per_tx": ["final", "final"]}, {"name": "locktime_pattern", "kind": "discriminator", "verdict": "match", "weight": 4, "per_tx": ["anti_sniping", "anti_sniping"]}, {"name": "bip69_outputs_sorted", "kind": "score", "verdict": "match", "weight": 2, "per_tx": ["sorted", "sorted"]}, {"name": "output_count_shape", "kind": "score", "verdict": "match", "weight": 3, "per_tx": ["pay_plus_change", "pay_plus_change"]}, {"name": "shared_cluster", "kind": "linkage", "verdict": "mismatch", "weight": 0, "per_tx": ["5963948", "92291698"]}, {"name": "exchange_input_overlap","kind": "linkage", "verdict": "mismatch", "weight": 0, "per_tx": ["exchange", "non_exchange"]}, {"name": "direct_input_overlap", "kind": "linkage", "verdict": "mismatch", "weight": 0, "per_tx": [null, null]}, {"name": "change_chain", "kind": "linkage", "verdict": "mismatch", "weight": 0, "per_tx": [null, null]}, {"name": "common_ancestor", "kind": "linkage", "verdict": "mismatch", "weight": 0, "per_tx": [null, null]}, {"name": "utxo_linkage", "kind": "linkage", "verdict": "mismatch", "weight": 0, "per_tx": [null, null]} ] }Same actor, correctly linked (BTC)
Two txs that both spend from the same input address
1EVvJ9uKhHPtWodk1QFvESmLUeB2RimzfA(cluster34430663), so they aregenuinely the same actor:
121b74f14630ed865899f07bb7d53bb48510604d9a68cd7671edc54600570567056d557b39ff137935c6051aa1807efd55188b549ee00e62c1483bc7fba231d3Same input cluster (
shared_clusteranddirect_input_overlapboth match) andevery discriminator agrees, so the weighted score is +25 and the verdict is
linked.Captured response (verdict + signals)
{ "verdict": { "relation": "linked", "confidence": 96, "cluster_verdict": "same", "discriminator_hits": [], "score_total": 25.0, "notes": ["All compared txs share at least one input cluster."] }, "signals": [ {"name": "script_type", "kind": "discriminator", "verdict": "match", "weight": 5, "per_tx": ["P2PKH", "P2PKH"]}, {"name": "witness_present", "kind": "score", "verdict": "match", "weight": 3, "per_tx": ["false", "false"]}, {"name": "tx_version", "kind": "discriminator", "verdict": "match", "weight": 5, "per_tx": ["v1", "v1"]}, {"name": "rbf", "kind": "discriminator", "verdict": "match", "weight": 3, "per_tx": ["final", "final"]}, {"name": "locktime_pattern", "kind": "discriminator", "verdict": "match", "weight": 4, "per_tx": ["anti_sniping", "anti_sniping"]}, {"name": "bip69_outputs_sorted", "kind": "score", "verdict": "match", "weight": 2, "per_tx": ["sorted", "sorted"]}, {"name": "output_count_shape", "kind": "score", "verdict": "match", "weight": 3, "per_tx": ["pay_plus_change", "pay_plus_change"]}, {"name": "shared_cluster", "kind": "linkage", "verdict": "match", "weight": 0, "per_tx": ["34430663", "34430663"]}, {"name": "exchange_input_overlap","kind": "linkage", "verdict": "mismatch", "weight": 0, "per_tx": ["non_exchange", "non_exchange"]}, {"name": "direct_input_overlap", "kind": "linkage", "verdict": "match", "weight": 0, "per_tx": ["1EVvJ9uKhHPtWodk1QFvESmLUeB2RimzfA", "1EVvJ9uKhHPtWodk1QFvESmLUeB2RimzfA"]}, {"name": "change_chain", "kind": "linkage", "verdict": "mismatch", "weight": 0, "per_tx": [null, null]}, {"name": "common_ancestor", "kind": "linkage", "verdict": "mismatch", "weight": 0, "per_tx": [null, null]}, {"name": "utxo_linkage", "kind": "linkage", "verdict": "mismatch", "weight": 0, "per_tx": [null, null]} ] }Multi-asset summary (ETH, summary-only)
Four ETH transactions, one of them a USDT token transfer, in summary-only mode
(account chains do not run the fingerprinting analysis):
The USDT transfer adds 0 to
total_value(it moves no native ETH), but itsfiat value still counts toward
total_value_usd, keeping that figure comparableacross native and token activity;
notesrecords the exclusion. UTXO-onlyfields (
total_inputs/total_outputs) and any unset native-unit fields(
total_feewhen no rate was applied) are omitted on account chains.tx_countcounts every aggregated leg, so 4 input hashes become 5 legs (4native bases + 1 USDT token transfer).
Summary response
{ "summary": { "tx_count": 5, "currency": "eth", "total_value": 17811110000000000000000, "total_value_usd": 22365601.2, "block_min": 15954101, "block_max": 25142863, "timestamp_min": 1668258119, "timestamp_max": 1779357383, "notes": [ "total_value covers native transfers only; 1 token transfer(s) excluded (their value is in total_value_usd)" ] } }Benchmarks
Same-cluster consistency (should be
linked)Spends sampled across known small clusters (50-200 addresses); every pair
should read as linked (2026-05-26, k=20 spends/cluster, 8 workers).
linked+likely_linkedinconclusive/likely_unlinked/unlinkedEvery evaluated same-cluster pair reads
linkedorlikely_linked, withzero false negatives across 2,142 pairs spanning 18 clusters. The minimum
verdict on any pair was
likely_linked— neverinconclusive.Cross-entity separation (should be
unlinked)Pairs an address from entity A with one from entity B (A != B) — known
negatives (e.g. Binance vs Kraken), so anything more positive than
likely_unlinkedis a false positive (2026-05-21; 100 pairs, 30 entities,5 addrs/entity, seed 42, 0 errors).
verdict.relationunlinkedlikely_unlinkedlikely_linked(false positive)All 100 known-negative pairs land on the correct negative side, with zero
false-positive
likely_linked.Performance
Measured against a live instance (May 2026). Sweeps draw random tx hashes
from random blocks, group them into
N-sizedhash-sets, and call
/{currency}/txs/comparerepeatedly. The fourinclude_tiers are reused across the same hash-sets so the tiers are comparable on
identical inputs.
summarycharacteristicsfullfull_detailsfullplus embedded per-tx detailsBTC —
summaryandcharacteristicstiers (median / mean / p95ms, 100 calls per cell)summarycharacteristicssummaryandcharacteristicsscale near-linearly with N up to N=100. Thefull N-sweep ran end-to-end with 0 errors.
BTC —
fulltier, before and after the cliff fix (median / mean / p95 / maxms)The original
full-tier implementation paidget_best_cluster_tagper inputaddress, which scales with cluster size: any compared tx whose input belonged
to a huge cluster (e.g. a major exchange, ~23M addresses) added ~2 s. Latency
was strongly bimodal rather than long-tailed: each compare call landed in
either a ~150 ms fast path or a ~2.5-3.5 s cliff path. This PR replaces the
expensive ranked best-cluster-tag digest with a cheap correlated
EXISTSquery (
get_clusters_with_concept→which_clusters_have_concept,src/graphsenselib/tagstore/db/queries.py), making the cost independent of
per-cluster tag count. See cliff fix below.
Same sweep shape (30 distinct hash-sets × 3 calls per N, fresh random seed,
540 calls total, 0 errors):
(
maxrows in the after column are the entire 90-call distribution's max,not just p95.)
full_details≈full(adds <1% — per-tx details arealready-fetched data attached at the response layer).
The cliff fix — bimodal distribution gone
Per-set medians, threshold 1000 ms ("slow"):
The bimodal distribution is gone. The one slow N=50 set in the after-run
peaked at 1.3 s (vs the old cliff plateau at 3-5 s) and is consistent with
ordinary tail noise, not a cluster-size cliff. p95 and max now sit close to
the median across all N — the latency distribution is unimodal.
Summary tier across chains (
median / mean / p95ms, 0 errors)BTC < TRX < ETH at the summary tier. ETH is ~3-4× slower than TRX and ~4-5×
slower than BTC — the overhead is the account-chain trace fetch and
multi-asset USD aggregation, not fingerprinting itself (analysis is off here).
Notes for reviewers
db/.../modelsand APIweb/models/compare.py) plus the translator, per the project's dual-modelconvention.
web/app.pyis extended with a paths reorderfor
/txs/compare; the snake_case/union/example post-processing is preserved.make run-codegen) and committedalongside this change for the codegen pre-commit hook to pass. (The generator
now runs as the host user via
--user, so it no longer leaves root-ownedfiles.)
build_summaryandComparisonSummaryweregeneralized beyond BTC:
total_output_satbecametotal_value(native,native-transfers only);
total_value_usd(USD across all transfers incl.tokens) and
total_feewere added;total_inputs/total_outputsare nowoptional (UTXO-only); and a
noteslist flags caveats. A missing orcross-chain tx hash aborts with 404 (no partial summary). This is what enables
summary-only support for account chains.