fix(ci): repair archs4 + ELM live-data test failures#252
Merged
Conversation
…0 fix)
OpenTargets changed the Drug 'synonyms' and 'tradeNames' fields from
[String!]! to the object type [DrugLabelAndSource!]!, which now requires
a sub-selection. The bare-scalar selection caused every drug query to
fail with HTTP 400.
Request '{ label }' for both fields and flatten the response objects
back to a list of label strings so downstream output stays
backward-compatible (a list of strings).
Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
…ev-drift) ARCHS4's tissue-expression CSV intermittently omits the 'color' column, which made `gget archs4 --which tissue` crash with `KeyError: "['color'] not found in axis"`. The 'color' column is only used for plotting upstream and is dropped (never used) by gget, so a missing column should not be fatal. Use `drop(columns=["color"], errors="ignore")` so the request degrades gracefully when the column is absent. Adds network-free regression tests covering both the present-color and missing-color responses. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
OpenTargets retired the `target.expressions` field (it now returns an empty
list for every gene), so `gget opentargets -r expression` returned nothing.
Baseline expression data moved to the paginated `target.baselineExpression`
field with a new per-biosample data model.
- Repoint the expression query to `baselineExpression(page:{index:0,size:250})
{ rows {...} }` and update rows_path to ["baselineExpression","rows"].
- Output columns change accordingly (per-biosample summary stats: median/min/
q1/q3/max/unit + tissueBiosample/celltypeBiosample ids + datasource/datatype),
because the upstream data model changed and the old shape no longer exists.
- Remove the two now-invalid live exact-match fixtures and replace them with
network-free mocked tests; update docs (example, resource table, updates.md).
Verified live: http_json with the new query returns 1409 rows in ~0.6s and the
parsing pipeline yields the documented columns.
Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
…t (data drifts across releases) OpenTargets is a live database re-released regularly; several opentargets tests pinned exact current values (disease ids/scores, result hashes, interaction partner ids, genotypes) that legitimately change every release, so they failed on unrelated PRs even though gget returns correct current data. Replace the exact-value/hash assertions for test_opentargets, _diseases, _depmap, _depmap_filter, _interactions, _interactions_no_limit and _pharmacogenetics with structural/invariant assertions (expected columns present, numeric dtypes, value-format patterns — ontology-curie disease/tissue ids, ENSG interaction partners, ACH DepMap ids, score in [0,1], nucleotide genotypes — and the depmap filter invariant). The fixture entries are marked `code_defined`; the structural methods live in tests/test_opentargets.py. These stay meaningful (they break on wrong columns, malformed ids, non-numeric scores, broken filtering, or empty-where-guaranteed) without pinning drifting data. Verified live against current OpenTargets data. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## dev #252 +/- ##
==========================================
- Coverage 56.14% 7.98% -48.17%
==========================================
Files 29 29
Lines 9244 9244
==========================================
- Hits 5190 738 -4452
- Misses 4054 8506 +4452 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Contributor
Author
Contributor
Author
…a tests (scverse#249) Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
for more information, see https://pre-commit.ci
scverse#249) Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
for more information, see https://pre-commit.ci
…cate test_opentargets (scverse#249) Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
for more information, see https://pre-commit.ci
…ecks (scverse#249) Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
…ct-snapshot tests (scverse#249) Sort tissue rows by [median desc, id asc] so output is reproducible when medians tie (ARCHS4 returns tied rows in varying order). Revert the live tissue tests to exact assert_equal snapshots (re-sorted to the deterministic order); keep the network-free color regression tests. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
Strip back the opentargets-related changes so this PR is focused on the archs4 + ELM CI-stability fixes only. The opentargets work (synonyms HTTP 400 fix, fixture refresh, expression skip) is being handled in a separate PR (scverse#256), per maintainer preference for one-module-per-PR review. Reverted to origin/dev: - gget/gget_opentargets.py - tests/test_opentargets.py - tests/fixtures/test_opentargets.json Trimmed updates.md: - Removed the opentargets bullet (lives in scverse#256) - Added an archs4 bullet explaining the color-column + deterministic- sort fix (user-visible behavior change, was missing here) Remaining scope: - gget_archs4.py: graceful handling of missing color column, deterministic median-then-id sort - tests/test_archs4.py: TestArchs4MissingColor regression test - tests/fixtures/test_archs4.json: refreshed for the deterministic sort - tests/test_elm.py: retry ELM setup on transient download failure Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
test_tissue_with_color_still_dropped tested the "happy path" that both the old and the new code already handle the same way (column present → column dropped from output). It can't catch any plausible regression of the actual fix (which is the errors="ignore" kwarg, exercised by the sibling test_tissue_missing_color_does_not_crash). Removing it tightens the test suite without weakening the regression guard around the actual bug. _CSV_WITH_COLOR class attribute removed along with it (no other references).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Scoped to the ARCHS4 + ELM CI-stability fixes only. The opentargets parts of this PR have been moved to #256 per maintainer preference for one-module-per-PR review.
ARCHS4 — graceful missing-
colorcolumn + deterministic orderBug: ARCHS4 intermittently omits the
colorcolumn from its tissue-expression CSV response. The previoustissue_exp_df.drop(["color"], axis=1)raisesKeyError: ['color'] not found in axiswhen this happens, sogget archs4 -w tissue ...crashes for users on those requests.Fix:
drop(columns=["color"], errors="ignore")— drops if present, no-op if absent. Thecolorcolumn was never used bygget archs4(only ARCHS4's own plotting needs it).Secondary improvement: sort key changed from
"median"to["median", "id"](descending median, ascending id) so equal-median tissues no longer flip order between requests. Test fixtures refreshed for the deterministic ordering.New regression tests (network-free, mocked):
TestArchs4MissingColor.test_tissue_missing_color_does_not_crash— main regression guard for theerrors="ignore"fixTestArchs4MissingColor.test_tissue_with_color_still_dropped— paired companion confirming the column still gets dropped when presentELM — retry transient setup download
gget_setup(module="elm")downloads ELM database files from elm.eu.org. Cold-cache or transient network blips occasionally fail the download and break test collection (the call happens at module-import time, before any test runs). Wrap in a 3-attempt retry with a 30s sleep between attempts so transient failures don't kill the whole opentargets/ELM suite.Out of scope (moved or punted)
baselineExpressionmigration (resolves gget opentargets -r expression returns empty: OpenTargets retired target.expressions (data moved to baselineExpression) #247): separate PR (or rebased version of fix(opentargets): repoint expression to baselineExpression (upstream retired target.expressions) #248).Test plan
python -m pytest tests/test_archs4.py -vlocally — all pass (including the newTestArchs4MissingColorcases)python -m pytest tests/test_elm.py -vlocally — passes (ELM setup either succeeds first try or after a retry)