scverse · Elarwei001 · Jun 24, 2026 · Jun 24, 2026
diff --git a/docs/src/SUMMARY.md b/docs/src/SUMMARY.md
@@ -31,6 +31,7 @@
 * [gget search](en/search.md)  
 * [gget setup](en/setup.md)  
 * [gget seq](en/seq.md)  
+* [gget ucsc](en/ucsc.md)  
 * [gget virus](en/virus.md)
 
 ---

diff --git a/docs/src/en/ucsc.md b/docs/src/en/ucsc.md
@@ -0,0 +1,58 @@
+[<kbd> View page source on GitHub </kbd>](https://github.com/scverse/gget/blob/main/docs/src/en/ucsc.md)
+
+> Python arguments are equivalent to long-option arguments (`--arg`), unless otherwise specified. Flags are True/False arguments in Python.  The manual for any gget tool can be called from the command-line using the `-h` `--help` flag.  
+# gget ucsc 🔎
+Fetch [UCSC Genome Browser](https://genome.ucsc.edu/) IDs for a gene or term, similar to `gget search` for Ensembl.  
+`gget ucsc` searches the UCSC Genome Browser for a gene symbol, accession, or free-text term and returns the matching identifiers (e.g. UCSC known gene / transcript IDs) together with their genomic positions, grouped by the track they come from.  
+Return format: JSON (command-line) or data frame/CSV (Python).
+
+**Positional argument**  
+`search_term`  
+Gene symbol, accession, or free-text term to search for, e.g. `BRCA2`.
+
+**Optional arguments**  
+`-g` `--genome`  
+UCSC genome assembly to search, e.g. `hg38`, `hg19`, `mm39`. Default: `hg38`.  
+
+`-t` `--track`  
+Only return matches from tracks whose name contains this (case-insensitive) substring, e.g. `knownGene`. Default: None.  
+
+`-l` `--limit`  
+Maximum number of matches to return. Default: None (all matches).  
+
+`-o` `--out`  
+Path to the file the results will be saved in, e.g. path/to/directory/results.csv (or .json). Default: Standard out.  
+Python: `save=True` will save the output in the current working directory.
+
+**Flags**  
+`-csv` `--csv`  
+Command-line only. Returns results in CSV format.  
+Python: Use `json=True` to return output in JSON format.
+
+`-q` `--quiet`  
+Command-line only. Prevents progress information from being displayed.  
+Python: Use `verbose=False` to prevent progress information from being displayed.  
+
+### Example
+```bash
+gget ucsc BRCA2 --genome hg38 --track knownGene
+```
+```python
+# Python
+gget.ucsc("BRCA2", genome="hg38", track="knownGene")
+```
+&rarr; Returns the UCSC IDs matching the search term, with their genomic positions.
+
+| track | ucsc_id | chrom | start | end | name | description |
+| --- | --- | --- | --- | --- | --- | --- |
+| knownGene | ENST00000380152.8 | chr13 | 32315508 | 32400268 | BRCA2 (ENST00000380152.8) | breast cancer type 2 susceptibility protein |
+| . . . | . . . | . . . | . . . | . . . | . . . | . . . |
+
+A UCSC ID (e.g. a known gene `ucsc_id`) can be inspected on the UCSC gene page, e.g. `https://genome.ucsc.edu/cgi-bin/hgGene?hgg_gene={ucsc_id}&db=hg38`.
+
+# References
+If you use `gget ucsc` in a publication, please cite the following articles:  
+
+- Luebbert, L., & Pachter, L. (2023). Efficient querying of genomic reference databases with gget. Bioinformatics. [https://doi.org/10.1093/bioinformatics/btac836](https://doi.org/10.1093/bioinformatics/btac836)
+
+- Kent WJ, Sugnet CW, Furey TS, et al. (2002). The human genome browser at UCSC. Genome Research. [https://doi.org/10.1101/gr.229102](https://doi.org/10.1101/gr.229102)
diff --git a/docs/src/en/updates.md b/docs/src/en/updates.md
@@ -5,6 +5,7 @@
 #### *gget* officially became part of [*scverse*](https://scverse.org/) on June 9, 2026. 🥳🥳🥳
 
 **Version ≥ 0.30.8** (XXX XX, 2026):  
+- [`gget ucsc`](ucsc.md): **New module** to fetch [UCSC Genome Browser](https://genome.ucsc.edu/) IDs for a gene or term, analogous to `gget search` for Ensembl. Searches the UCSC Genome Browser for a symbol/accession/term and returns the matching identifiers (e.g. UCSC known gene / transcript IDs) with their genomic positions, grouped by track; supports filtering by `genome`, `track`, and `limit`. Available in the Python API and on the command line. Resolves [issue 18](https://github.com/scverse/gget/issues/18).
 - [`gget pdb`](pdb.md): Added support for the PDBx/mmCIF structure format (fixes [issue 178](https://github.com/scverse/gget/issues/178) and [issue 177](https://github.com/scverse/gget/issues/177)).
   - New `resource="mmcif"` option downloads the structure in PDBx/mmCIF format (`.cif`).
   - The default `resource="pdb"` now automatically falls back to PDBx/mmCIF when the legacy PDB file is unavailable (e.g. for large structures), since the legacy PDB format is being phased out by RCSB. A warning is logged and saved files use the correct extension (`.cif`).

diff --git a/gget/__init__.py b/gget/__init__.py
@@ -26,6 +26,7 @@
 from .gget_search import search
 from .gget_seq import seq
 from .gget_setup import setup
+from .gget_ucsc import ucsc
 from .gget_virus import virus
 
 # Mute numexpr threads info

diff --git a/gget/constants.py b/gget/constants.py
@@ -7,6 +7,9 @@
 # strategy avoid hanging indefinitely on slow upstreams.
 DEFAULT_REQUESTS_TIMEOUT = (10, 60)
 
+# UCSC Genome Browser REST API for gget ucsc
+UCSC_API_URL = "https://api.genome.ucsc.edu"
+
 # Ensembl REST API server for gget seq and info
 ENSEMBL_REST_API = "http://rest.ensembl.org/"
 ENSEMBL_FTP_URL = "http://ftp.ensembl.org/pub/"

diff --git a/gget/gget_ucsc.py b/gget/gget_ucsc.py
@@ -0,0 +1,180 @@
+from __future__ import annotations
+
+import html
+import json as json_package
+from typing import Any, Literal, overload
+from urllib.parse import unquote
+
+import pandas as pd
+import requests
+
+from .constants import DEFAULT_REQUESTS_TIMEOUT, UCSC_API_URL
+from .utils import set_up_logger
+
+logger = set_up_logger()
+
+_COLUMNS = [
+    "track",
+    "ucsc_id",
+    "chrom",
+    "start",
+    "end",
+    "name",
+    "description",
+]
+
+
+def _parse_position(position: str | None) -> tuple[str | None, int | None, int | None]:
+    """Parse a UCSC position string 'chr13:32315508-32400268' into (chrom, start, end)."""
+    if not position or ":" not in position:
+        return position, None, None
+    chrom, _, span = position.partition(":")
+    if "-" not in span:
+        return chrom, None, None
+    start_str, _, end_str = span.partition("-")
+    start_str = start_str.replace(",", "").strip()
+    end_str = end_str.replace(",", "").strip()
+    start = int(start_str) if start_str.isdigit() else None
+    end = int(end_str) if end_str.isdigit() else None
+    return chrom, start, end
+
+
+def _match_rows(group: dict[str, Any]) -> list[dict[str, Any]]:
+    """Flatten one UCSC positionMatches track group into rows."""
+    track = group.get("trackName") or group.get("name")
+    group_desc = group.get("description")
+    rows = []
+    for m in group.get("matches", []):
+        chrom, start, end = _parse_position(m.get("position"))
+        ucsc_id = m.get("hgFindMatches")
+        if ucsc_id is not None:
+            ucsc_id = unquote(str(ucsc_id))
+        pos_name = m.get("posName")
+        match_desc = m.get("description") or group_desc
+        rows.append(
+            {
+                "track": track,
+                "ucsc_id": ucsc_id,
+                "chrom": chrom,
+                "start": start,
+                "end": end,
+                "name": html.unescape(pos_name) if isinstance(pos_name, str) else pos_name,
+                "description": html.unescape(match_desc) if isinstance(match_desc, str) else match_desc,
+            }
+        )
+    return rows
+
+
+@overload
+def ucsc(
+    search_term: str,
+    genome: str = "hg38",
+    track: str | None = None,
+    limit: int | None = None,
+    save: bool = False,
+    verbose: bool = True,
+    *,
+    json: Literal[True],
+) -> list[dict[str, Any]] | None: ...
+
+
+@overload
+def ucsc(
+    search_term: str,
+    genome: str = "hg38",
+    track: str | None = None,
+    limit: int | None = None,
+    save: bool = False,
+    verbose: bool = True,
+    json: Literal[False] = False,
+) -> pd.DataFrame | None: ...
+
+
+def ucsc(
+    search_term: str,
+    genome: str = "hg38",
+    track: str | None = None,
+    limit: int | None = None,
+    save: bool = False,
+    verbose: bool = True,
+    json: bool = False,
+) -> pd.DataFrame | list[dict[str, Any]] | None:
+    """Fetch UCSC Genome Browser IDs for a gene/term, similar to gget search.
+
+    Searches the UCSC Genome Browser for a gene symbol, accession, or other term
+    and returns the matching identifiers (e.g. UCSC known gene / transcript IDs)
+    together with their genomic positions, grouped by the track they come from.
+
+    Args:
+     - search_term  Gene symbol, accession, or free-text term to search for, e.g. "BRCA2".
+     - genome       UCSC genome assembly to search, e.g. "hg38", "hg19", "mm39". Default: "hg38".
+     - track        If provided, only return matches from tracks whose name contains
+                    this (case-insensitive) substring, e.g. "knownGene". Default: None.
+     - limit        Maximum number of matches to return. Default: None (all matches).
+     - save         If True, save the results table as csv/json in the working directory. Default: False.
+     - verbose      True/False whether to print progress information. Default: True.
+     - json         If True, returns results in json format instead of data frame. Default: False.
+
+    Returns a data frame (or list of dicts if json=True) with one row per match,
+    including the track, UCSC ID, chromosome, start, end, name, and description.
+    Returns None if no matches are found.
+    """
+    if search_term is None or str(search_term).strip() == "":
+        raise ValueError("Please provide a gene symbol or search term in 'search_term'.")
+
+    term = str(search_term).strip()
+    url = f"{UCSC_API_URL}/search"
+    params = {"search": term, "genome": genome}
+
+    if verbose:
+        logger.info(f"Searching UCSC ({genome}) for '{term}'...")
+
+    try:
+        response = requests.get(
+            url,
+            params=params,
+            headers={"Accept": "application/json"},
+            timeout=DEFAULT_REQUESTS_TIMEOUT,
+        )
+    except requests.exceptions.RequestException as exc:
+        raise RuntimeError(f"The UCSC server request failed: {exc}") from exc
+
+    if not response.ok:
+        raise RuntimeError(
+            f"The UCSC server returned error status code {response.status_code}. Please try again later."
+        )
+
+    data = response.json()
+    if isinstance(data, dict) and data.get("error"):
+        raise ValueError(f"UCSC returned an error: {data['error']}")
+
+    rows = []
+    for group in data.get("positionMatches", []):
+        rows.extend(_match_rows(group))
+
+    # Optional track filter
+    if track is not None:
+        track_lower = str(track).lower()
+        rows = [r for r in rows if r["track"] and track_lower in str(r["track"]).lower()]
+
+    # Optional limit
+    if limit is not None:
+        rows = rows[: int(limit)]
+
+    results_df = pd.DataFrame(rows, columns=_COLUMNS)
+
+    if len(results_df) == 0:
+        logger.warning(f"No UCSC matches found for '{term}' in genome '{genome}'.")
+        return None
+
+    if json:
+        results_dict = json_package.loads(results_df.to_json(orient="records"))
+        if save:
+            with open("gget_ucsc_results.json", "w", encoding="utf-8") as f:
+                json_package.dump(results_dict, f, ensure_ascii=False, indent=4)
+        return results_dict
+
+    if save:
+        results_df.to_csv("gget_ucsc_results.csv", index=False)
+
+    return results_df