Paired eye (intraocular fluid) and blood single-cell RNA and adaptive immune receptor (TCR/BCR) analysis comparing viral and non-infectious uveitis (NIU). The pipeline builds a full immune atlas, an eye sub-atlas, and per-compartment sub-atlases (myeloid, B/plasma, T cell), then runs differential expression, differential abundance, cell-cell communication, repertoire, and TCR motif/distance/generation-probability analyses, and renders the manuscript figures.
- Paper: coming soon
- Data archive (Zenodo): https://zenodo.org/records/20783773
- Paley Lab: https://paleylab.com/members/
This repo holds code only. The data needed to re-use or rebuild the analysis (integrated Seurat objects, per-sample QC, raw CellRanger inputs, and the VDJdb reference) is distributed through Zenodo; derived tables and figures are regenerated locally from the objects. See docs/DATA_AVAILABILITY.md for the full Zenodo-to-directory map.
R/ 57 pipeline modules, numbered by execution phase
config/ config.yml (master) and config.run.yml (viz-only re-render)
scripts/ setup and convenience utilities
docs/ data dictionary and data-availability guide
run_pipeline.R sources every module, then runs the config-gated steps
- R 4.4+ on macOS or Linux.
- R packages install on first run via
R/00_setup_packages.R(CRAN + Bioconductor + a few GitHub packages). There is no lock file — see Reproducibility. - TCR generation-probability tools (OLGA, SoNNia, tcrdist3) run through
immLynx, which manages its own Python environment viabasilisk. - BCR lineage reconstruction needs an IMGT germline reference; point
paths$imgt_dirin the config at it. Those steps skip cleanly if absent.
-
Clone this repository.
-
Get the data. Download the Zenodo archive and unpack each part into the matching directory (
outputs/objects/,outputs/tables/,inputs/data/, ...). Follow docs/DATA_AVAILABILITY.md. The empty directories with.gitkeepplaceholders mark where each part goes. -
Run from the repository root (the directory that contains
R/andconfig/):Rscript run_pipeline.R config/config.yml
Every step is gated by a flag under steps: (and steps_fig6:) in
config/config.yml. Set a flag to true to run that step. Most flags ship
false because their outputs are already provided via Zenodo. To re-render
figures only, use config/config.run.yml (viz steps, atlases untouched).
inputs/data/metadata.csv holds one row per sample and points each sample at its
CellRanger output directories (RNA_CRoutput, TCR_CRoutput, BCR_CRoutput).
The de-identified metadata table is published as a supplementary table in the
manuscript — recreate inputs/data/metadata.csv from it and remap the
*_CRoutput paths to your local CellRanger outputs before running ingest. The
full column schema is in docs/data_dictionary.md, and
a header-only example is at
inputs/data/metadata_template.csv.
File-number prefixes follow execution phase:
| Range | Phase |
|---|---|
| 00-03 | Setup, package management, data ingest, AIRR annotation |
| 10-12 | Full-atlas integration, annotation, cluster merge |
| 14-19 | QC and read-only diagnostics (repertoire QC, lens filter, lineage gate/validation, index-hop QC) |
| 20-23 | Eye subset, re-integration, compartment split, substate labels |
| 30-41 | Markers, DGE, escape (ssGSEA), milo DA, composition |
| 45-59 | Compartment PCA, cross-compartment ligand-receptor (LIANA/NicheNet), repertoire, BCR lineage |
| 64-80 | TCR motif / distance / generation-probability analyses |
| 81-93 | Visualization (shared helpers first, then per-target panels) |
run_pipeline.R sources every module in this order, then runs the config-gated
steps grouped by the same phases.
The Zenodo deposit delivers the data needed to re-use or rebuild the analysis; tables and figures are regenerated locally from the objects (see docs/DATA_AVAILABILITY.md).
| Directory | Contents | Source |
|---|---|---|
outputs/objects/ |
Seurat .rds atlases (full, eye/, per-compartment), Milo object, TCR result objects |
Zenodo |
outputs/qc/ |
Per-sample QC reports | Zenodo |
outputs/tables/ |
Analysis CSVs, sub-foldered by target | Regenerated by the analysis steps |
outputs/viz/ |
Figures (PDF + PNG), sub-foldered by target and panel group | Regenerated by the viz steps |
outputs/processed/ |
Per-sample intermediate Seurat objects | Regenerated from raw during a from-scratch run |
- Seed.
seed: 42in the config is applied globally and before each stochastic step (clustering, UCell, bootstrap and permutation tests, rarefaction). - UMAP. Embeddings are not seeded (
uwotwithfast_sgd), so coordinates vary between from-scratch rebuilds. Cluster IDs are stable. The Zenodo objects preserve the published embeddings, and the subset/re-integration steps default tofalse, so a normal run does not overwrite them. - Package versions are not pinned with a lock file.
R/00_setup_packages.Rinstalls current CRAN/Bioconductor/GitHub versions. To capture the exact environment for a run, savesessionInfo()(orsessioninfo::session_info()) to a file and keep it alongside your results.
Run from the repository root. All analysis and figure steps live in the pipeline
modules (R/) and are toggled via the config; these are setup/convenience
utilities only.
refresh_figures.sh— re-render all figures from existing objects/tables viaconfig/config.run.yml(viz steps only, atlases untouched).generate_tcrdist_pw_beta.R— one-time precompute of the tcrdist pairwise beta matrix;run_pipeline.Rsources it automatically if a GLIPH distance step needs it and the matrix is missing.run_vdj_b_realignment.sh— one-time BCR VDJ re-alignment utility.copy_eye_blood_pairs.sh— local CellRanger-output copy helper (edit the mounted source/destination paths for your environment).
- Code: MIT — see LICENSE.
- Data and figures: CC BY 4.0 — see LICENSE-DATA.md.
If you use this code or data, please cite the paper and the Zenodo archive. See CITATION.cff (GitHub renders a "Cite this repository" button once the DOIs are filled in).