How Effective is In-Context Learning with Large Language Models for Rare Cell Identification in Single-Cell Expression Data?

Overview

The recent development of single-cell genomics requires more powerful computational tools to differentiate between different phenotypes. Rare cell identification has been one of the most important challenges in this area. Traditional data-driven approaches typically rely on feature selection techniques to identify key genes for anomaly detection, often requiring extensive training data or domain-specific knowledge.

In contrast, large language models (LLMs) have demonstrated strong generalization abilities in various scientific research fields, presenting new opportunities for rare cell identification. This repository accompanies our paper, where we conduct the first comprehensive evaluation of in-context learning with LLMs for rare cell identification. Our approach employs a chain-of-thought prompting strategy, integrating latent space analysis and cross-query comparisons to generate scores for identifying rare cells.

Key Contributions

First evaluation of LLMs for rare cell identification using in-context learning.
Novel prompting strategy combining chain-of-thought reasoning with latent space analysis and cross-query comparisons.
Competitive performance of LLMs compared with traditional optimization-based methods on benchmark datasets.
Minimal dependence on extensive training data or expert-defined feature selection, demonstrating the generalization potential of LLMs in genomics.

Repository Structure

├── data/                  # Benchmark datasets for rare cell identification
├── src/                   # Implementation of our methodology
│   ├── preprocessing.py   # Data preprocessing scripts
│   ├── llm_prompting.py   # Chain-of-thought prompting strategy
│   ├── evaluation.py      # Performance evaluation scripts
├── results/               # Experimental results and analysis
├── README.md              # Project documentation
└── requirements.txt       # Required dependencies

Installation

To set up the environment, clone this repository and install the required dependencies:

$ cd RareCellAgent
$ pip install -r requirements.txt

Usage

1. Data Preprocessing

Prepare the single-cell expression datasets and apply preprocessing:

$ python src/preprocessing.py --input data/raw_data.csv --output data/processed_data.csv

2. Running LLM-Based Rare Cell Identification

Execute the LLM-based rare cell identification pipeline:

$ python src/llm_prompting.py --input data/processed_data.csv --output results/llm_predictions.csv

3. Evaluating Model Performance

Assess the performance of the LLM-based approach against traditional methods:

$ python src/evaluation.py --predictions results/llm_predictions.csv --ground_truth data/labels.csv

Benchmark Datasets

We evaluate our approach on publicly available single-cell expression datasets, including:

Chung
Darmanis
Goolam
Immuno

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py
scCAD.py		scCAD.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How Effective is In-Context Learning with Large Language Models for Rare Cell Identification in Single-Cell Expression Data?

Overview

Key Contributions

Repository Structure

Installation

Usage

1. Data Preprocessing

2. Running LLM-Based Rare Cell Identification

3. Evaluating Model Performance

Benchmark Datasets

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

How Effective is In-Context Learning with Large Language Models for Rare Cell Identification in Single-Cell Expression Data?

Overview

Key Contributions

Repository Structure

Installation

Usage

1. Data Preprocessing

2. Running LLM-Based Rare Cell Identification

3. Evaluating Model Performance

Benchmark Datasets

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages