Initial Object Generation
We started with .h5 output files from the 10x Genomics Cell Ranger multi v9.1.0-beta pipeline, which separated cells from each 10x well by sample. For processing steps, we merged samples based on their original column position in the 96-well plate. These files were processed through scrublet (Wolock, et al. 2019) for doublet detection, and prediction scores for each cell were added to each corresponding object. The same files were also processed through CellTypist (Domínguez Conde, et al. 2022) for cell labeling using a T cell-specific model generated using the Immunobiology of Aging dataset with labels from the Human Immune Health Atlas (Gong, et al. 2025). Each object was then merged, retaining column information to be processed. The combined object had ~4.2M cells.
Processing and Filtering
We processed the combined single-cell dataset using a standard scanpy (Wolf, et al. 2018) analysis workflow starting from raw gene expression. First, we assessed basic quality metrics and filtered to retain cells with > 200 and < 600 detected genes, and fraction of ribosomal reads ≤50% to remove potential multiplets, empty droplets and low quality cells. We performed dimensionality reduction with PCA and applied Harmony (Korsunsky, et al. 2019) on sample id. We grouped cells with similar expression profiles using leiden clustering and visualized the results with UMAP.
Cell Labeling
Along with celltypist predicted labels we also utilized a tool called CyteType (Ahuja, et al. 2025) to inform cell type annotation. CyteType combines gene expression data with biological and experimental context to generate cell type annotations. We used these annotations to guide cluster identification based on gene detection levels of marker genes to identity 11 T cell subtypes: Naive CD4 T cell, Central Memory (CM) CD4 T cell, Effector Memory (EM) CD4 T cell, Naive Treg, Memory Treg, Naive CD8 T cell, CM CD8 T cell, EM CD8 T cell, CD8aa, MAIT, and gdT.
Differential Gene Expression
We utilized Scanpy’s built in functions to perform differential gene expression analysis. For each drug treatment and unique cell type, cells exposed to the drug were compared against unstimulated DMSO-treated control cells by subsetting the anndata object to include only the relevant treatment (unique cell type + drug) and control groups (DMSO). Differential expression testing was conducted using the Wilcoxon rank-sum test with sc.tl.rank_genes_groups, with DMSO control specified as the reference condition. For each comparison, gene-level statistics (test scores, log fold changes, p-values, and adjusted p-values) are extracted and compiled into individual results tables. CSV files for each cell type + drug vs. control comparison (preserving all genes in the limited panel) were used as input for downstream gene set enrichment analysis with fgsea (Korotkevich, et al. 2021).
Gene Set Enrichment Analysis
Pathway enrichment analysis was performed using fgsea on the differential expression results for each cell type + drug treatment vs DMSO control. Genes were ranked based on the effect size of differential expression (log fold changes). The ranked gene list was used as input to the fgseaMultilevel algorithm to test for enrichment of predefined gene sets. Enrichment result files include normalized enrichment scores, p-values, and adjusted p-values annotated with each cell type, drug, and control.
References
Ahuja G, Antill A, Su Y, Dall’Olio GM, Basnayake S, Karlsson G, et al. Multi-agent AI enables evidence-based cell annotation in single-cell transcriptomics. bioRxiv. 2025. p. 2025.11.06.686964.
doi:10.1101/2025.11.06.686964
Domínguez Conde C, Xu C, Jarvis LB, Rainbow DB, Wells SB, Gomes T, et al. Cross-tissue immune cell analysis reveals tissue-specific features in humans. Science. 2022;376: eabl5197.
doi:10.1126/science.abl5197
Gong Q, Sharma M, Glass MC, Kuan EL, Chander A, Singh M, et al. Multi-omic profiling reveals age-related immune dynamics in healthy adults. Nature. 2025;648: 696–706.
doi:10.1038/s41586-025-09686-5
Korotkevich G, Sukhov V, Budin N, Shpak B, Artyomov MN, Sergushichev A. Fast gene set enrichment analysis. bioRxiv. 2021. p. 060012.
doi:10.1101/060012
Korsunsky I, Millard N, Fan J, Slowikowski K, Zhang F, Wei K, et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods. 2019;16: 1289–1296.
doi:10.1038/s41592-019-0619-0
Wolf FA, Angerer P, Theis FJ. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 2018;19: 15.
doi:10.1186/s13059-017-1382-0
Wolock SL, Lopez R, Klein AM. Scrublet: Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data. Cell Syst. 2019;8: 281-291.e9.
doi:10.1016/j.cels.2018.11.005