scRNA-seq Data

scRNA-seq data was generated on the 10x Genomics 3' scRNA-seq Platform (v3.1). For data collection and processing details, see the Cohorts, Experimental Methods, and Data Analysis Methods sections.

Below, we provide labeled and annotated PBMC scRNA-seq data from our Immune Health Atlas. The full set of all ~1.8 million cells is provided, as well subsets based on major cell classes.

Click on the headers below for additional details.

Each file contains sample-level metadata, as well as cell-level cell type labels and QC metrics. The following values are stored in the .obs section of these .h5ad files as descriptions of observations:

Sample Identifiers
cohort.cohortGuid: A Globally Unique Identifier (GUID) of the Cohort the subject enrolled in for our study subject.subjectGuid: A GUID for the Subject
sample.sampleKitGuid: A GUID for the Sample Kit, representing all material collected at a visit
specimen.specimenGuid: A GUID for the specific aliquot used for the experiment
pipeline.fileGuid: A GUID for the specific analysis pipeline output file used for analysis

Subject Metadata
subject.biologicalSex: The biological sex of the Subject
subject.birthYear: The Birth Year of the Subject
subject.ageAtFirstDraw: The Age of the Subject at their first on-study sample collection
subject.ageGroup: The Age Group of the Subject (Young Adult or Older Adult)
subject.race: The self-reported Race of the Subject
subject.ethnicity: The self-reported Ethnicity of the subject
subject.cmv: The CMV Status of the subject, as determined by an HCMV assay
subject.bmi: The BMI of the Subject

Sample Metadata
sample.visitName: The name of the study visit (i.e. time point)
sample.drawYear: The year of the study visit (e.g. 2021)
sample.subjectAgeAtDraw: The approximate age of the Subject in years at time of sample collection based on subject.birthYear and sample.drawYear

Process Identifiers
batch_id: A GUID for the batch of samples processed together (e.g. B039)
pool_id: A GUID for the pool of samples combined for Cell Hashing (e.g. B039-P1)
chip_id: A GUID for the 10x Genomics chip the cells were loaded into (e.g. B039-P1C2)
well_id: A GUID for the 10x Genomics well the cells were loaded into within the chip (e.g. B039-P1C2W4)
*barcodes: A GUID for the individual cell
original_barcodes: The original, sequence-based barcode generated by 10x Genomics Cell Ranger software
cell_name: A quasi-unique, memorable cell identifier generated using an adjective-adjective-animal structure

*used as the primary cell index in our .h5ad files

Cell QC Metrics
n_reads: Number of reads assigned to the cell barcode
n_umis: Number of Unique Molecular Identifiers (unique molecules) detected
n_genes: Number of genes with at least 1 UMI detected
total_counts_mito: Total number of reads that were assigned to mitochondrial genes
pct_counts_mito: Percent of reads that were assigned to mitochondrial genes
doublet_score: Doublet score assigned by Scrublet for doublet detection

Cell Labeling Results
AIFI_L1: Final broad class cell type label (9 types)
AIFI_L1_score: AIFI_L1 prediction score generated by CellTypist
predicted_AIFI_L1: Predicted AIFI_L1 type assigned by CellTypist
AIFI_L2: Final mid resolution cell type label (29 types)
AIFI_L2_score: AIFI_L2 prediction score generated by CellTypist
predicted_AIFI_L2: Predicted AIFI_L2 type assigned by CellTypist
AIFI_L3: Final high resolution cell type label (71 types)
AIFI_L3_score: AIFI_L3 prediction score generated by CellTypist
predicted_AIFI_L3: Predicted AIFI_L3 type assigned by CellTypist

We are providing our scRNA-seq data in AnnData (.h5ad) format. For more details about AnnData, see the AnnData Documentation Page.

These files contain both normalized high-variance genes and raw count data. Normalized data is the active layer by default. In Python, the raw counts can be accessed using:

adata = adata.raw.to_adata()

Each file provided below contains either the complete Immune Health Atlas of ~1.8 million cells, or a subset of cell types. Sample counts, cell counts, and approximate file sizes are below:

File Name	N Subjects	N Samples	N Cells	File Size
immune_health_atlas_full.h5ad	108	108	1,821,725	40 GB
immune_health_atlas_b-plasma.h5ad	108	108	160,632	3.4 GB
immune_health_atlas_cd4t-treg-dnt.h5ad	108	108	743,615	16 GB
immune_health_atlas_cd8t-gdt-mait.h5ad	108	108	406,730	8.9 GB
immune_health_atlas_dc.h5ad	108	108	23,287	1 GB
immune_health_atlas_mono.h5ad	108	108	327,919	11 GB
immune_health_atlas_nk-ilc.h5ad	108	108	148,605	3.5 GB
immune_health_atlas_other.h5ad	108	108	10,937	0.15 GB

Human Immune Health Atlas .h5ad Files

File Name	Description	Download Link
human_immune_health_atlas_b-plasma.h5ad	B cells and Plasma cells
human_immune_health_atlas_cd4t-treg-dnt.h5ad	CD4 T cells, Tregs, and DN T cells
human_immune_health_atlas_cd8t-gdt-mait.h5ad	CD8 T cells, gdT cells, and MAIT cells
human_immune_health_atlas_dc.h5ad	Dendritic cells
human_immune_health_atlas_full.h5ad	All cells in the Immune Health Atlas
human_immune_health_atlas_mono.h5ad	Monocytes
human_immune_health_atlas_nk-ilc.h5ad	NK cells and ILCs
human_immune_health_atlas_other.h5ad	Other cell types

2025-06-10: .h5ad files were updated to change the sample.drawDate field to a sample.drawYear field to better align with PII standards. Files were reprocessed to ensure raw data storage as unsigned integers and normalized data is stored after using Scanpy's normalize_total function. This reduces the file size of the full dataset (72 GB to 40 GB) and ensures subsets are consistently processed.

scRNA-seq Batch Controls and QC Reports

As part of our scRNA-seq pipeline, we include a batch control sample as an aliquot of PBMCs derived from a single leukapheresis draw. Here, we provide the scRNA-seq data in .h5 format for use in batch comparisons to identify batch effects.

Immune Health Atlas Batch Controls

File Name	Description	Download Link
immune_health_atlas_batch-control-h5.tar	Tar bundle of .h5 files for batch controls
immune_health_atlas_batch-reports-html.tar	Tar bundle of .html files for batch QC reports

scRNA-seq Data

scRNA-seq Data

.h5ad Metadata Descriptions

.h5ad Download Links

Human Immune Health Atlas .h5ad Files

Update Notice

scRNA-seq Batch Controls and QC Reports

Immune Health Atlas Batch Controls