scRNA-seq Data
scRNA-seq data was generated on the 10x Genomics 3' scRNA-seq Platform (v3.1). For data collection and processing details, see the Cohorts, Experimental Methods, and Data Analysis Methods sections.
Below, we provide labeled and annotated PBMC scRNA-seq data from our Immune Health Atlas. The full set of all ~1.8 million cells is provided, as well subsets based on major cell classes.
Click on the headers below for additional details.
Each file contains sample-level metadata, as well as cell-level cell type labels and QC metrics. The following values are stored in the .obs section of these .h5ad files as descriptions of observations:
Sample Identifierscohort.cohortGuid: A Globally Unique Identifier (GUID) of the Cohort the subject enrolled in for our study subject.subjectGuid: A GUID for the Subjectsample.sampleKitGuid: A GUID for the Sample Kit, representing all material collected at a visitspecimen.specimenGuid: A GUID for the specific aliquot used for the experimentpipeline.fileGuid: A GUID for the specific analysis pipeline output file used for analysis
Subject Metadatasubject.biologicalSex: The biological sex of the Subjectsubject.birthYear: The Birth Year of the Subjectsubject.ageAtFirstDraw: The Age of the Subject at their first on-study sample collectionsubject.ageGroup: The Age Group of the Subject (Young Adult or Older Adult)subject.race: The self-reported Race of the Subjectsubject.ethnicity: The self-reported Ethnicity of the subjectsubject.cmv: The CMV Status of the subject, as determined by an HCMV assaysubject.bmi: The BMI of the Subject
Sample Metadatasample.visitName: The name of the study visit (i.e. time point)sample.drawYear: The year of the study visit (e.g. 2021)sample.subjectAgeAtDraw: The approximate age of the Subject in years at time of sample collection based on subject.birthYear and sample.drawYear
Process Identifiersbatch_id: A GUID for the batch of samples processed together (e.g. B039)pool_id: A GUID for the pool of samples combined for Cell Hashing (e.g. B039-P1)chip_id: A GUID for the 10x Genomics chip the cells were loaded into (e.g. B039-P1C2)well_id: A GUID for the 10x Genomics well the cells were loaded into within the chip (e.g. B039-P1C2W4)*barcodes: A GUID for the individual celloriginal_barcodes: The original, sequence-based barcode generated by 10x Genomics Cell Ranger softwarecell_name: A quasi-unique, memorable cell identifier generated using an adjective-adjective-animal structure
*used as the primary cell index in our .h5ad files
Cell QC Metricsn_reads: Number of reads assigned to the cell barcoden_umis: Number of Unique Molecular Identifiers (unique molecules) detectedn_genes: Number of genes with at least 1 UMI detectedtotal_counts_mito: Total number of reads that were assigned to mitochondrial genespct_counts_mito: Percent of reads that were assigned to mitochondrial genesdoublet_score: Doublet score assigned by Scrublet for doublet detection
Cell Labeling ResultsAIFI_L1: Final broad class cell type label (9 types)AIFI_L1_score: AIFI_L1 prediction score generated by CellTypistpredicted_AIFI_L1: Predicted AIFI_L1 type assigned by CellTypistAIFI_L2: Final mid resolution cell type label (29 types)AIFI_L2_score: AIFI_L2 prediction score generated by CellTypistpredicted_AIFI_L2: Predicted AIFI_L2 type assigned by CellTypistAIFI_L3: Final high resolution cell type label (71 types)AIFI_L3_score: AIFI_L3 prediction score generated by CellTypistpredicted_AIFI_L3: Predicted AIFI_L3 type assigned by CellTypist
We are providing our scRNA-seq data in AnnData (.h5ad) format. For more details about AnnData, see the AnnData Documentation Page.
These files contain both normalized high-variance genes and raw count data. Normalized data is the active layer by default. In Python, the raw counts can be accessed using:
adata = adata.raw.to_adata()
Each file provided below contains either the complete Immune Health Atlas of ~1.8 million cells, or a subset of cell types. Sample counts, cell counts, and approximate file sizes are below:
| File Name | N Subjects | N Samples | N Cells | File Size |
|---|---|---|---|---|
| immune_health_atlas_full.h5ad | 108 | 108 | 1,821,725 | 40 GB |
| immune_health_atlas_b-plasma.h5ad | 108 | 108 | 160,632 | 3.4 GB |
| immune_health_atlas_cd4t-treg-dnt.h5ad | 108 | 108 | 743,615 | 16 GB |
| immune_health_atlas_cd8t-gdt-mait.h5ad | 108 | 108 | 406,730 | 8.9 GB |
| immune_health_atlas_dc.h5ad | 108 | 108 | 23,287 | 1 GB |
| immune_health_atlas_mono.h5ad | 108 | 108 | 327,919 | 11 GB |
| immune_health_atlas_nk-ilc.h5ad | 108 | 108 | 148,605 | 3.5 GB |
| immune_health_atlas_other.h5ad | 108 | 108 | 10,937 | 0.15 GB |
Human Immune Health Atlas .h5ad Files
| File Name | Description | Download Link |
|---|---|---|
| human_immune_health_atlas_b-plasma.h5ad | B cells and Plasma cells | |
| human_immune_health_atlas_cd4t-treg-dnt.h5ad | CD4 T cells, Tregs, and DN T cells | |
| human_immune_health_atlas_cd8t-gdt-mait.h5ad | CD8 T cells, gdT cells, and MAIT cells | |
| human_immune_health_atlas_dc.h5ad | Dendritic cells | |
| human_immune_health_atlas_full.h5ad | All cells in the Immune Health Atlas | |
| human_immune_health_atlas_mono.h5ad | Monocytes | |
| human_immune_health_atlas_nk-ilc.h5ad | NK cells and ILCs | |
| human_immune_health_atlas_other.h5ad | Other cells |
2025-06-10: .h5ad files were updated to change the sample.drawDate field to a sample.drawYear field to better align with PII standards. Files were reprocessed to ensure raw data storage as unsigned integers and normalized data is stored after using Scanpy's normalize_total function. This reduces the file size of the full dataset (72 GB to 40 GB) and ensures subsets are consistently processed.
scRNA-seq Batch Controls and QC Reports
As part of our scRNA-seq pipeline, we include a batch control sample as an aliquot of PBMCs derived from a single leukapheresis draw. Here, we provide the scRNA-seq data in .h5 format for use in batch comparisons to identify batch effects.
Immune Health Atlas Batch Controls
| File Name | Description | Download Link |
|---|---|---|
| immune_health_atlas_batch-control-h5.tar | Tar bundle of .h5 files for batch controls | |
| immune_health_atlas_batch-reports-html.tar | Tar bundle of .html files for batch QC reports |