scRNA-seq Data
scRNA-seq data was generated on the 10x Genomics 3' scRNA-seq Platform (v3.1). For data collection and processing details, see the Cohorts, Experimental Methods, and Data Analysis Methods sections.
Below, we provide labeled and annotated PBMC scRNA-seq data from our Immune Health Atlas. The full set of all ~1.8 million cells is provided, as well subsets based on major cell classes.
Click on the headers below for additional details.
Each file contains sample-level metadata, as well as cell-level cell type labels and QC metrics. The following values are stored in the .obs
section of these .h5ad files as descriptions of observations:
Sample Identifierscohort.cohortGuid:
A Globally Unique Identifier (GUID) of the Cohort the subject enrolled in for our study
subject.subjectGuid:
A GUID for the Subjectsample.sampleKitGuid:
A GUID for the Sample Kit, representing all material collected at a visitspecimen.specimenGuid:
A GUID for the specific aliquot used for the experimentpipeline.fileGuid:
A GUID for the specific analysis pipeline output file used for analysis
Subject Metadatasubject.biologicalSex:
The biological sex of the Subjectsubject.birthYear:
The Birth Year of the Subjectsubject.ageAtFirstDraw:
The Age of the Subject at their first on-study sample collectionsubject.ageGroup:
The Age Group of the Subject (Young Adult or Older Adult)subject.race:
The self-reported Race of the Subjectsubject.ethnicity:
The self-reported Ethnicity of the subjectsubject.cmv:
The CMV Status of the subject, as determined by an HCMV assaysubject.bmi:
The BMI of the Subject
Sample Metadatasample.visitName:
The name of the study visit (i.e. time point)sample.drawDate:
The date of the study visit (Month and Year; e.g. 2021-03)sample.subjectAgeAtDraw:
The age of the Subject in years at the time of sample collection
Process Identifiersbatch_id:
A GUID for the batch of samples processed together (e.g. B039)pool_id:
A GUID for the pool of samples combined for Cell Hashing (e.g. B039-P1)chip_id:
A GUID for the 10x Genomics chip the cells were loaded into (e.g. B039-P1C2)well_id:
A GUID for the 10x Genomics well the cells were loaded into within the chip (e.g. B039-P1C2W4)*barcodes:
A GUID for the individual celloriginal_barcodes:
The original, sequence-based barcode generated by 10x Genomics Cell Ranger softwarecell_name:
A quasi-unique, memorable cell identifier generated using an adjective-adjective-animal structure
*used as the primary cell index in our .h5ad files
Cell QC Metricsn_reads:
Number of reads assigned to the cell barcoden_umis:
Number of Unique Molecular Identifiers (unique molecules) detectedn_genes:
Number of genes with at least 1 UMI detectedtotal_counts_mito:
Total number of reads that were assigned to mitochondrial genespct_counts_mito:
Percent of reads that were assigned to mitochondrial genesdoublet_score:
Doublet score assigned by Scrublet for doublet detection
Cell Labeling ResultsAIFI_L1:
Final broad class cell type label (9 types)AIFI_L1_score:
AIFI_L1 prediction score generated by CellTypistpredicted_AIFI_L1:
Predicted AIFI_L1 type assigned by CellTypistAIFI_L2:
Final mid resolution cell type label (29 types)AIFI_L2_score:
AIFI_L2 prediction score generated by CellTypistpredicted_AIFI_L2:
Predicted AIFI_L2 type assigned by CellTypistAIFI_L3:
Final high resolution cell type label (71 types)AIFI_L3_score:
AIFI_L3 prediction score generated by CellTypistpredicted_AIFI_L3:
Predicted AIFI_L3 type assigned by CellTypist
We are providing our scRNA-seq data in AnnData (.h5ad) format. For more details about AnnData, see the AnnData Documentation Page.
These files contain both normalized high-variance genes and raw count data. Normalized data is the active layer by default. In Python, the raw counts can be accessed using:
adata = adata.raw.to_adata()
Each file provided below contains either the complete Immune Health Atlas of ~1.8 million cells, or a subset of cell types. Sample counts, cell counts, and approximate file sizes are below:
File Name | N Subjects | N Samples | N Cells | File Size |
---|---|---|---|---|
immune_health_atlas_full.h5ad | 108 | 108 | 1,821,725 | 72 GB |
immune_health_atlas_b-plasma.h5ad | 108 | 108 | 160,632 | 2.9 GB |
immune_health_atlas_cd4t-treg-dnt.h5ad | 108 | 108 | 743,615 | 14 GB |
immune_health_atlas_cd8t-gdt-mait.h5ad | 108 | 108 | 406,730 | 8.6 GB |
immune_health_atlas_dc.h5ad | 108 | 108 | 23,287 | 0.8 GB |
immune_health_atlas_mono.h5ad | 108 | 108 | 327,919 | 8.1 GB |
immune_health_atlas_nk-ilc.h5ad | 108 | 108 | 148,605 | 3 GB |
immune_health_atlas_other.h5ad | 108 | 108 | 10,937 | 0.4 GB |
Immune Health Atlas .h5ad Files
File Name | Description | Download Link |
---|---|---|
immune_health_atlas_b-plasma.h5ad | B cells and Plasma cells | |
immune_health_atlas_cd4t-treg-dnt.h5ad | CD4 T cells, Tregs, and DN T cells | |
immune_health_atlas_cd8t-gdt-mait.h5ad | CD8 T cells, gdT cells, and MAIT cells | |
immune_health_atlas_dc.h5ad | Dendritic cells | |
immune_health_atlas_full.h5ad | All cells in the Immune Health Atlas | |
immune_health_atlas_mono.h5ad | Monocytes | |
immune_health_atlas_nk-ilc.h5ad | NK cells and ILCs | |
immune_health_atlas_other.h5ad | Other cell types |
scRNA-seq Batch Controls and QC Reports
As part of our scRNA-seq pipeline, we include a batch control sample as an aliquot of PBMCs derived from a single leukapheresis draw. Here, we provide the scRNA-seq data in .h5 format for use in batch comparisons to identify batch effects.
Immune Health Atlas Batch Controls
File Name | Description | Download Link |
---|---|---|
immune_health_atlas_batch-control-h5.tar | Tar bundle of .h5 files for batch controls | |
immune_health_atlas_batch-reports-html.tar | Tar bundle of .html files for batch QC reports |