CellTypist Labeling Models
To enable you to label your own PBMC datasets, we supply models for the CellTypist cell labeling framework. Below are links for each of the 3 cell type resolutions generated for our Healthy Immune Atlas:
Level 1 (AIFI_L1): 9 broad cell classes
Level 2 (AIFI_L2): 29 intermediate resolution cell types
Level 3 (AIFI_L3): 71 high resolution cell types
Here, we provide models that use the full set of 10x Genomics 3' scRNA-seq features, as well as models for use with data generated using the 10x Genomics Single Cell Gene Expression Flex scRNA-seq system, which uses a subset of the genes available for 3' scRNA-seq.
Immune Health Atlas 10x 3' CellTypist Models
File Name | Description | Download Link |
---|---|---|
ref_pbmc_clean_celltypist_model_AIFI_L1_2024-04-18.pkl | Broad cell class labeling model for CellTypist | |
ref_pbmc_clean_celltypist_model_AIFI_L2_2024-04-19.pkl | Intermediate cell type labeling model for CellTypist | |
ref_pbmc_clean_celltypist_model_AIFI_L3_2024-04-19.pkl | High resolution cell type labeling model for CellTypist |
Immune Health Atlas 10x Flex CellTypist Models
File Name | Description | Download Link |
---|---|---|
ref_pbmc_clean_celltypist_model_flex-features_AIFI_L1_2024-04-18.pkl | ||
ref_pbmc_clean_celltypist_model_flex-features_AIFI_L2_2024-04-19.pkl | ||
ref_pbmc_clean_celltypist_model_flex-features_AIFI_L3_2024-04-19.pkl |
Cell Type Colorset
In addition to our labeling models, we provide the colorsets that we used to generate our scRNA-seq visualizations. These colors were selected in collaboration with Heidi Gustafson, an artist and pigment foraging expert. More information about Heidi's work can be found at https://earlyfutures.com/.
Immune Health Atlas Colorsets
File Name | Description | Download Link |
---|---|---|
AIFI_L1_imm_health_atlas_type_order_colors.csv | ||
AIFI_L2_imm_health_atlas_type_order_colors.csv | ||
AIFI_L3_imm_health_atlas_type_order_colors.csv | ||
AIFI_imm_health_atlas_type_order_colors.csv |
scRNA-seq Data
scRNA-seq data was generated on the 10x Genomics 3' scRNA-seq Platform (v3.1). For data collection and processing details, see the Cohorts, Experimental Methods, and Data Analysis Methods sections.
Below, we provide labeled and annotated PBMC scRNA-seq data from our Immune Health Atlas. The full set of all ~1.8 million cells is provided, as well subsets based on major cell classes.
Click on the headers below for additional details.
Each file contains sample-level metadata, as well as cell-level cell type labels and QC metrics. The following values are stored in the .obs
section of these .h5ad files as descriptions of observations:
Sample Identifierscohort.cohortGuid:
A Globally Unique Identifier (GUID) of the Cohort the subject enrolled in for our study
subject.subjectGuid:
A GUID for the Subjectsample.sampleKitGuid:
A GUID for the Sample Kit, representing all material collected at a visitspecimen.specimenGuid:
A GUID for the specific aliquot used for the experimentpipeline.fileGuid:
A GUID for the specific analysis pipeline output file used for analysis
Subject Metadatasubject.biologicalSex:
The biological sex of the Subjectsubject.birthYear:
The Birth Year of the Subjectsubject.ageAtFirstDraw:
The Age of the Subject at their first on-study sample collectionsubject.ageGroup:
The Age Group of the Subject (Young Adult or Older Adult)subject.race:
The self-reported Race of the Subjectsubject.ethnicity:
The self-reported Ethnicity of the subjectsubject.cmv:
The CMV Status of the subject, as determined by an HCMV assaysubject.bmi:
The BMI of the Subject
Sample Metadatasample.visitName:
The name of the study visit (i.e. time point)sample.drawDate:
The date of the study visit (Month and Year; e.g. 2021-03)sample.subjectAgeAtDraw:
The age of the Subject in years at the time of sample collection
Process Identifiersbatch_id:
A GUID for the batch of samples processed together (e.g. B039)pool_id:
A GUID for the pool of samples combined for Cell Hashing (e.g. B039-P1)chip_id:
A GUID for the 10x Genomics chip the cells were loaded into (e.g. B039-P1C2)well_id:
A GUID for the 10x Genomics well the cells were loaded into within the chip (e.g. B039-P1C2W4)*barcodes:
A GUID for the individual celloriginal_barcodes:
The original, sequence-based barcode generated by 10x Genomics Cell Ranger softwarecell_name:
A quasi-unique, memorable cell identifier generated using an adjective-adjective-animal structure
*used as the primary cell index in our .h5ad files
Cell QC Metricsn_reads:
Number of reads assigned to the cell barcoden_umis:
Number of Unique Molecular Identifiers (unique molecules) detectedn_genes:
Number of genes with at least 1 UMI detectedtotal_counts_mito:
Total number of reads that were assigned to mitochondrial genespct_counts_mito:
Percent of reads that were assigned to mitochondrial genesdoublet_score:
Doublet score assigned by Scrublet for doublet detection
Cell Labeling ResultsAIFI_L1:
Final broad class cell type label (9 types)AIFI_L1_score:
AIFI_L1 prediction score generated by CellTypistpredicted_AIFI_L1:
Predicted AIFI_L1 type assigned by CellTypistAIFI_L2:
Final mid resolution cell type label (29 types)AIFI_L2_score:
AIFI_L2 prediction score generated by CellTypistpredicted_AIFI_L2:
Predicted AIFI_L2 type assigned by CellTypistAIFI_L3:
Final high resolution cell type label (71 types)AIFI_L3_score:
AIFI_L3 prediction score generated by CellTypistpredicted_AIFI_L3:
Predicted AIFI_L3 type assigned by CellTypist
We are providing our scRNA-seq data in AnnData (.h5ad) format. For more details about AnnData, see the AnnData Documentation Page.
These files contain both normalized high-variance genes and raw count data. Normalized data is the active layer by default. In Python, the raw counts can be accessed using:
adata = adata.raw.to_adata()
Each file provided below contains either the complete Immune Health Atlas of ~1.8 million cells, or a subset of cell types. Sample counts, cell counts, and approximate file sizes are below:
File Name | N Subjects | N Samples | N Cells | File Size |
---|---|---|---|---|
immune_health_atlas_full.h5ad | 108 | 108 | 1,821,725 | 72 GB |
immune_health_atlas_b-plasma.h5ad | 108 | 108 | 160,632 | 2.9 GB |
immune_health_atlas_cd4t-treg-dnt.h5ad | 108 | 108 | 743,615 | 14 GB |
immune_health_atlas_cd8t-gdt-mait.h5ad | 108 | 108 | 406,730 | 8.6 GB |
immune_health_atlas_dc.h5ad | 108 | 108 | 23,287 | 0.8 GB |
immune_health_atlas_mono.h5ad | 108 | 108 | 327,919 | 8.1 GB |
immune_health_atlas_nk-ilc.h5ad | 108 | 108 | 148,605 | 3 GB |
immune_health_atlas_other.h5ad | 108 | 108 | 10,937 | 0.4 GB |
Immune Health Atlas .h5ad Files
File Name | Description | Download Link |
---|---|---|
immune_health_atlas_b-plasma.h5ad | B cells and Plasma cells | |
immune_health_atlas_cd4t-treg-dnt.h5ad | CD4 T cells, Tregs, and DN T cells | |
immune_health_atlas_cd8t-gdt-mait.h5ad | CD8 T cells, gdT cells, and MAIT cells | |
immune_health_atlas_dc.h5ad | Dendritic cells | |
immune_health_atlas_full.h5ad | All cells in the Immune Health Atlas | |
immune_health_atlas_mono.h5ad | Monocytes | |
immune_health_atlas_nk-ilc.h5ad | NK cells and ILCs | |
immune_health_atlas_other.h5ad | Other cell types |
scRNA-seq Batch Controls and QC Reports
As part of our scRNA-seq pipeline, we include a batch control sample as an aliquot of PBMCs derived from a single leukapheresis draw. Here, we provide the scRNA-seq data in .h5 format for use in batch comparisons to identify batch effects.
Immune Health Atlas Batch Controls
File Name | Description | Download Link |
---|---|---|
immune_health_atlas_batch-control-h5.tar | Tar bundle of .h5 files for batch controls | |
immune_health_atlas_batch-reports-html.tar | Tar bundle of .html files for batch QC reports |
Clinical Lab Results and Sample Metadata
Along with a blood draw for PBMCs, several clinical lab panels are performed for each subject visit, including Complete Blood Counts (CBCs), blood chemistries, and a cholesterol panel.
Here, we provide a .csv file with subject and sample metadata along with the results of these clinical measure.
Immune Health Atlas Clinical Labs and Metadata
File Name | Description | Download Link |
---|---|---|
immune_health_atlas_metadata_clinical_labs.csv | Subject and sample metadata and clinical lab results |