CellTypist Labeling Models

To enable you to label your own PBMC datasets, we supply models for the CellTypist cell labeling framework. Below are links for each of the 3 cell type resolutions generated for our Healthy Immune Atlas:

Level 1 (AIFI_L1): 9 broad cell classes
Level 2 (AIFI_L2): 29 intermediate resolution cell types
Level 3 (AIFI_L3): 71 high resolution cell types

Here, we provide models that use the full set of 10x Genomics 3' scRNA-seq features, as well as models for use with data generated using the 10x Genomics Single Cell Gene Expression Flex scRNA-seq system, which uses a subset of the genes available for 3' scRNA-seq.

Immune Health Atlas 10x 3' CellTypist Models
File NameDescriptionDownload Link
ref_pbmc_clean_celltypist_model_AIFI_L1_2024-04-18.pkl Broad cell class labeling model for CellTypist
ref_pbmc_clean_celltypist_model_AIFI_L2_2024-04-19.pkl Intermediate cell type labeling model for CellTypist
ref_pbmc_clean_celltypist_model_AIFI_L3_2024-04-19.pkl High resolution cell type labeling model for CellTypist
Immune Health Atlas 10x Flex CellTypist Models
File NameDescriptionDownload Link
ref_pbmc_clean_celltypist_model_flex-features_AIFI_L1_2024-04-18.pkl
ref_pbmc_clean_celltypist_model_flex-features_AIFI_L2_2024-04-19.pkl
ref_pbmc_clean_celltypist_model_flex-features_AIFI_L3_2024-04-19.pkl

Cell Type Colorset

In addition to our labeling models, we provide the colorsets that we used to generate our scRNA-seq visualizations. These colors were selected in collaboration with Heidi Gustafson, an artist and pigment foraging expert. More information about Heidi's work can be found at https://earlyfutures.com/.

Immune Health Atlas Colorsets
File NameDescriptionDownload Link
AIFI_L1_imm_health_atlas_type_order_colors.csv
AIFI_L2_imm_health_atlas_type_order_colors.csv
AIFI_L3_imm_health_atlas_type_order_colors.csv
AIFI_imm_health_atlas_type_order_colors.csv

scRNA-seq Data

scRNA-seq data was generated on the 10x Genomics 3' scRNA-seq Platform (v3.1). For data collection and processing details, see the Cohorts, Experimental Methods, and Data Analysis Methods sections.

Below, we provide labeled and annotated PBMC scRNA-seq data from our Immune Health Atlas. The full set of all ~1.8 million cells is provided, as well subsets based on major cell classes.

Click on the headers below for additional details.

Each file contains sample-level metadata, as well as cell-level cell type labels and QC metrics. The following values are stored in the .obs section of these .h5ad files as descriptions of observations:

Sample Identifiers
cohort.cohortGuid: A Globally Unique Identifier (GUID) of the Cohort the subject enrolled in for our study subject.subjectGuid: A GUID for the Subject
sample.sampleKitGuid: A GUID for the Sample Kit, representing all material collected at a visit
specimen.specimenGuid: A GUID for the specific aliquot used for the experiment
pipeline.fileGuid: A GUID for the specific analysis pipeline output file used for analysis

Subject Metadata
subject.biologicalSex: The biological sex of the Subject
subject.birthYear: The Birth Year of the Subject
subject.ageAtFirstDraw: The Age of the Subject at their first on-study sample collection
subject.ageGroup: The Age Group of the Subject (Young Adult or Older Adult)
subject.race: The self-reported Race of the Subject
subject.ethnicity: The self-reported Ethnicity of the subject
subject.cmv: The CMV Status of the subject, as determined by an HCMV assay
subject.bmi: The BMI of the Subject

Sample Metadata
sample.visitName: The name of the study visit (i.e. time point)
sample.drawDate: The date of the study visit (Month and Year; e.g. 2021-03)
sample.subjectAgeAtDraw: The age of the Subject in years at the time of sample collection

Process Identifiers
batch_id: A GUID for the batch of samples processed together (e.g. B039)
pool_id: A GUID for the pool of samples combined for Cell Hashing (e.g. B039-P1)
chip_id: A GUID for the 10x Genomics chip the cells were loaded into (e.g. B039-P1C2)
well_id: A GUID for the 10x Genomics well the cells were loaded into within the chip (e.g. B039-P1C2W4)
*barcodes: A GUID for the individual cell
original_barcodes: The original, sequence-based barcode generated by 10x Genomics Cell Ranger software
cell_name: A quasi-unique, memorable cell identifier generated using an adjective-adjective-animal structure

*used as the primary cell index in our .h5ad files

Cell QC Metrics
n_reads: Number of reads assigned to the cell barcode
n_umis: Number of Unique Molecular Identifiers (unique molecules) detected
n_genes: Number of genes with at least 1 UMI detected
total_counts_mito: Total number of reads that were assigned to mitochondrial genes
pct_counts_mito: Percent of reads that were assigned to mitochondrial genes
doublet_score: Doublet score assigned by Scrublet for doublet detection

Cell Labeling Results
AIFI_L1: Final broad class cell type label (9 types)
AIFI_L1_score: AIFI_L1 prediction score generated by CellTypist
predicted_AIFI_L1: Predicted AIFI_L1 type assigned by CellTypist
AIFI_L2: Final mid resolution cell type label (29 types)
AIFI_L2_score: AIFI_L2 prediction score generated by CellTypist
predicted_AIFI_L2: Predicted AIFI_L2 type assigned by CellTypist
AIFI_L3: Final high resolution cell type label (71 types)
AIFI_L3_score: AIFI_L3 prediction score generated by CellTypist
predicted_AIFI_L3: Predicted AIFI_L3 type assigned by CellTypist

We are providing our scRNA-seq data in AnnData (.h5ad) format. For more details about AnnData, see the AnnData Documentation Page.

These files contain both normalized high-variance genes and raw count data. Normalized data is the active layer by default. In Python, the raw counts can be accessed using:

adata = adata.raw.to_adata()

Each file provided below contains either the complete Immune Health Atlas of ~1.8 million cells, or a subset of cell types. Sample counts, cell counts, and approximate file sizes are below:

File NameN SubjectsN SamplesN CellsFile Size
immune_health_atlas_full.h5ad1081081,821,72572 GB
immune_health_atlas_b-plasma.h5ad108108160,6322.9 GB
immune_health_atlas_cd4t-treg-dnt.h5ad108108743,61514 GB
immune_health_atlas_cd8t-gdt-mait.h5ad108108406,7308.6 GB
immune_health_atlas_dc.h5ad10810823,2870.8 GB
immune_health_atlas_mono.h5ad108108327,9198.1 GB
immune_health_atlas_nk-ilc.h5ad108108148,6053 GB
immune_health_atlas_other.h5ad10810810,9370.4 GB
Immune Health Atlas .h5ad Files
File NameDescriptionDownload Link
immune_health_atlas_b-plasma.h5ad B cells and Plasma cells
immune_health_atlas_cd4t-treg-dnt.h5ad CD4 T cells, Tregs, and DN T cells
immune_health_atlas_cd8t-gdt-mait.h5ad CD8 T cells, gdT cells, and MAIT cells
immune_health_atlas_dc.h5ad Dendritic cells
immune_health_atlas_full.h5ad All cells in the Immune Health Atlas
immune_health_atlas_mono.h5ad Monocytes
immune_health_atlas_nk-ilc.h5ad NK cells and ILCs
immune_health_atlas_other.h5ad Other cell types

scRNA-seq Batch Controls and QC Reports

As part of our scRNA-seq pipeline, we include a batch control sample as an aliquot of PBMCs derived from a single leukapheresis draw. Here, we provide the scRNA-seq data in .h5 format for use in batch comparisons to identify batch effects.

Immune Health Atlas Batch Controls
File NameDescriptionDownload Link
immune_health_atlas_batch-control-h5.tar Tar bundle of .h5 files for batch controls
immune_health_atlas_batch-reports-html.tar Tar bundle of .html files for batch QC reports

Clinical Lab Results and Sample Metadata

Along with a blood draw for PBMCs, several clinical lab panels are performed for each subject visit, including Complete Blood Counts (CBCs), blood chemistries, and a cholesterol panel.

Here, we provide a .csv file with subject and sample metadata along with the results of these clinical measure.

Immune Health Atlas Clinical Labs and Metadata
File NameDescriptionDownload Link
immune_health_atlas_metadata_clinical_labs.csv Subject and sample metadata and clinical lab results