scRNA-seq Data

Sound Life scRNA-seq Data

scRNA-seq data from the Sound Life cohort was generated on the 10x Genomics 3' scRNA-seq Platform (v3.1). For data collection and processing details, see the Cohorts, Experimental Methods, and Data Analysis Methods sections.

Below, we provide labeled and annotated PBMC scRNA-seq data from this healthy adult cohort. We provide the full dataset, as well as data for each major class of cell types.

All .h5ad files for this project contain sample and subject metadata, in addition to cell type labels and QC metrics. Click the header below for descriptions of these metadata:

Each file contains sample-level metadata, as well as cell-level cell type labels and QC metrics. The following values are stored in the .obs section of these .h5ad files as descriptions of observations:

Sample Identifiers
cohort.cohortGuid: A Globally Unique Identifier (GUID) of the Cohort the subject enrolled in for our study subject.subjectGuid: A GUID for the Subject
sample.sampleKitGuid: A GUID for the Sample Kit, representing all material collected at a visit
specimen.specimenGuid: A GUID for the specific aliquot used for the experiment
pipeline.fileGuid: A GUID for the specific analysis pipeline output file used for analysis

Subject Metadata
subject.biologicalSex: The biological sex of the Subject
subject.birthYear: The Birth Year of the Subject
subject.ageAtFirstDraw: The Age of the Subject at their first on-study sample collection
subject.ageGroup: The Age Group of the Subject (Young Adult or Older Adult)
subject.race: The self-reported Race of the Subject
subject.ethnicity: The self-reported Ethnicity of the subject
subject.cmv: The CMV Status of the subject, as determined by an HCMV assay
subject.bmi: The BMI of the Subject

Sample Metadata
sample.visitName: The name of the study visit (i.e. time point)
sample.drawYear: The year of the study visit (e.g. 2021)
sample.subjectAgeAtDraw: The age of the Subject in years at the time of sample collection

Process Identifiers
batch_id: A GUID for the batch of samples processed together (e.g. B039)
pool_id: A GUID for the pool of samples combined for Cell Hashing (e.g. B039-P1)
chip_id: A GUID for the 10x Genomics chip the cells were loaded into (e.g. B039-P1C2)
well_id: A GUID for the 10x Genomics well the cells were loaded into within the chip (e.g. B039-P1C2W4)
*barcodes: A GUID for the individual cell
original_barcodes: The original, sequence-based barcode generated by 10x Genomics Cell Ranger software
cell_name: A quasi-unique, memorable cell identifier generated using an adjective-adjective-animal structure

*used as the primary cell index in our .h5ad files

Cell QC Metrics
n_reads: Number of reads assigned to the cell barcode
n_umis: Number of Unique Molecular Identifiers (unique molecules) detected
n_genes: Number of genes with at least 1 UMI detected
total_counts_mito: Total number of reads that were assigned to mitochondrial genes
pct_counts_mito: Percent of reads that were assigned to mitochondrial genes
doublet_score: Doublet score assigned by Scrublet for doublet detection

Cell Labeling Results
AIFI_L1: Final broad class cell type label (9 types)
AIFI_L1_score: AIFI_L1 prediction score generated by CellTypist
predicted_AIFI_L1: Predicted AIFI_L1 type assigned by CellTypist
AIFI_L2: Final mid resolution cell type label (29 types)
AIFI_L2_score: AIFI_L2 prediction score generated by CellTypist
predicted_AIFI_L2: Predicted AIFI_L2 type assigned by CellTypist
AIFI_L3: Final high resolution cell type label (71 types)
AIFI_L3_score: AIFI_L3 prediction score generated by CellTypist
predicted_AIFI_L3: Predicted AIFI_L3 type assigned by CellTypist

Subject Group .h5ad files

Here, we group samples based on Cohort, Subject Biological Sex, and CMV status to generate subsets of data for use in analysis.

We are providing our scRNA-seq data in AnnData (.h5ad) format. For more details about AnnData, see the AnnData Documentation Page.

To reduce download size, normalized data are not provided. Normalization and log transformation can be performed using the scanpy package with:
scanpy.pp.normalize_total(adata, target_sum = 1e4) scanpy.pp.log1p(adata)

Each file provided below contains a subset of the full > 13 million cell dataset. Sample counts, cell counts, and approximate file sizes are below:

File Name	N Subjects	N Samples	N Cells	File Size
SoundLife_OlderAdult_Female_CMVneg.h5ad	10	91	1,350,748	25 GB
SoundLife_OlderAdult_Female_CMVpos.h5ad	17	163	2,596,111	49 GB
SoundLife_OlderAdult_Male_CMVneg.h5ad	12	116	1,750,565	34 GB
SoundLife_OlderAdult_Male_CMVpos.h5ad	8	80	1,322,061	25 GB
SoundLife_YoungAdult_Female_CMVneg.h5ad	18	162	2,616,824	49 GB
SoundLife_YoungAdult_Female_CMVpos.h5ad	10	76	1,234,234	16 GB
SoundLife_YoungAdult_Male_CMVneg.h5ad	12	107	1,712,244	34 GB
SoundLife_YoungAdult_Male_CMVpos.h5ad	9	73	1,206,761	16 GB

Sound Life Subject Group .h5ad Files

File Name	Description	Download Link
SoundLife_OlderAdult_Female_CMVneg.h5ad	Older Adult Female CMV- Subjects
SoundLife_OlderAdult_Female_CMVpos.h5ad	Older Adult Female CMV+ Subjects
SoundLife_OlderAdult_Male_CMVneg.h5ad	Older Adult Male CMV- Subjects
SoundLife_OlderAdult_Male_CMVpos.h5ad	Older Adult Male CMV+ Subjects
SoundLife_YoungAdult_Female_CMVneg.h5ad	Young Adult Female CMV- Subjects
SoundLife_YoungAdult_Female_CMVpos.h5ad	Young Adult Female CMV+ Subjects
SoundLife_YoungAdult_Male_CMVneg.h5ad	Young Adult Male CMV- Subjects
SoundLife_YoungAdult_Male_CMVpos.h5ad	Young Adult Male CMV+ Subjects

Cell population .h5ad files

Here, we group cells by major population category. These files contain cells from all samples.

We are providing our scRNA-seq data in AnnData (.h5ad) format. For more details about AnnData, see the AnnData Documentation Page.

Each file provided below contains a subset of the full > 13 million cell dataset. Sample counts, cell counts, and approximate file sizes are below:

File Name	N Cells	File Size
SoundLife_b_plasma.h5ad	1,205,085	11 GB
SoundLife_dc_monocyte.h5ad	2,643,674	58 GB
SoundLife_nk.h5ad	1,114,975	11 GB
SoundLife_other.h5ad	94,244	0.6 GB
SoundLife_t_cd4_memory.h5ad	2,640,499	40 GB
SoundLife_t_cd4_naive.h5ad	2,839,099	38 GB
SoundLife_t_cd8.h5ad	2,264,102	34 GB
SoundLife_t_other.h5ad	987,870	9 GB

Sound Life Cell Class .h5ad Files

File Name	Description	Download Link
sound_life_b_plasma.h5ad	B cells and Plasma cells
sound_life_dc_mono.h5ad	DCs and Monocytes
sound_life_nk.h5ad	NK cells and ILCs
sound_life_other.h5ad	Other cell types
sound_life_t_cd4_memory.h5ad	CD4 memory T cells
sound_life_t_cd4_naive.h5ad	CD4 Naive cells
sound_life_t_cd8.h5ad	CD8 T cells
sound_life_t_other.h5ad	Other T cell types

Cell type frequency data

After labeling cell types, we tabulated the cell count for each of the 868 samples utilized in our study at each level of resolution in our Immune Health Atlas. Below, we provide these cell counts, the fraction of counts for each sample, and the centered log ratio (CLR) transformation of those fractions that we utilized for our analyses. CLR values are provided both with and without a pseudocount added to adjust types with no cells in each sample.

In addition, we utilized the absolute lymphocyte counts (ALC) provided in our clinical blood count lab results for each sample (available below) to compute estimated cell type abundance based on normalization to ALC. For more details, see our Data Analysis Methods.

Descriptors of columns in the cell type frequency tables can be accessed by clicking on the header below:

Sample Metadata columns
cohort.cohortGuid: Cohort ID (BR1 or BR2)
subject.subjectGuid: Subject ID
subject.biologicalSex: Subject Sex (Female or Male)
subject.cmv: Subject CMV Status (Negative or Positive)
subject.bmi: Subject BMI (integer)
subject.race: Subject race
subject.ethnicity: Subject ethnicity
subject.birthYear: Subject Birth Year
subject.ageAtFirstDraw: Subject Age at earliest blood draw in study
sample.sampleKitGuid: Sample Kit ID
sample.visitName: Sample Visit Name
sample.drawDate: Sample Draw Date (Year-Month)
sample.subjectAgeAtDraw: Subject age at time of draw, based on year of Draw Date and Birth Year
specimen.specimenGuid: Specimen ID (pbmc_sample_id in .h5 files)

Frequency-related columns
(for AIFI_L1 as an example; AIFI_L1 is replaced with AIFI_L2 and AIFI_L3 for those levels)

AIFI_L1: Cell Type assignment
AIFI_L1_count: Count of cells within this sample with cell type assignment
total_cells: Total cells within this sample
scrna.lymphocyte_count: Sum of T, NK, and B cells
bc.lymphocyte_count: Absolute Lymphocyte Count (ALC) from clinical Blood Counts (bc.)
alc_ratio: ALC per scRNA Lymphocyte Count
AIFI_L1_frac_total: Fraction of cells with cell type assignment divided by Total cells for this sample
AIFI_L1_alc: ALC estimate for this cell type assignment
AIFI_L1_clr: Centered Log Ratio computed using AIFI_L1_frac_total for all types within this sample
AIFI_L1_count_pseudo: Count of cells adjusted with a pseudocount (AIFI_L1_count + 1)
AIFI_L1_clr_pseudo Centered Log Ratio computed using AIFI_L1_count_pseudo

Sound Life scRNA-seq Cell Type Frequencies

File Name	Description	Download Link
sound_life_AIFI_L1_frequencies.csv	Broad classes
sound_life_AIFI_L2_frequencies.csv	Intermediate resolution cell types
sound_life_AIFI_L3_frequencies.csv	High resolution cell types

2025-05-23: Cell types that were not detected in a sample were previously excluded from our cell type frequency tables and CLR calculation. We now include all populations for each sample, including those without counts (with 0 values). CLR calculations were updated to include pseudocount-adjusted values to take these 0 values into account. Original values without a pseudocount are also provided, and CLR values are assigned a missing value (NA).

scRNA-seq Batch Controls and QC Reports

As part of our scRNA-seq pipeline, we include a batch control sample as an aliquot of PBMCs derived from a single leukapheresis draw. Here, we provide the scRNA-seq data in .h5 format for use in batch comparisons to identify batch effects.

In total, this dataset contained samples from 99 scRNA-seq batches from 143 hashed sample pools. For additional details about our multiplexing and batching approach, see our Multiplexed scRNA-seq methods.

Sound Life Batch Control and QC Report Files

File Name	Description	Download Link
sound_life_batch_control_h5.tar
sound_life_batch_report_html.tar

Immunobiology of Aging scRNA-seq Data

scRNA-seq data from the Immunobiology of Aging cohort was generated on the 10x Genomics Flex Gene Expression platform.

All .h5ad files for this project contain sample and subject metadata, in addition to cell type labels and QC metrics. Click the header below for descriptions of these metadata:

Subject Metadata
subject.biologicalSex: The biological sex of the Subject
subject.ageAtFirstDraw: The Age of the Subject at their first on-study sample collection
subject.race: The self-reported Race of the Subject
subject.ethnicity: The self-reported Ethnicity of the subject
subject.cmv: The CMV Status of the subject, as determined by an HCMV assay

Sample Metadata
sample.visitName: The name of the study visit (i.e. time point)
sample.subjectAgeAtDraw: The age of the Subject in years at the time of sample collection

*used as the primary cell index in our .h5ad files

Cell Labeling Results
AIFI_L1: Final broad class cell type label (9 types)
predicted_AIFI_L1: Predicted AIFI_L1 type assigned by CellTypist
AIFI_L2: Final mid resolution cell type label (29 types)
predicted_AIFI_L2: Predicted AIFI_L2 type assigned by CellTypist
AIFI_L3: Final high resolution cell type label (71 types)
predicted_AIFI_L3: Predicted AIFI_L3 type assigned by CellTypist

Cell population .h5ad files

Here, we group cells by major population category. These files contain cells from all samples.

We are providing our scRNA-seq data in AnnData (.h5ad) format. For more details about AnnData, see the AnnData Documentation Page.

To reduce file size for download, the full dataset (imm-of-aging_all_cells.h5ad) does not include normalized data. Only raw counts are provided. Normalized data can be obtained using the scanpy package for Python with:
scanpy.pp.normalize_total(adata, target_sum = 1e4) scanpy.pp.log1p(adata)

Cell type subset files contain both normalized high-variance genes and raw count data. Normalized data is the active layer by default. In Python, the raw counts can be accessed using:

adata = adata.raw.to_adata()

Each file provided below contains the full set of ~3.8 million cells, or a subset for a major cell population category. Each file contains cells from the full set of 234 samples. Cell counts and approximate file sizes are below:

File Name	N Cells	File Size
imm-of-aging_all_cells.h5ad	3,758,514	87 GB
imm-of-aging_b-plasma_cells.h5ad	455,893	14 GB
imm-of-aging_cd4-memory-treg_cells.h5ad	854,753	30 GB
imm-of-aging_cd4-naive_cells.h5ad	770,809	25 GB
imm-of-aging_cd8-gdt-mait-dnt_cells.h5ad	717,559	25 GB
imm-of-aging_dc-monocyte_cells.h5ad	389,187	16 GB
imm-of-aging_nk-ilc_cells.h5ad	560,959	19 GB
imm-of-aging_other_cells.h5ad	9,354	0.25 GB

Immunobiology of Aging Fixed scRNA-seq .h5ad

File Name	Description	Download Link
imm-of-aging_all_cells.h5ad	All cells
imm-of-aging_b-plasma_cells.h5ad	B and Plasma cells
imm-of-aging_cd4-memory-treg_cells.h5ad	CD4 Memory and Treg cells
imm-of-aging_cd4-naive_cells.h5ad	Naive CD4 T cells
imm-of-aging_cd8-gdt-mait-dnt_cells.h5ad	CD8 T, gdT, MAIT, and dnT cells
imm-of-aging_dc-monocyte_cells.h5ad	Monocytes and Dendritic cells
imm-of-aging_nk-ilc_cells.h5ad	NK cells and ILCs
imm-of-aging_other_cells.h5ad	Other cell types

Cell type frequency data

After labeling cell types, we tabulated the cell count for each of the 234 Immunobiology of Aging samples utilized in our study at each level of resolution in our Immune Health Atlas. Below, we provide these cell counts, the fraction of counts for each sample, and the centered log ratio (CLR) transformation of those fractions that we utilized for our analyses. CLR values are provided both with and without a pseudocount added to adjust types with no cells in each sample.

For this cohort, we did not have matched absolute lymphocyte counts (ALC) available, so these samples lack the estimated cell type abundance that is available for the Sound Life dataset.

Descriptors of columns in the cell type frequency tables can be accessed by clicking on the header below:

Sample Metadata columns
cohort.cohortGuid: Cohort ID (SF4)
subject.subjectGuid: Subject ID
subject.biologicalSex: Subject Sex (Female or Male)
subject.cmv: Subject CMV Status (Negative or Positive)
subject.race: Subject race
subject.ethnicity: Subject ethnicity
subject.ageAtFirstDraw: Subject Age at earliest blood draw in study
sample.sampleKitGuid: Sample Kit ID
sample.visitName: Sample Visit Name
sample.subjectAgeAtDraw: Subject age at time of draw, based on year of Draw Date and Birth Year

Frequency-related columns
(for AIFI_L1 as an example; AIFI_L1 is replaced with AIFI_L2 and AIFI_L3 for those levels)

AIFI_L1: Cell Type assignment
AIFI_L1_count: Count of cells within this sample with cell type assignment
total_cells: Total cells within this sample
AIFI_L1_frac_total: Fraction of cells with cell type assignment divided by Total cells for this sample
AIFI_L1_clr: Centered Log Ratio computed using AIFI_L1_frac_total for all types within this sample
AIFI_L1_count_pseudo: Count of cells adjusted with a pseudocount (AIFI_L1_count + 1)
AIFI_L1_clr_pseudo Centered Log Ratio computed using AIFI_L1_count_pseudo

Immunobiology of Aging Fixed scRNA-seq Cell Type Frequencies

File Name	Description	Download Link
imm-of-aging_AIFI_L1_frequencies.csv	Broad classes
imm-of-aging_AIFI_L2_frequencies.csv	Intermediate resolution cell types
imm-of-aging_AIFI_L3_frequencies.csv	High resolution cell types

scRNA-seq Data

Sound Life scRNA-seq Data

.h5ad Metadata Descriptions

Subject Group .h5ad files

Subject Group .h5ad Download Links

Sound Life Subject Group .h5ad Files

Cell population .h5ad files

Cell Type Group .h5ad Download Links

Sound Life Cell Class .h5ad Files

Cell type frequency data

Cell Type Frequency Descriptors

Sound Life scRNA-seq Cell Type Frequencies

Update Notice

scRNA-seq Batch Controls and QC Reports

Sound Life Batch Control and QC Report Files

Immunobiology of Aging scRNA-seq Data

.h5ad Metadata Descriptions

Cell population .h5ad files

Cell Type Group .h5ad Download Links

Immunobiology of Aging Fixed scRNA-seq .h5ad

Cell type frequency data

Cell Type Frequency Descriptors

Immunobiology of Aging Fixed scRNA-seq Cell Type Frequencies