Clinical lab results and Sample Metadata
At clinical visits at which subjects contributed blood samples, we also collected samples for a battery of clinical laboratory tests including Blood Chemistry, Blood Counts, Inflammatory Markers, and Lipid Profiles. These clinical laboratory results are available to download below.
Descriptors of the columns in our clinical lab results table can be found in the downloadable "clinical_lab_descriptors" table, or by clicking the header below:
Subject and Sample Metadata
Our clinical labs dataset results contain some sample-level metadata to enable connection between these results and samples used in other modalities. The sample.sampleKitGuid identifier is used across many data modalities:
Sample Identifierscohort.cohortGuid:
A Globally Unique Identifier (GUID) of the Cohort the subject enrolled in for our study subject.subjectGuid:
A GUID for the Subjectsample.sampleKitGuid:
A GUID for the Sample Kit, representing all material collected at a visitsample.sampleGuid:
A GUID for the sample collected for clinical lab testing
Subject Metadata (prefix: subject)subject.biologicalSex:
The biological sex of the Subjectsubject.birthYear:
The Birth Year of the Subjectsubject.ageAtFirstDraw:
The Age of the Subject at their first on-study sample collectionsubject.race:
The self-reported Race of the Subjectsubject.races:
The self-reported Race of the Subjectsubject.ethnicity:
The self-reported Ethnicity of the subject
Sample Metadata (prefix: sample)sample.visitName:
The name of the study visit (i.e. time point)sample.visitDetails:
Additional visit purpose/details, where availablesample.drawDate:
The date of the study visit (Month and Year; e.g. 2021-03)sample.subjectAgeAtDraw:
The age of the Subject in years at the time of sample collectionsample.daysSinceFirstVisit:
Timing of the sample collection relative to the first visit date (usually Flu Year 1 Day 0)sample.diseaseStatesRecordedAtVisit:
Disease state recored at the date of sample collection.
Clinical Lab Results
Data in the following columns of our table are results from clinical laboratory tests. Below, we show the column name, full name, units used, and normal ranges (where available). Normal ranges are provided as a guide, but may vary between clinical laboratories.
Anthropometric measures (prefix: am)
Colum Name | Full Name | Units |
---|---|---|
am.bmi | Body Mass Index (BMI) | |
am.height | Height | cm |
am.weight | Weight | kg |
Blood chemistry (prefix: chem)
Colum Name | Full Name | Units | Normal Range |
---|---|---|---|
chem.alt | Alanine Transaminase (ALT) | Units per L | 7 - 52 |
chem.albumin | Albumin | g per dL | 3.5 - 5.7 |
chem.alkaline_phosphatase | Alkaline Phosphatase | Units per L | 39 - 117 |
chem.ast | Aspartate Aminotransferase (AST) | Units per L | 12 - 39 |
chem.t_bili | Bilirubin, Total (T-Bili) | mg per dL | 0.1 - 1.3 |
chem.bun | Blood Urea Nitrogen (BUN) | mg per dL | 7 - 25 |
chem.calcium | Calcium | mg per dL | 8.6 - 10.3 |
chem.co2 | Carbon Dioxide (CO2) | mmol per L | 21 - 31 |
chem.cl | Chloride (Cl) | mmol per L | 98 - 108 |
chem.creatinine | Creatinine | mg per dL | 0.7 - 1.3 |
chem.egfr_aa | Estimated Glomerular Filtration Rate (eGFR) - African American | ml per min per 1.73 m2 | < 60 |
chem.egfr_non_aa | Estimated Glomerular Filtration Rate (eGFR) - Non-African American | ml per min per 1.73 m2 | < 60 |
chem.globin | Globin | g per dL | 2 - 3.5 |
chem.glucose | Glucose | mg per dL | 70 - 199 |
chem.ldh | Lactate Dehydrogenase | Units | 0 - 210 |
chem.magnesium | Magnesium | mg per dL | 1.8 - 2.4 |
chem.phosphate | Phosphate | mg per dL | 2.5 - 4.5 |
chem.potassium | Potassium | mmol per L | 3.5 - 5.1 |
chem.protein | Protein, Total | g per dL | 6.4 - 8.9 |
chem.sodium | Sodium | mmol per L | 133 - 145 |
Blood Counts (prefix: bc)
Colum Name | Full Name | Units | Normal Range |
---|---|---|---|
bc.perc_basophils | % Basophils | percent | |
bc.perc_eosinophils | % Eosinophils | percent | |
bc.perc_lymphocytes | % Lymphocytes | percent | |
bc.perc_monocytes | % Monocytes | percent | |
bc.perc_neutrophils | % Neutrophils | percent | |
bc.basophil_count | Absolute Basophil Count | count per uL | 0 - 202 |
bc.eosinophil_count | Absolute Eosinophil Count (AEC) | count per uL | 0 - 400 |
bc.lymphocyte_count | Absolute Lymphocyte Count (ALC) | count per uL | 1000 - 4800 |
bc.monocyte_count | Absolute Monocyte Count (AMC) | count per uL | 200 - 900 |
bc.neutrophil_count | Absolute Neutrophil Count (ANC) | count per uL | 1800 - 6600 |
bc.hematocrit | Hematocrit | percent | 39.2 - 50.2 |
bc.hemoglobin | Hemoglobin | g per dL | 12.1 - 18.1 |
bc.mch | Mean Corpuscular Hemoglobin (MCH) | pg | 27.5 - 35.1 |
bc.mchc | Mean Corpuscular Hemoglobin Concentration (MCHC) | g per dL | 32 - 36 |
bc.mcv | Mean Corpuscular Volume (MCV) | fL | 80 - 100 |
bc.mpv | Mean Platelet Volume (MPV) | fL | 7.2 - 11.7 |
bc.platelet_count | Platelet Count | thousand count per uL | 150 - 400 |
bc.red_blood_cell_count | Red Blood Cell Count | million count per uL | 4.18 - 6.09 |
bc.rdw | Red Cell Distribution Width (RDW) | percent | 11.7 - 14.2 |
bc.wbc | White Blood Cell Count (WBC) | thousand count per uL | 4 - 11.1 |
HCMV Serology (prefix: cmv)
Colum Name | Full Name |
---|---|
cmv.igg_serology | CMV IgG Serology Result (ELISA) |
cmv.igg_serology_interpretation | CMV IgG Serology Result Interpretation |
Inflammatory Markers (prefix: infl)
Colum Name | Full Name |
---|---|
infl.anti_ccp3 | Anti-CCP3 |
infl.anti_ccp31 | Anti-CCP31 |
infl.hs_crp | C-Reactive Protein, High-Sensitivity (HS-CRP) |
infl.rf_iga_interpretation | RFIgA Interpretation |
infl.rf_iga_result | RFIgA Result |
infl.rf_igm_interpretation | RFIgM Interpretation |
infl.rf_igm_result | RFIgM Result |
infl.esr | SED Rate-Westergren (ESR) |
Lipid Profile (prefix: lip)
Column Name | Full Name | Units | Normal Range |
---|---|---|---|
lip.cholesterol_hdl | Cholesterol, HDL | mg per dL | 40 - 59 |
lip.cholesterol_ldl | Cholesterol, LDL | mg per dL | 57 - 129 |
lip.cholesterol_non_hdl | Cholesterol, Non-HDL | mg per dL | 0 - 189 |
lip.cholesterol_total | Cholesterol, Total | mg per dL | 136 - 239 |
lip.chlesterol_hdl_ratio | Cholesterol/HDL Ratio | ratio | |
lip.triglycerides | Triglycerides | mg per dL | 48 - 149 |
Sound Life Clinical Lab Results and Metadata
File Name | Description | Download Link |
---|---|---|
aifi_sound-life_clinical_lab_results.csv | ||
aifi_sound_life_clinical_lab_descriptors.csv |
scRNA-seq Data
scRNA-seq data was generated on the 10x Genomics 3' scRNA-seq Platform (v3.1). For data collection and processing details, see the Cohorts, Experimental Methods, and Data Analysis Methods sections.
Below, we provide labeled and annotated PBMC scRNA-seq data from our healthy adult cohorts. Because this is a large dataset, we have divided the data to enable data transfer.
All .h5ad files for this project contain sample and subject metadata, in addition to cell type labels and QC metrics. Click the header below for descriptions of these metadata:
Each file contains sample-level metadata, as well as cell-level cell type labels and QC metrics. The following values are stored in the .obs
section of these .h5ad files as descriptions of observations:
Sample Identifierscohort.cohortGuid:
A Globally Unique Identifier (GUID) of the Cohort the subject enrolled in for our study
subject.subjectGuid:
A GUID for the Subjectsample.sampleKitGuid:
A GUID for the Sample Kit, representing all material collected at a visitspecimen.specimenGuid:
A GUID for the specific aliquot used for the experimentpipeline.fileGuid:
A GUID for the specific analysis pipeline output file used for analysis
Subject Metadatasubject.biologicalSex:
The biological sex of the Subjectsubject.birthYear:
The Birth Year of the Subjectsubject.ageAtFirstDraw:
The Age of the Subject at their first on-study sample collectionsubject.ageGroup:
The Age Group of the Subject (Young Adult or Older Adult)subject.race:
The self-reported Race of the Subjectsubject.ethnicity:
The self-reported Ethnicity of the subjectsubject.cmv:
The CMV Status of the subject, as determined by an HCMV assaysubject.bmi:
The BMI of the Subject
Sample Metadatasample.visitName:
The name of the study visit (i.e. time point)sample.drawDate:
The date of the study visit (Month and Year; e.g. 2021-03)sample.subjectAgeAtDraw:
The age of the Subject in years at the time of sample collection
Process Identifiersbatch_id:
A GUID for the batch of samples processed together (e.g. B039)pool_id:
A GUID for the pool of samples combined for Cell Hashing (e.g. B039-P1)chip_id:
A GUID for the 10x Genomics chip the cells were loaded into (e.g. B039-P1C2)well_id:
A GUID for the 10x Genomics well the cells were loaded into within the chip (e.g. B039-P1C2W4)*barcodes:
A GUID for the individual celloriginal_barcodes:
The original, sequence-based barcode generated by 10x Genomics Cell Ranger softwarecell_name:
A quasi-unique, memorable cell identifier generated using an adjective-adjective-animal structure
*used as the primary cell index in our .h5ad files
Cell QC Metricsn_reads:
Number of reads assigned to the cell barcoden_umis:
Number of Unique Molecular Identifiers (unique molecules) detectedn_genes:
Number of genes with at least 1 UMI detectedtotal_counts_mito:
Total number of reads that were assigned to mitochondrial genespct_counts_mito:
Percent of reads that were assigned to mitochondrial genesdoublet_score:
Doublet score assigned by Scrublet for doublet detection
Cell Labeling ResultsAIFI_L1:
Final broad class cell type label (9 types)AIFI_L1_score:
AIFI_L1 prediction score generated by CellTypistpredicted_AIFI_L1:
Predicted AIFI_L1 type assigned by CellTypistAIFI_L2:
Final mid resolution cell type label (29 types)AIFI_L2_score:
AIFI_L2 prediction score generated by CellTypistpredicted_AIFI_L2:
Predicted AIFI_L2 type assigned by CellTypistAIFI_L3:
Final high resolution cell type label (71 types)AIFI_L3_score:
AIFI_L3 prediction score generated by CellTypistpredicted_AIFI_L3:
Predicted AIFI_L3 type assigned by CellTypist
Subject Group .h5ad files
Here, we group samples based on Cohort, Subject Biological Sex, and CMV status to generate subsets of data for use in analysis.
We are providing our scRNA-seq data in AnnData (.h5ad) format. For more details about AnnData, see the AnnData Documentation Page.
Each file provided below contains a subset of the full > 13 million cell dataset. Sample counts, cell counts, and approximate file sizes are below:
File Name | N Subjects | N Samples | N Cells | File Size |
---|---|---|---|---|
SoundLife_OlderAdult_Female_CMVneg.h5ad | 10 | 91 | 1,350,748 | 21 GB |
SoundLife_OlderAdult_Female_CMVpos.h5ad | 17 | 163 | 2,596,111 | 41 GB |
SoundLife_OlderAdult_Male_CMVneg.h5ad | 12 | 116 | 1,750,565 | 29 GB |
SoundLife_OlderAdult_Male_CMVpos.h5ad | 8 | 80 | 1,322,061 | 21 GB |
SoundLife_YoungAdult_Female_CMVneg.h5ad | 18 | 162 | 2,616,824 | 41 GB |
SoundLife_YoungAdult_Female_CMVpos.h5ad | 10 | 76 | 1,234,234 | 12 GB |
SoundLife_YoungAdult_Male_CMVneg.h5ad | 12 | 107 | 1,712,244 | 28 GB |
SoundLife_YoungAdult_Male_CMVpos.h5ad | 9 | 73 | 1,206,761 | 12 GB |
Sound Life Subject Group .h5ad Files
File Name | Description | Download Link |
---|---|---|
Sound_Life_OlderAdult_Female_CMVneg.h5ad | Female CMV-negative Older Adult Subjects | |
Sound_Life_OlderAdult_Female_CMVpos.h5ad | Female CMV-positive Older Adult Subjects | |
Sound_Life_OlderAdult_Male_CMVneg.h5ad | Male CMV-negative Older Adult Subjects | |
Sound_Life_OlderAdult_Male_CMVpos.h5ad | Male CMV-positive Older Adult Subjects | |
Sound_Life_YoungAdult_Female_CMVneg.h5ad | Female CMV-negative Young Adult Subjects | |
Sound_Life_YoungAdult_Female_CMVpos.h5ad | Female CMV-positive Young Adult Subjects | |
Sound_Life_YoungAdult_Male_CMVneg.h5ad | Male CMV-negative Young Adult Subjects | |
Sound_Life_YoungAdult_Male_CMVpos.h5ad | Male CMV-positive Young Adult Subjects |
Cell type .h5ad files
Here, we group cells by major cell type category. These files contain cells from all samples.
We are providing our scRNA-seq data in AnnData (.h5ad) format. For more details about AnnData, see the AnnData Documentation Page.
Each file provided below contains a subset of the full > 13 million cell dataset. Sample counts, cell counts, and approximate file sizes are below:
File Name | N Cells | File Size |
---|---|---|
SoundLife_b_plasma.h5ad | 1,205,085 | 11 GB |
SoundLife_dc_monocyte.h5ad | 2,643,674 | 58 GB |
SoundLife_nk.h5ad | 1,114,975 | 11 GB |
SoundLife_other.h5ad | 94,244 | 0.6 GB |
SoundLife_t_cd4_memory.h5ad | 2,640,499 | 40 GB |
SoundLife_t_cd4_naive.h5ad | 2,839,099 | 38 GB |
SoundLife_t_cd8.h5ad | 2,264,102 | 34 GB |
SoundLife_t_other.h5ad | 987,870 | 9 GB |
Sound LIfe Cell Type Group .h5ad Files
File Name | Description | Download Link |
---|---|---|
SoundLife_b_plasma.h5ad | ||
SoundLife_dc_monocyte.h5ad | ||
SoundLife_nk.h5ad | ||
SoundLife_other.h5ad | ||
SoundLife_t_cd4_memory.h5ad | ||
SoundLife_t_cd4_naive.h5ad | ||
SoundLife_t_cd8.h5ad | ||
SoundLife_t_other.h5ad |
Cell type frequency data
After labeling cell types, we tabulated the cell count for each of the 868 samples utilized in our study at each level of resolution in our Immune Health Atlas. Below, we provide these cell counts, the fraction of counts for each sample, and the centered log ratio (CLR) transformation of those fractions that we utilized for our analyses.
In addition, we utilized the absolute lymphocyte counts (ALC) provided in our clinical blood count lab results for each sample (available below) to compute estimated cell type abundance based on normalization to ALC. For more details, see our Data Analysis Methods.
Descriptors of columns in the cell type frequency tables can be accessed by clicking on the header below:
Sample Metadata columnscohort.cohortGuid
: Cohort ID (BR1 or BR2)subject.subjectGuid
: Subject IDsubject.biologicalSex
: Subject Sex (Female or Male)subject.cmv
: Subject CMV Status (Negative or Positive)subject.bmi
: Subject BMI (integer)subject.race
: Subject racesubject.ethnicity
: Subject ethnicitysubject.birthYear
: Subject Birth Yearsubject.ageAtFirstDraw
: Subject Age at earliest blood draw in studysample.sampleKitGuid
: Sample Kit IDsample.visitName
: Sample Visit Namesample.drawDate
: Sample Draw Date (Year-Month)sample.subjectAgeAtDraw
: Subject age at time of draw, based on year of Draw Date and Birth Yearspecimen.specimenGuid
: Specimen ID (pbmc_sample_id in .h5 files)
Frequency-related columns
(for AIFI_L1 as an example; AIFI_L1 is replaced with AIFI_L2 and AIFI_L3 for those levels)
AIFI_L1
: Cell Type assignmentAIFI_L1_count
: Count of cells within this sample with cell type assignmenttotal_cells
: Total cells within this samplescrna.lymphocyte_count
: Sum of T, NK, and B cellsbc.lymphocyte_count
: Absolute Lymphocyte Count (ALC) from clinical Blood Counts (bc.)alc_ratio
: ALC per scRNA Lymphocyte CountAIFI_L1_frac_total
: Fraction of cells with cell type assignment divided by Total cells for this sampleAIFI_L1_alc
: ALC estimate for this cell type assignmentAIFI_L1_clr
: Centered Log Ratio computed using AIFI_L1_frac_total for all types within this sample
Sound Life scRNA-seq Cell Type Frequencies
File Name | Description | Download Link |
---|---|---|
sound_life_AIFI_L1_frequencies.csv | ||
sound_life_AIFI_L2_frequencies.csv | ||
sound_life_AIFI_L3_frequencies.csv |
Pseudobulk scRNA-seq data
To perform analysis and display data per cell type for each sample, we assembled pseudobulk expression values for each high resolution (AIFI_L3) cell type in our cell type labels.
For use with differential expression tests using the DESeq2 framework, we assembled total UMI counts for each sample and cell type. For display of average expression, we normalized and log-transformed scRNA-seq count data, then computed mean values for each sample and cell type.
Below, we provide matrices of pseudobulk expression for each of these metrics, as well as sample and cell type metadata that are necessary for analysis.
Sound Life Pseudobulk scRNA-seq Data
File Name | Description | Download Link |
---|---|---|
sound-life_pseudobulk_mean-log-norm.tar.gz | ||
sound-life_pseudobulk_sum.tar.gz |
Differential expression test results
We utilized DESeq2 to perform comparisons between groups of pseudobulk scRNA-seq samples in our healthy cohorts. Most of these comparisons were cross-sectional; that is, we used data from a single time point (Flu Year 1 Day 0) to compare age groups, CMV status groups, and biological sex groups. For the Flu Vaccine comparison, we performed comparisons between 7 days post-vaccination (Flu Year 1 Day 7) and the day of vaccine administration (Flu Year 1 Day 0) for all subjects with longitudinal collection across both time points.
In each case, we provide a "Foreground" and "Background" group designation. Differential expression results with positive log2(Fold Change) indicate higher expression in the "Foreground" group, while results with negative values indicate higher expression in the "Background" group.
Comparison | scRNA-seq column | Background Group | Foreground Group |
---|---|---|---|
Age Group | cohort.cohortGuid | BR1 (Young Adult) | BR2 (Older Adult) |
Flu Vaccine | sample.visitName | Flu Year 1 Day 0 | Flu Year 1 Day 7 |
CMV Status | subject.cmv | Negative | Positive |
Biological Sex | subject.biologicalSex | Female | Male |
For each comparison, we provide DESeq results for each cell type and gene. For additional details, see the Data Analysis Methods section. Descriptions of the columns in these results are available by clicking the header below.
DEG Results columnsAIFI_L3
: Level 3 (high resolution) cell type used for differential expression testfg
: Foreground group in the contrastbg
: Background group in the contrastgene
: Gene symbol for the contrast testlog2fc
: Log2(Fold Change) of average expression between groups.padj
: Adjusted P-value for the contrast test (using the Benjamini and Hochberg FDR method)pvalue
: Original P-value generated by DESeq2 for the contrast teststat
: Test statistic generated by DESeq2 for the contrast test
Sound Life Differential Gene Expression Results
File Name | Description | Download Link |
---|---|---|
sound_life_age-group_deseq2_results.csv | ||
sound_life_biological-sex_deseq2_results.csv | ||
sound_life_cmv-status_deseq2_results.csv | ||
sound_life_flu-vaccine_deseq2_results.csv |
scRNA-seq Batch Controls and QC Reports
As part of our scRNA-seq pipeline, we include a batch control sample as an aliquot of PBMCs derived from a single leukapheresis draw. Here, we provide the scRNA-seq data in .h5 format for use in batch comparisons to identify batch effects.
In total, this dataset contained samples from 99 scRNA-seq batches from 143 hashed sample pools. For additional details about our multiplexing and batching approach, see our Multiplexed scRNA-seq methods.
Sound Life Batch Control and QC Report Files
File Name | Description | Download Link |
---|---|---|
sound_life_batch_control_h5.tar | ||
sound_life_batch_report_html.tar |
GEO .h5 data access
In addition to the assembled data, above, we have deposited the demultiplexed, per-sample Cell Ranger output .h5 files in the Gene Expression Omnibus (GEO) for public access.
These .h5 files can be found on GEO at accession ID GSE271896.
These files can be read into analysis packages using functions designed for reading 10x Genomics Cell Ranger .h5 files.
Raw data in dbGaP
Raw data in FASTQ format will be available in dbGaP for controlled access. We are currently in the process of data deposition.