Analysis Methods

Flow Cytometry Data Processing

Flow cytometry gating and data analyses for both BMMC and PBMC samples were conducted using CellEngine. T follicular helper (Tfh) cells were defined uniquely as CD185+ and CD279+ (PD-1) within the non-Treg population (CD4+, CD25-), while plasma cells were gated as CD19+, CD20-, CD27+, and CD319+. CD38 was not used due to degradation of CD38 signal in patients treated with daratumumab. Hematopoietic stem cells (HSCs) in the bone marrow were identified as CD34+, lin-, CD38-, and CD45RA-. After concatenating samples across the four PBMC and two BMMC panels, cell type nomenclature was uniformly standardized. For subjects missing the 60-day post-transplant timepoint, the available 90-day measurement was used for analysis.

To estimate absolute cell quantities, the flow cytometry event counts were normalized based on each patient's clinical Absolute Lymphocyte Count (ALC). This was achieved by determining the total lymphocyte count (T cells + B cells + NK cells) from the flow data, and then computing the absolute count for any specific cell type by multiplying its event count by the clinical ALC, divided by the total flow lymphocyte count. Finally, to account for the constraints of compositional data, cell frequencies were stabilized using a centered log-ratio (CLR) transformation, where each cell type's frequency is divided by the geometric mean of all cell type frequencies in that sample.

Bone Marrow scRNA-seq QC and Cell Labeling

Bone marrow mononuclear cell (BMMC) single-cell data were processed using Scanpy, with doublets identified with Scrublet. High-quality cells were retained by applying median absolute deviation outlier thresholds: removing cells exceeding 7 MADs for total counts and detected genes, cells with >6% mitochondrial gene content, and cells with a >0.35 doublet score. The data was normalized to 10,000 counts per cell, log1p-transformed, and scaled with a maximum value cutoff of 10 followed by dimensionality reduction by PCA. Harmony was applied for batch correction. Neighborhood graphs were constructed using 50 neighbors and 20 PCs, followed by TSNE, UMAP (min_dist=0.45), and Leiden clustering (resolution=2.0).

Cell type annotations were mapped using CellTypist referencing three distinct datasets: an internal healthy PBMC reference, a published bone marrow dataset, and an internal 5' Flex kit bone marrow reference. Annotations were manually refined based on major canonical lineages (CD3D for T cells, NKG7 for NK cells, CD19 for B cells, CD14 for monocytes, CST3 for dendritic cells, LEPR for stromal cells, and CD34 for progenitors).

Malignant Plasma Cell Isolation & Copy Number Variation (CNV) Analysis

For CNV mapping data from labeled BMMC malignant and non-malignant plasma cells were subsetted based on canonical markers (CD38, TNFRSF17, SDC1, PRDM1, JCHAIN) and immunoglobulin gene expression across unsupervised Leiden clusters, along with publicly available data from healthy plasma cells was selected. The transcripts were normalized, followed by scaling, variable feature selection, PCA, and UMAP embedding as described in Scanpy (cite). A healthy bone marrow external atlas dataset from researchers at the University of Pennsylvania was subsetted down to plasma cells (n=16K) (http://cell.com/cell/fulltext/S0092-8674(24)00408-2) and merged with our NDMM plasma cells and harmonized on data source. Chromosome locations for each gene were added to run InferCNV. InferCNV ran with reference keys Healthy vs NDMM Plasma and reference group being the healthy external atlas dataset (window size = 200). The CNV object was processed with PCA, UMAP embedding, leiden clustering and nearest neighbor clustering and CNV score was calculated using the inferCNV.

Peripheral Blood scRNA-seq QC and Cell Labeling

PBMCs were filtered using the following thresholds: <10% mitochondrial content, 200-5000 genes detected, and predicted doublet status removed. A secondary round of doublet detection was performed on the filtered dataset, and additional doublets were removed to ensure high-quality singlets. Labels were transferred from the Allen Institute healthy PBMC reference atlas using CellTypist (link). Following manual curation to remove ambiguous labels, the finalized PBMC dataset contains 5,413,474 high-quality single cells.

Single-Cell Pseudobulking, Differential Expression Analysis, and Pathway Enrichment

For differential gene expression, single-cell RNA-seq data were pseudobulked by aggregating the raw count sum of each gene within every sample and specific cell type, filtering for a minimum of 10 cells per cell type and at least 1,000 cells per overall sample. Differential expression testing was performed in R (v4.3.3) using DESeq2. For paired longitudinal NDMM comparisons across clinical visits, the model inherently regressed subject-specific effects (GEX ~ Subject + Visit). When comparing NDMM states to healthy controls, the DESeq2 formula adjusted for covariates including cohort, sex, age, and CMV status (GEX ~ Cohort + Sex + Age + CMV). Differentially expressed genes were defined as having a Benjamini-Hochberg adjusted p-value of <= 0.05. Pathway enrichment was calculated utilizing the Fast Gene Set Enrichment Analysis (FGSEA) in R (v4.3.3). Gene sets were sourced from MSigDB_Hallmark_2020, Reactome_Pathways_2024, ENCODE_and_ChEA_Consensus_TFs_from_ChIP-X, and KEGG_2021_Human. Plotting significant pathways defined by an adjusted p-value <= 0.05.

B Cell Isotype Classification

Normalized expression of nine IgH constant genes (IGHM, IGHD, IGHG1-4, IGHA1-2, and IGHE) were used to classify B cells by their isotype. If cells were positive for either IGHM or IGHD, or both IGHM and IGHD, they received isotype assignment of IGHM, IGHD, or IGHMD. Otherwise, the cell received the isotype of the IGH genes for which they are positive. In cases of no expression of any IGH genes, the cell was classified as undetermined.

Olink and MSD Proteomics

Statistical differences between healthy and cancer patients (VRd/DVRd) for each timepoint were calculated using an unpaired Mann-Whitney U test. Statistical differences between timepoints within the same treatment group (Healthy, VRd, or DVRd) were calculated using a paired Wilcoxon Signed-Rank test. Benjamini-Hochberg p-value correction was applied to control the false discovery rate (FDR), with significant differences in composition defined by FDR < 0.05. Olink proteomic pathway enrichment was calculated via Over-Representation Analysis (ORA) utilizing the WebGestaltR package.