Deconvolution using a reference of normal B cell development
Previous works have directly used normal hematopoietic populations along B cell development as a reference for deconvolution rather than populations derived from B-ALL patients (Beder et al. Hemasphere 2023; Hu et al. Haematologica 2024; Huang et al. Cancer Cell 2024; Khabirova et al. Nature Medicine 2022). We have shown in the context of myeloid malignancies that this leads to inferior deconvolution performance (Zeng et al. Nature Medicine 2022). Nonetheless, we sought to perform deconvolution from a normal hematopoietic reference as an additional benchmark using each of the four approaches (BayesPrism, CIBERSORTx, DWLS, SVR).
To perform deconvolution with a normal hematopoietic reference, we utilized single-cell transcriptomes from our B cell development atlas as a reference. Specifically, we condensed the numerous cell states into the seven developmental states spanning B cell development (HSC/MPP, Myeloid Progenitor, Pre-pDC, Early Lymphoid, Pro-B, Pre-B, Mature B), and generated pseudobulk profiles by pooling cells from each developmental state from each healthy donor, only retaining pseudobulk profiles comprised of 100 or more cells.
For signature matrix generation prior to deconvolution with CIBERSORTx, DWLS, or SVR, AutoGeneS was applied in hierarchical mode to 3,031 batch-aware highly variable genes identified across normal B cell development. The first round of AutoGeneS (500 generations) sought to separate all populations while the second round (100 generations) focused on discerning between Pro-B and Pre-B populations. Feature sets from each round were combined to constitute a final signature matrix of 637 genes representing developmental states spanning normal B cell development. Deconvolution with CIBERSORTx (B-mode), DWLS, and SVR was performed on the 85 bulk RNA-seq profiles as described above.
For BayesPrism deconvolution with the normal hematopoietic reference, pseudobulk profiles from normal B cell developmental states were subject to standard BayesPrism preprocessing and feature selection as described above, resulting in a final matrix of 6550 genes across 233 normal pseudobulk profiles used to run BayesPrism. Final theta predictions of population abundance were used as the BayesPrism estimates.