STATISTICS AND DATA ANALYSIS
Thirty volunteers will be enrolled.
Ultrasound images will be collected using the software provided with the US machine operating system. These may be securely shared for storage on a secure password protected computer as anonymised images using the US applications such as ApliGate, developed by Canon for the secure sharing of images. Ultrasound images will be stored with the unique study identifier for each participant, the date of the scan and the initials of the person performing the examination.
For single cell RNA-seq experiments, we anticipate capturing ~3,000-5,000 cells per sample after quality control (QC) and filtering. Sequencing, batch correction, filtering, QC, doublet removal, ambient RNA correction and correction for multiple testing will be performed according to single-cell RNA-seq best practices using established pipelines and tools (e.g. https://github.com/DendrouLab; COMBAT Consortium 2022). Statistical and outcome measures will typically be assessed through ANOVA, Wilcoxon rank, and Fisher exact tests. Dimensionality reduction processes will be used to understand data structure and to identify clusters and cell types for statistical testing. Differential cluster abundance and differential gene expression using pseudobulk counts will be performed using relevant edgeR packages and applying a Benjamini-Hochberg multiple testing correction. Power calculations for single-cell RNA-seq data analysis are based on relevant available data (e.g., Turner et al. 2020), and for assessing differential abundance and gene expression, for instance, we estimate that n=10 individuals per group are required to have >80% power to detect 30% differences between longitudinal samples (pre and post vaccination), with a false discovery rate of 5%. Given that there is expected variation in sample yield after fine needle aspiration which may be anatomically determined, there is a 50% overage in each group to allow for this and for any participants lost to follow up bringing the total to n=30.
Single-cell repertoire analyses will be performed as previously described22,23,24. Briefly, T cell receptor (TCR) and B cell receptor (BCR) outputs from CellRanger will be further filtered and processed to remove homotypic doublets and low-quality droplets and annotated using IMGT. Single-cell TCR clonality measurements will include Shannon diversity calculation using the entropy R package, and mean clone size estimation by bootstrapped down-sampling. Single-cell BCR clonality measurements will include clonal expansion index calculation based on the Gini index of the number of total BCRs per clone, whilst the clonal diversification index is calculated as the Gini index of the number of unique BCRs per clone.
For bulk sequencing, standard pipelines will be used for data processing (COMBAT Consortium 2022). Principal component analysis will be used for initial analyses of the normalised and filtered data. Differential expression analysis of the normalised data will be performed using the limma R package for example, with pathway enrichment analysis being performed using the Reactome pathways via the XGR R package with Fisher’s exact test. Weighted gene correlation network analysis will be applied to identify modules of highly correlated genes. Bulk repertoire analyses will include filtering for base quality amongst other quality control checks, followed by TCR chain/Ig isotype frequency estimates and usage, CDR3 length characterisation, and somatic hypermutation and BCR class-switching analyses.
For flow cytometric analyses, data will be analysed using FlowJo, with appropriate control/validation with non-specific isotype controls and beads to ensure stable equipment calibration over time, as well as use of gating templates. For functional/serological analyses, assays will be performed in duplicate or triplicate with appropriate repeat experiments to ensure reproducibility of findings. Appropriate tests (e.g., t-tests, Wilcoxon rank-sum tests) for assessing inter-group differences will be performed using R, python or GraphPad Prism as required.
Data generated and all appropriate documentation will be stored for a minimum of 10 years after the completion of the study, including the follow-up period.
22. Corridoni D, Antanaviciute A, Gupta T, et al. (2020) Single-cell atlas of colonic CD8+ T cells in ulcerative colitis. Nat Med 26: 1480-1490.
23. Huang B, Chen Z, Geng L, et al. (2019) Mucosal profiling of pediatric-onset colitis and IBD reveals common pathogenics and therapeutic pathways. Cell 179: 1160-1176.
24. COvid-19 Multi-omics Blood ATlas (COMBAT) Consortium. (2022) A blood atlas of COVID-19 defines hallmarks of disease severity and specificity. Cell 185: 916-938.