The STARsolo algorithm is used as a fast, memory-efficient alternative to 10x Genomics CellRanger.
STARsolo directly aligns raw sequencing reads from scRNA-seq data to the human reference genome.
Like CellRanger, STARsolo performs:
Barcode and UMI (Unique Molecular Identifier) extraction: Identifies cell barcodes and UMIs from the raw sequencing reads.
Read alignment: Reads are aligned to the genome using STAR.
Transcript quantification: Assigns reads to genes based on the alignment to compute gene expression per cell.
Gene-cell matrix generation: Produces a matrix similar to CellRanger's output, containing gene expression counts across cells.
Achieving CellRanger v3.0 Output:
To match the output of CellRanger v3.0, specific parameters are set within the STARsolo command, particularly:
--soloType CB_UMI_Simple: Defines the type of input for the barcodes and UMIs.
--soloCBwhitelist: Uses the 10x Genomics whitelist to ensure correct identification of valid barcodes.
--soloUMIdedup: UMI deduplication is set to handle potential PCR artifacts by counting each transcript only once per UMI.
--soloFeatures: This is set to generate gene expression matrices, capturing standard features like gene and transcript annotations.
--soloCellFilter EmptyDrops_CR
Post-Processing:
Once the alignment and quantification are completed, the resulting gene-cell matrix is further processed using SoupX to remove ambient RNAs