Requires at least 1000 virus RNA copies going into cDNA synthesis. More is better. Try to normalize virus RNA copies between samples to make comparisons easier.
Process each RNA sample twice through the protocol to sequence as technical replicates. By calling variants only present in both replicates, it reduces the number of false positives (mainly from sequencing errors) and increases the accuracy of variant frequency measurements.
Obtain at least 400x nt coverage of each nucleotide position. Because of different amplicon efficiencies, this typically means that ~1M 250 nt paired-end reads are needed. Amplification of high input virus concentrations (>10,000 virus RNA copies) are more even and require fewer total reads.
During our validation process, the lowest intrahost variant frequency that we could accurately and consistently measure was 3%. Measuring lower than this requires additional input copies, coverage depth, and validation.
Beware of intrahost virus variants that exist within primer binding sites as they can decrease the amplification efficiency of that particular virus haplotype. Because the primer sites are trimmed and are covered by an overlapping amplicon, the variants within the primer sites can be accurately measured. All variants within the amplicon with a primer mismatch, however, can be significantly altered. This is the major limitation with any PCR protocol for virus population diversity analysis.
Use our data pipeline, iVar (intrahost variant analysis from replicates) to process and analyze the data. It will align to the reference (or call a consensus), trim primers, call variants, compare variants between replicates, and flag variants within primer sites.