Long Amplicon Nanopore Sequencing for Dual-Typing RdRp and VP1 Genes of Norovirus Genogroups I and II in Wastewater

George Scott; David Ryder; Mary Buckley; Richard Hill; Samantha Treagus; Tina Stapleton; David Walker; James Lowther; Frederico Batista

Mar 26, 2024

Long Amplicon Nanopore Sequencing for Dual-Typing RdRp and VP1 Genes of Norovirus Genogroups I and II in Wastewater

DOI

dx.doi.org/10.17504/protocols.io.8epv5xpmjg1b/v1

George Scott¹,
David Ryder¹,
Mary Buckley¹,
Richard Hill¹,
Samantha Treagus²,
Tina Stapleton¹,
David Walker¹,
James Lowther¹,
Frederico Batista¹

¹Centre for Environment, Fisheries and Aquaculture Science;
²Faculty of Environment, Science and Economy, University of Exeter

George Scott

Centre for Environment, Fisheries and Aquaculture Science

DOI: dx.doi.org/10.17504/protocols.io.8epv5xpmjg1b/v1

Protocol Citation: George Scott, David Ryder, Mary Buckley, Richard Hill, Samantha Treagus, Tina Stapleton, David Walker, James Lowther, Frederico Batista 2024. Long Amplicon Nanopore Sequencing for Dual-Typing RdRp and VP1 Genes of Norovirus Genogroups I and II in Wastewater. protocols.io https://dx.doi.org/10.17504/protocols.io.8epv5xpmjg1b/v1

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: Working

We use this protocol and it's working

Created: January 17, 2024

Last Modified: March 26, 2024

Protocol Integer ID: 93653

Keywords: Norovirus, Oxford Nanopore Technologies, Long Amplicon, Wastewater, Sequencing

Funders Acknowledgement:

Funded by His Majesty’s Treasury Shared Outcome Fund PATH-SAFE programme

Disclaimer

DISCLAIMER – FOR INFORMATIONAL PURPOSES ONLY; USE AT YOUR OWN RISK

The protocol content here is for informational purposes only and does not constitute legal, medical, clinical, or safety advice, or otherwise; content added to protocols.io is not peer reviewed and may not have undergone a formal approval of any kind. Information presented in this protocol should not substitute for independent professional judgment, advice, diagnosis, or treatment. Any action you take or refrain from taking using or relying upon the information presented here is strictly at your own risk. You agree that neither the Company nor any of the authors, contributors, administrators, or anyone else associated with protocols.io, can be held responsible for your use of the information contained in or linked to this protocol or any of our Sites/Apps and Services.

Abstract

This protocol outlines the procedures for dual-typing (genotyping and polymerase typing) of norovirus genogroups I and II (GI and GII) from wastewater. It assumes that wastewater sample collection, viral concentration and nucleic acid extraction has already been performed. The wastewater samples used during method development were processed using an ammonium sulphate precipitation (150 mL sample) followed by nucleic acid extraction using Kingfisher Flex‱ (Thermo Scientific‱, UK) and NucliSENS‱ reagents (BioMérieux, France).  
Adapting this technique for use with differing viral concentration methods and nucleic acid extraction techniques should still be suitable. The analysis of different sample matrices will also likely be possible following sample processing (up to and including nucleic acid extraction) with methods appropriate for that matrix.
This protocol starts with reverse transcription followed by semi-nested PCR amplifying the RdRp+VP1 region
of norovirus GI and GII independently. GI and GII amplicons are combined into a single library for simultaneous sequencing of the ≈1000 bp PCR products using either the Oxford Nanopore Technologies MinION or GridION. 
Consensus sequences are generated using NGSpeciedID, grouped at 95% then dual-typed giving both genotype and polymerase type (e.g.GI.2[P2]). Reads are aligned to consensus sequences and PCR chimeras are filtered using reference, de novo and manual approaches.

Materials

 



  
ABC
Material
      Manufacturer
      Product code
   
Mag-Bind TotalPure NGS
      OmegaBIO-TEK
      M1378-00
   
Ethanol, Molecular Biology Grade
      Sigma-Aldrich
      1085430250
   
Water, Molecuar Biology Grade
      Sigma-Aldrich
      W4502-1L
   
LunaScript RT SuperMix Kit
      New England Biolabs
      E3010
   
TE buffer, Molecular Biology Grade
      Sigma-Aldrich
      574793
   
Platinum Taq Polymerase
      Invitrogen
      10966018
   
D5000 Reagents, TapeStation
      Agilent
      5067-5589
   
D5000 ScreenTapes, Tapestation
      Agilent
      5067-5588
   
ExoSAP-IT
      Applied Biosystems
      78200
   
1X dsDNA High Sensitivity Kit, Qubit
      Invitrogen
      Q32851
   
Native Barcoding Kit 96 V14
      Oxford Nanopore 
      NBD114.96
   
R10.4.1 Flow Cell
      Oxford Nanopore 
      FLO-MIN114
   
Blunt/TA Ligase Master Mix
      New England Biolabs
      M0367
   
NEBNext Ultra II End repair/dA-tailing Module
      New England Biolabs
      E7546
   
NEBNext Quick Ligation Module
      New England Biolabs
      E6056
   
Bovine Serum Albumin, UltraPure
      Invitrogen
      AM2616
   
dNTP Deoxynucleotide (dNTP) Solution Mix      New England Biolabs   N0447S
 

Before start

This protocol assumes that wastewater sample collection, viral concentration and nucleic acid extraction has already been performed. 

Inhibitor Removal

Note
For best practice, a process control negative and positive sample should be implemented from this point onwards. Norovirus GI and GII positive reference materials can be obtained from
the UK Health Security Agency in the form of Virus LENTICULE‱ Discs. They will need dissolving in 975 µL PBS and then nucleic acid extraction prior to inputting into this process.

Note
The inhibitor removal and reverse transcription procedures in this protocol were adapted from the citation below:

CITATION
Harry T Child, Paul A O'Neill, Karen Moore, Hubert Denise, Matthew Loose, Steve Paterson, Ronny van Aerle, Aaron Jeffries. Wastewater Sequencing using the EasySeq™ RC-PCR SARS CoV-2 (Nimagen) V3.0. protocols.io.LINK
https://protocols.io/view/wastewater-sequencing-using-the-easyseq-rc-pcr-sar-cih2ub8e

Pipette Amount25 µL   of RNA extract into a 96 well PCR plate

Add Amount45 µL   (1.8X) of Mag-Bind® Total Pure NGS beads

Pipette carefully 10x to mix

Leave to stand at room temperature for Duration00:05:00  

If there is liquid on the side of the wells, cover the plate with a PCR adhesive seal and briefly spin down  plate 

Place on magnetic rack for Duration00:03:00  

Remove Amount68 µL   of supernatant

Add Amount80 µL   of Concentration80 % (v/v)   ethanol for Duration00:00:30   (make EtOH fresh every time with nuclease free water)

30s

Remove Amount82 µL   of supernatant being careful to avoid the pellet

Repeat clean once more Go togo to step #9   

Set pipette to Amount20 µL   and remove any residual ethanol

Leave to dry for Duration00:02:00  

Remove from magnetic rack

Add Amount27 µL   of molecular grade water and mix by pipetting 10x

Allow to stand for Duration00:05:00   

Place plate back on magnetic rack for Duration00:03:00  

Recover Amount25 µL   of the supernatant

Reverse Transcription

Add Amount2.5 µL   of LunaScript® RT SuperMix to a 0.2 mL PCR reaction tube placed on ice or on a cool block

Add Amount10 µL   of cleaned nucleic acid extract and mix gently by pipetting 5x

Cover or cap the reaction tubes and briefly spin down

Incubate the combined reaction mixture in a thermocycler with a heated lid (105°C) for:
Temperature25 °C   for Duration00:02:00  
Temperature55 °C   for Duration00:45:00  
Temperature95 °C   for Duration00:01:00  
Hold at Temperature4 °C  

48m

Briefly spin down the reaction tubes

Add Amount7.5 µL   of molecular biology grade water to each sample

Note
The first-strand cDNA can be stored in the fridge at Temperature4 °C    overnight or in the freezer at Temperature-20 °C   if longer term storage is required.

Primer Preparation

Note
Cartridge purified primers should be purchased to prevent loss of the degenerate
oligonucleotides

Reconstitute the lyophilised primers in Table 1 to Concentration100 micromolar (µM)  using TE buffer Ph8 

Table 1. Norovirus GI and GII Primers
ABCD
NameGenogroupF/R
  Sequence (5’-3’)
  
  NV4478m
    GI
    F1
  AARYTVCCHATHAARGTTGGNATG
  NV4562m
    GI
    F2
  GATGCDGAYTAYACRGCHTGGG
  GISKRm
    GI
    R1,2
  CCIACCCAICCATTRTACA
  NV4611 
    GII
    F1
  CWGCAGCMCTDGAAATCATGG
  NV4692
    GII
    F2
  GTGTGRTKGATGTGGGTGACTT
  GIISKR
    GII
    R1,2
  CCRCCNGCATRHCCRTTRTACAT
1) first round primers and 2) semi-nested primers

Dilute primers to a working concentration of Concentration10 micromolar (µM)  in molecular biology grade water and aliquot into suitable volumes to avoid repeated freeze-thaw cycles

First-Round PCR

Note
All PCR cycling conditions used a heated lid at Temperature105 °C   and a ramping speed of 3°C/s. 


Note
Prepare enough mastermix for all of the samples plus the process and PCR positive and negative controls.

Note
GI and GII PCRs are performed independently for each wastewater sample under analysis.

Prepare the mastermixes as indicated in Table 2 and Table 3 for the first round PCRs

Table 2. Forward and reverse primers for the GI and GII first-round PCRs
ABC
  Genogroup
    Forward Primer
    Reverse Primer
  
  GI
    NV4478m
    GISKRm
  
  GII
    NV4611
    GIISKR
  

 Table 3. Mastermix Recipe for all PCRs
ABC
  Reagents
    Final concentration
    Volume (µL)
  
  H20 (molecular biology grade)
    -
    15.15
  
  PCR Buffer, without Mg (10X)
    1
    2.5
  
  dNTP (10 mM)
    0.2 mM
    0.5
  
  MgCl2 (50 mM)
    1.5 mM
    0.75
  
  Forward primer (10 µM)
    0.2 µM
    0.5
  
  Reverse
  primer (10 µM)
    0.2 µM
    0.5
  
  Platinum
  Taq polymerase (10 U/µL)
    1 unit/tube
    0.1
  

Remove all mastermix components from the freezer and place the polymerase on TemperatureOn ice  

Defrost all other components at TemperatureRoom temperature  

Briefly vortex and spin down all components

Add the components as outlined in Table 2 and 3 reserving the polymerase until the end

When adding the polymerase, slowly aspirate and pre-wet pipette tip x3

After dispensing the polymerase rinse the pipette tip 10x

Vortex then briefly spin down the prepared mastermix

Distribute Amount20 µL   of PCR mastermix 0.2 mL reaction tubes

Add Amount5 µL   of cDNA for each sample

AddAmount5 µL   of molecular biology grade water or positive control cDNA for your PCR positive and negative controls

Seal or cap the reaction tubes then spin down ensuring no bubbles are present

Run the reactions in a thermocycler using the conditions in Table 4 and Table 5  for GI and GII

Table 4. GI first-round PCR cycling conditions
ABCD
  Stage
    Temperature (°C)
    Time
    Cycles
  
  Initial denature
    95.0
    60 s
    1
  
  Denature
    95.0
    30 s
  40
  Anneal
    47.4
    30 s
  
  Elongation
    72.0
    30 s
  
  Final elongation
    72.0
    7 min
    1
  
  Hold
    4.0
    
  
 
Table 5. GII first-round PCR cycling conditions
ABCD
  Stage
    Temperature (°C)
    Time
    Cycles
  
  Initial denature
    95.0
    60 s
    1
  
  Denature
    95.0
    30 s
    40
  
  Anneal
    55.7
    30 s
  
  Elongation
    72.0
    30 s
  
  Final elongation
    72.0
    7 min
    1
  
  Hold
    4.0
  
 

Semi-Nested PCR

Note
Prepare enough mastermix for all of the samples and controls from the first-round PCR plus an additional PCR negative control for the semi-nested PCR.

Prepare the mastermixes as indicated in Table 3 and Table 6 following 

Table 6. GI and GII primer pairs for the semi-nested PCR
ABC
  Genogroup
    Forward Primer
    Reverse Primer
  
  GI
    NV4562m
    GISKRm
  
  GII
    NV4692
    GIISKR
  
 

Remove all mastermix components from the freezer and place the polymerase on TemperatureOn ice  

Defrost all other components at TemperatureRoom temperature  

Briefly vortex and spin down all components

Add the components as outlined in Table 2 and 3 reserving the polymerase until the end

When adding the polymerase, slowly aspirate and pre-wet pipette tip x3

After dispensing the polymerase rinse the pipette tip 10x

Vortex then briefly spin down the prepared mastermix

Distribute Amount20 µL   of PCR mastermix into a 96-well plate

Vortex and spin down the plate from the first round PCR

Add Amount5 µL   of first-round PCR product to each of the sample wells

Add Amount5 µL   of molecular biology grade water for your negative control

Seal or cap the plate then spin down ensuring no bubbles are present

Run using the conditions in Table 7 and Table 8 for GI and GII

Table 7. GI semi-nested PCR cycling conditions
ABCD
Stage
  Temperature (°C)TimeCycles
  Initial denature
    95.0
    60 s
    1
  
  Denature
    95.0
    30 s
    40
  
  Anneal
    57.2
    30 s
  
  Elongation
    72.0
    30 s
  
  Final elongation
    72.0
    7 min
    1
  
  Hold
    4.0
  
 
Table 8. GII semi-nested PCR cycling conditions
ABCD
  Stage
  Temperature (°C)Time
  Cycles
  Initial denature
    95.0
    60 s
    1
  
  Denature
    95.0
    30 s
    40
  
  Anneal
    55.7
    30 s
  
  Elongation
    72.0
    30 s
  
  Final elongation
    72.0
    7 min
    1
  
  Hold
    4.0
  
 

Analysis of PCR Controls

Note
This step uses a TapeStation for PCR product analysis, but gel electrophoresis with 2% agarose tris-borate EDTA gel run at 125 V for 35 min with a 100 bp ladder (Promega, USA), can be used.

Remove ScreenTape from packaging, ensure there are no bubbles in any of the electrophoresis lanes, flicking ScreenTape to remove bubbles if neccesary

Place ScreenTape into the TapeStation and allow to come to room temperature Duration00:30:00  

30m

Load Amount10 µL   of D5000 reagents into the 0.2 mL TapeStation reaction tubes

Add Amount1 µL   of ladder into the tube representing position TapeStation position A1

Load Amount1 µL   of your positive and negative PCR and process controls into the 0.2 mL TapeStation reaction tubes

Cap tubes, vortex and then spin down ensuring no bubbles are present

Place tubes into the machine and analyse the samples

Note
All controls should give their expected results and appropriate actions should be taken if any of the negative controls show contamination.

PCR Product Clean-Up

Note
This is performed for both the GI and GII semi-nested PCR products.

Remove ExoSAP-IT from the Temperature-20 °C    freezer and allow to defrost TemperatureOn ice  

Briefly vortex and spin down reaction tubes or plates from the semi-nested PCR

Briefly vortex and spin down ExoSap-IT

Aliquot Amount10 µL   of semi-nested PCR product into a new reaction tube or plate

Add Amount4 µL   of ExoSap-IT to each PCR product

Seal or cap reaction vessel

Incubate with a heated lid (Temperature105 °C  ):
Temperature37 °C   for Duration00:15:00  
Temperature80 °C   for Duration00:15:00  
Hold at Temperature4 °C  

30m

PCR Product Quantification

    
Note
This is performed for both the GI and GII cleaned semi-nested PCR products.

Add Amount198 µL   High-Sensitivity dsDNA Qubit™ reagents to Qubit tubes for each sample being quantified

Load Amount2 µL   of cleaned, semi-nested PCR product onto the side of the tube ensuring residual nucleic acids aren't present on the exterior of the pipette tip

Add Amount190 µL   High-Sensitivity dsDNA Qubit™ reagents to Qubit tubes for your standards

Load Amount10 µL   of each standard

Cap tubes, vortex and briefly spin down

Check for presence of bubbles, flick tubes and spin down again if necessary

Incubate for Duration00:02:00   and then measure with the Qubit™

GI and GII PCR Product Pooling

Note
At the end of this process, for every sample being analysed, you will have a combined 200 fmol of GI and GII PCR products comprised of 85.3 and 114.7 fmol of GI and GII amplicons, respectively. The final volume will be Amount11.5 µL  ; ready for use directly in end-prep for library preparation.


Note
For samples which do not have sufficient PCR yield (200 fmol of combined GI and GII cleaned, semi-nested PCR products) undiluted, cleaned GI and GII PCR products can pooled and input into end-prep. Adjustment of the quantity of end-prepped DNA input into native barcoding can then be made by varying the sample volume from 0.75-3 µl, reducing the volume of water accordingly, to retain 10 fmol of input DNA.

Calculate the volume of water required to dilute Amount8 µL   of the cleaned semi-nested GI and GII PCR products to 14.83 fmol/µL and 19.94 fmol/µL. 

For each sample, convert the GI and GII Qubit derived PCR product concentrations from ng/µL to fmol/µL by inputting into software such as the NEBioCalculator ensuring to enter the amplicon lengths of 1110 bp for GI and 971 bp for GII.

Calculate the GI dilution factor by dividing the fmol/µL of the PCR product by 14.83

Calculate the GII dilution factor by dividing the fmol/µL of the PCR product by 19.94

Calculate the volumes of water required by subtracting 1 from the dilution factor and then multiplying by 8

Vortex and then briefly spin down the cleaned, semi-nested PCR products

Separately aliquot Amount8 µL   of cleaned, semi-nested GI and GII PCR products into new reaction tubes

Add the volumes of water calculated in the previous steps to the GI and GII amplicons

Cap the reaction vessels

Briefly vortex and then spin down

In new reaction tubes, for every sample, combine Amount5.75 µL   of both the diluted GI and GII PCR products for every sample undergoing analysis

Briefly vortex and then spin down

Native Barcoding and Sequencing

Perform end-prep and barcoding following the manufacturer’s instructions for ligation sequencing of amplicons in the Native Barcoding Kit 96 V14 by Oxford Nanopore Technologies

Load 45 fmol of library per flow cell, using an amplicon size of 1 kb for molarity calculation

Run sequencing at 260 bps using super-accurate basecalling until ≈60,000 reads per barcode have been generated

Set Up Software Environment

Install all the required software using mamba:

$ mamba create --name=amplion_analysis_norovirus duplex-tools=0.2.14 cutadapt=3.4 seqtk=1.3 minimap2=2.24 yacrd=1.0.0 kma=1.4.9 seqkit=2.3.0 samtools=1.13 bedtools=2.30.0 cd-hit=4.8.1 pyfastx=0.8.4

$ mamba create --name=NGSpeciesID medaka=1.2.2 bcftools=1.11 python=3.6.10 perl=5.32.0 openblas=0.3.3 spoa=4.0.7 racon=1.4.20 minimap2=2.17 tensorflow=2.4.1

$ mamba activate NGSpeciesID

$ pip install NGSpeciesID==0.1.3

Edit the NGSpeciesID script:

$ nano ~/mambaforge/envs/NGSpeciesID/bin/NGSpeciesID

Change the behaviour of the abundance_cutoff parameter so that it is based on the minimum coverage, rather than the relative abundance of a sequence within the sample:


-    abundance_cutoff = int( args.abundance_ratio * len(read_array))
+    abundance_cutoff = args.abundance_ratio

Edit NGSpeciesID's consensus module:

$ nano ~/mambaforge/envs/NGSpeciesID/lib/python3.6/site-packages/modules/consensus.py

Change the behaviour of NGSpeciesID so it is possible to output consensus sequences from spoa without polishing the consensus sequences using Medaka:


+    polishing = False
     if args.medaka:
         polishing_pattern = os.path.join(args.outfolder, "medaka_cl_id_*")
+        polishing = True
     elif args.racon:
         polishing_pattern = os.path.join(args.outfolder, "racon_cl_id_*")
+        polishing = True

-    for folder in glob.glob(polishing_pattern):
-        shutil.rmtree(folder)
+    if (polishing == True):
+        for folder in glob.glob(polishing_pattern):
+            shutil.rmtree(folder)

     spoa_pattern = os.path.join(args.outfolder, "consensus_reference_*")
     for file in glob.glob(spoa_pattern):

Split and Trim Reads

Create a folder to store results from the analysis.

$ mkdir Analysis_Results

$ cd Analysis_Results

Copy the fastq_pass or pass folder from the sequencing run to the Analysis_Results folder.

Store the name of the barcode and sample to be analysed as a variable:

$ barcodeID=barcode09

$ sampleID=Sample-4_PCR-Norm_PoolingType-1

Activate the correct software environment:

$ mamba activate amplion_analysis_norovirus

Use duplex_tools to split the reads:

$ mkdir  Split_Reads

$ duplex_tools split_on_adapter --threads 12 --allow_multiple_splits \
fastq_pass/${barcodeID} Split_Reads/${barcodeID} Native

Create a text file to store the sequence of different primers being used in the experiment:

$ nano primer_sequences.fasta

Copy the following into the file and then save it


>Long_GI_PCR1
AARYTVCCHATHAARGTTGGNATG...TGTAYAATGGNTGGGTNGG
>Long_GI_PCR2
GATGCDGAYTAYACRGCHTGGG...TGTAYAATGGNTGGGTNGG
>Long_GII_PCR1
CWGCAGCMCTDGAAATCATGG...ATGTAYAAYGGDYATGCNGGYGG
>Long_GII_PCR2
GTGTGRTKGATGTGGGTGACTT...ATGTAYAAYGGDYATGCNGGYGG

Use cutadapt to trim primer sequences from reads:


$ mkdir Trim_Reads

$ cat Split_Reads/${barcodeID}/*_split.fastq.gz > Split_Reads/${barcodeID}/combined.fastq.gz

$ cutadapt -j12 --action=trim -n 1 -e 0.30 -O 12 --revcomp \
-g file:primer_sequences.fasta --discard-untrimmed \
--output Trim_Reads/trimmed-${sampleID}-{name}.fastq.gz \
Split_Reads/${barcodeID}/combined.fastq.gz \
> Trim_Reads/trimmed-${sampleID}.log 2>&1

Update the sampleID and barcodeID variables, and repeat steps described in this section for any other samples in the sequencing library.

Detect Chimeras

Select reads which include primers from the 2nd round of PCR and are longer than 800bp, then randomly sample 90,000 reads for subsequent filtering and chimera removal:


$ cat Trim_Reads/trimmed-${sampleID}-*_G*_PCR2.fastq.gz \
> Trim_Reads/trimmed-${sampleID}-combined-PCR2.fastq.gz

$ seqtk seq -L 800 Trim_Reads/trimmed-${sampleID}-combined-PCR2.fastq.gz |\
seqtk sample - 90000 > Trim_Reads/trimmed-${sampleID}-combined-PCR2-gt800bp.fastq

Use minimap2 to carry out an all-vs-all alignment of reads:


$ mkdir Detect_Chimeric_Reads

$ minimap2 -k19 -Xw19 -e0 -m100 -r100 -I 30M --cap-kalloc=8000m --cap-sw-mem=100m -t 12 \
Trim_Reads/trimmed-${sampleID}-combined-PCR2-gt800bp.fastq \
Trim_Reads/trimmed-${sampleID}-combined-PCR2-gt800bp.fastq \
> Detect_Chimeric_Reads/trimmed-${sampleID}-combined-PCR2.overlap.paf

Use yacrd to identify chimeras and reads with poor support:


$ yacrd -t12 -c 10 -n 0.2 \
-i Detect_Chimeric_Reads/trimmed-${sampleID}-combined-PCR2.overlap.paf \
-o Detect_Chimeric_Reads/trimmed-${sampleID}-combined-PCR2.overlap-gt800bp.report.yacrd 

$ awk '$1=="NotBad"' \
Detect_Chimeric_Reads/trimmed-${sampleID}-combined-PCR2.overlap-gt800bp.report.yacrd |\
cut -f2 \
> Detect_Chimeric_Reads/trimmed-${sampleID}-combined-PCR2.overlap-gt800bp.report.lst

$ seqtk subseq Trim_Reads/trimmed-${sampleID}-combined-PCR2-gt800bp.fastq \
Detect_Chimeric_Reads/trimmed-${sampleID}-combined-PCR2.overlap-gt800bp.report.lst \
> Trim_Reads/trimmed-${sampleID}-combined-PCR2-gt800bp.filtered.fastq

$ rm  Detect_Chimeric_Reads/trimmed-${sampleID}-combined-PCR2.overlap.paf

Update the sampleID and barcodeID variables, and repeat the steps described in this section for any other samples in the sequencing library.

Creating Representative Sequences

Cluster reads to generate consensus sequences:


$ mkdir -p Consensus_Reads/${sampleID}/

$ mamba activate NGSpeciesID

$ NGSpeciesID --t 20 --q 15 --ont --consensus --max_seqs_for_consensus 100000 \
--abundance_ratio 100 --m 1100 --s 200 \
--fastq Trim_Reads/trimmed-${sampleID}-combined-PCR2-gt800bp.filtered.fastq \
--outfolder Consensus_Reads/${sampleID}/ \
> Consensus_Reads/${sampleID}/NGSpeciesID.log 2>&1 \
|| echo "No Assembly"

Align reads against consensus sequences, call variants and identify any poorly supported regions:


$ mamba activate amplion_analysis_norovirus

$ cat Consensus_Reads/${sampleID}/consensus_reference_*.fasta \
> Consensus_Reads/${sampleID}_consensus_seqs.fasta

$ kma index -NI -i Consensus_Reads/${sampleID}_consensus_seqs.fasta \
-o Consensus_Reads/${sampleID}_consensus_seqs

$ mkdir Align_vs_Consensus_Seqs

$ kma -i Trim_Reads/trimmed-${sampleID}-combined-PCR2.fastq.gz \
-o Align_vs_Consensus_Seqs/${sampleID}_consensus_seqs_kma \
-t_db Consensus_Reads/${sampleID}_consensus_seqs \
-vcf 1 -ConClave 2 -bcNano -bc 0.7 -bcd 100 -t 20 -md 100 -1t1 \
-ont -ml 900 -xl 1100 -ef -mrs 0.92 -mrc 0.90 \
> Align_vs_Consensus_Seqs/${sampleID}_consensus_seqs_kma.log 2>&1

Update the sampleID and barcodeID variables, and repeat the last two steps for any other samples in the sequencing library.

Combine consensus sequences from every sample into a single file and rename them:

$ mkdir Cluster_Consensus_Sequences

$ cat Align_vs_Consensus_Seqs/*kma.fsa > Cluster_Consensus_Sequences/combined_seqs.fasta

$ seqtk rename Cluster_Consensus_Sequences/combined_seqs.fasta OTU_ \
> Cluster_Consensus_Sequences/combined_seqs_renamed.fasta

Trim consensus sequences to remove any poorly supported regions identified by kma:


$ seqkit locate --bed -P -r -p '^[agct]+' -p '[agct]+$' \
Cluster_Consensus_Sequences/combined_seqs_renamed.fasta \
> Cluster_Consensus_Sequences/combined_seqs_renamed.bed

$ samtools faidx Cluster_Consensus_Sequences/combined_seqs_renamed.fasta

$ bedtools sort -i Cluster_Consensus_Sequences/combined_seqs_renamed.bed \
-g Cluster_Consensus_Sequences/combined_seqs_renamed.fasta.fai \
> Cluster_Consensus_Sequences/combined_seqs_renamed_sorted.bed

$ bedtools complement -i Cluster_Consensus_Sequences/combined_seqs_renamed_sorted.bed \
-g Cluster_Consensus_Sequences/combined_seqs_renamed.fasta.fai \
> Cluster_Consensus_Sequences/combined_seqs_renamed_sorted_inverse.bed

$ bedtools getfasta -fi Cluster_Consensus_Sequences/combined_seqs_renamed.fasta \
-bed Cluster_Consensus_Sequences/combined_seqs_renamed_sorted_inverse.bed \
-fo Cluster_Consensus_Sequences/combined_seqs_renamed_trimmed.fasta

Cluster consensus sequences at 95 percent sequence identity:


$ cd-hit-est -i Cluster_Consensus_Sequences/combined_seqs_renamed_trimmed.fasta \
  -o Cluster_Consensus_Sequences/combined_seqs_renamed_trimmed_clustered.fasta \
  -G 0 -c 0.95 -n 10 -d 0 -M 100000 -T 80 -g 1 -aL 0.9

Calculating Coverage

Align reads against consensus sequences to calculate coverage:


$ mkdir Align_vs_Single_Set_Consensus_Reads

$ kma index -NI \
-i Cluster_Consensus_Sequences/combined_seqs_renamed_trimmed_clustered.fasta \
-o Cluster_Consensus_Sequences/combined_seqs_renamed_trimmed_clustered

$ kma -i Trim_Reads/trimmed-${sampleID}-combined-PCR2.fastq.gz \
-o Align_vs_Single_Set_Consensus_Reads/${sampleID}_consensus_seqs_kma \
-t_db Cluster_Consensus_Sequences/combined_seqs_renamed_trimmed_clustered \
-nc -vcf 2 -ConClave 2 -bcNano -bc 0.7 -bcd 100 -t 20 -md 100 -1t1 \
-ont -ml 900 -xl 1100 -ef -mrs 0.92 -mrc 0.90 \
> Align_vs_Single_Set_Consensus_Reads/${sampleID}_consensus_seqs_kma.log 2>&1

Use the following files for downstream statistical analysis:

The coverage of each representative sequence in each sample:

Align_vs_Single_Set_Consensus_Reads/*_consensus_seqs_kma.mapstat

The sequence of each representative sequence:

Cluster_Consensus_Sequences/combined_seqs_renamed_trimmed_clustered.fasta

Collating Coverage, Sequencing and Typing Results

Use the CDC's calicivirus typing tool to type each representative sequence:

https://calicivirustypingtool.cdc.gov/

Upload the entire fasta file

Click on 'type it'

Once the input sequences have been typed click on 'Download EXCEL'

Install R (≥ 4.1.2), RStudio (≥ 1.4.1717) and the openxlsx package (≥ 4.2.5.2)

Setup a new project directory using RStudio

Download the coverage files to a sub-folder within the project directory called 'Coverage Results'

Copy the Excel file downloaded from the CDC's calicivirus typing tool website to the project directory

Create a new R script in your RStudio project

Copy the following code into the R script:


# Load library for opening xlsx files 
library(openxlsx)

# List coverage files
coverage_results = 
  list.files(path = 'Coverage Results', pattern = "\\_consensus_seqs_kma.mapstat", 
             recursive = TRUE, include.dirs = FALSE)

# List samples
samples = gsub(x = coverage_results, 
               pattern = "^(.+)_consensus_seqs_kma.mapstat", replacement = "\\1")

# Collate coverage results into a single file
combined_coverage_results = NULL

for (sampleID in 1:length(samples)) {
  print(sampleID)
  coverage_filename = 
    paste0("Coverage Results/", samples[sampleID], "_consensus_seqs_kma.mapstat")
  
  coverage_file = readLines(con = coverage_filename)
  
  coverage_file[[7]] = gsub(x = coverage_file[[7]], pattern = "# ", replacement = "")
  
  noRecords = sum(!grepl(pattern = "##", x = coverage_file,fixed = TRUE)) - 1
  
  if (noRecords > 0) {
    coverage = read.table(file = textConnection(coverage_file), 
                          header = TRUE, stringsAsFactors = FALSE, sep = "\t", skip = 6)
    
    coverage$sample = samples[sampleID]
    
    combined_coverage_results = rbind(combined_coverage_results, coverage)
  }
  
}

# Import file linking reference sequences to genotype
taxa_results_field_names = 
  c('query','name','sequence','length','Genus','Genotype-B-region-score',
    'Genotype-B-region-plot','Genotype','Genotype-C-region-plot',
    'Genotype-C-region-score','Genotype-plot')

combined_taxa_results = 
  read.xlsx(xlsxFile = "calicivirus_typing_output.xlsx", 
            sheet = "sheet1", startRow = 2)

colnames(combined_taxa_results) = taxa_results_field_names

# Merge coverage results with genotype
combined_coverage_results2 = 
  merge(x = combined_coverage_results, 
        y = combined_taxa_results[,c('name','Genotype')], 
        by.x = c('refSequence'), by.y = c('name'), all.x = TRUE)

combined_coverage_results2[
  is.na(combined_coverage_results2$Genotype),
  c('Genotype')
  ] = "Unknown"

# Summarise by genotype
coverage_per_genotype = 
  aggregate(formula = readCount ~ Genotype + refSequence + sample, 
            data = combined_coverage_results2, 
            FUN = sum)

coverage_per_genotype_wide = 
  reshape(data = coverage_per_genotype, 
          v.names = "readCount", 
          idvar = c("Genotype","refSequence"),
          timevar = "sample", direction = "wide")

coverage_per_genotype_wide[is.na(coverage_per_genotype_wide)] = 0

colnames(coverage_per_genotype_wide) = 
  gsub(x = colnames(coverage_per_genotype_wide),
       pattern = "^readCount\\.(.+)$", replacement = "\\1")

# Save results as a csv file
write.csv(x = coverage_per_genotype_wide, 
          file = "coverage_per_genotype_wide.csv", row.names = FALSE)

Run the R script, check for errors and that the output looks correct. 

Read Depth Filtering

Using the coverage_per_genotype_wide.csv file generated above, for GI and GII independently, sum the norovirus-aligned reads for each sample

Find the median of the total aligned reads per sample across the data set

Filter data on a sample-to-sample basis removing reads associated with consensus sequences that have fewer than 0.1% of the median total aligned reads per sample

Reference Database Removal of PCR Chimeras

Note
PCR chimeras should be present at relatively low abundances. It is worth taking note of the No. of reads aligned with any putative chimeras to avoid the removal of novel recombinants without further investigation.

Collate a list of all known types of norovirus GI and GII from the Centre for Disease Control’s (CDC) Human Calicivirus Typing Tool – this should be performed prior the analysis of any new data sets to prevent missing recently observed types

Remove all coverage (coverage_per_genotype_wide.csv) and sequence data (combined_seqs_renamed_trimmed_clustered.fasta) relating to consensus sequences that are types not previously reported by the CDC

Note
This can be done manually for small data sets or using R and tidyverse 2.0.0.

De Novo Removal of PCR Chimeras

For each wastewater sample under analysis, annotate the consensus sequences (combined_seqs_renamed_trimmed_clustered.fasta) with the read depths (coverage_per_genotype_wide.csv)
Note
This can be done manually or with R using seqinr 4.2.3 and tidyverse 2.0.0.

Perform chimera filtering with USEARCH v11 using the following parameters:
-uchime3_denovo -chimeras chimeras.fa

Open chimeras.fa to identify potentially chimeric sequences for that sample

Remove all coverage (coverage_per_genotype_wide.csv) and sequence data (combined_seqs_renamed_trimmed_clustered.fasta) relating to consensus sequences identified as chimeras in chimeras.fa

Manual Screening for PCR Chimeras

For each sample, identify types that may be generated from two other (parent) types in the sample using the data in coverage_per_genotype_wide.csv
Note
E.g. GI.6[P13] may be a chimera of GI.6[P11] and GI.3[P13] if the latter two have a higher number of reads associated with them.

Assess the putative chimeras and parent sequences using a multiple sequence alignment tool such as NCBI Multiple Sequence Alignment Viewer v1.25.0  with the putative chimera sequence anchored 

Check the parent sequences for breakpoints within the terminal or proximal regions of RdRp and VP1 and child-parent sequence similarities ≥95%

Remove all coverage (coverage_per_genotype_wide.csv) and sequence data (combined_seqs_renamed_trimmed_clustered.fasta) relating to consensus sequences identified as  putative chimeras which match these criteria

Note
Your data is now ready for downstream analysis. It should be noted, however, that at present this method is only suitable for qualitative analysis and read-depth/coverage should not be used as a quantitative measurement of the different types detected in a sample. 

Protocol references

Primers:
Ollivier, J. et al. (2022) Application of Next Generation Sequencing on Norovirus- contaminated oyster samples, EFSA Supporting Publications. doi: 10.2903/sp.efsa.2022.en-7348.

Yuen, Catton et al, 2001

Kojima et al, 2002

Citations

Step 1

Harry T Child, Paul A O'Neill, Karen Moore, Hubert Denise, Matthew Loose, Steve Paterson, Ronny van Aerle, Aaron Jeffries. Wastewater Sequencing using the EasySeq™ RC-PCR SARS CoV-2 (Nimagen) V3.0

https://protocols.io/view/wastewater-sequencing-using-the-easyseq-rc-pcr-sar-cih2ub8e

A	B	C
Material	Manufacturer	Product code
Mag-Bind TotalPure NGS	OmegaBIO-TEK	M1378-00
Ethanol, Molecular Biology Grade	Sigma-Aldrich	1085430250
Water, Molecuar Biology Grade	Sigma-Aldrich	W4502-1L
LunaScript RT SuperMix Kit	New England Biolabs	E3010
TE buffer, Molecular Biology Grade	Sigma-Aldrich	574793
Platinum Taq Polymerase	Invitrogen	10966018
D5000 Reagents, TapeStation	Agilent	5067-5589
D5000 ScreenTapes, Tapestation	Agilent	5067-5588
ExoSAP-IT	Applied Biosystems	78200
1X dsDNA High Sensitivity Kit, Qubit	Invitrogen	Q32851
Native Barcoding Kit 96 V14	Oxford Nanopore	NBD114.96
R10.4.1 Flow Cell	Oxford Nanopore	FLO-MIN114
Blunt/TA Ligase Master Mix	New England Biolabs	M0367
NEBNext Ultra II End repair/dA-tailing Module	New England Biolabs	E7546
NEBNext Quick Ligation Module	New England Biolabs	E6056
Bovine Serum Albumin, UltraPure	Invitrogen	AM2616
dNTP Deoxynucleotide (dNTP) Solution Mix	New England Biolabs	N0447S

A	B	C	D
Name	Genogroup	F/R	Sequence (5’-3’)
NV4478m	GI	F1	AARYTVCCHATHAARGTTGGNATG
NV4562m	GI	F2	GATGCDGAYTAYACRGCHTGGG
GISKRm	GI	R1,2	CCIACCCAICCATTRTACA
NV4611	GII	F1	CWGCAGCMCTDGAAATCATGG
NV4692	GII	F2	GTGTGRTKGATGTGGGTGACTT
GIISKR	GII	R1,2	CCRCCNGCATRHCCRTTRTACAT

A	B	C
Genogroup	Forward Primer	Reverse Primer
GI	NV4478m	GISKRm
GII	NV4611	GIISKR

A	B	C
Reagents	Final concentration	Volume (µL)
H20 (molecular biology grade)	-	15.15
PCR Buffer, without Mg (10X)	1	2.5
dNTP (10 mM)	0.2 mM	0.5
MgCl2 (50 mM)	1.5 mM	0.75
Forward primer (10 µM)	0.2 µM	0.5
Reverse primer (10 µM)	0.2 µM	0.5
Platinum Taq polymerase (10 U/µL)	1 unit/tube	0.1

A	B	C	D
Stage	Temperature (°C)	Time	Cycles
Initial denature	95.0	60 s	1
Denature	95.0	30 s	40
Anneal	47.4	30 s
Elongation	72.0	30 s
Final elongation	72.0	7 min	1
Hold	4.0

Public workspaceLong Amplicon Nanopore Sequencing for Dual-Typing RdRp and VP1 Genes of Norovirus Genogroups I and II in Wastewater

Long Amplicon Nanopore Sequencing for Dual-Typing RdRp and VP1 Genes of Norovirus Genogroups I and II in Wastewater