Jan 04, 2024

Public workspaceNorovirus genotyping and phylogeny analysis_ViroTrakr workflow 1_v.1

  • 1Office of Applied Research and Safety Assessment, Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration,
Open access
Protocol CitationJayanthi Gangiredla, Mark Mammel, Zhihui Yang 2024. Norovirus genotyping and phylogeny analysis_ViroTrakr workflow 1_v.1. protocols.io https://dx.doi.org/10.17504/protocols.io.261ged1eov47/v1
License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: January 04, 2024
Last Modified: January 04, 2024
Protocol Integer ID: 92954
Disclaimer
Please note that this protocol is public domain, which supersedes the CC-BY license default used by protocols.io.
Abstract
This workflow provides step-by-step instructions for norovirus analysis within the GalaxyTrakr platform. It includes the quality assessment for raw sequencing data (from most next-generation sequencing platforms), drafting de novo assemblies, and reporting the sequence genotype and phylogenetic results. This workflow was designed for norovirus, which is one of the major targets of our ViroTrakr database.

This protocol covers how to:
Set up an account in Galaxy Trakr;
Create a new history/workspace for a new submission;
Upload raw data obtained from local folders or download from NCBI;
Execute the ViroTrakr workflow 1 (norovirus);
Interpret the results.


ViroTrakr:


Reference: Quality control assessment for microbial genomes: GalaxyTrakr MicroRunQC workflow V.5:


Log into your GalaxyTrakr account.
Create a GalaxyTrakr account if you are the first-time user:



Log into your GalaxyTrakr account if you already have one:



Get familiar with Galaxy components: Tools, Menu and History.






Create a new history.





Upload data.
The raw sequencing data in fastq files can be imported into GalaxyTrakr directly from your local folder (instructions shown in 3.1); or downloaded from SRA (instructions shown as in 3.2) if the files have been already submitted to ViroTrakr in NCBI (Submission protocol: NCBI submission protocol for foodborne virus surveillance (protocols.io)). After being uploaded to GalaxyTrakr, the files will remain in your account until they are deleted.
Upload raw data from local folder.
3.1.1. Click on the button “Upload Data”, then “Choose local files”.



3.1.2. Select fastq files from your local folder.



  3.1.3. Select the files and click “Start” to upload.



3.1.4. Check the status of data upload.



3.1.5. Build a list of Dataset Pairs (pairing the forward and reverse files into their respective samples for batch analysis (Follow steps1, 2 and 3).



3.1.6. Create a collection of paired datasets.


3.1.6. Create a collection of paired datasets - Cont.



3.1.7. Data collection will be created in history.



3.1.8. Import the reference data files from Shared Data folder following the steps 1-3 as shown below.



3.1.9. Select all files from the folder and export them as Datasets to your current history following the steps 1-3 as shown below.



3.1.10. Click on WorkFlow tab from main menu, select and run the Norovirus_ Genotyping_Phylogeny workflow.



3.1.11. Select all the appropriate files from each dropdown menu and run workflow.



3.1.12. Once the workflow run is successful (Green status), results will appear in the history.



3.1.13. Select and view the result files in the middle panel.



3.1.14. Download the result files to your local folder.



Result files include:
•Assembly with MEGAHIT: Metagenomic assemblies
•Report: Kraken2: Kraken2 reports
•Report_blasthits_Genotype: Reporting Best BLAST Hits against reference sequences
•Noro_genotyping_report: Final report contains QC stats and genotyping results
•Norovirus_contigs: Norovirus specific contigs extracted from metagenomics assembly
•Reference_query_phylogenetic_tree: Phylogenetic tree represents the input genomes along with the reference genomes from all groups .png format and .txt format.
•Group4_phylogenetic_tree:Phylogenetic tree represents the input genomes along with the GroupII.4 reference genomes from all groups .png format and .txt format.
Download data from SRA database (if you submit your raw data to ViroTrakr before GalaxyTrakr analysis).
3.2.1. SRA database link to norovirus sequence files for bioprojectPRJNA490509:


3.2.2. From SRA database, (1) select SRR samples of your choice then (2) click on accession list.



3.2.3. The text files will be downloaded with SRR accessions in a format as SRR_Acc_List.txt:
For example:
SRR26123209
SRR26123210
SRR26123211
SRR26123212
SRR26123214
SRR26123215
SRR26123216
SRR26123217
SRR26123218


3.2.4. To “Upload” the data, select the SRR_Acc_list.txt and click the “Start” button:



3.2.5. Once the download is completed, the text file will be added in history.



3.2.6. From “Get Data” on the left menu, select “Faster Download and Extract reads in FASTQ from NCBI SRA”, and select option by clicking on drop down menu “List of SRA accessions, one per line”.




3.2.6. (cont.) Select the file SRR_Acc_List.txt and Click the Execute” button.



3.2.7. Data files will be downloading from NCBI SRA database. (Download time varies, depending on the number of files downloading and the NCBI server status).



3.2.8. Follow the steps from 3.1.8. to 3.1.14 to run the workflow and collect the results.