NCBI submission protocol for foodborne virus surveillance

Zhihui Yang; Ruth Timme; Maria Balkey; Zhihui Yang

Jun 08, 2022

Version 1

NCBI submission protocol for foodborne virus surveillance V.1

DOI

dx.doi.org/10.17504/protocols.io.j8nlkkdbxl5r/v1

Zhihui Yang¹,
Ruth Timme²,
Maria Balkey²,
Zhihui Yang³

¹FDA/CFSAN/OARSA/DMB;
²CFSAN/ORS/DM/MMSB;
³CFSAN/OARSA/DMB

ViroTrakr

Zhihui Yang

FDA

DOI: https://dx.doi.org/10.17504/protocols.io.j8nlkkdbxl5r/v1

External link: http://ViroTrakr: foodborne viruses (ID 396739) - BioProject - NCBI (nih.gov)

Protocol Citation: Zhihui Yang, Ruth Timme, Maria Balkey, Zhihui Yang 2022. NCBI submission protocol for foodborne virus surveillance. protocols.io https://dx.doi.org/10.17504/protocols.io.j8nlkkdbxl5r/v1

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: Working

We use this protocol and it’s working

Created: April 08, 2022

Last Modified: June 08, 2022

Protocol Integer ID: 60501

Keywords: ViroTrakr, Foodborne virus surveillance, NCBI submission protocol, ncbi submission protocol for microbial pathogen surveillance, submission protocol for foodborne virus surveillance introduction, foodborne virus surveillance introduction, foodborne virus, bacterial pathogen surveillance in multiple country, bacterial pathogen surveillance, microbial pathogen surveillance, virotrakr contributor, international genomic reference database, virotrakr, pathogen, norovirus, new bioproject at ncbi, foodborne illness, data to ncbi, raw sequence data, genometrakr, sapovirus, genomic database, raw sequence data to the sra database, data to genbank, time reference sequences for phylogenetic analysis, ncbi submission protocol, ncbi submission, new bioproject, ncbi, phylogenetic analysis, biosample number

Abstract

INTRODUCTION:
This protocol outlines the steps which ViroTrakr contributors need to follow in order to submit their data to NCBI. It includes how to: 
- establish your new BioProject at NCBI;
- link it to ViroTrakr;
- create BioSample numbers for your submission and submit raw sequence data to the SRA database;
- submit assembled data to GenBank and link them to ViroTrakr (optional).

ViroTrakr, a genomic database initiated by CFSAN and housed in NCBI, aims to (1) cover sequences of a wide range of foodborne viruses (e.g., norovirus, hepatitis A virus, sapovirus, etc.) from clinical, food and/or environmental specimens and (2) provide real-time reference sequences for phylogenetic analysis and epidemiologic studies linked to foodborne illnesses. Pipelines are under development with which ViroTrakr will eventually integrate to an international genomic reference database, GenomeTrakr, which has been successfully employed for bacterial pathogen surveillance in multiple countries. 

ViroTrakr: foodborne viruses (ID 396739) - BioProject - NCBI (nih.gov)

GenomeTrakr: Multispecies (ID 593772) - BioProject - NCBI (nih.gov)



Reference: NCBI submission protocol for microbial pathogen surveillance V.5:
NCBI submission protocol for microbial pathogen surveillance (protocols.io)

Troubleshooting

ViroTrakr data structure

ViroTrakr database structure: the ViroTrakr database was established as an umbrella BioProject at NCBI with the structure shown below:
Note: The steps involved in your ViroTrakr submission are highlighted in green;
One data level BioProjectper lab or per collaboration project.
 
Database structure: (cont.) for each data level BioProject:

NCBI sign in

Getting started

Please refer to “NCBI submission protocol for microbial pathogen surveillance” Section 1
NCBI submission protocol for microbial pathogen surveillance (protocols.io) for details.

For new users, directly create an account using one of the 3rd party sign-in options:
Sign up / NCBI (nih.gov)

For existing users, sign in to our NCBI account: NCBI Sign In Page (nih.gov)

Note for existing users: NCBI-managed credentials are the username and password you set at NCBI — these will be retired in June 2022, and access to any My NCBI accounts without a linked 3rd-party login will require going through an access recovery process. Federated account credentials are those set through eRA Commons, Google, or a university or institutional point of access. Your NCBI Accounts and the contents of your current account will not change. You simply need to login a different way: through a third party option. See the FAQ for more information: (https://ncbiinsights.ncbi.nlm.nih.gov/ncbi-login-retirement-faqs/ ).
 For existing users, use the steps below to link 3rd party logins  to your account:

1. Sign in directly to NCBI with your username and password.

2. Click your username, which is located on the top right of the browser page.

3. Click “Change” in the “Linked Account” portal.

4. Locate the 3rd party account of your choice using the search bar.

5. You will be transferred to the 3rd party’s sign in page. Enter your credentials there for the 3rd party account.

You may group, organize and manage your NCBI submission environment for your lab: please refer to “NCBI submission protocol for microbial pathogen surveillance” Section 1
NCBI submission protocol for microbial pathogen surveillance (protocols.io) for details.

Log into your NCBI account and you are ready for your NCBI submission.

Creating BioProjects at NCBI

Establish your new data level BioProject under the umbrella BioProject ViroTrakr:

Please refer to “NCBI submission protocol for microbial pathogen surveillance” NCBI submission protocol for microbial pathogen surveillance (protocols.io) step 3 for details.

Log into your NCBI account at Submissions | BioProject | Submission Portal (nih.gov):

Establish a new BioProject by clicking “New submission”:

There are seven tabs under each BioProject submission.

Populate “Submitter” tab: (a submission group is highly recommended* for your laboratory)

*Note: to establish and use a user group for all your submission related to microbial genome surveillance is highly recommended. The reasons are, as mentioned in NCBI submission protocol for microbial pathogen surveillance (protocols.io):

“- it will link your laboratory's NCBI data ownership to the user group and not to individuals, allowing anyone in the current group to perform updates or retractions and answer inquiries from the NCBI staff, even if there's been a complete turnover of staff since the original data submission. 

- it also ensures consistent data ownership across BioProjects, BioSamples, and sequence data. If your laboratory has non-overlapping research groups submitting and managing data at NCBI, multiple user groups can be established to track these efforts separately.” 

You may use a submission group which has been already established by your laboratory. You may check the “Group” tab in the submission portal, https://submit.ncbi.nlm.nih.gov/groups/for this information. Ask your colleagues to do the same thing, to ensure your laboratory doesn't already have one in place. 

If your laboratory doesn’t have one proper submission group ready, please refer to NCBI submission protocol for microbial pathogen surveillance (protocols.io)section 1.2 and 1.3 for the details on:
- how to request and create a new user group by emailing to NCBI help staff at submit-help@ncbi.nih.gov
- how to manage your NCBI submission user group by clicking in the “Group” tab of the submission portal https://submit.ncbi.nlm.nih.gov/groups/

You may contact NCBI by emailing to submit-help@ncbi.nih.govif you have any further question regarding submission group and need additional help.

 
Populate “Project type” tab (e.g. Raw sequence reads):
* Required fields are marked with * asterisk.

Populate “Target” tab: move cursor to the question marks for description of each item.
Required fields are marked with * asterisk; fields without * asterisk could be left blank.

Note: choose the most descriptive and valid organism name for your study. For example, “Norwalk virus” instead of “norovirus”, “Homo sapiens” instead of “human” should be used. See Organism information - BioSample - NCBI (nih.gov)and Home - Taxonomy - NCBI (nih.gov)for more information about providing a valid organism name. 

Populate “General Info” tab:
•Choose “release immediately following processing” or a specified date to release your submission;
•Provide a description (e.g., Norwalk virus sequencing) of the study goals and relevance (e.g., NGS of clinical samples as part of norovirus surveillance) under “Public description”;
•Choose a “Relevance” from the provided options;
•Click “Yes” to question “Is your project part of a larger initiative which is already registered with NCBI?”:
- enter BioProject accession number PRJNA433975 if norovirus sequence data;
- enter BioProject accession number PRJNA433976 if hepatitis A virus sequence data;
- enter BioProject accession number PRJNA433977 if sapovirus sequence data;
- enter BioProject accession number PRJNA817226 if rotavirus sequence data;
- enter BioProject accession number PRJNA817227 if astrovirus sequence data;
- enter BioProject accession number PRJNA817228 if hepatitis E virus sequence data. 

•You may leave other fields blank. 
 

 

Leave "BioSample" tab bland, it will be created from a different submission portal.

Leave "Publications" tab blank or add relevant publications from your group.

Check  your input in “Review and Submit” tab, you may edit if needed or click “submit” to complete your submission.

The BioProject accession number “PRJNAxxxxxx” will be available within a few minutes on “my submission” page. Meanwhile, you will receive an NCBI email containing these accession numbers, usually within 12 hours. 

Creating BioSamples at NCBI

Submission of reads to SRA for sequence data and associated sequence metadata to BioSample for sample metadata in a single step.

Please refer to “NCBI submission protocol for microbial pathogen surveillance” NCBI submission protocol for microbial pathogen surveillance (protocols.io) step 2 for details.

Log into your NCBI account at Submissions | Sequence Read Archive (SRA) | Submission Portal (nih.gov);
Establish a SRA submission by clicking “New Submission”:

There are initially five tabs under each SRA submission.

Populate “Submitter” tab: (a submission group is highly recommended for your laboratory, see details in step 2.3)

Populate “General info” tab:

•Click “Yes” Under BioProject, put in the BioProject accession number established in step 2.

•Click “No” Under BioSample, to indicate you do not have an existing BioSampleto associate with this sequence data and will create the BioSampleon one of the next steps.

•* Click “Release immediately following processing” or specify a date to release if preferred.

Note: this is important for your first submission especially for data from clinical samples. To protect subject privacy, removal of any human genomics reads from your raw sequencing data could be done with the automated human-read scrubbing tool available in NCBI https://github.com/ncbi/sra-human-scrubber. To do so, along with the first data submission, a flag can be set for that BioProject indicating this first data submission and subsequent data submissions for that BioProject will get automatically scrubbed. Specifically, choose “Release on specified date” on your first submission, you may enter a date one week in the future (or longer and you are able to change the date later), meanwhile send the following email to sra@ncbi.nlm.nih.govas soon as possible:

“Hi SRA help desk, 
Please add the human read scrubbing analysis flag to my BioProjectPRJNAXXXXXX, then release my HUPed(delayed release) SRA submissions.
Thanks,
Your name”

Once the flag is set for that BioProject, you may click “Release immediately following processing” for subsequent data submissions. 

•Click “Continue” to next page.

Populate “BioSampleType” tab:

Preview BioSampleTypes and Attributers on the template page, and select the package that best describes your samples (e.g., select “Pathogen” and “Pathogen: clinical or host-associated” type):

* Note: two additional tabs “BioSample type” and “BioSamle attributes” were added to collect information for BioSample creation.

Click “Continue” at the bottom of “Sample type” tab:

Populate “BioSample attributes” tab:

•Click “Upload a file using Excel or text format (tab-delimited) that includes the attributes for each of your BioSamples.

•Click “Download Excel” button under “Attributes file”.

•Fill out the downloaded BioSample attributes sheet in excel file, save it in a local folder.

•Click “Choose file” button under “Attributes file”, then upload the populated and saved attributes sheet.

•At bottom of the “Attributes” tab, click “Continue”:

Populate “SRA metadata” tab:

•Click “Upload a file using Excel or text format (tab-delimited)”.

•Click “Download Excel spreadsheet” under “Metadata file”.

•Read instructions under first tab (Contact Info and Instructions) on how to fill up the spreadsheet (next page), fill out the second tab (SRA_data) and save it as a TSV (tab-delimited) file in a local folder.

•Click “Choose file” button under “Metadata file”, then upload the populated spreadsheet that saved in your local folder.

•At the bottom of the “SRA metadata” tab, click “Continue”:

4.7.1: The downloaded metadata excel spreadsheet:
Note: You must save the spreadsheet under the second tab (SRA_data) as a TSV (tab-delimited file) to upload the TSV file for the SRA metadata tab.
4.7.2: Example: fill out the metadata excel spreadsheet:

4.7.2: Example: fill out the metadata excel spreadsheet (Cont.):
           Get the file name of each fastq file and fill out the column:

Populate “Files” tab:

•Click “FTP or Aspera Command Line file preload”.

•Click “FTP upload Instructions”.

4.8.1:Read and follow “FTP upload Instructions”. Select a proper FTP tool (e.g., FileZilla) to upload your data:

4.8.2: Open FileZilla

• Copy and paste “Host, Username, Password” to establish FTP connection.

• Port: default for FTP is 21; default for SFTP is 22. Click “Quick connect”.

• Copy and paste your directory name “uploads/. …”.

• Create a subfolder (required!) with a meaningful name. 

• Start upload your sequence data from your local folder to the created subfolder.

4.8.3: when upload is completed, return to SRA submission page and click “Select preload folder”; 

4.8.4: (note: it takes at least 10 minutes for uploaded files to become available) click “continue” to upload:

Please review your submission, make necessary changes on any tab, then click the “Submit” button:

The SRA accession number “SRRxxxxxxxx” will be available within a few minutes on “my submission” page. You may download the “metadata file with SRA accessions” for your record. Meanwhile, you will receive an NCBI email containing these accession numbers, usually within 12 hours. 

Submission of raw sequencing data to NCBI Sequencing Read Archive (SRA)

Submission of the assembled data to GenBank. 

Raw sequencing data is required as input for ViroTrakr database deposit and subsequent data analysis.  Assemblies or consensus sequences are part of the data analysis in our workflow, its submission to GenBank is optional but encouraged. The GenBank submission of your assembled sequences will:

•“Make your sequence data available in the International Nucleotide Sequence Database Collaboration (INSDC) for global use;
•Ensure your data contribution is included in NCBI Virus, BLAST, RefSeq and other resources;
•Follow FAIR data-sharing principles.”

Reference: 

Note: this GenBank submission requires and assumes that you already have established a BioProject and BioSample(s) from step 1 and step 2 of this protocol.

Log into your NCBI account at Submissions  GenBank | Submission Portal (nih.gov);
Establish a GenBank submission by clicking “New Submission”:
 

Note: Submission of norovirus assemblies can be directly made at the submission portal, all other submission types could use one of the alternate submission tools (such as BankIt, tbl2asn) with similar submission steps. 

There are nine tabs under each GenBank submission. 

Populate “Submitter” tab:

Populate “Submitter” tab: (a submission group is highly recommended for your laboratory, see details in step 2.3)

Populate “Sequencing technology” tab:
•Choose the method used to obtain these sequences;
•Click “Assembled sequences”;
•Fill in the Assembly information (Assembly program and version/date);
•Click “Continue” to next page.

Populate “Sequences” tab: 

•Click “Release immediately following processing” or specify a data to release if preferred.

•Upload a prepared nucleotide FASTA file by clicking “Choose file”.

•Click “Continue” to next page.

Notes:

•Organize your sequence files by type or locus and make one submission for each type.

•Plain text (.txt) nucleotide FASTA files are accepted.

•Use a text editor (for example: Notepad or WordPad) to prepare a file containing the set of nucleotide sequences in FASTA format and save the file as plain text or text.

•You may use the strain, isolate, specimen-voucher, or clone IDs as the sequence_IDsin your FASTA file. If you do this, do not include extra information in the sequence_IDsuch as the organismname, etc.

For more information on how to format and organize FASTA files, please see  FASTA file help.

Populate “Source Info” tab:

•You may find more information on the question “Do your sequences IDs represent one of these?” by clicking “description of these fields”. 

•Click “None of these” if your sequence IDs don’t contain information as described.

•Click “Continue” to next page.

Populate “Source modifiers” tab:

•Click the “Upload a tab-delimited table (template file provided)” button. 

 4.7:Populate “Source modifiers” tab: (cont.) 

* Notes:

•GenBank source modifier template:
Below is a a custom version containing direct linkage to the respective BioSample and BioProject records. Populate the template as guided and save it in a txt format. 

GenBank submission modifers_ViroTrakr.xlsx  

•Click “Choose file” to upload your saved source modifier file.

•Click “Continue” to next page.

Technical Assistance

Technical Assistance: 

If you are having trouble finalizing your submission, contact the relevant NCBI database for assistance and include your submission ID in the email subject (SUB#######):

BioProject (for any BioProject issues): bioprojecthelp@ncbi.nlm.nih.gov

BioSample (for issues on source metadata): biosamplehelp@ncbi.nlm.nih.gov

SRA (for issues on raw sequencing data): sra@ncbi.nlm.nih.gov

GenBank (for issues on assembled sequences): gb-admin@ncbi.nlm.nih.gov

GenomeTrakr:genomeTrakr@fda.hhs.gov

ViroTrakr:ViroTrakr@fda.hhs.gov

NCBI help desk and account issues:info@ncbi.nlm.nih.gov

Public workspaceNCBI submission protocol for foodborne virus surveillance V.1

NCBI submission protocol for foodborne virus surveillance V.1