Feb 14, 2024

Public workspacePopulating NCBI template for submissions using BioNumerics

  • 1US Food and Drug Administration;
  • 2New York State Department of Agriculture & Markets
Open access
Protocol CitationRuth Timme, Maria Balkey, Julie Haendiges, Brian Sauders, Tina Lusk Pfefer 2024. Populating NCBI template for submissions using BioNumerics . protocols.io https://dx.doi.org/10.17504/protocols.io.3byl4qn4ovo5/v1
License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: February 09, 2024
Last Modified: February 14, 2024
Protocol Integer ID: 94941
Keywords: NCBI submission, BioNumerics, biosample, SRA, metadata, bioproject,
Abstract
PURPOSE: to define the standard operating procedure for collecting isolate metadata using BioNumerics for submission of food/environmental isolates to NCBI.

SCOPE: to provide a standardized procedure to collect isolate metadata using BioNumerics for submission of food/environmental isolates to NCBI.

RESPONSIBILITIES- SOP Responsible Officials: Ruth Timme, Maria Balkey

The GenomeTrakr Network Management will be responsible to monitor GenomeTrakr submissions processed through Bionumerics and ensure that all GT labs are familiar with the mandatory metadata fields required for submission of GenomeTrakr sequencing records to NCBI.

V3: Added dropdown menus from controlled vocabulary to sequenced by and project name to metadata template PulseNet_Bionumerics_Isolate_Metadata
V4: Changes in metadata template PulseNet_Bionumerics_Isolate_Metadata.
- Added dropdown menus from controlled vocabulary to collected_by , SourceCountryState
- Added fields: collected by, isolation source
- Added mapping table of attribute names.
- Remove requirement to send biosample update to NCBI to make changes on sequenced by and project name.


Metadata SampleSheet preparation

Before uploading your sequencing run or linking NCBI sequencing records at the BioNumerics platform make sure to fill out the metadata spreadsheet form.

Please download the template and guidelines included in the file
Download PulseNet_Bionumerics_Isolate_Metadata.xlsxPulseNet_Bionumerics_Isolate_Metadata.xlsx64KB

Create the fields NCBI_bioproject, Attribute_package, Organism_name, NCBI_LabID, Collected by, SourceCountryState, Latitude_longitude, ProjectName, SequencedBy, Isolation source if they are not in the BioNumerics interface.

Once you have filled out the template information, save the template sheet as .csv and import the metadata to BioNumerics.

The metadata fields created in Bionumerics will map metadata fields at NCBI. Table 1 describes each of the fields submitted to NCBI along mapping against name of the fields in Bionumerics templates.
ABCD
Field Name at BioNumerics NCBI Submission PromptField Name at NCBIField Name in BioNumerics Submission Metadata TemplateDescription
BioProject accessionBioProjectNCBI bioprojectThe accession number of the BioProject(s) to which the BioSample belongs (PRJNAxxxxxx).  **Double check that you are submitting to the correct BioProject (the organism name must match the one designated for your BioProject). For species that fall outside of NCBI pathogen detection, we recommend establishing a separate multi-species "research" bioproject for publishing data outside of the structured Pathogen Detection surveillance effort.
Attribute packageattribute_packageattribute_packageThis field provides the pathogen type (or “isolation type”). Allowed values are “Pathogen.cl” (for human clinical pathogens) or “Pathogen.env” (for environmental, food, or animal clinical isolates). The value provided in this field drives validation of other fields and cannot be left blank.
Strain namestrainKeyThis is the authoritative ID used for foodborne pathogen genomic epidemiology and within NCBI Pathogen Detection. Although the strain ID can have any format, we suggest that it be unique, concise, and consistent within your laboratory (e.g. CFSAN123456). 
SerovarserovarSerovarThe organism serovar/serotype name should include the most descriptive information you have at time of submission, adhering to proper nomenclature in NCBI taxonomy database: https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi. Check spelling carefully!
Isolate name aliasisolate name aliasisolate_name_aliasOther IDs associated with this isolate. Separate with ';' if more than one
Project nameproject nameProjectNameName of the project within which the sequencing was organized
Collected bycollected bycollected byFull name of laboratory or agency that collected the sample or has taken over curation of the physical isolate. The name should be written out in full, (with minor exceptions) and be consistent across multiple submissions.  Example: Washington State Department of Health.
Collection / Isolate datecollection dateIsolateDateDate on which the sample was collected. Populate using ISO 8601 standard: “YYYY-mm-dd”, “YYYY-mm” or “YYYY” (e.g., 1990–10–30, 1990–10, or 1990).  Including the month or month/day of collection is extremely valuable for accessing seasonality in the database.
Geographical origingeographic locationSourceCountryStatePopulate the geographic origin of the food product. Include the country name if imported, or the "Country: state/territory/province" if domestic. Include multiple locations if necessary, delimited by semi colon.
Geographical coordinateslatitute and longitudelat_longThe geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format "d[d.dddd] N|S d[dd.dddd] W|E", eg, 38.98 N 77.11 W.  If information is unavailable for any mandatory field, please enter 'not collected',  'not applicable' or 'missing' as appropriate.
Isolation sourceisolation sourceisolation sourceFree text, short description of sample source. Avoid generic terms such as patient, sample, food, surface, clinical, product, source, or environment.  Example: bagged romaine lettuce.
HosthosthostFor human, animal, and plant hosts, include the full taxonomic name of the host when available, "Homo sapiens" or "Bos Taurus". Animal livestock terms are also acceptable entries, e.g. porcine, bovine, equine, etc.
Host diseasehost diseasehost_diseaseName of relevant disease, e.g. Salmonella gastroenteritis. Choose an ontological term from https://bioportal.bioontology.org/ontologies/DOID or https://www.ncbi.nlm.nih.gov/mesh.  Attribute is mandatory for Pathogen.cl isolates (human clinical isolates) or include "missing" if unkown. Leave blank if not relevant.
Sequenced bysequenced bySequencedByThe name of the agency that generated the sequence, e.g., Centers for Disease Control and Prevention
Source name/ typesource typeSourceTypeControlled vocabulary describing the isolation_source. Choose the best fit term: Human, Animal, Food, Environmental, Other.
Organism nameOrganismorganismThe organism name should include the most descriptive information you have at time of submission, adhering to proper nomenclature in NCBI taxonomy database: https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi. Check spelling carefully! Levels of valid organism names are as follows: Genus species: Salmonella enterica Listeria monocytogenes Genus species and subspecies: Salmonella enterica subsp. enterica Determined serotype or serovar (trad or WGS-based): Escherichia coli O104:H7 Salmonella enterica subsp. enterica serovar Agnoa Salmonella enterica subsp. diarizonae serovar 16:z10:e,n,x,z15 Listeria monocytogenes serotype 1/2a If NCBI doesn’t have the desired organism name, enter the name determined by your laboratory. After submission, a “taxonomy consult” will take place to evaluate the new name. Sometimes the organism name is changed to a canonical serovar name and the submission proceeds. It is also possible that the serovar is a novel one not currently in the NCBI database and the Taxonomy team will work with the submitter to get the new name added to the database.
Table 1: Metadata attributes for GenomeTrakr

NCBI Submission Settings (Manage submission template)

Create the NCBI metadata template in BioNumerics following PulseNet instructions making sure fields are populated according to GT requirements which are described in the following steps.
BioProject and Organization: GenomeTrakr labs by submitting independently become owners of their data and are responsible for managing individual bioprojects for each sequenced organism. The term 'field content ' denotes that the template value e.g. BioProject accession is mapping to the field in BioNumerics e.g. NCBI_bioproject.


Fig 1. NCBI Submission Template: BioProject and Organization




Laboratories will be submitting to specific bioprojects for lab/organisms. Find the organism/lab specific bioproject under each of the GenomeTrakr umbrella bioprojects included at https://www.ncbi.nlm.nih.gov/bioproject/593772

Make sure to submit to your lab bioproject. Please don't submit to umbrella bioprojects.
BioSample: Metadata associate to the isolate might require the creation of new fields in BioNumerics. The term 'field content ' denotes that the template value e.g. Organism name is mapping to the field in BioNumerics e.g. OrganismName. The template values might map to default values e.g. Pathogen: environmental/food/other; version 1.0. Make sure to include the metadata associated to the isolates in the mandatory fields such as: Submitter Provided Unique ID, BioSample accession (output), Organism name, Title, Attribute package, Strain name, Isolate name alias and Project name. Isolate name alias is a mandatory field for GenomeTrakr submissions. Provide serovar when available.

Fig 2. NCBI Submission Template:BioSample



BioSample: Make sure to include the metadata associated to the isolates in the mandatory fields such as: Collected by, Collection / Isolate date, Collection / Isolate date format, Title, Geographical origin, Isolate source, Sequenced by and Source name/type. Isolate name alias is a mandatory field for GenomeTrakr submissions. Provide Geographical coordinates when available. For human, animal, and plant hosts, include the full taxonomic name of the host when available, "Homo sapiens" or "Bos Taurus". Animal livestock terms are also acceptable entries, e.g. porcine, bovine, equine, etc.


Fig 3. NCBI Submission Template:BioSample

NCBI submission settings – SRA Experiment and Run

Populate fields for SRA Experiment and Run according to PulseNet instructions.

Fig 4. NCBI Submission Template forBioNumerics, SRA Experiment and run: Make sure to map collection attributes to the corresponding fields.



NCBI submission settings – Submission Template

Save submission template according to PulseNet Instructions as -GenomeTrakr-Template-.
Import data
Import the GenomeTrakr Metadata form for BioNumerics according to PulseNet Instructions.
When importing rules, the field source should match destination fields.
In the importing links section, choose the -key- for linking records to database entries.
Proceed with sequencing data import according to PulseNet Instructions.
Submit data to NCBI according to PulseNet Instructions. If NCBI accessions are not available at BioNumerics in 1 business day, please contact NCBI and PulseNet to troubleshoot issues with submissions.
Contact GenomeTrakr by email genometrakr@fda.hhs.gov if issues with submissions are delayed for more than 3 days. GenomeTrakr can support urgent submissions if needed.