Dec 12, 2023

Public workspaceORFanID Web-based Search Engine to Identify Orphan and Taxonomically Restricted Genes V.2

CheckPeer-reviewed method
  • 1Biola University;
  • 2Chesalon USA, Inc.;
  • 3Emanuel University of Oradea
Open access
Protocol CitationThushara Galbadage, Vinodh Gunasekera, Emanuel Tundrea, Richard S. Gunasekera 2023. ORFanID Web-based Search Engine to Identify Orphan and Taxonomically Restricted Genes. protocols.io https://dx.doi.org/10.17504/protocols.io.14egn37jql5d/v2Version created by Thushara Galbadage
Manuscript citation:
Gunasekera RS, Raja KKB, Hewapathirana S, Tundrea E, Gunasekera V, et al. (2023) ORFanID: A web-based search engine for the discovery and identification of orphan and taxonomically restricted genes. PLOS ONE 18(10): e0291260. https://doi.org/10.1371/journal.pone.0291260
License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: December 11, 2023
Last Modified: December 12, 2023
Protocol Integer ID: 92138
Keywords: ORFan, genes, ORFanID, NCBI, genomes
Abstract
ORFanID is a web-based software engine designed to identify ORFan genes from genomes of interest; from a given list of DNA or protein sequences within the NCBI databases. The selection of the taxonomy level of interest can define the scope of the search for orphan genes. Detectable homologous sequences are found by the software for candidate genes in the NCBI databases. Based on these findings, the ORFanID engine identifies and depicts orphan genes. Results may be viewed and analyzed graphically for scientific research and inquiry. As the enigma of orphan genes unravels, we believe ORFanID will provide critical insights into the origin, function, and prevalence of ORFan genes in genomes.

The last step contains a supplemental video with extra context and tips, as part of the protocols.io Spotlight series, featuring conversations with protocol authors.
Materials
ORFan Genes
Orphan genes, also known as taxonomically restricted genes, lack ancestral ties in other species at specific taxonomy levels. These genes present DNA and/or protein sequences lacking homology with those archived in prominent DNA databases like GenBank. Despite conventional beliefs attributing the emergence of new genes to processes like gene duplication or recombination, the widespread presence of orphan genes in sequenced genomes remains a mystery. This ubiquity represents a challenging question in the realm of life sciences.

Biological Implications
Traditionally, genes have been understood to dictate functions through proteins. Intriguingly, certain organisms, including Hydra, various mollusks, and salamanders, express unique proteins stemming from orphan genes. For instance:
• Hydra's anatomy is influenced by proteins produced by orphan genes.
• The mantles of specific mollusks owe their unique features to proteins resulting from orphan genes.
• The regenerative capability of salamander limbs can be attributed to proteins encoded by orphan genes.

Discovery through ORFanID
ORFanID stands as a powerful tool to unearth the origin, function, and broader implications of orphan genes. Capable of recognizing genes unique to diverse taxonomical levels such as genus, family, and species, ORFanID ensures precision by allowing adjustments in classification parameters. Thus, while some genes might be recognized as taxonomy-restricted based on set criteria, they might not strictly qualify as ORFans. This precision aids in pinpointing the sequence and functionality of de novo genes across various taxonomical spectra.

Accessing ORFanID
Accessing ORFanID
Navigate to ORFanID's webpage at http://www.orfangenes.com/
Click "Get Started" on the home page to access the search system.



Sample Searches
Sample Searches
If you don't have a specific gene sequence or an accession number, use the sample options provided. Four sample icons are located at the bottom left of the search screen. Clicking on any of these will pre-fill the input field with either a gene sequence or an accession number. Click "Search" to proceed.
Input Methods
Input Methods
Choose between searching for a gene sequence or using an accession number. For instance, to search a Homo sapiens sample, use the toggle switch to specify your preference.
ORFanID offers three search methods:
a. Uploading a FASTA file.
b. Inputting an accession number or numbers. (e.g., the E. coli sample has three given accession numbers)
c. Directly submitting gene sequences.
Search Guidelines
Search Guidelines
When searching multiple gene sequences, separate each with a new line or space.
You'll need to specify the organism for your search. If the desired organism isn't listed, refer to the NCBI taxonomy database to obtain its full scientific name and taxonomy ID. Enter this information, ensuring the taxonomy ID is in parentheses.
Choose between searching by protein or gene using the provided radio selection boxes.
Additional Options
Additional Options
There's an optional Nickname field on the upper left. Use it to label and identify your search results for future reference.
Adjust the three advanced parameters if needed. By default, the maximum e-value for the BLAST algorithm is set to three, and the maximum target sequences are set at 550. Hover over the pop-up for detailed info.
Submission and Results
Submission and Results
After confirming your search criteria, the organism, and any other fields, click "Submit". A pop-up will prompt you to review your search. If satisfied, proceed to submit. Depending on the complexity of your query, results may take between 3 to 15 minutes.
You'll be redirected to the "Results" tab. A light green box indicates that processing is ongoing. Once complete, it turns dark green. To view your results, click the graph icon on the right of the results table.



Spotlight video
Spotlight video
Protocol references
Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990) "Basic local alignment search tool." J. Mol. Biol. 215:403-410
Ekstrom, A. & Yin, Y. (2016) "ORFanFinder: automated identification of taxonomically restricted orphan genes." Bioinformatics; 32 (13): 2053-2055. doi: 10.1093/bioinformatics/btw122
Clamp, M., Fry, B., Kamal, M., Xie, X., Cuff, J., Lin, M.F., Kellis, K., Lindblad-Toh, K., and Lander, E. S. (2007) “Distinguishing protein-coding and noncoding genes in the human genome”. PNAS 2007 December, 104 (49) 19428-19433
R. S. Gunasekera, Hewpathirana, S., Gunasekera, V., Dias, S. and Nelson, P., A Web-Based Computational Algorithm, ORFanID, for Discovering and Cataloging Orphan and Taxonomically Restricted Genes in Various Species, International Society for Computational Biology, Chicago, USA, B-847, 2018