Protocol Citation: Mark Stanworth, Shu-Dong Zhang 2024. Elucidating the roles of SOD3 correlated genes and reactive oxygen species in rare human diseases using a bioinformatic-ontology approach. protocols.io https://dx.doi.org/10.17504/protocols.io.rm7vzxp14gx1/v1
License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: February 07, 2024
Last Modified: February 12, 2024
Protocol Integer ID: 94798
Disclaimer
DISCLAIMER – FOR INFORMATIONAL PURPOSES ONLY; USE AT YOUR OWN RISK
The protocol content here is for informational purposes only and does not constitute legal, medical, clinical, or safety advice, or otherwise; content added to protocols.io is not peer reviewed and may not have undergone a formal approval of any kind. Information presented in this protocol should not substitute for independent professional judgment, advice, diagnosis, or treatment. Any action you take or refrain from taking using or relying upon the information presented here is strictly at your own risk. You agree that neither the Company nor any of the authors, contributors, administrators, or anyone else associated with protocols.io, can be held responsible for your use of the information contained in or linked to this protocol or any of our Sites/Apps and Services.
Abstract
This is a gene discovery protocol utilising a single seed gene to create correlation lists. These lists are used to identify novel genes in rare diseases.
Create another column calculating the chi-squared p-value for the Jarque-Bera normality test
Note
=CHISQ.DIST.RT(abs(JBTS),2)
Expression exclusions
Expression exclusions
Apply statistical exclusions:
1. Exclude all expressions with SOD3 correlation p > 2.3x10-5 (Bonferroni corrected alpha 0.05)
2. Exclude all expressions with SOD3 correlation ρ ≤ |0.34|
3. Exclude all expressions failing the Jarque-Bera normality test
Gene exclusions
Gene exclusions
Use GSE2109 supplementary file GPL570-9999 to assign gene symbols to expression probes
Update gene names using the HUGO Gene Nomenclature Committee website (https://www.genenames.org/) and note gene class (e.g. protein coding, pseudogene, etc)
Identify optimal gene of duplicate probes by adding 'Jetset' to the R library
Note
>library(jetset)
# Best fit from duplicates
>jscores('hgu133plus2', symbol = 'CAND1')
>jmap('hgu133plus2', symbol = "CAND1")
>jscores('hgu133plus2', symbol = 'FBXO28')
>jmap('hgu133plus2', symbol = "FBXO28")
>jscores('hgu133plus2', symbol = 'HSPB6')
>jmap('hgu133plus2', symbol = "HSPB6")
>jscores('hgu133plus2', symbol = 'MREG')
>jmap('hgu133plus2', symbol = "MREG")
>jscores('hgu133plus2', symbol = 'MTF2')
>jmap('hgu133plus2', symbol = "MTF2")
>jscores('hgu133plus2', symbol = 'MYH11')
>jmap('hgu133plus2', symbol = "MYH11")
>jscores('hgu133plus2', symbol = 'PLN')
>jmap('hgu133plus2', symbol = "PLN")
>jscores('hgu133plus2', symbol = 'QSER1')
>jmap('hgu133plus2', symbol = "QSER1")
Apply gene exclusions:
1. Exclude expressions with no GPL570-9999 identified gene symbol
1. Exclude suboptimal duplicate gene probes identified by 'Jetset'
>plot(robustness, hang=-1,cex=0.5, main="GSE2109 Correlation (|ρ|≥0.34) Cluster with
p-values (%)")
>pvrect(result,alpha=0.95)
Incrementally increase the correlation threshold by 0.01 from ρ>|0.34| to ρ>|0.41|
List 1: Genes with ρ>|0.41|
List 2: Genes with ρ>|0.40|
List 3: Genes with ρ>|0.39|
List 4: Genes with ρ>|0.38|
List 5: Genes with ρ>|0.37|
List 6: Genes with ρ>|0.36|
List 7: Genes with ρ>|0.35|
List 8: Genes with ρ>|0.34|
For all lists, separate genes by correlation direction and denote the daughter lists with superscript '+' for positive and '-' for negative correlations
List 1+: Genes with ρ > 0.41
List 2+: Genes with ρ > 0.40
List 3+: Genes with ρ > 0.39
List 4+: Genes with ρ > 0.38
List 5+: Genes with ρ > 0.37
List 6+: Genes with ρ > 0.36
List 7+: Genes with ρ > 0.35
List 8+: Genes with ρ > 0.34
List 1-: Genes with ρ < -0.41
List 2-: Genes with ρ < -0.40
List 3-: Genes with ρ < -0.39
List 4-: Genes with ρ < -0.38
List 5-: Genes with ρ < -0.37
List 6-: Genes with ρ < -0.36
List 7-: Genes with ρ < -0.35
List 8-: Genes with ρ < -0.34
Significant disorders
Significant disorders
Enter all positive and negative correlation gene lists into Enrichr (maayanlab.cloud), and for each list:
1. In the Diseases / Drugs tab select Orphanet Augmented 2021
2. Copy all significant disorders Name and Adjusted P-value into a spreadsheet. Positive lists gene overlap must contain SOD3 and at two other genes, whereas negative correlation lists do not have to have SOD3 as an overlap gene but must contain 3 list genes in the overlap (hover over the disorder to check)
3. Identify minimum viable positive and negative lists (i.e. the smallest signed gene lists which have a significant overlap with disorder(s)).
4. Delete from the spreadsheet any disorders from parent lists not in the minimum viable positive and negative lists
5. Identify the gene lists with the greatest overlap and associate the disorder to that list, noting the overlap genes in the spreadsheet
Individually enter each gene list associated with a disorder into Enrichr (maayanlab.cloud) and add the causal gene(s) for the disorder considered.
1. In the pathways tab, use Elsevier Pathway Collection, from the table note the significant pathways and the adjusted p-value of any pathways containing both a causal gene and a non-overlapping list gene
2. In the ontologies tab note from the Biological Process, Molecular Function, and Human Phenotype tables significant entries and adjusted p-values of any entries containing both a causal gene and a non-overlapping list gene
Associate qualifying entries with the list associated disorders
Note the non-overlapping gene(s) (potentially novel genes) linked to the disorders via the causal gene/list ontologies
Literature associations
Literature associations
Search the literature for previous works which may lead to potential mechanisms or pathways from the potentially novel genes to disorder presentation. Repeat for superoxide dismutase, superoxide, and hydrogen peroxide. Consider gene aliases / previous symbols in respect of time allocated to the research