Elucidating the roles of SOD3 correlated genes and reactive oxygen species in rare human diseases using a bioinformatic-ontology approach

markstanworth; Shu-Dong Zhang

Feb 12, 2024

Elucidating the roles of SOD3 correlated genes and reactive oxygen species in rare human diseases using a bioinformatic-ontology approach

DOI

dx.doi.org/10.17504/protocols.io.rm7vzxp14gx1/v1

Mark Stanworth¹,
Shu-Dong Zhang¹

¹Ulster University

Mark Stanworth

DOI: dx.doi.org/10.17504/protocols.io.rm7vzxp14gx1/v1

Protocol Citation: Mark Stanworth, Shu-Dong Zhang 2024. Elucidating the roles of SOD3 correlated genes and reactive oxygen species in rare human diseases using a bioinformatic-ontology approach. protocols.io https://dx.doi.org/10.17504/protocols.io.rm7vzxp14gx1/v1

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: Working

We use this protocol and it's working

Created: February 07, 2024

Last Modified: February 12, 2024

Protocol Integer ID: 94798

Disclaimer

DISCLAIMER – FOR INFORMATIONAL PURPOSES ONLY; USE AT YOUR OWN RISK

The protocol content here is for informational purposes only and does not constitute legal, medical, clinical, or safety advice, or otherwise; content added to protocols.io is not peer reviewed and may not have undergone a formal approval of any kind. Information presented in this protocol should not substitute for independent professional judgment, advice, diagnosis, or treatment. Any action you take or refrain from taking using or relying upon the information presented here is strictly at your own risk. You agree that neither the Company nor any of the authors, contributors, administrators, or anyone else associated with protocols.io, can be held responsible for your use of the information contained in or linked to this protocol or any of our Sites/Apps and Services.

Abstract

This is a gene discovery protocol utilising a single seed gene to create correlation lists. These lists are used to identify novel genes in rare diseases.

Software and packages

Download and install the required software

Software
R programming language
NAME
The R Foundation
DEVELOPER
Comprehensive R Archive Network
SOURCE LINK
Cost£0.00  

Software
7-zip
NAME
Igor Pavlov
DEVELOPER
Cost£0.00  

Software
Microsoft 365
NAME
Microsoft
DEVELOPER
   Cost£59.99 per year (personal)  

Expression preparation

Download the RAW file for GSE2109
Dataset
Expression Project for Oncology (expO)
NAME
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE2109
LINK
Cost£0.00  


Extract the CEL files using 7-zip

Add 'affy' to the R library and normalise the data


Note
>Library(Affy)

#Normalise data
>raw.data<- ReadAffy(celfile.path = "GSE2109_RAW/")
>normalised.data<- rma(raw.data)
>normalised.expression<- as.data.frame(exprs(normalised.data))
>write.table(normalised.expression, file = "GSE2109_normalised.txt", quote = FALSE, sep ="\t", row.names = TRUE, col.names = TRUE)

SOD3 correlation data

Import the GSE2109_normalised.txt file into excel

Create a new column to calculate the Pearson correlation with the SOD3 probe (205236_at)

Note
=PEARSON(X1:X2,Y1:Y2)

Create another column calculating the two-tail t-test statistic for the correlations

Note
=rho*SQRT(n-2)/SQRT(1-rho^2)

Create another column calculating the two tail t-test p-value
    
Note
=T.DIST.2T(ABS(tstat),n-2)

Create another column calculating the Jarque-Bera normality test statistic

Note
=(COUNT(X1:X2)/6)*(SKEW(X1:X2) ^2+(KURT(X1:X2)^2-3)/4)

    
Create another column calculating the chi-squared p-value for the Jarque-Bera normality test     

Note
=CHISQ.DIST.RT(abs(JBTS),2)

Expression exclusions

Apply statistical exclusions:
Exclude all expressions with SOD3 correlation p > 2.3x10-5 (Bonferroni corrected alpha 0.05)
Exclude all expressions with SOD3 correlation ρ ≤ |0.34|
Exclude all expressions failing the Jarque-Bera normality test

Gene exclusions

Use GSE2109 supplementary file GPL570-9999 to assign gene symbols to expression probes

Update gene names using the HUGO Gene Nomenclature Committee website (https://www.genenames.org/) and note gene class (e.g. protein coding, pseudogene, etc)

Identify optimal gene of duplicate probes by adding 'Jetset' to the R library

Note
>library(jetset)

# Best fit from duplicates
>jscores('hgu133plus2', symbol = 'CAND1')
>jmap('hgu133plus2', symbol = "CAND1")
>jscores('hgu133plus2', symbol = 'FBXO28')
>jmap('hgu133plus2', symbol = "FBXO28")
>jscores('hgu133plus2', symbol = 'HSPB6')
>jmap('hgu133plus2', symbol = "HSPB6")
>jscores('hgu133plus2', symbol = 'MREG')
>jmap('hgu133plus2', symbol = "MREG")
>jscores('hgu133plus2', symbol = 'MTF2')
>jmap('hgu133plus2', symbol = "MTF2")
>jscores('hgu133plus2', symbol = 'MYH11')
>jmap('hgu133plus2', symbol = "MYH11")
>jscores('hgu133plus2', symbol = 'PLN')
>jmap('hgu133plus2', symbol = "PLN")
>jscores('hgu133plus2', symbol = 'QSER1')
>jmap('hgu133plus2', symbol = "QSER1")
        
Apply gene exclusions:
1. Exclude expressions with no GPL570-9999 identified gene symbol
1. Exclude suboptimal duplicate gene probes identified by 'Jetset'
2. Exclude non-specific/promiscuous probes, non-coding genes

Gene lists

Verify robustness of the remaining gene expressions using 'pvclust' in R

Note
>Library(pvclust)

# Test for robustness
>data <- as.matrix(read.table("GSE2109_rho_34_CLEAN.txt",
header=TRUE,row.names=1))
>robustness <- pvclust((as.matrix(t(data))),method.dist="correlation", use.cor="pairwise.complete.obs", method.hclust="ward.D2",nboot=1000)
>plot(robustness, hang=-1,cex=0.5, main="GSE2109 Correlation (|ρ|≥0.34) Cluster with
p-values (%)")
>pvrect(result,alpha=0.95)

Incrementally increase the correlation threshold by 0.01 from ρ>|0.34| to ρ>|0.41| 

List 1: Genes with ρ>|0.41|
List 2: Genes with ρ>|0.40|
List 3: Genes with ρ>|0.39|
List 4: Genes with ρ>|0.38|
List 5: Genes with ρ>|0.37|
List 6: Genes with ρ>|0.36|
List 7: Genes with ρ>|0.35|
List 8: Genes with ρ>|0.34|

For all lists, separate genes by correlation direction and denote the daughter lists with superscript '+' for positive and '-' for negative correlations

List 1+: Genes with ρ > 0.41
List 2+: Genes with ρ > 0.40
List 3+: Genes with ρ > 0.39
List 4+: Genes with ρ > 0.38
List 5+: Genes with ρ > 0.37
List 6+: Genes with ρ > 0.36
List 7+: Genes with ρ > 0.35
List 8+: Genes with ρ > 0.34
List 1-: Genes with ρ < -0.41
List 2-: Genes with ρ < -0.40
List 3-: Genes with ρ < -0.39
List 4-: Genes with ρ < -0.38
List 5-: Genes with ρ < -0.37
List 6-: Genes with ρ < -0.36
List 7-: Genes with ρ < -0.35
List 8-: Genes with ρ < -0.34

Significant disorders

Enter all positive and negative correlation gene lists into Enrichr (maayanlab.cloud), and for each list:

In the Diseases / Drugs tab select Orphanet Augmented 2021
Copy all significant disorders Name and Adjusted P-value into a spreadsheet. Positive lists gene overlap must contain SOD3 and at two other genes, whereas negative correlation lists do not have to have SOD3 as an overlap gene but must contain 3 list genes in the overlap (hover over the disorder to check)
Identify minimum viable positive and negative lists (i.e. the smallest signed gene lists which have a significant overlap with disorder(s)).    
Delete from the spreadsheet any disorders from parent lists not in the minimum viable positive and negative lists
Identify the gene lists with the greatest overlap and associate the disorder to that list, noting the overlap genes in the spreadsheet
Cross reference the Orphanet disorder names (Orphanet: Search by disease name) with the OMIM names (Home - OMIM) noting the causal genes

Significant ontologies

Individually enter each gene list associated with a disorder into Enrichr (maayanlab.cloud) and add the causal gene(s) for the disorder considered.

1. In the pathways tab, use Elsevier Pathway Collection, from the table note the significant pathways and the adjusted p-value of any pathways containing both a causal gene and a non-overlapping list gene
2. In the ontologies tab note from the Biological Process, Molecular Function, and Human Phenotype tables significant entries and adjusted p-values of any entries containing both a causal gene and a non-overlapping list gene

Associate qualifying entries with the list associated disorders

Note the non-overlapping gene(s) (potentially novel genes) linked to the disorders via the causal gene/list ontologies

Literature associations

Search the literature for previous works which may lead to potential mechanisms or pathways from the potentially novel genes to disorder presentation. Repeat for superoxide dismutase, superoxide, and hydrogen peroxide. Consider gene aliases / previous symbols in respect of time allocated to the research

Public workspaceElucidating the roles of SOD3 correlated genes and reactive oxygen species in rare human diseases using a bioinformatic-ontology approach

Elucidating the roles of SOD3 correlated genes and reactive oxygen species in rare human diseases using a bioinformatic-ontology approach