Dec 15, 2023

Public workspaceFASTQ alignment, gene counts, and differential expression analysis and functional annotation of DEGs

  • 1University of Florida
Open access
Protocol CitationRebecca Wallings 2023. FASTQ alignment, gene counts, and differential expression analysis and functional annotation of DEGs. protocols.io https://dx.doi.org/10.17504/protocols.io.dm6gp3mzdvzp/v1
License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: December 14, 2023
Last Modified: December 15, 2023
Protocol Integer ID: 92337
Funders Acknowledgement:
ASAP
Grant ID: ASAP-020527
Abstract
FASTQ alignment, gene counts, and differential expression analysis and functional annotation of DEGs
FASTQ alignment, gene counts, and differential expression analysis
FASTQ alignment, gene counts, and differential expression analysis
FASTQ files were aligned against the mouse genome (GRCm39) and GRCm39.107 annotation using STAR to generate BAM files. Gene counts were generated from BAM files using Rsamtools (Bioconductor package) and the summarizeOverlaps function with the GenomicAlignments package v1.36.0. Differential gene expression analysis was performed with DESeq2 package v1.40.2 using the “DESeq” function with default settings which fits a generalized linear model for each gene. Subsequent Wald test P-values are adjusted for multiple comparisons using the Benjamini–Hochberg method (adjusted P-value). Pair-wise changes in gene expression levels between groups were used to identify DEGs. DEGs will be defined as an absolute log2 fold change ≥0.5 and an adjusted P-value ≤0.05.
Functional annotation of DEGs
Functional annotation of DEGs
Gene Ontology (GO) enrichment analysis was performed with goseq to identify enrichment in gene ontology categories and KEGG pathways. For DEGs, up- and down-regulated gene lists were analyzed separately. Over-represented P-values were adjusted for multiple comparisons using the Benjamini–Hochberg (BH) adjustments for controlling false-discovery rates. An enrichment score was calculated using an observed-over-expected ratio for each gene list. GO-BP categories and KEGG pathways were plotted if their BH adjusted FDR reached >= 0.05 and the number of DEGs within each category/pathway was greater than 4. To eliminate larger, broad GO-BP parent categories, categories with more than 250 total genes were not plotted. To eliminate redundancy, pathways with significant overlap regarding genes and semantics were merged.