Mar 14, 2024

Public workspaceDESEq2 for time series

This protocol is a draft, published without a DOI.
  • 1Fred Hutch
Open access
Protocol CitationKarina Jhingan 2024. DESEq2 for time series. protocols.io https://protocols.io/view/deseq2-for-time-series-dajh2cj6
License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: March 12, 2024
Last Modified: March 14, 2024
Protocol Integer ID: 96585
Abstract
This is a differential analysis for a time series experiment using DESeq2.
Introduction
Introduction

Note that DESeq and any other differential analysis requires replicates.
Imports/Libraries
Imports/Libraries
Citation: “ashr” for LFC shrinkage Stephens, M. (2016) False discovery rates: a new deal. Biostatistics, 18:2. https://doi.org/10.1093/biostatistics/kxw041
if (!("DESeq2" %in% installed.packages())) {
# Install this package if it isn't installed yet
BiocManager::install("DESeq2", update = FALSE)
}
if (!("apeglm" %in% installed.packages())) {
# Install this package if it isn't installed yet
BiocManager::install("apeglm", update = FALSE)
}
if (!("ashr" %in% installed.packages())) {
# Install this package if it isn't installed yet
BiocManager::install("ashr", update = FALSE)
}

# Attach the DESeq2 library
library(DESeq2)

# Attach the ggplot2 library for plotting
library(ggplot2)

# We will need this so we can use the pipe: %>%
library(magrittr)

library(tidyverse)

#set seed for DESeq2::plotCounts() for reproducibility
set.seed(12345)

Data Set up
Data Set up
Set Project Path

#replace with your directory
projPath="/fh/fast/greenberg_p/user/kjhingan/DESeq2"

Gene Expression Data Matrix
A csv file where rows are genes and columns are samples/time stamps
# Replace with the path to your dataset file
data_file <- file.path(projPath, "gsva_data.csv")

Metadata File A csv file with nrows(metadata) == ncols(datafile) three columns: -id: sample name, i.e the column names from the data file -group (i.e time stamp or control group)
-replicate

# Replace with the path to your metadata file
metadata_file <- file.path(projPath, "gsva_metadata.csv")


# Check if the gene expression matrix file is at the path stored in `data_file`
file.exists(data_file)

# Check if the metadata file is at the file path stored in `metadata_file`
file.exists(metadata_file)
You should see TRUE outputted twice
Read Data

# Read in metadata CSV file
metadata <- readr::read_csv(metadata_file)

# Read in data CSV file
expression_df <- readr::read_csv(data_file)

Set up data matrix

# ensure unique row names for the next function
expression_df <- expression_df[!(duplicated(expression_df[[1]]) |
duplicated(expression_df[[1]], fromLast=TRUE)),]


expression_df <-expression_df %>%
# store the gene IDs as row names so we have a numeric matrix to perform calculations on later
tibble::column_to_rownames("Gene")

Set up metadata
We are using the naive samples ("0 hours") as a reference/control.
metadata <- metadata %>%
dplyr::mutate(time_stamp = dplyr::case_when(
stringr::str_detect(group, "naïve") ~ "reference",
stringr::str_detect(group, "24") ~ "24_hours",
stringr::str_detect(group, "48") ~ "48_hours",
stringr::str_detect(group, "72") ~ "72_hours"
))


# Let's take a look at the original metadata column's info
# and our new `mutation_status` column
dplyr::select(metadata, group, time_stamp)



# Print out a preview of `mutation_status`
str(metadata$time_stamp)




# Make mutation_status a factor and set the levels appropriately
metadata <- metadata %>%
dplyr::mutate(
# Here we define the values our factor variable can have and their order.
time_stamp = factor(time_stamp, levels = c("reference", "24_hours", "48_hours", "72_hours"))
)

levels(metadata$time_stamp)
You should see the following printed out: [1] "reference" "24_hours" "48_hours" "72_hours"

# Define a minimum counts cutoff and filter the data to include
# only rows (genes) that have total counts above the cutoff
filtered_expression_df <- expression_df %>%
dplyr::filter(rowSums(.) >= 10)

# round all expression counts
gene_matrix <- round(filtered_expression_df)

DESeq2
DESeq2
Create DESeq2 Object
ddset <- DESeqDataSetFromMatrix(
# Here we supply non-normalized count data
countData = gene_matrix,
# Supply the `colData` with our metadata data frame
colData = metadata,
# Supply our experimental variable to `design`
design = ~time_stamp
)

Run DESeq2
deseq_object <- DESeq(ddset)

View Results

deseq_results <- results(deseq_object)
resultsNames(deseq_object)
[1] "Intercept" "time_stamp_24_hours_vs_reference" "time_stamp_48_hours_vs_reference" [4] "time_stamp_72_hours_vs_reference"
head(deseq_results)



Normalize results by log fold change
Here we are shrinking log fold change by ashr, for more information on ashr go to step 2 and view the citation.
deseq_results <- lfcShrink(
deseq_object, # The original DESeq2 object after running DESeq()
contrast=c("reference","24_hours","48_hours", "72_hours"),
type = "ashr",
res = deseq_results # The original DESeq2 results table
)
head(deseq_results)



Set up DESeq results for visualization
Convert the results into a dataframe
deseq_df <- deseq_results %>%
# make into data.frame
as.data.frame() %>%
# the gene names are row names -- let's make them a column for easy display
tibble::rownames_to_column("Gene") %>%
# add a column for significance threshold results
dplyr::mutate(threshold = padj < 0.05) %>%
# sort by statistic -- the highest values will be genes with
# higher expression in RPL10 mutated samples
dplyr::arrange(dplyr::desc(log2FoldChange))

head(deseq_df)



Visualizations
Visualizations
Plot counts
IL2RA was only chosen as an example, change to your gene of interest into the gene parameter.
plotCounts(ddset, gene = "IL2RA", intgroup = "time_stamp")



Save Results
Save Results
Save DESeq2 results
readr::write_tsv(
deseq_df,
file.path(
projPath,
"SRP123625_diff_expr_results.tsv" # Replace with a relevant output file name
)
)