Protocol Citation: Martina Albuja-Quintana, Gabriela Pozo, Milton Gordillo-Romero, Carolina E. Armijos, Maria de Lourdes Torres 2024. Vaccinium floribundum Genome Assembly and Annotation Script. protocols.io https://dx.doi.org/10.17504/protocols.io.n92ldmo4nl5b/v1
License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: March 14, 2024
Last Modified: April 03, 2024
Protocol Integer ID: 96735
Funders Acknowledgement:
Fondos COCIBA USFQ
Abstract
Oxford Nanopore long reads and Illumina short reads obtained from sequencing the DNA of a Vaccinum floribundum specimen from the Paramo region in Ecuador, were used to assemble and annotate the whole genome of this species. ONT long reads were filtered and trimmed in Nanofilt and Porechop. Sequencing statistics were visualized in Nanoplot. Illumina short reads were evaluated with fastqc. Different assemblers and polishers were used with both long and short reads. The resulting Flye assembly polished with Polca and Medaka was then analyzed for genome completeness and quality with Quast, BUSCO, LAI Index, and Coverage Graph. The assembly was later annotated in Maker in 3 consecutive rounds using the ab initio gene predictor SNAP.
Oxford Nanopore Sequencing - Raw Reads Filtering, Trimming, and Statistics
Oxford Nanopore Sequencing - Raw Reads Filtering, Trimming, and Statistics
Raw Read Adapter Filtering
Porechop v0.2.4 (RRID:SCR_016967)
Raw Read Quality and Length Trimming
Nanofilt v2.8.0 (RRID:SCR_016966)
Raw Read Dataset Statistics
LongQC v1.2.0c
Nanoplot v1.33.0 (RRID:SCR_024128)
Illumina Sequencing - Raw Reads Statistics
Illumina Sequencing - Raw Reads Statistics
Raw Read Dataset Statistics
FastQC (RRID:SCR_014583)
Genome Size Estimation
Genome Size Estimation
k-mer based analysis
Jellyfish v2.3.0 (RRID:SCR_005491)
k-mer profile visualization
GenomeScope v2.0 (RRID:SCR_017014)
De novo Genome Assembly and Polishing
De novo Genome Assembly and Polishing
Assembly
SMARTdenovo v1.0.0 (RRID:SCR_017622)
Flye v2.9.2 (RRID:SCR_017016)
MaSuRCA v.4.1.0 (RRID:SCR_010691)
Polishing
Medaka v1.11.1
POLCA (MaSuRCA, v4.1.0 RRID:SCR_010691)
Genome assembly quality, continuity, and completeness assessment
Genome assembly quality, continuity, and completeness assessment
Quast v5.2.0 (RRID:SCR_001228)
BUSCO v5.4.7 (RRID:SCR_015008)
Long Terminal Repeat (LTR) Assembly Index (LAI)
LTRharvest v1.6.2 (RRID:SCR_018970)
1. Create an "Enhanced Suffix Array"
2. Run LTRHARVEST
LTR_FINDER v1.0.7 (RRID:SCR_015247)
Combine both documents from LTRFinder and LTRHarvest
LTR_retriever v2.8.7 (RRID:SCR_017623)
Coverage Graphs of ONT and Illumina Reads
1. Change contig names from assembly file
2. Identify longest contig
3. Extract the longest contig from the assembly
Samtools package v1.18 (RRID:SCR_002105)
4. Extract 10% of ONT and Illumina reads (total reads were obtained from previously run Nanoplot)
BWA (RRID:SCR_010910)
5. Map ONT reads and illumina reads, seperately, to the assembly