Mar 22, 2022

Public workspacePlant assemble - Plant de novo genome assembly, scaffolding and annotation for genomic studies

  • 1Australian National University
Icon indicating open access to content
QR code linking to this content
Collection CitationScott Ferguson, Ashley Jones, Justin Borevitz 2022. Plant assemble - Plant de novo genome assembly, scaffolding and annotation for genomic studies. protocols.io https://dx.doi.org/10.17504/protocols.io.81wgb6zk3lpk/v1
License: This is an open access collection distributed under the terms of the Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this collection and it’s working
Created: March 21, 2022
Last Modified: March 22, 2022
Collection Integer ID: 59709
Abstract
With the advancement of long-read sequencing technologies and associated bioinformatics tools, it has now become possible to de novo assemble complex plant genomes with unrivalled contiguity, completeness and correctness. As read lengths can surpass repeat lengths, the ability to assemble genomes de novo has dramatically improved, whereby complex plant genomes of widely variable sizes and repeat content have highly benefited. Despite these improvements, challenges remain in performing de novo assembly, namely in developing a reliable workflow and in tool choice. Here we present a protocol collection of bioinformatic workflows detailing plant genome assembly using Oxford Nanopore Technologies long-reads with a de novo assembler (Canu), syntenic or Hi-C scaffolding, and RNA and/or gene homology-based annotation. We have developed and tested these protocols on multiple plant genomes. Using these protocols with sufficient coverage of long-reads, a highly contiguous, complete, and correct plant genome can be assembled. These genomes can further genomic research into structural variation among groups, and SNP genotyping and association studies among populations.
Materials

ABC
ToolVersionWhat?
BEDToolsLatestSoft masking
BioawkLatestExtract sequence names and lengths
BLASTLatestContamination filter
Blobtools1.12.xContamination filter
BREAKER22.1.5Gene annotation
BUSCO5.xGenome assessment
bwa memLatestAlign short reads during polishing
EDTAv1.9.6Predict transposon sequences
FlyeLatestLong read genome assembly, used for genome size estimate.
GenomeScope 2.0LatestK-mer based genome size and ploidy estimator.
GenomeToolsLatestUsed by LAI
Hapo-GLatestHaplotype aware short read polisher
JellyfishLatestK-mer counting for k-mer based genome size estimate
Juicer1.6Hi-C quality control and scaffolding.
LAILatestGenome assessment.
LTR_FINDER_parallelLatestUsed by LAI
ltr_retrieverLatestUsed by LAI
MiniasmLatestLong read genome assembly, used for genome size estimate.
Minimap2LatestLong read aligner.
MUMmer4Sequence aligner and visualisation.
NanoPackLatestLong read fastq assessment and quality control
Next PolishLatestShort read polisher.
hi_qc (Phase Genomics)LatestHi-C quality assessment.
purge haplotigsLatestFind and filter duplicate genomic regions in assembly.
qualimap2Assess quality of alignment of validation reads.
RLatestUsed by Genome Scope 2.0.
RaconLatestLong read polisher.
RaGOO/RagTagLatestSyntenic scaffolder.
RepeatMaskerLatestFinds TEs and SSR regions in genomes and masks.
SamtoolsLatestProcessing of sam/bam files.
SeqtkLatestSub-sample fasta/q files.
StarLatestRNA aligner for gene annotation.
Tools/programs used by pipelines and versions that have worked for us. For citations see publication.

Files
Protocol
Icon representing the file Plant assemble - Plant de novo genome assembly: assembly
Name
Plant assemble - Plant de novo genome assembly: assembly
Version 2
, Australian National University
Scott FergusonAustralian National University
Protocol
Icon representing the file Plant assemble - Plant de novo genome assembly: scaffolding
Name
Plant assemble - Plant de novo genome assembly: scaffolding
Version 2
, Australian National University
Scott FergusonAustralian National University
Protocol
Icon representing the file Plant assemble - Plant de novo genome assembly: quality assessment
Name
Plant assemble - Plant de novo genome assembly: quality assessment
Version 2
, Australian National University
Scott FergusonAustralian National University
Protocol
Icon representing the file Plant assemble - Plant de novo genome assembly: annotation
Name
Plant assemble - Plant de novo genome assembly: annotation
Version 2
, Australian National University
Scott FergusonAustralian National University