License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: August 26, 2024
Last Modified: November 22, 2024
Protocol Integer ID: 112587
Keywords: docker, bioinformatics, dnalinux, fungi
Abstract
Protocol to annotate a fungi genome
Setup
Setup
Install Docker
If you don't have Docker already, install it. There are two versions, Docker Engine (also known as CE) and Docker Desktop. The Desktop version is more user friendly but since may require commercial license for large enterprise, this tutorial is based on the Docker engine. Both version will work in this protocol. Linux users can install both Docker CE and Desktop, while macOS and Windows users should install Docker Desktop.
You will need fastq data (long reads), short reads, and the assembly data. In the following code, the assembly data file is called assembly.fasta. The long reads file is called ID.fastq. The short reads should be two files (ID_R1.fastq.gz and ID_R2.fastq.gz).
If you have more files for short reads, you can concatenate them so you end up with 2 files. For example, if you have ID_L001_R1.fastq.gz, ID_L002_R1.fastq.gz, ID_L001_R2.fastq.gz, ID_L002_R2.fastq.gz, you can concatenate them with these commands:
Download FamDB HDF5 database, Interproscan database and GeneMark license
FamDB HDF5 database
FamDB HDF5 database is needed for the RepeatMasker step. This database is partitioned by taxonomic groups, the partition needed for Fungi is partition number 0, for more information about partitions read this file: README.txt2KB
If you don't have a GeneMark license, get it from this page. License key file should be named gm_key and located in /your_dir. This license is need to run the Funannotate Predict step.
Run sspace_longread
Run sspace_longread
Run the following command (replace /your_dir for the base directory where you have your data
Run the following command (replace /your_dir for the base directory where you have your data). Remember that is step requires the dfam38_full.0.h5 database installed in a directory that should be called /ftmp in the docker.