Sep 23, 2022

Public workspacefastANI analysis protocol

This protocol is a draft, published without a DOI.
  • 1University of Exeter
Icon indicating open access to content
QR code linking to this content
Protocol CitationJamie Harrison, David J Studholme 2022. fastANI analysis protocol. protocols.io https://protocols.io/view/fastani-analysis-protocol-cgritv4e
License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: September 16, 2022
Last Modified: September 23, 2022
Protocol Integer ID: 70154
Abstract
this is the protocol to conduct ani analysis between groups of genomes using fastANI and produce the heatmap figure in R using the pheatmap package
Create directory structure and link files
create directories
create directories



Note
For this there there needs to be two files, one listing query genomes and one listing reference genomes, each with a single filename per line. Suggested names for these files = query_list.txt, reference_list.txt
mkdir analysis query_genomes reference_genomes
Command
create dirs for each of analysis, query genomes and reference genomes
mkdir analysis query_genomes, reference_genomes

move to query dir


Command
change dir to query_genomes
cd query_genomes


Command
Copy query_list.txt to query_genomes dir, where XXXXXXXX is location of query_list.txt
cp XXXXXXXX/query_list.txt .


Command
loop through query_list.txt and link each query genomes to query_genomes dir. YYYYYYYY is the location of the genome fasta files
while read i; do ln -s YYYYYYYY/${i} .;done < query_list.txt

check soft links and remove bad links

Command
check softlinks and remove any that don't work
find . -xtype l | xargs rm


Command
create query genome list
ls *fasta > query_list.txt

move to reference dir


Command
change dir to reference_genomes dir
cd ../reference_genomes


Command
Copy reference_list.txt to reference_genomes dir, where ZZZZZZZZ is location of reference_list.txt
cp ZZZZZZZZ/reference_list.txt .


Command
loop through reference_list.txt and link each reference genomes to reference_genomes dir. YYYYYYYY is the location of the genome fasta files
while read i; do ln -s YYYYYYYY/${i} .;done < reference_list.txt


Command
check softlinks and remove any that don't work
find . -xtype l | xargs rm


Command
create reference and query genome list files for fastANI input
ls *fasta > ref_list.txt


Command
move to directory to be used for the analysis step
cd ../analysis



Command
softlink all necessary files to the analysis directory to be used in the fastANI analysis
ln -s ../reference_genomes/ref_list.txt .
ln -s ../reference_genomes/*fasta .
ln -s ../query_genomes/query_list.txt .
ln -s ../query_genomes/query_list.txt .


Command
perform the fastANI analysis of the query genomes vs the reference genomes -rl specifies reference list of genomes -ql specifies query genome list -matrix outputs a bottom half triangular matrix of results -o specifies output file prefix this step can also be submitted to job queue on HPC cluster.
fastANI --rl reflist.txt --ql querylist.txt --matrix -o fastANI_out


Command
the output of fastANI is not in a suitable format to produce figure but this is addressed with a simple script available from github
git clone https://github.com/jh288/fastANI_reformatter.git


Command
reformat fastANI output for use in the R package pheatmap to create the figure
fastANI_reformater.pl fastANI_out > fastANI_out_reformat.tab


produce heatmap figure in R


Command
remove ".fasta" and substitute spaces for underscores in taxa names
sed 's/.fasta//g' fastANI_out_reformat.tab > fastANI_out_reformat_ed.tab

sed -i 's/_/ /g' fastANI_out_reformat_ed.tab


Command
r code to produce heatmap of fastANI results.
##load libraries

library("pheatmap")

library("RColorBrewer")

##load matrix into dataframe
b1<-read.delim("fastANI_out_reformat_ed.tab", header =T, row.names=1, check.names=F)

##set output parameters
png("fastANI_out_reformat_ed.png", height=1500, width=750)

###run pheatmap
pheatmap(t(as.matrix(b1)), color = brewer.pal(n = 7, name ="Blues"), display_numbers=T, number_format = "%.2f", number_color="black")

dev.off()