As the SARS-CoV-2 pandemic emerged, Philippine Genome Center Mindanao (PGC Mindanao), in collaboration with Project Accessible Genomics and Genomic Epidemiology of COVID in the Philippines (GECO), has acquired an Oxford Nanopore MinION sequencer to be able to sequence SARS-CoV-2 whole genomes from different samples in Mindanao in a previous project. Specifically, the samples originated and were collected from actual patients in various Sub-National Laboratories (SNLs). A workflow was developed by PGC Mindanao to generate and identify SARS-CoV-2 whole genome sequences from these samples, up to submission of the sequences, along with the associated metadata, to the public database GISAID. The previous project was successful, generating and publicly-releasing about 100 sequences from these samples.
However, the previously-developed workflow, though functional, still requires a considerable amount of time to run since each step of the workflow would require much human intervention (e.g. generation of scripts and issuing of terminal commands) to start the next part of the workflow. In line with this, PGC Mindanao and Project Accessible Genomics has sought a collaborator that offers an automated workflow, which covers the input of raw sequencing data up to assembled sequences and automated report generation. This was the basis for applying under the Public Health Alliance for Genomics Epidemiology (PHA4GE) subgrant.
For this grant, PGC Mindanao collaborated with BugSeq due to the latter's capacity to automate the workflow and previous experience with submission to public databases. The collaboration with BugSeq, with their automated workflow, will definitely decrease runtime and improve bioinformatics capacity to generate immediate actionable insights from the data and ease submission to databases. In addition, the PHA4GE grant funds went to the improvement of the bioinformatics infrastructure within PGC Mindanao such as fiber optic structured cabling for the sequencers and internet connectivity and upgrade of bioinformatics workstations.
The protocol below outlines the PGC Mindanao workflow (Section 2) for the lineage assignment of assembled sequences, starting from patient sample collection (Section 2.1), to whole-genome sequencing of samples (Section 2.2), generation, quality control, and processing of raw sequence data (2.3), quality assessment and control of assembled sequence data (Section 2.4), then to the different approaches to annotation of the assembled sequence data such as by PANGOLIN (Section 2.5), Nextclade (Section 2.6), and GISAID (Section 2.7). The BugSeq workflow (Section 3) was also then outlined below, starting from FASTQ sequence file upload through the BugSeq website interface (Section 3.1), to quality control and processing of FASTQ data (Section 3.2), generation of results, output data (especially the assembled sequence FASTA files), and reports which also included PANGO lineage assignment (Section 3.3) for annotation of the assembled sequence data. Comparison of results from PGC Mindanao and BugSeq workflows (Section 4) was also done.