Jul 03, 2019

Public workspaceA step-by-step beginner’s protocol for whole genome sequencing of human bacterial pathogens

  • Sanjay Gautam1,
  • Rajendra KC1,
  • Kelvin WC Leong2,
  • Micheál Mac Aogáin3,
  • Ronan F. O’Toole1
  • 1School of Medicine, College of Health and Medicine, University of Tasmania, Hobart, Australia;
  • 2School of Molecular Sciences, College of Science, Health and Engineering, La Trobe University, Australia;
  • 3Department of Clinical Microbiology, School of Medicine, Trinity College Dublin, Ireland
Icon indicating open access to content
QR code linking to this content
Protocol CitationSanjay Gautam, Rajendra KC, Kelvin WC Leong, Micheál Mac Aogáin, Ronan F. O’Toole 2019. A step-by-step beginner’s protocol for whole genome sequencing of human bacterial pathogens. protocols.io https://dx.doi.org/10.17504/protocols.io.4xxgxpn
Manuscript citation:

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: June 29, 2019
Last Modified: July 03, 2019
Protocol Integer ID: 25303
Keywords: whole genome sequencing, Enterococcus faecium, Haemophilus influenzae, Mycobacterium tuberculosis, whole genome, sequencing, bacterial, pathogens
Abstract
Bacterial whole genome sequencing (WGS) is becoming a widely-used technique in research, clinical diagnostic, and public health laboratories. It enables high resolution characterization of bacterial pathogens in terms of properties that include antibiotic resistance, molecular epidemiology, and virulence. The introduction of next-generation sequencing instrumentation has made WGS attainable in terms of costs. However, the lack of a beginner’s protocol for WGS still represents a barrier to its adoption in some settings. Here, we present detailed step-by-step methods for obtaining WGS data from a range of different bacteria (Gram-positive, Gram-negative, and acid-fast) using the Illumina platform. Modifications have been performed with respect to DNA extraction and library normalization to maximize the output from the laboratory consumables invested. The protocol represents a simplified and reproducible method for producing high quality sequencing data. The key advantages of this protocol include simplicity of the protocol for users with no prior genome sequencing experience and reproducibility of the protocol across a wide range of bacteria.




Attachments
Guidelines
Background

Using Sanger sequencing, the Human Genome Project expended approximately USD $2.7 billion and took more than 10 years to pro- duce the first human genome sequence. Today, a human genome can be sequenced in a matter of days for less than USD $1000 on a single next-generation sequencing (NGS) machine. This step change in through- put and per-base cost has transformed the use of DNA sequencing in biomedical research and is being translated in an expanding number of ways into medicine. NGS is increasingly being applied to understand- ing and managing infectious diseases. This includes the sequencing of microbial genomes for the purposes of laboratory identification of infectious agents [1], detection of antibiotic resistance markers [2], and the public health surveillance of epidemiological clusters and outbreaks [3]. Examples include its deployment in public health surveillance and control of community cases of Escherichia coli [4], Campylobacter jejuni [5], Legionella pneumophila [6] and Mycobacterium tuberculo- sis [7] disease, or global and regional epidemics caused by influenza [8], Ebola [9], and Zika [10] viruses. It has also been utilised to track the source and spread of healthcare-associated infections caused by Staphylococcus aureus [11], Pseudomonas aeruginosa [12], Acineto- bacter baumannii [13], and Enterococcus faecium [14] in order to guide infection prevention and control in hospitals. In addition to its whole genome (WGS), whole exome (WES), transcriptome (RNA-Seq), bisulphite methylome, and metagenomic sequencing capabilities, NGS can be directed to the detection of specific genes or mutations associated with human disease through targeted-panel amplicon screening. However, barriers remain with regard to establishing NGS in a laboratory for the first time and this hinders its uptake in clinical microbiology and other settings. One of these challenges is the lack of a simplified step-by-step protocol that can be picked up by laboratory personnel with no prior training or experience in NGS and used to gen- erate reliable, high quality sequence data. Illumina dye-sequencing is currently considered the gold standard internationally in terms of read depth and base-calling accuracy, genome coverage, scalability, and the range of sequencing applications it delivers. In this work, we produced an easy-to-follow, step-by-step NGS protocol with consistent genome coverage and average read depth that was applicable to a range of bacterial pathogens i.e., Gram-positive van- comycin-resistant Enterococcus faecium, Gram-negative non-typeable Haemophilus influenzae, and acid-fast high-GC content Mycobacterium tuberculosis. This protocol can be used to generate Illumina-based WGS data for clinical isolates of bacterial pathogens of importance to human health. Figure 1 is the graphical summary of the process of obtaining whole genome sequence data from bacterial culture. This wet labora- tory procedure generated FastQ reads from the sequencer within three days of start. We modified a number of the DNA extraction steps to obtain a sufficient quantity of contamination free template. Similarly, we replaced library normalization plates and Nextera XT tagment amplicon (NTA) plates with conventional polymerase chain reaction (PCR) tubes which may represent a cost-effective alternative. In ad- dition, we have recommended the use of equal DNA concentrations of each library during library normalization to ensure better coverage and minimize bias. Simplification of bacterial NGS may assist in its uptake by beginner users.
Figure 1. Graphical summary of the process of obtaining whole genome sequence data from a bacterial culture.

Anticipated Results

Coverage refers to the percentage of reference genome bases covered by mapped sequence reads. Mean read depth indicates the mean number of times each base is mapped by a sequence read. Reference genomes used were E. faecium ST18 DO (TX16) (accession number NC_017960), Haemophilus influenzae 86-028NP (nontypeable) (accession number NC_007146), and Mycobacterium tuberculosis H37Rv (accession number NC000962). VRE, vancomycin resistant Enterococcus faecium; NtHi, non-typeable Haemophilus influenza; MTBC, Mycobacterium tuberculosis complex.
Anticipated Results

A consensus sequence was generated for each of the isolates analysed in Geneious. The Geneious report provided information on the percentage coverage of test sequence to the reference genome and the mean read depth (Table 1). Each contiguous sequence is viewable in Geneious and can be analysed for coverage with respect to the reference genome. Quality control checks of raw sequence data were also performed using FastQC [22]. This freely-available software provided information re- garding per base sequence content and quality, per base and sequence GC content, and highlighted the parameters of the sequence quality. Initial typing analysis We used open source databases to analyze the sequence data. For example, Geneious mapped contiguous sequences were imported into PubMLST (https://pubmlst.org/) for sequence typing of Haemophilus influenzae and vancomycin-resistant Enterococcus faecium. This can also be achieved using raw fastq reads in the MLST profiling tool from the Center for Genomic Epidemiology (CGE) database (http://www. genomicepidemiology.org/). The Resfinder tool (https://cge.cbs.dtu. dk/services/ResFinder/) was used to identify acquired antimicrobial resistance genes from raw fastq files. For example, PubMLST typing classified NTHi 1 as sequence type 46 and Resfinder did not detect the presence of any antimicrobial resistance determining mutations. Mycobacterium tuberculosis complex raw fastq.gz files were uploaded to the TGS-TB database (https://gph.niid.go.jp/tgs-tb/) to predict drug susceptibility, in silico spoligotype, lineage type, and phylogenetic classification. This database also enabled detection of IS6110 insertion sites, and 43 loci for variable number tandem repeat (VNTR) typing. The drug resistance profile of the MTBC isolates were further confirmed using PhyResSE database (http://phyresse.org/). For example, TGS-TB identified MTBC1 as a drug susceptible Mycobacterium bovis isolate. Coverage refers to the percentage of reference genome bases covered by mapped sequence reads. Mean read depth indicates the mean number of times each base is mapped by a sequence read. Reference genomes used were E. faecium ST18 DO (TX16) (accession number NC_017960), Haemophilus influenzae 86-028NP (nontypeable) (accession number NC_007146), and Mycobacterium tuberculosis H37Rv (accession number NC000962). VRE, van- comycin resistant Enterococcus faecium; NtHi, non-typeable Haemophilus influenza; MTBC, Mycobacterium tuberculosis complex.

Troubleshooting



Troubleshooting

Possible problems and their troubleshooting solutions are listed in Table 2. There are a number of limitations associated with the protocol that should be noted. These include: effective results with the protocol are reliant on the efficacy of the extraction procedure in producing a sufficient quantity of genomic DNA; analysis of sequences generated on an Illumina platform can be affected by the presence of highly repetitive regions; and depending on the output information sought, genome assembly can be influenced by the reference genome selected for the mapping of reads. Nevertheless, the protocol was effective in generating high quality sequencing data for the range of bacterial species tested. Acknowledgments This research was supported by funding from the Royal Hobart Hospital Research Foundation (17-104) and the Tasmanian Community Fund (36Medium00014).


Materials
Reagents

  • Lysozyme (VWR, Australia, Cat.# 0663-10G)
  • Ethanol, Pure (Sigma-Aldrich, Australia, Cat. # E7023)
  • 2-Propanol (Sigma-Aldrich, Australia, Cat. # I8912)
  • Phosphate Buffered Saline (GibcoTM- Thermo Fisher Scientific, UK, Cat. # 10010023)
  • Ultrapure™ DNase/RNase Free Distilled Water (Invitrogen, Australia Cat. # 10977-015)
  • DNeasy® Blood and Tissue Kit (Qiagen, Germany, Cat. # 69504)
  • High Pure PCR Template Preparation Kit (Roche, Germany, Cat. # 11796828001)
  • Qubit™ dsDNA HS (High Sensitivity) Assay Kit (Invitrogen, Australia, Cat. # Q32851)
  • Nextera® DNA Library Preparation Kit (Illumina, USA, Cat. # FC-121-1030)
  • Nextera® XT Library Preparation Kit (Illumina, USA, Cat. # FC-131-1024)
  • Nextera® XT Index Kit (Illumina, USA, Cat. # FC-131-1001)
  • Miseq Reagent Kit v2 (300 cycles) (Illumina, USA, Cat. # MS-102-2002)
  • KAPA Library Quantification Kit (Illumina, USA, Cat. # 07960140001)
  • Agencourt® AMPure XP beads (Beckman Coulter, USA, Cat. # A63880)


Recipes

  • Qubit working solution: dilute Qubit dsDNA HS Reagent 1:200 in Qubit dsDNA HS buffer.
For n samples, prepare n × Amount200 µL working solution.

  • 80% ethanol: add Amount2 mL absolute ethanol into Amount8 mL distilled water.

  • 0.2 M NaOH: weigh Amount0.04 g of NaOH pellet and dissolve it into Amount5 mL distilled water.


Equipment

  • Qubit™ assay tubes (Life-technologies, USA, Cat. # Q32856)
  • PCR tubes (Molecular Bioproducts, USA, Cat. # MBP3412)
  • Qubit® 2.0 Fluorometer (Invitrogen, Australia, Cat. # Q32866)
  • Agencourt Magnetic stand (Beckman Coulter, USA, Cat. # A32782)
  • Applied Biosystems® Veriti 96-Well thermal cycler (Thermo Fisher Scientific, USA)
  • Rotor-Gene 6000 real-time thermocycler (Corbett Research, Australia)

Safety warnings
See SDS (Safety Data Sheet) for safety warnings and hazards.

All bacterial cultures should be treated as potentially pathogenic to the laboratory worker and colleagues. Therefore, the use of appropriate aseptic techniques, and the wearing of appropriate personal protective equipment are strongly recommended to maintain acceptable work health and safety standards and minimise exposure to harmful agents.
Extraction of bacterial genomic DNA
Extraction of bacterial genomic DNA
Pellet the liquid culture media (Amount200 µL ) by centrifuging at Centrifigation8000 x g for Duration00:08:00 in a sterile microfuge tube.
Note
CRITICAL STEP: All bacterial cultures should be treated as potentially pathogenic to the laboratory worker and colleagues. Therefore, the use of appropriate aseptic techniques, and the wearing of appropriate personal protective equipment are strongly recommended to maintain acceptable work health and safety standards and minimise exposure to harmful agents.

Resuspend the pellet in Amount600 µL phosphate-buffered saline (1x) until the absorbance at 600 nm (A600) is between 1.0 and 2.0.
Lyse the cells by adding Amount30 µL lysozyme (50 mg/ml), vortex, and incubate at Temperature37 °C for Duration01:00:00 .

Elute the DNA in Amount100 µL volume.
Treat it with Amount2 µL RNase (100 mg/ml) (Qiagen, Hilden, Germany) and incubate at TemperatureRoom temperature for Duration01:00:00 .

Purify RNase-treated DNA using the High Pure PCR Template Preparation Kit.
Note
TIP: Perform only 4 DNA spin-wash steps instead of 9 recommended steps. Pre-incubate the elution buffer in a heat block set at Temperature70 °C .


Add Amount100 µL of binding buffer to RNase treated DNA and incubate at Temperature70 °C for Duration00:10:00 .

Add Amount50 µL of 2-Propanol and transfer the content to a Roche spin column and spin at Centrifigation8000 x g for Duration00:01:00 .
Discard the flow through and insert the spin column into a new collection tube.

Wash by adding Amount500 µL wash buffer and spin at Centrifigation8000 x g for Duration00:01:00 .


Discard the flow through and insert the spin column into a new collection tube.
Perform a final spin at Centrifigation8000 x g for Duration00:01:00 .
Finally, insert the column into a 1.5 ml sterile microfuge tube, add Amount50 µL of pre-heated elution buffer and spin at Centrifigation8000 x g for Duration00:01:00 to elute the purified DNA for next generation sequencing.
Note
CRITICAL STEP: For next generation sequencing, contaminant-free, high-molecular weight DNA with an absorbance (260 nm/280 nm) ratio between 1.8 to 2.0 is considered a high-quality template DNA.


Quantification of bacterial genomic DNA
Quantification of bacterial genomic DNA
Dispense Amount190 µL and Amount198 µL of Qubit working solution in standard and sample tubes, respectively.

Add Amount10 µL standards (1 and 2) and Amount2 µL of sample in separate Qubit assay tubes.
Vortex the mixture for Duration00:00:03 and incubate at TemperatureRoom temperature for Duration00:02:00 before taking the reading.

Adjust the DNA concentration of each sample to 0.2 ng/μl by diluting with a required volume of distilled water.
Note
CRITICAL STEP: The use of an accurate concentration of DNA is crucial for bacterial DNA genomic library preparation.

Tagmentation and PCR amplification of bacterial genomic DNA
Tagmentation and PCR amplification of bacterial genomic DNA
TIP: For all of the methods below, the recommended 96-well TYC plate can be replaced with 0.2 ml thin wall clear, flat capped PCR tubes. In addition, multichannel pipettes and the high-speed micro plate shaker can be replaced with single channel pipettes and a bench top centrifuge, respectively.
Nextera XT tagment amplicon construction
Nextera XT tagment amplicon construction
In a PCR tube, add Amount5 µL tagmentation DNA buffer and Amount2.5 µL amplification tagmentation mix to Amount2.5 µL (0.2 ng/μl) input DNA.
Briefly vortexed the content and transfer to the thermocycler programmed for one step at Temperature55 °C for Duration00:05:00 with heated lid, followed by a hold at Temperature10 °C for a volume of Amount10 µL .

Neutralization of Nextera XT tagment amplicon
Neutralization of Nextera XT tagment amplicon
Immediately after reaching the hold temperature of Temperature10 °C in the above step, neutralize NTA by adding Amount2.5 µL neutralization tagmentation buffer and incubate at TemperatureRoom temperature for Duration00:05:00 .

PCR amplification
PCR amplification
For amplification, add Amount7.5 µL Nextera® PCR mastermix andAmount2.5 µL of each index primer, 1 and 2, to a tube containing neutralized NTA.
Note
CRITICAL STEP: Primer combinations, S502 with N705 /706 and S503 with N701/702 should be avoided. Avoid any repeated combinations and carefully note the primers used for each sample.

Gently pipette the content and perform a quick spin.
Proceed to amplification in a thermocycler programmed for a working volume of Amount25 µL with the following settings and a heated lid:
CylesTemperature (°C)Time
1723 min
19530 s
129510 s
125530 s
127230 s
1725 min
110hold

Note
The amplified, tagmented library can be stored at Temperature4 °C overnight for PCR clean-up the next day.


Cleaning up the PCR product
Cleaning up the PCR product
NOTE: Bring AMPure XP beads to room temperature (for Duration00:20:00 ).
CRITICAL STEP: Prepare fresh 80% (v/v) ethanol and 0.2 M NaOH.

To Amount22.5 µL of PCR product, add Amount11.25 µL of vortexed (Duration00:00:30 ) AMPure XP beads and mix by pipetting (10 times).
Incubate at room temperature for Duration00:05:00 .

Place the tube on a magnetic stand for Duration00:02:00 .
While leaving the PCR tubes on the magnetic stand, carefully aspirate the supernatant.
Note
CRITICAL STEP: Do not aspirate beads. If aspirated, redo steps 24 and 25.

Add Amount100 µL of 80% ethanol and leave on the stand in the magnetic stand for Duration00:00:30 .
Note
CRITICAL STEP: Do not resuspend the beads.

Aspirate out the supernatant carefully.
Add Amount100 µL of 80% ethanol and leave on the stand in the magnetic stand for Duration00:00:30 .


Aspirate out the supernatant carefully.
Remove the tube from magnetic stand and allow to air dry in a tube stand for approximately Duration00:05:00 .
Note
CRITICAL STEP: Visually check for cracks as over drying the beads will significantly reduce elution efficiency.

Add Amount26.15 µL of resuspension buffer and gently pipette 20 times to mix.
Incubate the tubes at room temperature for Duration00:02:00 and then place on a magnetic stand for Duration00:02:00 (until the supernatant cleared).
Transfer the supernatant (Amount25 µL ) to a new PCR tube.
Note
NOTE: The final supernatant can be stored at 15°C to 20°C for up to 1 week but we recommend proceeding to library normalization immediately.

Library normalization
Library normalization
Perform the Qubit DNA quantification method as described above to determine the genomic DNA concentration in cleaned up product.
Pool the genomic DNA from all of the tubes.
Note
NOTE: Sample with the lowest DNA concentration can be used in a volume of Amount10 µL as the reference to prepare a library pool using the formula:

Volume required (V2) = Concentration original (S1) x Volume total (V1 = 10 μl) / Concentration required (S2).
To x μl of library pool, add x μl of freshly prepared 0.2 molar NaOH (final concentration 0.1 molar) and incubate for Duration00:05:00 at TemperatureRoom temperature .

To the NaOH treated suspension add an equal volume (2x μl) of LNS1. Label the tube as pooled amplified library (PAL).
Note
TIPS: In this modified step, normalize the library by using LNS1 (Library Normalisation Storage Buffer 1) only.

Dilute the PAL to 1:1000 by adding Amount1 µL of NaOH-LNS1 treated suspension to Amount999 µL of ultrapure distilled water.
Use KAPA library quantification kit (No ROX) to check the concentration of diluted pooled library in a real time PCR system using the following set up:
CylesTemperature (°C)Time
19510 min
409510 s
406030 s
Note
NOTE: Include a set of six DNA standards (with concentrations ranging from 20 pM to 0.0002 pM), three sets of negative control (ultrapure distilled water), and three sets of the DNA library in the qPCR run.

Determine the concentration of DNA in the pooled library by the standard curve method and calculate concentration in picomolar (pM) for each tube.
Note
NOTE: To calculate the original concentration of the pooled library we applied the formula:

[Average sample concentration (in pM) . insert size standards (452 bp) . dilution factor (1000)] divisible by the Insert size of pooled library (500 BP)

For example, for a qPCR determined concentration of 2.36 pM in a 1:1000 dilution of the pooled library, the library DNA concentration will be:

(2.36 pM x 452 bp x 1000) / 500 bp = 2133.44 pM

The value obtained from the calculation represents the concentration of DNA in the pooled library.

To estimate the dilution factor required to achieve a final library concentration of Concentration15 picomolar (pM) in a Amount600 µL volume use the formula:

Volume required = (Concentration required × Volume total) / Concentration original
= (15 pM × 600 μl) / 2133.44 pM
= 4.22 μl
Note
NOTE: Therefore, Amount4.22 µL is added to Amount595.78 µL of HT buffer to produce a final concentration of Concentration15 picomolar (pM) , in a final volume of Amount600 µL .

The diluted library is then ready to be heat denatured and loaded into the MiSeq reagent cartridge.

Preparing pooled library for loading onto MiSeq
Preparing pooled library for loading onto MiSeq
Thaw the PAL at room temperature and mix by pipetting up and down (5 times) followed by brief centrifugation.
Based on the library concentration example above, transfer Amount595.78 µL of HT buffer to a Amount1.5 mL diluted amplified library (DAL) tube containing Amount4.22 µL PAL.
Mix using a pipette (5 times).
Vortex the DAL tube at top speed, centrifuge briefly , and incubate exactly for Duration00:02:00 at Temperature96 °C ± Temperature2 °C .
Immediately transfer the DAL tube to ice for at least Duration00:05:00 or until loading.
Note
CRITICAL STEP: Put the Illumina MiSeq sequencer through a short wash cycle to avoid cross-contamination of the DAL from previous usage.

Thaw the MiSeq reagent cartridge at TemperatureRoom temperature .
Generate a MiSeq sample sheet using the Illumina Experiment Manager. See step 22 (Go togo to step #22 ) to identify primer sets for each sample.
Use the following configuration to set up the Miseq machine. Generate FASTQ workflow; FASTQ Only application; NexteraXT assay; 151 insert reads; assignment of the samples with a unique identifier and index-pair combination.
Note
CRITICAL STEP: Rinse the flow cell with MilliQ water and remove traces of water using soft tissue paper before inserting into the machine.

Transfer the entire Amount600 µL of DAL to the “Load here” well of the MiSeq reagent cartridge.
Following the setup procedure of the Illumina Experiment Manager, insert the cartridge into MiSeq instrument for sequencing to commence.
Note
TIPS: The raw FastQ sequence reads from whole-genome sequencing can be stored on the local computer as well as on the Illumina BaseSpace server (https://basespace.illumina.com/) for further analysis.

Bioinformatic analyses
Bioinformatic analyses
NOTE: The selection of bioinformatics software for the analysis of WGS data will be determined by the objective of the study. Here, we used Geneious 9.1.8 (Biomatters Ltd.), a desktop software to analyse our sequence data. Geneious was used to map the Fastq sequence reads to a publicly available reference genome for each species as follows:
Download Geneious.
Go to File | Import | From File. Import raw-read files (Sample_xx_R1.fastq.gz and Sample_xx_R2.fastq. gz) into Geneious.
Download the Reference Genome from the NCBI database. For example, Enterococcus faecium NC_017960. In the Left panel | Go to NCBI | Nucleotide.
Enter NC_017960 | Click Search.
Once the genome has been found, click Download Full Sequence(s) .

Download the NC_017960 reference genome (The icon changes to a green circular genome when completed).

Drag and drop the NC_017960 reference genome into the working folder.
Mapping the isolate sequence to the reference genome Hold CTRL and select both R1 and R2 raw read files (imported), and the reference genome (NC_017960) (downloaded).
Click Align | Assemble|Map to Reference.

Check the settings
Reference Sequence = NC_017960
Mapper = Bowtie2–fast and accurate read mapper

Trim Before Mapping = Do not trim

Results: Select all options

Results | Save consensus sequences | Options

Threshold = Highest Quality
Threshold for sequences without quality = 95%

No coverage call = ‘–’

When mapping to reference is complete, a new folder will be created containing four files:

Assembly Report

Consensus

Contig
Unused Reads

NOTE: Setting may vary depending on objective of analyses and quality of fastq reads.

NOTE: We also used open source databases, for example, TGS-TB, PhyResSe and the Center for Genomic Epidemiology’s [20] ResFinder and VirulenceFinde, to further analyse the whole genome sequence data of our selection of bacterial pathogens. These freely-available databases enable the acquisition of information on bacterial pathogens that included genotype and phylogeny, antibiotic-resistance mutations, and the presence of known virulence genes.