Abbreviations used: DAL, diluted amplified library; NGS, next-generation sequencing; NTA, Nextera XT tagment amplicon; PAL, pooled amplified library; PCR, polymerase chain reaction; VNTR, variable number tandem repeat; WES, whole exome sequencing; WGS, whole genome sequencing
Received 2018 Sep 28; Revised 2019 Jan 13; Accepted 2019 Jan 28. Copyright © 2013-2019 The Journal of Biological Methods, All rights reserved.This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License: http://creativecommons.org/licenses/by-nc-sa/4.0
Bacterial whole genome sequencing (WGS) is becoming a widely-used technique in research, clinical diagnostic, and public health laboratories. It enables high resolution characterization of bacterial pathogens in terms of properties that include antibiotic resistance, molecular epidemiology, and virulence. The introduction of next-generation sequencing instrumentation has made WGS attainable in terms of costs. However, the lack of a beginner’s protocol for WGS still represents a barrier to its adoption in some settings. Here, we present detailed step-by-step methods for obtaining WGS data from a range of different bacteria (Gram-positive, Gram-negative, and acid-fast) using the Illumina platform. Modifications have been performed with respect to DNA extraction and library normalization to maximize the output from the laboratory consumables invested. The protocol represents a simplified and reproducible method for producing high quality sequencing data. The key advantages of this protocol include: simplicity of the protocol for users with no prior genome sequencing experience and reproducibility of the protocol across a wide range of bacteria.
Keywords: whole genome sequencing, Enterococcus faecium, Haemophilus influenzae, Mycobacterium tuberculosis
Using Sanger sequencing, the Human Genome Project expended approximately USD $2.7 billion and took more than 10 years to produce the first human genome sequence. Today, a human genome can be sequenced in a matter of days for less than USD $1000 on a single next-generation sequencing (NGS) machine. This step change in throughput and per-base cost has transformed the use of DNA sequencing in biomedical research and is being translated in an expanding number of ways into medicine. NGS is increasingly being applied to understanding and managing infectious diseases. This includes the sequencing of microbial genomes for the purposes of laboratory identification of infectious agents [1], detection of antibiotic resistance markers [2], and the public health surveillance of epidemiological clusters and outbreaks [3]. Examples include its deployment in public health surveillance and control of community cases of Escherichia coli [4], Campylobacter jejuni [5], Legionella pneumophila [6] and Mycobacterium tuberculosis [7] disease, or global and regional epidemics caused by influenza [8], Ebola [9], and Zika [10] viruses. It has also been utilised to track the source and spread of healthcare-associated infections caused by Staphylococcus aureus [11], Pseudomonas aeruginosa [12], Acinetobacter baumannii [13], and Enterococcus faecium [14] in order to guide infection prevention and control in hospitals.
In addition to its whole genome (WGS), whole exome (WES), transcriptome (RNA-Seq), bisulphite methylome, and metagenomic sequencing capabilities, NGS can be directed to the detection of specific genes or mutations associated with human disease through targeted-panel amplicon screening. However, barriers remain with regard to establishing NGS in a laboratory for the first time and this hinders its uptake in clinical microbiology and other settings. One of these challenges is the lack of a simplified step-by-step protocol that can be picked up by laboratory personnel with no prior training or experience in NGS and used to generate reliable, high quality sequence data. Illumina dye-sequencing is currently considered the gold standard internationally in terms of read depth and base-calling accuracy, genome coverage, scalability, and the range of sequencing applications it delivers.
In this work, we produced an easy-to-follow, step-by-step NGS protocol with consistent genome coverage and average read depth that was applicable to a range of bacterial pathogens i.e., Gram-positive vancomycin-resistant Enterococcus faecium, Gram-negative non-typeable Haemophilus influenzae, and acid-fast high-GC content Mycobacterium tuberculosis. This protocol can be used to generate Illumina-based WGS data for clinical isolates of bacterial pathogens of importance to human health.
Figure 1 is the graphical summary of the process of obtaining whole genome sequence data from bacterial culture. This wet laboratory procedure generated FastQ reads from the sequencer within three days of start. We modified a number of the DNA extraction steps to obtain a sufficient quantity of contamination free template. Similarly, we replaced library normalization plates and Nextera XT tagment amplicon (NTA) plates with conventional polymerase chain reaction (PCR) tubes which may represent a cost-effective alternative. In addition, we have recommended the use of equal DNA concentrations of each library during library normalization to ensure better coverage and minimize bias. Simplification of bacterial NGS may assist in its uptake by beginner users.
Graphical summary of the process of obtaining whole genome sequence data from a bacterial culture.