EMBARGOED UNTIL: Tuesday 5/21, 10:45 AM MDT
(Symposium Session 165)
Wyndmoor, PA, United States
Bacteria are an integral part of our environment and even our bodies. Most bacteria, including Escherichia coli, are beneficial, or at least harmless. However some bacteria can cause illness. There are at least 187 different types of E. coli strains based on the reaction of their surface polysaccharide antigen to different antisera (this defines their “serogroup”), including the most common occupants in our guts to the highly virulent strains that cause diarrhea, renal failure, or even death. Currently, researchers have to isolate and/or enrich target analytes from complicated clinical, food, or environmental samples to identify them -- a labor-intensive process that can take days and only be performed in specialized laboratories.
To prevent or track outbreaks of illness caused by food-borne pathogenic bacteria, such as toxin-producing E. coli, it is important to be able to quickly and accurately detect or identify potentially harmful strains in foods or patients samples directly. Metagenomics, as an emerging field of study that enables investigators to study the genetic materials of metagenomes recovered directly from microbial communities, provides a new way of identifying microorganisms in foods and patients. Moreover, recent advances in low-cost next-generation DNA sequencing (NGS) or high-throughput sequencing technologies have provided a potential for accurate, rapid, and comprehensive detection and identification of bacterial pathogens. We can liken the process of identifying a pathogenic bacterium based on its DNA sequence to finding someone on Facebook. Each particular strain (“Facebook user”) of E. coli is unique and has its own individual “Facebook” page and “User profile” based on its relevant gene content (i.e., genes that help us define the particular serogroup). If we have DNA sequence information from strains isolated from food or associated with an outbreak, we can search our database and identify the type of strain based on their profiles.
Setting up a comprehensive database of all-encompassing profiles for each serogroup of E. coli is a time consuming and crucial process. The preliminary testing of this database for identification of E. coli serogroups/serotypes is described by Drs. Xianghe Yan, Chin-Yi Chen, and Pina Fratamico at the USDA, Agricultural Research Service (ARS), Eastern Regional Research Center in Wyndmoor, PA, and collaborator Dr. Jing Hu at Franklin & Marshall College, Lancaster, PA, in the study “A Simulated Metagenomic Approach for Bacterial Serotyping Using Shotgun Genome Sequences Coupled with O-Antigen Gene Cluster Analysis” presented by Dr. Xianghe Yan at the 113th General Meeting of American Society for Microbiology, Denver, CO, on Monday, May 20 2013.
Scientists at USDA-ARS and their collaborator have built a database consisting of E. coli O-antigen gene cluster sequence data and that of various virulence factors for all 187 serogroups of E. coli including the pathogenic Shiga toxin-producing E. coli (STEC) strains as a first step to explore the power of NGS and computational technologies to facilitate and streamline the processing and analysis of NGS data for pathogen detection and identification. Minimally processed NGS data will be compared to the entries in the database, and based on the number of hits to specific genes-of-interest, the search will return with the possible O-antigen gene(s) and a score to each potential virulence factor. Parameters can be set to send out warnings to alert the
presence of potential pathogenic strains based on the number and type of virulence genes detected. In the study, the pooled NGS raw sequences were accurately reclassified into appropriate serogroups. The coverage of sequence reads allowed a numerical readout of the O-antigen and stx genes, enabling rapid detection of pathogenic STEC and an estimate of the richness of certain serogroups in the simulated metagenomic datasets. The database can also be expanded as more bacterial genome sequences become available. This approach has great potential for comprehensive molecular serotyping, genotyping, and identification of emerging pathogenic strains in clinical, food, or environmental samples.