A deletion in theSTA1promoter determines maltotriose and starch utilization inSTA1+Saccharomyces cerevisiaestrains

Mapping Intimacies ◽

10.1101/654681 ◽

2019 ◽

Author(s):

Kristoffer Krogerus ◽

Frederico Magalhães ◽

Joosu Kuivanen ◽

Brian Gibson

Keyword(s):

Sequence Data ◽

Pcr Primers ◽

Evolutionary Strategy ◽

Detection Methods ◽

Whole Genome Sequence ◽

Genetic Determinants ◽

Strain Development ◽

Beer Brewing ◽

Starch Utilization ◽

Hydrolysis Of

AbstractDiastatic strains ofSaccharomyces cerevisiaeare common contaminants in beer fermentations and are capable of producing an extracellularSTA1-encoded glucoamylase. Recent studies have revealed variable diastatic ability in strains tested positive forSTA1,and here we elucidate genetic determinants behind this variation. We show that poorly diastatic strains have a 1162 bp deletion in the promoter ofSTA1. With CRISPR/Cas9-aided reverse engineering, we show that this deletion greatly decreases the ability to grow in beer and consume dextrin, and the expression ofSTA1. New PCR primers were designed for differentiation of highly and poorly diastatic strains based on the presence of the deletion in theSTA1promoter. In addition, using publically available whole genome sequence data, we show that theSTA1gene is prevalent in among the ‘Beer 2’/’Mosaic Beer’ brewing strains. These strains utilize maltotriose efficiently, but the mechanisms for this have been unknown. By deletingSTA1from a number of highly diastatic strains, we show here that extracellular hydrolysis of maltotriose throughSTA1appears to be the dominant mechanism enabling maltotriose use during wort fermentation inSTA1+strains. The formation and retention ofSTA1seems to be an alternative evolutionary strategy for efficient utilization of sugars present in brewer’s wort. The results of this study allow for the improved reliability of molecular detection methods for diastatic contaminants in beer, and can be exploited for strain development where maltotriose use is desired.

Download Full-text

Kaptive 2.0: updated capsule and LPS locus typing for the Klebsiella pneumoniae species complex

10.1101/2021.11.05.467534 ◽

2021 ◽

Author(s):

Margaret M.C. Lam ◽

Ryan R. Wick ◽

Kathryn E. Holt ◽

Kelly L Wyres

Keyword(s):

Klebsiella Pneumoniae ◽

Species Complex ◽

Sequence Data ◽

Control Strategies ◽

Surface Antigens ◽

Whole Genome Sequence ◽

Reference Database ◽

Genetic Determinants ◽

Extensive Search ◽

First Time

The outer polysaccharide capsule and lipopolysaccharide antigens are key targets for novel control strategies targeting Klebsiella pneumoniae and related taxa from the K. pneumoniae species complex (KpSC), including vaccines, phage and monoclonal antibody therapies. Given the importance and growing interest in these highly diverse surface antigens, we had previously developed Kaptive, a tool for rapidly identifying and typing capsule (K) and outer lipopolysaccharide (O) loci from whole genome sequence data. Here, we report two significant updates, now freely available in Kaptive 2.0 (github.com/katholt/kaptive); i) the addition of 16 novel K locus sequences to the K locus reference database following an extensive search of >17,000 KpSC genomes; and ii) enhanced O locus typing to enable prediction of the clinically relevant O2 antigen (sub)types, for which the genetic determinants have been recently described. We applied Kaptive 2.0 to a curated dataset of >12,000 public KpSC genomes to explore for the first time the distribution of predicted O (sub)types across species, sampling niches and clones, which highlighted key differences in the distributions that warrant further investigation. As the uptake of genomic surveillance approaches continues to expand globally, the application of Kaptive 2.0 will generate novel insights essential for the design of effective KpSC control strategies.

Download Full-text

Identifying genetic determinants of complex phenotypes from whole genome sequence data

10.1101/181222 ◽

2017 ◽

Cited By ~ 1

Author(s):

George S. Long ◽

Mohammed Hussen ◽

Jonathan Dench ◽

Stéphane Aris-Brosou

Keyword(s):

Machine Learning ◽

Sequence Data ◽

Association Studies ◽

Machine Learning Algorithms ◽

Whole Genome Sequence ◽

Genome Wide Association Studies ◽

Genetic Determinants ◽

Data Set ◽

Adaptive Boosting ◽

Complex Phenotypes

AbstractA critical goal in biology is to relate the phenotype to the genotype, that is, to find the genetic determinants of various traits. However, while simple monofactorial determinants are relatively easy to identify, the underpinnings of complex phenotypes are harder to predict. While traditional approaches rely on genome-wide association studies based on Single Nucleotide Polymorphism data, the ability of machine learning algorithms to find these determinants in whole proteome data is still not well known. To better understand the applicability of machine learning in this case, we implemented two such algorithms, adaptive boosting (AB) and repeated random forest (RRF), and developed a chunking layer that facilitates the analysis of whole proteome data. We first assessed the performance of these algorithms and tuned them on an influenza data set, for which the determinants of three complex phenotypes (in-fectivity, transmissibility, and pathogenicity) are known based on experimental evidence. This allowed us to show that chunking improves runtimes by an order of magnitude. Based on simulations, we showed that chunking also increases sensitivity of the predictions, reaching 100% with as few as 20 sequences in a small proteome as in the influenza case (5k sites), but may require at least 30 sequences to reach 90% on larger alignments (500k sites). While RRF has less specificity than RF, it was never < 50%, and RRF sensitivity was significantly higher at smaller chunk sizes. We then used these algorithms to predict the determinants of three types of drug resistance (to Ciprofloxacin, Ceftazidime, and Gentamicin) in a bacterium, Pseudomonas aeruginosa. While both algorithms performed well in the case of the influenza data, results were more nuanced in the bacterial case, with RRF making more sensible predictions, with smaller errors rates, than AB. Altogether, we demonstrated that ML algorithms can be used to identify genetic determinants in small proteomes (viruses), even when trained on small numbers of individuals. We further showed that our RRF algorithm may deserve more scrutiny, which should be facilitated by the decreasing costs of both sequencing and phenotyping of large cohorts of individuals.

Download Full-text

Identifying genetic determinants of complex phenotypes from whole genome sequence data

BMC Genomics ◽

10.1186/s12864-019-5820-0 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 2

Author(s):

George S. Long ◽

Mohammed Hussen ◽

Jonathan Dench ◽

Stéphane Aris-Brosou

Keyword(s):

Genome Sequence ◽

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genetic Determinants ◽

Genome Sequence Data ◽

Complex Phenotypes

Download Full-text

Faculty Opinions recommendation of Optimal algorithms for haplotype assembly from whole-genome sequence data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.13339986.14707085 ◽

2011 ◽

Author(s):

Alejandro Schaffer

Keyword(s):

Genome Sequence ◽

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome ◽

Optimal Algorithms ◽

Genome Sequence Data ◽

Haplotype Assembly

Download Full-text

TIGER: inferring DNA replication timing from whole-genome sequence data

Bioinformatics ◽

10.1093/bioinformatics/btab166 ◽

2021 ◽

Cited By ~ 1

Author(s):

Amnon Koren ◽

Dashiell J Massey ◽

Alexa N Bracci

Keyword(s):

Dna Replication ◽

Genome Sequence ◽

Genomic Dna ◽

Sequence Data ◽

Replication Timing ◽

Whole Genome Sequence ◽

Supplementary Information ◽

Whole Genome ◽

Genome Sequence Data ◽

Dna Replication Timing

Abstract Motivation Genomic DNA replicates according to a reproducible spatiotemporal program, with some loci replicating early in S phase while others replicate late. Despite being a central cellular process, DNA replication timing studies have been limited in scale due to technical challenges. Results We present TIGER (Timing Inferred from Genome Replication), a computational approach for extracting DNA replication timing information from whole genome sequence data obtained from proliferating cell samples. The presence of replicating cells in a biological specimen leads to non-uniform representation of genomic DNA that depends on the timing of replication of different genomic loci. Replication dynamics can hence be observed in genome sequence data by analyzing DNA copy number along chromosomes while accounting for other sources of sequence coverage variation. TIGER is applicable to any species with a contiguous genome assembly and rivals the quality of experimental measurements of DNA replication timing. It provides a straightforward approach for measuring replication timing and can readily be applied at scale. Availability and Implementation TIGER is available at https://github.com/TheKorenLab/TIGER. Supplementary information Supplementary data are available at Bioinformatics online

Download Full-text

Whole genome sequence data of Bacillus australimaris strain B28A, isolated from Marine Water in India

Data in Brief ◽

10.1016/j.dib.2021.107240 ◽

2021 ◽

pp. 107240

Author(s):

Wael Ali Mohammed Hadi ◽

Boby T Edwin ◽

A Jayakumaran Nair

Keyword(s):

Genome Sequence ◽

Sequence Data ◽

Marine Water ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Sequence Data

Download Full-text

Whole genome sequence data of Mycobacterium tuberculosis XDR strain, isolated from patient in Kazakhstan

Data in Brief ◽

10.1016/j.dib.2020.106416 ◽

2020 ◽

Vol 33 ◽

pp. 106416

Author(s):

Asset Daniyarov ◽

Askhat Molkenov ◽

Saule Rakhimova ◽

Ainur Akhmetova ◽

Zhannur Nurkina ◽

...

Keyword(s):

Mycobacterium Tuberculosis ◽

Genome Sequence ◽

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Sequence Data

Download Full-text

Engineering of Streptoalloteichus tenebrarius 2444 for Sustainable Production of Tobramycin

Molecules ◽

10.3390/molecules26144343 ◽

2021 ◽

Vol 26 (14) ◽

pp. 4343

Author(s):

Lena Mitousis ◽

Hannes Maier ◽

Luka Martinovic ◽

Andreas Kulik ◽

Sigrid Stockert ◽

...

Keyword(s):

Aminoglycoside Antibiotic ◽

Biosynthetic Gene Cluster ◽

Genetically Engineered ◽

Sustainable Production ◽

Biosynthetic Gene ◽

Strain Development ◽

Antibiotic Agent ◽

Phenotypic Stability ◽

High Concentrations ◽

Hydrolysis Of

Tobramycin is a broad-spectrum aminoglycoside antibiotic agent. The compound is obtained from the base-catalyzed hydrolysis of carbamoyltobramycin (CTB), which is naturally produced by the actinomycete Streptoalloteichus tenebrarius. However, the strain uses the same precursors to synthesize several structurally related aminoglycosides. Consequently, the production yields of tobramycin are low, and the compound’s purification is very challenging, costly, and time-consuming. In this study, the production of the main undesired product, apramycin, in the industrial isolate Streptoalloteichus tenebrarius 2444 was decreased by applying the fermentation media M10 and M11, which contained high concentrations of starch and dextrin. Furthermore, the strain was genetically engineered by the inactivation of the aprK gene (∆aprK), resulting in the abolishment of apramycin biosynthesis. In the next step of strain development, an additional copy of the tobramycin biosynthetic gene cluster (BGC) was introduced into the ∆aprK mutant. Fermentation by the engineered strain (∆aprK_1-17L) in M11 medium resulted in a 3- to 4-fold higher production than fermentation by the precursor strain (∆aprK). The phenotypic stability of the mutant without selection pressure was validated. The use of the engineered S. tenebrarius 2444 facilitates a step-saving, efficient, and, thus, more sustainable production of the valuable compound tobramycin on an industrial scale.

Download Full-text

Whole-genome sequence data suggests environmental adaptation of Ethiopian sheep populations

Genome Biology and Evolution ◽

10.1093/gbe/evab014 ◽

2021 ◽

Author(s):

Pamela Wiener ◽

Christelle Robert ◽

Abulgasim Ahbara ◽

Mazdak Salavati ◽

Ayele Abebe ◽

...

Keyword(s):

High Altitude ◽

Environmental Variables ◽

Large Scale ◽

Sequence Data ◽

Strong Association ◽

Environmental Adaptation ◽

Whole Genome Sequence ◽

Single Nucleotide Variants ◽

High Altitude Adaptation ◽

Altitude Adaptation

Abstract Great progress has been made over recent years in the identification of selection signatures in the genomes of livestock species. This work has primarily been carried out in commercial breeds for which the dominant selection pressures, are associated with artificial selection. As agriculture and food security are likely to be strongly affected by climate change, a better understanding of environment-imposed selection on agricultural species is warranted. Ethiopia is an ideal setting to investigate environmental adaptation in livestock due to its wide variation in geo-climatic characteristics and the extensive genetic and phenotypic variation of its livestock. Here, we identified over three million single nucleotide variants across 12 Ethiopian sheep populations and applied landscape genomics approaches to investigate the association between these variants and environmental variables. Our results suggest that environmental adaptation for precipitation-related variables is stronger than that related to altitude or temperature, consistent with large-scale meta-analyses of selection pressure across species. The set of genes showing association with environmental variables was enriched for genes highly expressed in human blood and nerve tissues. There was also evidence of enrichment for genes associated with high-altitude adaptation although no strong association was identified with hypoxia-inducible-factor (HIF) genes. One of the strongest altitude-related signals was for a collagen gene, consistent with previous studies of high-altitude adaptation. Several altitude-associated genes also showed evidence of adaptation with temperature, suggesting a relationship between responses to these environmental factors. These results provide a foundation to investigate further the effects of climatic variables on small ruminant populations.

Download Full-text

Elucidating the genetic basis of an oligogenic birth defect using whole genome sequence data in a non-model organism, Bubalus bubalis

Scientific Reports ◽

10.1038/srep39719 ◽

2017 ◽

Vol 7 (1) ◽

Cited By ~ 10

Author(s):

Lynsey K. Whitacre ◽

Jesse L. Hoff ◽

Robert D. Schnabel ◽

Sara Albarella ◽

Francesca Ciotola ◽

...

Keyword(s):

Genome Sequence ◽

Birth Defect ◽

Genetic Basis ◽

Sequence Data ◽

Model Organism ◽

Bubalus Bubalis ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Sequence Data

Download Full-text