scholarly journals pong: fast analysis and visualization of latent clusters in population genetic data

2015 ◽  
Author(s):  
Aaron A. Behr ◽  
Katherine Z. Liu ◽  
Gracie Liu-Fang ◽  
Priyanka Nakka ◽  
Sohini Ramachandran

Abstract1MotivationA series of methods in population genetics use multilocus genotype data to assign individuals membership in latent clusters. These methods belong to a broad class of mixed-membership models, such as latent Dirichlet allocation used to analyze text corpora. Inference from mixed-membership models can produce different output matrices when repeatedly applied to the same inputs, and the number of latent clusters is a parameter that is often varied in the analysis pipeline. For these reasons, quantifying, visualizing, and annotating the output from mixed-membership models are bottlenecks for investigators across multiple disciplines from ecology to text data mining.2ResultsWe introduce pong, a network-graphical approach for analyzing and visualizing membership in latent clusters with a native D3.js interactive visualization. pong leverages efficient algorithms for solving the Assignment Problem to dramatically reduce runtime while increasing accuracy compared to other methods that process output from mixed-membership models. We apply pong to 225,705 unlinked genome-wide single-nucleotide variants from 2,426 unrelated individuals in the 1000 Genomes Project, and identify previously overlooked aspects of global human population structure. We show that pong outpaces current solutions by more than an order of magnitude in runtime while providing a customizable and interactive visualization of population structure that is more accurate than those produced by current tools.3Availabilitypong is freely available and can be installed using the Python package management system pip. pong’s source code is available at https://github.com/abehr/[email protected],[email protected]

2015 ◽  
Vol 2015 ◽  
pp. 1-11 ◽  
Author(s):  
Duc-Thuan Vo ◽  
Vo Thuan Hai ◽  
Cheol-Young Ock

Classifying events is challenging in Twitter because tweets texts have a large amount of temporal data with a lot of noise and various kinds of topics. In this paper, we propose a method to classify events from Twitter. We firstly find the distinguishing terms between tweets in events and measure their similarities with learning language models such as ConceptNet and a latent Dirichlet allocation method for selectional preferences (LDA-SP), which have been widely studied based on large text corpora within computational linguistic relations. The relationship of term words in tweets will be discovered by checking them under each model. We then proposed a method to compute the similarity between tweets based on tweets’ features including common term words and relationships among their distinguishing term words. It will be explicit and convenient for applying to k-nearest neighbor techniques for classification. We carefully applied experiments on the Edinburgh Twitter Corpus to show that our method achieves competitive results for classifying events.


Microbiology ◽  
2005 ◽  
Vol 151 (6) ◽  
pp. 1875-1881 ◽  
Author(s):  
Naiel Bisharat ◽  
Nicola Jones ◽  
Dror Marchaim ◽  
Colin Block ◽  
Rosalind M. Harding ◽  
...  

The population structure of group B streptococcus (GBS) from a low-incidence region for invasive neonatal disease (Israel) was investigated using multilocus genotype data. The strain collection consisted of isolates from maternal carriage (n=104) and invasive neonatal disease (n=50), resolving into 46 sequence types. The most prevalent sequence types were ST-1 (17·5 %), ST-19 (10·4 %), ST-17 (9·7 %), ST-22 (8·4 %) and ST-23 (6·5 %). Serotype III was the most common, accounting for 29·2 % of the isolates. None of the serotypes was significantly associated with invasive neonatal disease. burst analysis resolved the 46 sequence types into seven lineages (clonal complexes), from which only lineage ST-17, expressing serotype III only, was significantly associated with invasive neonatal disease. Lineage ST-22 expressed mainly serotype II, and was significantly associated with carriage. The distribution of the various sequence types and lineages, and the association of lineage ST-17 with invasive disease, are consistent with the results of analyses from a global GBS isolate collection. These findings could imply that the global variation in disease incidence is independent of the circulating GBS populations, and may be more affected by other risk factors for invasive GBS disease, or by different prevention strategies.


2019 ◽  
Author(s):  
Elora H. López ◽  
Stephen R. Palumbi

AbstractOne challenge for multicellular organisms is maintaining genome stability in the face of mutagens across long life spans. Imperfect genome maintenance leads to mutation accumulation in somatic cells, which is associated with tumors and senescence in vertebrates. Colonial reef-building corals are often large, can live for hundreds of years, rarely develop recognizable tumors, and are thought to convert somatic cells into gamete producers, so they are a pivotal group in which to understand long-term genome maintenance. To measure rates and patterns of somatic mutations, we analyzed transcriptomes from 17-22 branches from each of four Acropora hyacinthus colonies, determined putative single nucleotide variants, and verified them with Sanger resequencing. Unlike for human skin carcinomas, there is no signature of mutations caused by UV damage, indicating either higher efficiency of repair than in vertebrates, or strong sunscreen protection in these shallow water tropical animals. The somatic mutation frequency per nucleotide in A. hyacinthus is on the same order of magnitude (10−7) as noncancerous human somatic cells, and accumulation of mutations with age is similar. Unlike mammals, loss of heterozygosity variants outnumber gain of heterozygosity mutations about 2:1. Although the mutation frequency is similar in mammals and corals, the preponderance of loss of heterozygosity changes and potential selection may reduce the frequency of deleterious mutations in colonial animals like corals. This may limit the deleterious effects of somatic mutations on the coral organism as well as potential offspring.


2020 ◽  
Author(s):  
Roberta D’Agata ◽  
Noemi Bellassai ◽  
Matteo Allegretti ◽  
Andrea Rozzi ◽  
Saša Korom ◽  
...  

By exploiting a liquid biopsy approach, we developed an ultrasensitive nanoparticle-enhanced plasmonic method for detecting RAS single nucleotide variants (SNVs) in the plasma of CRC patients. The PCR-free method we developed is based on an imaging platform and allows the direct detection of ~1 attomolar RAS sequences in plasma with a sandwich hybridization assay using peptide nucleic acids probes. The assay involves a simple pre-analytical procedure that does not require the extraction of tumor DNA from plasma and detects it in volumes as low as 40 uL of plasma, which is at least an order of magnitude smaller than that required by state of the art liquid biopsy technologies. The most prevalent RAS SNVs are detected in DNA from tumor tissue with 100% sensitivity and 83.33% specificity. Spike-in experiments in human plasma further encouraged assay application on clinical specimens. Assay performances were then proven in plasma from CRC patients and healthy donors, demonstrating its promising avenue for cancer monitoring.<br>


2019 ◽  
Vol 28 (3) ◽  
pp. 263-272 ◽  
Author(s):  
Tobias Hecking ◽  
Loet Leydesdorff

AbstractWe replicate and analyze the topic model which was commissioned to King’s College and Digital Science for the Research Evaluation Framework (REF 2014) in the United Kingdom: 6,638 case descriptions of societal impact were submitted by 154 higher-education institutes. We compare the Latent Dirichlet Allocation (LDA) model with Principal Component Analysis (PCA) of document-term matrices using the same data. Since topic models are almost by definition applied to text corpora which are too large to read, validation of the results of these models is hardly possible; furthermore the models are irreproducible for a number of reasons. However, removing a small fraction of the documents from the sample—a test for reliability—has on average a larger impact in terms of decay on LDA than on PCA-based models. The semantic coherence of LDA models outperforms PCA-based models. In our opinion, results of the topic models are statistical and should not be used for grant selections and micro decision-making about research without follow-up using domain-specific semantic maps.


Author(s):  
Anthony Pannullo ◽  
Zhian N. Kamvar ◽  
Thomas J.J. Miorini ◽  
James R Steadman ◽  
Sydney E Everhart

The clonal, necrotrophic plant pathogen, Sclerotinia sclerotiorum is the causal agent of white mold on soybean, causing significant losses for Brazilian farmers each year. While assessments of population structure and clonal dynamics can be beneficial for determining effective management strategies, few studies have been performed. In this paper, we present a broad-scale population genetic analysis with 11 microsatellite loci of 94 isolates of S. sclerotiorum from soybean fields in nine Brazilian states (N=74) with Argentina (N=5) and the United States (N=15) as outgroups. Genotyping identified 87 multilocus genotypes with 81 represented by a single isolate. The pattern of genetic diversity observed suggested populations were not strongly differentiated because despite the high genetic diversity, there were few private alleles/genotypes and no multilocus genotypes were identified in both South and North America while one multilocus genotype was shared between Argentina and Brazil. Pairwise analysis of molecular variance between populations in Brazil revealed nine out of 15 pairs significantly different (P > 0.05). The population from the U.S. was most strongly differentiated in across all measures of population differentiation. Overall, our results found evidence for gene flow across populations with a moderate amount of population structure within states in Brazil. We additionally found shared genotypes across populations in Brazil and Argentina, suggesting that sclerotia may be transferred across states either through seeds or shared equipment. This represents the first population genetic study to cover a wide area in Brazil.


Author(s):  
Anthony Pannullo ◽  
Zhian N. Kamvar ◽  
Thomas J.J. Miorini ◽  
James R Steadman ◽  
Sydney E Everhart

The clonal, necrotrophic plant pathogen, Sclerotinia sclerotiorum is the causal agent of white mold on soybean, causing significant losses for Brazilian farmers each year. While assessments of population structure and clonal dynamics can be beneficial for determining effective management strategies, few studies have been performed. In this paper, we present a broad-scale population genetic analysis with 11 microsatellite loci of 94 isolates of S. sclerotiorum from soybean fields in nine Brazilian states (N=74) with Argentina (N=5) and the United States (N=15) as outgroups. Genotyping identified 87 multilocus genotypes with 81 represented by a single isolate. The pattern of genetic diversity observed suggested populations were not strongly differentiated because despite the high genetic diversity, there were few private alleles/genotypes and no multilocus genotypes were identified in both South and North America while one multilocus genotype was shared between Argentina and Brazil. Pairwise analysis of molecular variance between populations in Brazil revealed nine out of 15 pairs significantly different (P > 0.05). The population from the U.S. was most strongly differentiated in across all measures of population differentiation. Overall, our results found evidence for gene flow across populations with a moderate amount of population structure within states in Brazil. We additionally found shared genotypes across populations in Brazil and Argentina, suggesting that sclerotia may be transferred across states either through seeds or shared equipment. This represents the first population genetic study to cover a wide area in Brazil.


Sign in / Sign up

Export Citation Format

Share Document