pong: fast analysis and visualization of latent clusters in population genetic data

Exploiting Language Models to Classify Events from Twitter

Computational Intelligence and Neuroscience ◽

10.1155/2015/401024 ◽

2015 ◽

Vol 2015 ◽

pp. 1-11 ◽

Cited By ~ 4

Author(s):

Duc-Thuan Vo ◽

Vo Thuan Hai ◽

Cheol-Young Ock

Keyword(s):

Latent Dirichlet Allocation ◽

Nearest Neighbor ◽

Language Models ◽

K Nearest Neighbor ◽

Text Corpora ◽

Common Term ◽

Selectional Preferences ◽

Linguistic Relations ◽

Relationship Of ◽

Learning Language

Classifying events is challenging in Twitter because tweets texts have a large amount of temporal data with a lot of noise and various kinds of topics. In this paper, we propose a method to classify events from Twitter. We firstly find the distinguishing terms between tweets in events and measure their similarities with learning language models such as ConceptNet and a latent Dirichlet allocation method for selectional preferences (LDA-SP), which have been widely studied based on large text corpora within computational linguistic relations. The relationship of term words in tweets will be discovered by checking them under each model. We then proposed a method to compute the similarity between tweets based on tweets’ features including common term words and relationships among their distinguishing term words. It will be explicit and convenient for applying to k-nearest neighbor techniques for classification. We carefully applied experiments on the Edinburgh Twitter Corpus to show that our method achieves competitive results for classifying events.

Download Full-text

Population structure of group B streptococcus from a low-incidence region for invasive neonatal disease

Microbiology ◽

10.1099/mic.0.27826-0 ◽

2005 ◽

Vol 151 (6) ◽

pp. 1875-1881 ◽

Cited By ~ 36

Author(s):

Naiel Bisharat ◽

Nicola Jones ◽

Dror Marchaim ◽

Colin Block ◽

Rosalind M. Harding ◽

...

Keyword(s):

Population Structure ◽

Disease Incidence ◽

Group B Streptococcus ◽

Invasive Disease ◽

Multilocus Genotype ◽

Neonatal Disease ◽

Strain Collection ◽

Low Incidence ◽

Group B ◽

Sequence Types

The population structure of group B streptococcus (GBS) from a low-incidence region for invasive neonatal disease (Israel) was investigated using multilocus genotype data. The strain collection consisted of isolates from maternal carriage (n=104) and invasive neonatal disease (n=50), resolving into 46 sequence types. The most prevalent sequence types were ST-1 (17·5 %), ST-19 (10·4 %), ST-17 (9·7 %), ST-22 (8·4 %) and ST-23 (6·5 %). Serotype III was the most common, accounting for 29·2 % of the isolates. None of the serotypes was significantly associated with invasive neonatal disease. burst analysis resolved the 46 sequence types into seven lineages (clonal complexes), from which only lineage ST-17, expressing serotype III only, was significantly associated with invasive neonatal disease. Lineage ST-22 expressed mainly serotype II, and was significantly associated with carriage. The distribution of the various sequence types and lineages, and the association of lineage ST-17 with invasive disease, are consistent with the results of analyses from a global GBS isolate collection. These findings could imply that the global variation in disease incidence is independent of the circulating GBS populations, and may be more affected by other risk factors for invasive GBS disease, or by different prevention strategies.

Download Full-text

Inference of population structure using multilocus genotype data: dominant markers and null alleles

Molecular Ecology Notes ◽

10.1111/j.1471-8286.2007.01758.x ◽

2007 ◽

Vol 7 (4) ◽

pp. 574-578 ◽

Cited By ~ 2091

Author(s):

DANIEL FALUSH ◽

MATTHEW STEPHENS ◽

JONATHAN K. PRITCHARD

Keyword(s):

Population Structure ◽

Null Alleles ◽

Multilocus Genotype ◽

Genotype Data ◽

Dominant Markers

Download Full-text

Somatic mutations and genome stability maintenance in clonal coral colonies

10.1101/799643 ◽

2019 ◽

Author(s):

Elora H. López ◽

Stephen R. Palumbi

Keyword(s):

Loss Of Heterozygosity ◽

Genome Stability ◽

Mutation Frequency ◽

Somatic Mutations ◽

Somatic Cells ◽

Genome Maintenance ◽

Single Nucleotide Variants ◽

The Face ◽

Order Of Magnitude ◽

Multicellular Organisms

AbstractOne challenge for multicellular organisms is maintaining genome stability in the face of mutagens across long life spans. Imperfect genome maintenance leads to mutation accumulation in somatic cells, which is associated with tumors and senescence in vertebrates. Colonial reef-building corals are often large, can live for hundreds of years, rarely develop recognizable tumors, and are thought to convert somatic cells into gamete producers, so they are a pivotal group in which to understand long-term genome maintenance. To measure rates and patterns of somatic mutations, we analyzed transcriptomes from 17-22 branches from each of four Acropora hyacinthus colonies, determined putative single nucleotide variants, and verified them with Sanger resequencing. Unlike for human skin carcinomas, there is no signature of mutations caused by UV damage, indicating either higher efficiency of repair than in vertebrates, or strong sunscreen protection in these shallow water tropical animals. The somatic mutation frequency per nucleotide in A. hyacinthus is on the same order of magnitude (10−7) as noncancerous human somatic cells, and accumulation of mutations with age is similar. Unlike mammals, loss of heterozygosity variants outnumber gain of heterozygosity mutations about 2:1. Although the mutation frequency is similar in mammals and corals, the preponderance of loss of heterozygosity changes and potential selection may reduce the frequency of deleterious mutations in colonial animals like corals. This may limit the deleterious effects of somatic mutations on the coral organism as well as potential offspring.

Download Full-text

Direct Plasmonic Detection of Circulating RAS Mutated DNA in Colorectal Cancer Patients

10.26434/chemrxiv.12625439.v1 ◽

2020 ◽

Author(s):

Roberta D’Agata ◽

Noemi Bellassai ◽

Matteo Allegretti ◽

Andrea Rozzi ◽

Saša Korom ◽

...

Keyword(s):

Liquid Biopsy ◽

Direct Detection ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Sandwich Hybridization ◽

Healthy Donors ◽

Order Of Magnitude ◽

Sandwich Hybridization Assay ◽

Colorectal Cancer Patients ◽

Promising Avenue

By exploiting a liquid biopsy approach, we developed an ultrasensitive nanoparticle-enhanced plasmonic method for detecting RAS single nucleotide variants (SNVs) in the plasma of CRC patients. The PCR-free method we developed is based on an imaging platform and allows the direct detection of ~1 attomolar RAS sequences in plasma with a sandwich hybridization assay using peptide nucleic acids probes. The assay involves a simple pre-analytical procedure that does not require the extraction of tumor DNA from plasma and detects it in volumes as low as 40 uL of plasma, which is at least an order of magnitude smaller than that required by state of the art liquid biopsy technologies. The most prevalent RAS SNVs are detected in DNA from tumor tissue with 100% sensitivity and 83.33% specificity. Spike-in experiments in human plasma further encouraged assay application on clinical specimens. Assay performances were then proven in plasma from CRC patients and healthy donors, demonstrating its promising avenue for cancer monitoring.<br>

Download Full-text

Can topic models be used in research evaluations? Reproducibility, validity, and reliability when compared with semantic maps

Research Evaluation ◽

10.1093/reseval/rvz015 ◽

2019 ◽

Vol 28 (3) ◽

pp. 263-272 ◽

Cited By ~ 1

Author(s):

Tobias Hecking ◽

Loet Leydesdorff

Keyword(s):

Latent Dirichlet Allocation ◽

Topic Model ◽

Research Evaluation ◽

Topic Models ◽

Principal Component ◽

Evaluation Framework ◽

Validity And Reliability ◽

Text Corpora ◽

Semantic Maps ◽

Semantic Coherence

AbstractWe replicate and analyze the topic model which was commissioned to King’s College and Digital Science for the Research Evaluation Framework (REF 2014) in the United Kingdom: 6,638 case descriptions of societal impact were submitted by 154 higher-education institutes. We compare the Latent Dirichlet Allocation (LDA) model with Principal Component Analysis (PCA) of document-term matrices using the same data. Since topic models are almost by definition applied to text corpora which are too large to read, validation of the results of these models is hardly possible; furthermore the models are irreproducible for a number of reasons. However, removing a small fraction of the documents from the sample—a test for reliability—has on average a larger impact in terms of decay on LDA than on PCA-based models. The semantic coherence of LDA models outperforms PCA-based models. In our opinion, results of the topic models are statistical and should not be used for grant selections and micro decision-making about research without follow-up using domain-specific semantic maps.

Download Full-text

Genetic variation and structure of Sclerotinia sclerotiorum populations from soybean in Brazil

10.7287/peerj.preprints.26600 ◽

2018 ◽

Cited By ~ 1

Author(s):

Anthony Pannullo ◽

Zhian N. Kamvar ◽

Thomas J.J. Miorini ◽

James R Steadman ◽

Sydney E Everhart

Keyword(s):

Genetic Diversity ◽

Population Structure ◽

Sclerotinia Sclerotiorum ◽

Population Genetic ◽

The United States ◽

Multilocus Genotype ◽

Population Genetic Study ◽

Population Genetic Analysis ◽

Moderate Amount ◽

Multilocus Genotypes

The clonal, necrotrophic plant pathogen, Sclerotinia sclerotiorum is the causal agent of white mold on soybean, causing significant losses for Brazilian farmers each year. While assessments of population structure and clonal dynamics can be beneficial for determining effective management strategies, few studies have been performed. In this paper, we present a broad-scale population genetic analysis with 11 microsatellite loci of 94 isolates of S. sclerotiorum from soybean fields in nine Brazilian states (N=74) with Argentina (N=5) and the United States (N=15) as outgroups. Genotyping identified 87 multilocus genotypes with 81 represented by a single isolate. The pattern of genetic diversity observed suggested populations were not strongly differentiated because despite the high genetic diversity, there were few private alleles/genotypes and no multilocus genotypes were identified in both South and North America while one multilocus genotype was shared between Argentina and Brazil. Pairwise analysis of molecular variance between populations in Brazil revealed nine out of 15 pairs significantly different (P > 0.05). The population from the U.S. was most strongly differentiated in across all measures of population differentiation. Overall, our results found evidence for gene flow across populations with a moderate amount of population structure within states in Brazil. We additionally found shared genotypes across populations in Brazil and Argentina, suggesting that sclerotia may be transferred across states either through seeds or shared equipment. This represents the first population genetic study to cover a wide area in Brazil.

Download Full-text

Genetic variation and structure of Sclerotinia sclerotiorum populations from soybean in Brazil

10.7287/peerj.preprints.26600v1 ◽

2018 ◽

Cited By ~ 1

Author(s):

Anthony Pannullo ◽

Zhian N. Kamvar ◽

Thomas J.J. Miorini ◽

James R Steadman ◽

Sydney E Everhart

Keyword(s):

Genetic Diversity ◽

Population Structure ◽

Sclerotinia Sclerotiorum ◽

Population Genetic ◽

The United States ◽

Multilocus Genotype ◽

Population Genetic Study ◽

Population Genetic Analysis ◽

Moderate Amount ◽

Multilocus Genotypes

The clonal, necrotrophic plant pathogen, Sclerotinia sclerotiorum is the causal agent of white mold on soybean, causing significant losses for Brazilian farmers each year. While assessments of population structure and clonal dynamics can be beneficial for determining effective management strategies, few studies have been performed. In this paper, we present a broad-scale population genetic analysis with 11 microsatellite loci of 94 isolates of S. sclerotiorum from soybean fields in nine Brazilian states (N=74) with Argentina (N=5) and the United States (N=15) as outgroups. Genotyping identified 87 multilocus genotypes with 81 represented by a single isolate. The pattern of genetic diversity observed suggested populations were not strongly differentiated because despite the high genetic diversity, there were few private alleles/genotypes and no multilocus genotypes were identified in both South and North America while one multilocus genotype was shared between Argentina and Brazil. Pairwise analysis of molecular variance between populations in Brazil revealed nine out of 15 pairs significantly different (P > 0.05). The population from the U.S. was most strongly differentiated in across all measures of population differentiation. Overall, our results found evidence for gene flow across populations with a moderate amount of population structure within states in Brazil. We additionally found shared genotypes across populations in Brazil and Argentina, suggesting that sclerotia may be transferred across states either through seeds or shared equipment. This represents the first population genetic study to cover a wide area in Brazil.

Download Full-text

Faculty Opinions recommendation of Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1015548.197423 ◽

2003 ◽

Cited By ~ 14

Author(s):

Daniel Weeks

Keyword(s):

Population Structure ◽

Allele Frequencies ◽

Multilocus Genotype ◽

Genotype Data

Download Full-text

A Markov Chain Monte Carlo Approach for Joint Inference of Population Structure and Inbreeding Rates From Multilocus Genotype Data

Genetics ◽

10.1534/genetics.107.072371 ◽

2007 ◽

Vol 176 (3) ◽

pp. 1635-1651 ◽

Cited By ~ 361

Author(s):

Hong Gao ◽

Scott Williamson ◽

Carlos D. Bustamante

Keyword(s):

Monte Carlo ◽

Population Structure ◽

Markov Chain ◽

Markov Chain Monte Carlo ◽

Multilocus Genotype ◽

Genotype Data ◽

Monte Carlo Approach ◽

Joint Inference

Download Full-text