Large-Scale Cover Song Retrieval System Developed Using Machine Learning Approaches

Recent Progress in Machine Learning-based Prediction of Peptide Activity for Drug Discovery

Current Topics in Medicinal Chemistry ◽

10.2174/1568026619666190122151634 ◽

2019 ◽

Vol 19 (1) ◽

pp. 4-16 ◽

Cited By ~ 6

Author(s):

Qihui Wu ◽

Hanzhong Ke ◽

Dongli Li ◽

Qi Wang ◽

Jiansong Fang ◽

...

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Large Scale ◽

Recent Progress ◽

High Specificity ◽

Learning Approaches ◽

Anticancer Peptides ◽

The Past ◽

Traditional Approaches ◽

Large Scale Screening

Over the past decades, peptide as a therapeutic candidate has received increasing attention in drug discovery, especially for antimicrobial peptides (AMPs), anticancer peptides (ACPs) and antiinflammatory peptides (AIPs). It is considered that the peptides can regulate various complex diseases which are previously untouchable. In recent years, the critical problem of antimicrobial resistance drives the pharmaceutical industry to look for new therapeutic agents. Compared to organic small drugs, peptide- based therapy exhibits high specificity and minimal toxicity. Thus, peptides are widely recruited in the design and discovery of new potent drugs. Currently, large-scale screening of peptide activity with traditional approaches is costly, time-consuming and labor-intensive. Hence, in silico methods, mainly machine learning approaches, for their accuracy and effectiveness, have been introduced to predict the peptide activity. In this review, we document the recent progress in machine learning-based prediction of peptides which will be of great benefit to the discovery of potential active AMPs, ACPs and AIPs.

Download Full-text

Machine learning identifies an immunological pattern associated with multiple juvenile idiopathic arthritis subtypes

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2018-214354 ◽

2019 ◽

Vol 78 (5) ◽

pp. 617-628 ◽

Cited By ~ 5

Author(s):

Erika Van Nieuwenhove ◽

Vasiliki Lagou ◽

Lien Van Eyck ◽

James Dooley ◽

Ulrich Bodenhofer ◽

...

Keyword(s):

Machine Learning ◽

Juvenile Idiopathic Arthritis ◽

Large Scale ◽

Inflammatory Diseases ◽

Adaptive Immune System ◽

Healthy Children ◽

Learning Approaches ◽

Data Set ◽

Immune Signature ◽

Systemic Jia

ObjectivesJuvenile idiopathic arthritis (JIA) is the most common class of childhood rheumatic diseases, with distinct disease subsets that may have diverging pathophysiological origins. Both adaptive and innate immune processes have been proposed as primary drivers, which may account for the observed clinical heterogeneity, but few high-depth studies have been performed.MethodsHere we profiled the adaptive immune system of 85 patients with JIA and 43 age-matched controls with indepth flow cytometry and machine learning approaches.ResultsImmune profiling identified immunological changes in patients with JIA. This immune signature was shared across a broad spectrum of childhood inflammatory diseases. The immune signature was identified in clinically distinct subsets of JIA, but was accentuated in patients with systemic JIA and those patients with active disease. Despite the extensive overlap in the immunological spectrum exhibited by healthy children and patients with JIA, machine learning analysis of the data set proved capable of discriminating patients with JIA from healthy controls with ~90% accuracy.ConclusionsThese results pave the way for large-scale immune phenotyping longitudinal studies of JIA. The ability to discriminate between patients with JIA and healthy individuals provides proof of principle for the use of machine learning to identify immune signatures that are predictive to treatment response group.

Download Full-text

Machine Learning Based Taxonomy and Analysis of English Learners' Translation Errors

International Journal of Computer-Assisted Language Learning and Teaching ◽

10.4018/ijcallt.2019070105 ◽

2019 ◽

Vol 9 (3) ◽

pp. 68-83

Author(s):

Ying Qin

Keyword(s):

Machine Learning ◽

English Learners ◽

Large Scale ◽

Learning Approaches ◽

Efl Learners ◽

Translation Error ◽

Chinese Learners ◽

Error Taxonomy ◽

Skill Improvement ◽

Translation Errors

This study extracts the comments from a large scale of Chinese EFL learners' translation corpus to study the taxonomy of translation errors. Two unsupervised machine learning approaches are used to obtain the computational evidences of translation error taxonomy. After manually revision, ten types of English to Chinese (E2C) and eight types Chinese to English (C2E) translation errors are finally confirmed. There probably exists three categories of top-level errors according to the hierarchical clustering results. In addition, three supervised learning methods are applied to automatically recognize the types of errors, among which the highest performance reaches F1 = 0.85 on E2C and F1 = 0.90 on C2E translation. Further comparison to the intuitive or theoretical studies on translation taxonomy shows some phenomenon accompanied by language skill improvement of Chinese learners. Analysis on translation problems based on machine learning provides the objective insight and understanding on the students' translations.

Download Full-text

Big Data’s Role in Health and Risk Messaging

Oxford Research Encyclopedia of Communication ◽

10.1093/acrefore/9780190228613.013.359 ◽

2017 ◽

Author(s):

Bradford William Hesse

Keyword(s):

Machine Learning ◽

Big Data ◽

Risk Communication ◽

Large Scale ◽

Protein Identification ◽

Machine Learning Algorithms ◽

National Committee ◽

Learning Approaches ◽

Road Map ◽

Data Flows

The presence of large-scale data systems can be felt, consciously or not, in almost every facet of modern life, whether through the simple act of selecting travel options online, purchasing products from online retailers, or navigating through the streets of an unfamiliar neighborhood using global positioning system (GPS) mapping. These systems operate through the momentum of big data, a term introduced by data scientists to describe a data-rich environment enabled by a superconvergence of advanced computer-processing speeds and storage capacities; advanced connectivity between people and devices through the Internet; the ubiquity of smart, mobile devices and wireless sensors; and the creation of accelerated data flows among systems in the global economy. Some researchers have suggested that big data represents the so-called fourth paradigm in science, wherein the first paradigm was marked by the evolution of the experimental method, the second was brought about by the maturation of theory, the third was marked by an evolution of statistical methodology as enabled by computational technology, while the fourth extended the benefits of the first three, but also enabled the application of novel machine-learning approaches to an evidence stream that exists in high volume, high velocity, high variety, and differing levels of veracity. In public health and medicine, the emergence of big data capabilities has followed naturally from the expansion of data streams from genome sequencing, protein identification, environmental surveillance, and passive patient sensing. In 2001, the National Committee on Vital and Health Statistics published a road map for connecting these evidence streams to each other through a national health information infrastructure. Since then, the road map has spurred national investments in electronic health records (EHRs) and motivated the integration of public surveillance data into analytic platforms for health situational awareness. More recently, the boom in consumer-oriented mobile applications and wireless medical sensing devices has opened up the possibility for mining new data flows directly from altruistic patients. In the broader public communication sphere, the ability to mine the digital traces of conversation on social media presents an opportunity to apply advanced machine learning algorithms as a way of tracking the diffusion of risk communication messages. In addition to utilizing big data for improving the scientific knowledge base in risk communication, there will be a need for health communication scientists and practitioners to work as part of interdisciplinary teams to improve the interfaces to these data for professionals and the public. Too much data, presented in disorganized ways, can lead to what some have referred to as “data smog.” Much work will be needed for understanding how to turn big data into knowledge, and just as important, how to turn data-informed knowledge into action.

Download Full-text

Quantum Chemistry Meets Machine Learning

CHIMIA International Journal for Chemistry ◽

10.2533/chimia.2019.983 ◽

2019 ◽

Vol 73 (12) ◽

pp. 983-989 ◽

Cited By ~ 3

Author(s):

Alberto Fabrizio ◽

Benjamin Meyer ◽

Raimon Fabregat ◽

Clemence Corminboeuf

Keyword(s):

Machine Learning ◽

Statistical Learning ◽

Quantum Chemical ◽

Large Scale ◽

Organic Molecules ◽

Chemical Properties ◽

Learning Approaches ◽

Energy Landscapes ◽

Free Energy Landscapes ◽

Large Scale Screening

In this account, we demonstrate how statistical learning approaches can be leveraged across a range of different quantum chemical areas to transform the scaling, nature, and complexity of the problems that we are tackling. Selected examples illustrate the power brought by kernel-based approaches in the large-scale screening of homogeneous catalysis, the prediction of fundamental quantum chemical properties and the free-energy landscapes of flexible organic molecules. While certainly non-exhaustive, these examples provide an intriguing glimpse into our own research efforts.

Download Full-text

Efficient database pruning for large-scale cover song recognition

2013 IEEE International Conference on Acoustics, Speech and Signal Processing ◽

10.1109/icassp.2013.6637741 ◽

2013 ◽

Cited By ~ 5

Author(s):

J. Osmalskyj ◽

S. Pierard ◽

M. Van Droogenbroeck ◽

J.J. Embrechts

Keyword(s):

Large Scale ◽

Song Recognition ◽

Scale Cover ◽

Cover Song

Download Full-text

Skill of large-scale seasonal drought impact forecasts

Natural Hazards and Earth System Science ◽

10.5194/nhess-20-1595-2020 ◽

2020 ◽

Vol 20 (6) ◽

pp. 1595-1608

Author(s):

Samuel J. Sutanto ◽

Melati van der Weert ◽

Veit Blauhut ◽

Henny A. J. Van Lanen

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Large Scale ◽

Standardized Precipitation Index ◽

Early Warning Systems ◽

Learning Approaches ◽

Discriminative Ability ◽

Drought Impact ◽

Drought Impacts ◽

Impact Data

Abstract. Forecasting of drought impacts is still lacking in drought early-warning systems (DEWSs), which presently do not go beyond hazard forecasting. Therefore, we developed drought impact functions using machine learning approaches (logistic regression and random forest) to predict drought impacts with lead times up to 7 months ahead. The observed and forecasted hydrometeorological drought hazards – such as the standardized precipitation index (SPI), standardized precipitation evaporation index (SPEI), and standardized runoff index (SRI) – were obtained from the The EU-funded Enhancing Emergency Management and Response to Extreme Weather and Climate Events (ANYWHERE) DEWS. Reported drought impact data, taken from the European Drought Impact Report Inventory (EDII), were used to develop and validate drought impact functions. The skill of the drought impact functions in forecasting drought impacts was evaluated using the Brier skill score and relative operating characteristic metrics for five cases representing different spatial aggregation and lumping of impacted sectors. Results show that hydrological drought hazard represented by SRI has higher skill than meteorological drought represented by SPI and SPEI. For German regions, impact functions developed using random forests indicate a higher discriminative ability to forecast drought impacts than logistic regression. Moreover, skill is higher for cases with higher spatial resolution and less lumped impacted sectors (cases 4 and 5), with considerable skill up to 3–4 months ahead. The forecasting skill of drought impacts using machine learning greatly depends on the availability of impact data. This study demonstrates that the drought impact functions could not be developed for certain regions and impacted sectors, owing to the lack of reported impacts.

Download Full-text

The Intervalgram: An Audio Feature for Large-Scale Cover-Song Recognition

From Sounds to Music and Emotions - Lecture Notes in Computer Science ◽

10.1007/978-3-642-41248-6_11 ◽

2013 ◽

pp. 197-213 ◽

Cited By ~ 4

Author(s):

Thomas C. Walters ◽

David A. Ross ◽

Richard F. Lyon

Keyword(s):

Large Scale ◽

Song Recognition ◽

Scale Cover ◽

Audio Feature ◽

Cover Song

Download Full-text

Informational and emotional elements in online support groups: a Bayesian approach to large-scale content analysis

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocv190 ◽

2016 ◽

Vol 23 (3) ◽

pp. 508-513 ◽

Cited By ~ 14

Author(s):

Ulrike Deetjen ◽

John A Powell

Keyword(s):

Machine Learning ◽

Support Groups ◽

Emotional Support ◽

Large Scale ◽

Training Dataset ◽

Online Support ◽

Online Support Groups ◽

Learning Approaches ◽

Physical Conditions ◽

Irritable Bowel

Objective This research examines the extent to which informational and emotional elements are employed in online support forums for 14 purposively sampled chronic medical conditions and the factors that influence whether posts are of a more informational or emotional nature. Methods Large-scale qualitative data were obtained from Dailystrength.org. Based on a hand-coded training dataset, all posts were classified into informational or emotional using a Bayesian classification algorithm to generalize the findings. Posts that could not be classified with a probability of at least 75% were excluded. Results The overall tendency toward emotional posts differs by condition: mental health (depression, schizophrenia) and Alzheimer’s disease consist of more emotional posts, while informational posts relate more to nonterminal physical conditions (irritable bowel syndrome, diabetes, asthma). There is no gender difference across conditions, although prostate cancer forums are oriented toward informational support, whereas breast cancer forums rather feature emotional support. Across diseases, the best predictors for emotional content are lower age and a higher number of overall posts by the support group member. Discussion The results are in line with previous empirical research and unify empirical findings from single/2-condition research. Limitations include the analytical restriction to predefined categories (informational, emotional) through the chosen machine-learning approach. Conclusion Our findings provide an empirical foundation for building theory on informational versus emotional support across conditions, give insights for practitioners to better understand the role of online support groups for different patients, and show the usefulness of machine-learning approaches to analyze large-scale qualitative health data from online settings.

Download Full-text

A Comparison of Machine Learning Approaches to Improve Free Topography Data for Flood Modelling

Remote Sensing ◽

10.3390/rs13020275 ◽

2021 ◽

Vol 13 (2) ◽

pp. 275

Author(s):

Michael Meadows ◽

Matthew Wilson

Keyword(s):

Neural Network ◽

Machine Learning ◽

Spatial Patterns ◽

Large Scale ◽

Multiple Scales ◽

Flood Hazard ◽

Training Data ◽

Learning Approaches ◽

Testing Dataset ◽

Topography Data

Given the high financial and institutional cost of collecting and processing accurate topography data, many large-scale flood hazard assessments continue to rely instead on freely-available global Digital Elevation Models, despite the significant vertical biases known to affect them. To predict (and thereby reduce) these biases, we apply a fully-convolutional neural network (FCN), a form of artificial neural network originally developed for image segmentation which is capable of learning from multi-variate spatial patterns at different scales. We assess its potential by training such a model on a wide variety of remote-sensed input data (primarily multi-spectral imagery), using high-resolution, LiDAR-derived Digital Terrain Models published by the New Zealand government as the reference topography data. In parallel, two more widely used machine learning models are also trained, in order to provide benchmarks against which the novel FCN may be assessed. We find that the FCN outperforms the other models (reducing root mean square error in the testing dataset by 71%), likely due to its ability to learn from spatial patterns at multiple scales, rather than only a pixel-by-pixel basis. Significantly for flood hazard modelling applications, corrections were found to be especially effective along rivers and their floodplains. However, our results also suggest that models are likely to be biased towards the land cover and relief conditions most prevalent in their training data, with further work required to assess the importance of limiting training data inputs to those most representative of the intended application area(s).

Download Full-text