scholarly journals Permutationally invariant polynomial regression for energies and gradients, using reverse differentiation, achieves orders of magnitude speed-up with high precision compared to other machine learning methods

Author(s):  
Paul L. Houston ◽  
Chen Qu ◽  
Apurba Nandi ◽  
Riccardo Conte ◽  
Qi Yu ◽  
...  
2019 ◽  
Vol 20 (5) ◽  
pp. 540-550 ◽  
Author(s):  
Jiu-Xin Tan ◽  
Hao Lv ◽  
Fang Wang ◽  
Fu-Ying Dao ◽  
Wei Chen ◽  
...  

Enzymes are proteins that act as biological catalysts to speed up cellular biochemical processes. According to their main Enzyme Commission (EC) numbers, enzymes are divided into six categories: EC-1: oxidoreductase; EC-2: transferase; EC-3: hydrolase; EC-4: lyase; EC-5: isomerase and EC-6: synthetase. Different enzymes have different biological functions and acting objects. Therefore, knowing which family an enzyme belongs to can help infer its catalytic mechanism and provide information about the relevant biological function. With the large amount of protein sequences influxing into databanks in the post-genomics age, the annotation of the family for an enzyme is very important. Since the experimental methods are cost ineffective, bioinformatics tool will be a great help for accurately classifying the family of the enzymes. In this review, we summarized the application of machine learning methods in the prediction of enzyme family from different aspects. We hope that this review will provide insights and inspirations for the researches on enzyme family classification.


Author(s):  
П.С. Козырь ◽  
Р.Н. Яковлев

В рамках настоящего исследования был проведен анализ существующих работ, посвященных интерпретации показаний тактильных сенсорных устройств, по результатам которого была предложена модель машинного обучения, позволяющая осуществлять оценку величины приложенного давления к поверхности тактильного сенсора давления емкостного типа. В качестве опорных моделей обработки и интерпретации сигналов данного устройства в работе рассматривались несколько методов машинного обучения: линейная регрессия, полиномиальная регрессия, регрессия дерева решений, частичная регрессия наименьших квадратов и полносвязная нейронная сеть прямого распространения. Обучение опорных моделей и апробация конечного решения проводилась на авторском наборе данных, включающем в себя более 3000 экземпляров данных. Согласно полученным результатам, наилучшее качество определения величины приложенного давления продемонстрирован решением на основе полносвязной нейронной сети прямого распространения. Коэффициент детерминации и средний модуль отклонения для данного решения на тестовой выборке составили 0,93 и 13,14 кПа соответственно. Currently, in the field of developing sensing systems for robotic means, one of the urgent tasks is the problem of interpreting the data of tactile pressure and proximity sensors. As a rule, the solution to this problem is complicated both by the dependence of the indicators of tactile sensors on the type of object’s material and by the design features of each individual device. In this study, an analysis of existing works devoted to the interpretation of the readings of tactile sensor devices was carried out. According to the analysis results a machine learning model was proposed that allows estimating the amount of pressure applied to the surface of a tactile pressure sensor of a capacitive type. The architecture of the proposed model includes two key blocks of data analysis, the first one is aimed at recognizing the type of interaction object’s material and the second is devoted to the direct assessment of the magnitude of the pressure applied to the sensor. Several machine learning methods were considered as supporting models for processing and interpreting the signals of this device: linear regression, polynomial regression, decision tree regression, partial least squares regression and a fully connected feedforward neural network.


The study of pricing factors in the market of the short-term rental has been done. Airbnb was chosen as the object of the study; it is a platform for accommodation, search, and rental around the world. At the beginning of 2021, the company offers 7 million homes from more than 220 countries. The Data Science methods play a significant role in the company's success. One of the key algorithms of the company is the pricing algorithm. Using the "Price Recommendations" feature, the homeowner can analyze which dates are most likely to be booked at the current price and which are not, it helps form a favorable offer. The system calculates the recommended cost of housing based on hundreds of parameters, some of which are easy to recognize, but there are less obvious factors that can also affect demand. The paper proposes an algorithm for identifying implicit pricing factors in the short-term rental market using machine learning methods, which includes: 1) data mining and data preparation; 2) building and analysis of linear regression models; 3) building and analysis of nonlinear regression models. The study was based on ads from the Airbnb site in Washington and New York using scripts developed in Python. The following models are built and analyzed: simple linear regression, multiple linear regression, polynomial regression, decision trees, random forest, and boosting. The results of the study showed that the most important factors are accommodates, cleaning_fee, room_type, bedrooms. But based on the model evaluation criteria, they cannot be used for implementation: linear models are of low quality, while the random forest, boosting, and trees are overfitted. Still the results can be used in conducting business analysis.


2020 ◽  
Vol 163 ◽  
pp. 01009
Author(s):  
Mikhail Sarafanov ◽  
Eduard Kazakov ◽  
Yulia Borisova

The article presents the results of the development of a model for calculating levels at one gauging station using the levels at another. To link the levels at two gauging stations, the data on levels, temperature and precipitation were used. The use of machine learning methods to solve the problem of predicting water levels made it possible to achieve an accuracy of about 6 cm. At the same time, traditional statistical models (linear regression, polynomial regression) have 14-16 cm error.


Cells ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 3169
Author(s):  
Ning Zhang ◽  
Yameng Wu ◽  
Yu Guo ◽  
Yu Sa ◽  
Qifeng Li ◽  
...  

In the field of gliomas research, the broad availability of genetic and image information originated by computer technologies and the booming of biomedical publications has led to the advent of the big-data era. Machine learning methods were applied as possible approaches to speed up the data mining processes. In this article, we reviewed the present situation and future orientations of machine learning application in gliomas within the context of workflows to integrate analysis for precision cancer care. Publicly available tools or algorithms for key machine learning technologies in the literature mining for glioma clinical research were reviewed and compared. Further, the existing solutions of machine learning methods and their limitations in glioma prediction and diagnostics, such as overfitting and class imbalanced, were critically analyzed.


Sign in / Sign up

Export Citation Format

Share Document