scholarly journals Robust classification of protein variation using structural modelling and large-scale data integration

2016 ◽  
Vol 44 (6) ◽  
pp. 2501-2513 ◽  
Author(s):  
Evan H. Baugh ◽  
Riley Simmons-Edler ◽  
Christian L. Müller ◽  
Rebecca F. Alford ◽  
Natalia Volfovsky ◽  
...  
Author(s):  
Balaje T. Thumati ◽  
Halasya Siva Subramania ◽  
Rajeev Shastri ◽  
Karthik Kalyana Kumar ◽  
Nicole Hessner ◽  
...  

2021 ◽  
Author(s):  
Mohammad Hassan Almaspoor ◽  
Ali Safaei ◽  
Afshin Salajegheh ◽  
Behrouz Minaei-Bidgoli

Abstract Classification is one of the most important and widely used issues in machine learning, the purpose of which is to create a rule for grouping data to sets of pre-existing categories is based on a set of training sets. Employed successfully in many scientific and engineering areas, the Support Vector Machine (SVM) is among the most promising methods of classification in machine learning. With the advent of big data, many of the machine learning methods have been challenged by big data characteristics. The standard SVM has been proposed for batch learning in which all data are available at the same time. The SVM has a high time complexity, i.e., increasing the number of training samples will intensify the need for computational resources and memory. Hence, many attempts have been made at SVM compatibility with online learning conditions and use of large-scale data. This paper focuses on the analysis, identification, and classification of existing methods for SVM compatibility with online conditions and large-scale data. These methods might be employed to classify big data and propose research areas for future studies. Considering its advantages, the SVM can be among the first options for compatibility with big data and classification of big data. For this purpose, appropriate techniques should be developed for data preprocessing in order to covert data into an appropriate form for learning. The existing frameworks should also be employed for parallel and distributed processes so that SVMs can be made scalable and properly online to be able to handle big data.


Author(s):  
Denali Molitor ◽  
Deanna Needell

Abstract In today’s data-driven world, storing, processing and gleaning insights from large-scale data are major challenges. Data compression is often required in order to store large amounts of high-dimensional data, and thus, efficient inference methods for analyzing compressed data are necessary. Building on a recently designed simple framework for classification using binary data, we demonstrate that one can improve classification accuracy of this approach through iterative applications whose output serves as input to the next application. As a side consequence, we show that the original framework can be used as a data preprocessing step to improve the performance of other methods, such as support vector machines. For several simple settings, we showcase the ability to obtain theoretical guarantees for the accuracy of the iterative classification method. The simplicity of the underlying classification framework makes it amenable to theoretical analysis.


Author(s):  
Jun Ding ◽  
Jose Lugo-Martinez ◽  
Ye Yuan ◽  
Darrell N. Kotton ◽  
Ziv Bar-Joseph

AbstractSeveral molecular datasets have been recently compiled to characterize the activity of SARS-CoV-2 within human cells. Here we extend computational methods to integrate several different types of sequence, functional and interaction data to reconstruct networks and pathways activated by the virus in host cells. We identify the key proteins in these networks and further intersect them with genes differentially expressed at conditions that are known to impact viral activity. Several of the top ranked genes do not directly interact with virus proteins though some were shown to impact other coronaviruses highlighting the importance of large-scale data integration for understanding virus and host activity.Software and interactive visualization: https://github.com/phoenixding/sdremsc


Author(s):  
Anisa Anisa ◽  
Mesran Mesran

Data mining is mining or discovery information to the process of looking for patterns or information that contains the search trends in a number of very large data in taking decisions on the future.In determining the patterns of classification techniques garnered record (Training set). The class attribute, which is a decision tree with method C 4.5 builds upon an algorithm of induction can be minimised.By utilizing data jobs graduates expected to generate information about interest & talent, work with benefit from graduate quisioner alumni. A pattern of work that sought from large-scale data and analyzed by various algorithms to compute the C 4.5 can do that work based on the pattern of investigation patterns that affect so that it found the rules are interconnected that can result from the results of the classification of objects of different classes or categories of attributes that influence to shape the patterns of work. The application used is software that used Tanagra data mining for academic and research purposes.That contains data mining method explored starting from the data analysis, and classification data mining.Keywords: analysis, Data Mining, method C 4.5, Tanagra, patterns of work


10.1186/gm186 ◽  
2010 ◽  
Vol 2 (9) ◽  
pp. 65 ◽  
Author(s):  
Kristian Ovaska ◽  
Marko Laakso ◽  
Saija Haapa-Paananen ◽  
Riku Louhimo ◽  
Ping Chen ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document