Robust classification of protein variation using structural modelling and large-scale data integration

Evan H. Baugh; Riley Simmons-Edler; Christian L. Müller; Rebecca F. Alford; Natalia Volfovsky; Alex E. Lash; Richard Bonneau

doi:10.1093/nar/gkw120

Robust classification of protein variation using structural modelling and large-scale data integration

Nucleic Acids Research ◽

10.1093/nar/gkw120 ◽

2016 ◽

Vol 44 (6) ◽

pp. 2501-2513 ◽

Cited By ~ 32

Author(s):

Evan H. Baugh ◽

Riley Simmons-Edler ◽

Christian L. Müller ◽

Rebecca F. Alford ◽

Natalia Volfovsky ◽

...

Keyword(s):

Data Integration ◽

Large Scale ◽

Structural Modelling ◽

Robust Classification ◽

Protein Variation ◽

Large Scale Data ◽

Scale Data

Download Full-text

Large-scale Data Integration for Facilities Analytics: Challenges and Opportunities

2020 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata50022.2020.9378440 ◽

2020 ◽

Author(s):

Balaje T. Thumati ◽

Halasya Siva Subramania ◽

Rajeev Shastri ◽

Karthik Kalyana Kumar ◽

Nicole Hessner ◽

...

Keyword(s):

Data Integration ◽

Large Scale ◽

Large Scale Data ◽

Challenges And Opportunities ◽

Scale Data

Download Full-text

Support Vector Machines in Big Data Classification: A Systematic Literature Review

10.21203/rs.3.rs-663359/v1 ◽

2021 ◽

Author(s):

Mohammad Hassan Almaspoor ◽

Ali Safaei ◽

Afshin Salajegheh ◽

Behrouz Minaei-Bidgoli

Keyword(s):

Machine Learning ◽

Big Data ◽

Large Scale ◽

Support Vector ◽

Research Areas ◽

Large Scale Data ◽

Training Samples ◽

Big Data Classification ◽

Scale Data

Abstract Classification is one of the most important and widely used issues in machine learning, the purpose of which is to create a rule for grouping data to sets of pre-existing categories is based on a set of training sets. Employed successfully in many scientific and engineering areas, the Support Vector Machine (SVM) is among the most promising methods of classification in machine learning. With the advent of big data, many of the machine learning methods have been challenged by big data characteristics. The standard SVM has been proposed for batch learning in which all data are available at the same time. The SVM has a high time complexity, i.e., increasing the number of training samples will intensify the need for computational resources and memory. Hence, many attempts have been made at SVM compatibility with online learning conditions and use of large-scale data. This paper focuses on the analysis, identification, and classification of existing methods for SVM compatibility with online conditions and large-scale data. These methods might be employed to classify big data and propose research areas for future studies. Considering its advantages, the SVM can be among the first options for compatibility with big data and classification of big data. For this purpose, appropriate techniques should be developed for data preprocessing in order to covert data into an appropriate form for learning. The existing frameworks should also be employed for parallel and distributed processes so that SVMs can be made scalable and properly online to be able to handle big data.

Download Full-text

An iterative method for classification of binary data

Information and Inference A Journal of the IMA ◽

10.1093/imaiai/iaaa003 ◽

2020 ◽

Author(s):

Denali Molitor ◽

Deanna Needell

Keyword(s):

Binary Data ◽

Large Scale ◽

Support Vector ◽

Large Scale Data ◽

Classification Framework ◽

Vector Machines ◽

Inference Methods ◽

Compressed Data ◽

Scale Data

Abstract In today’s data-driven world, storing, processing and gleaning insights from large-scale data are major challenges. Data compression is often required in order to store large amounts of high-dimensional data, and thus, efficient inference methods for analyzing compressed data are necessary. Building on a recently designed simple framework for classification using binary data, we demonstrate that one can improve classification accuracy of this approach through iterative applications whose output serves as input to the next application. As a side consequence, we show that the original framework can be used as a data preprocessing step to improve the performance of other methods, such as support vector machines. For several simple settings, we showcase the ability to obtain theoretical guarantees for the accuracy of the iterative classification method. The simplicity of the underlying classification framework makes it amenable to theoretical analysis.

Download Full-text

Metadata driven integration model for large scale data integration

2009 IEEE/ACS International Conference on Computer Systems and Applications ◽

10.1109/aiccsa.2009.5069296 ◽

2009 ◽

Author(s):

Bassem Barkallah ◽

Samir Moalla

Keyword(s):

Data Integration ◽

Large Scale ◽

Integration Model ◽

Large Scale Data ◽

Scale Data

Download Full-text

Decentralized Dynamic Query Optimization based on Mobiles Agents for Large Scale Data Integration Systems

International Journal of Computer Applications ◽

10.5120/9710-4172 ◽

2012 ◽

Vol 60 (8) ◽

pp. 7-14

Author(s):

Mohammad Hussein

Keyword(s):

Data Integration ◽

Query Optimization ◽

Large Scale ◽

Large Scale Data ◽

Scale Data ◽

Dynamic Query

Download Full-text

Reconstructing SARS-CoV-2 response signaling and regulatory networks

10.1101/2020.06.01.127589 ◽

2020 ◽

Cited By ~ 1

Author(s):

Jun Ding ◽

Jose Lugo-Martinez ◽

Ye Yuan ◽

Darrell N. Kotton ◽

Ziv Bar-Joseph

Keyword(s):

Data Integration ◽

Computational Methods ◽

Regulatory Networks ◽

Large Scale ◽

Human Cells ◽

Host Cells ◽

Interaction Data ◽

Large Scale Data ◽

Different Types ◽

Scale Data

AbstractSeveral molecular datasets have been recently compiled to characterize the activity of SARS-CoV-2 within human cells. Here we extend computational methods to integrate several different types of sequence, functional and interaction data to reconstruct networks and pathways activated by the virus in host cells. We identify the key proteins in these networks and further intersect them with genes differentially expressed at conditions that are known to impact viral activity. Several of the top ranked genes do not directly interact with virus proteins though some were shown to impact other coronaviruses highlighting the importance of large-scale data integration for understanding virus and host activity.Software and interactive visualization: https://github.com/phoenixding/sdremsc

Download Full-text

Classification of large-scale data and data batch stream with forward stagewise algorithm

Journal of the Korean Data and Information Science Society ◽

10.7465/jkdi.2014.25.6.1283 ◽

2014 ◽

Vol 25 (6) ◽

pp. 1283-1291

Author(s):

Young Joo Yoon

Keyword(s):

Large Scale ◽

Large Scale Data ◽

Scale Data

Download Full-text

A kernel fused perceptron for the online classification of large-scale data

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining Algorithms, Systems, Programming Models and Applications - BigMine '12 ◽

10.1145/2351316.2351332 ◽

2012 ◽

Author(s):

Huijun He ◽

Mingmin Chi ◽

Wenqiang Zhang

Keyword(s):

Large Scale ◽

Large Scale Data ◽

Online Classification ◽

Scale Data

Download Full-text

ANALISA POLA PEKERJAAN LULUSAN STMIK BUDI DARMA MENERAPKAN METODE C4.5

KOMIK (Konferensi Nasional Teknologi Informasi dan Komputer) ◽

10.30865/komik.v2i1.974 ◽

2018 ◽

Vol 2 (1) ◽

Author(s):

Anisa Anisa ◽

Mesran Mesran

Keyword(s):

Data Mining ◽

Large Scale ◽

Large Data ◽

Analysis Data ◽

Mining Method ◽

Training Set ◽

Data Mining Method ◽

Large Scale Data ◽

Scale Data

Data mining is mining or discovery information to the process of looking for patterns or information that contains the search trends in a number of very large data in taking decisions on the future.In determining the patterns of classification techniques garnered record (Training set). The class attribute, which is a decision tree with method C 4.5 builds upon an algorithm of induction can be minimised.By utilizing data jobs graduates expected to generate information about interest & talent, work with benefit from graduate quisioner alumni. A pattern of work that sought from large-scale data and analyzed by various algorithms to compute the C 4.5 can do that work based on the pattern of investigation patterns that affect so that it found the rules are interconnected that can result from the results of the classification of objects of different classes or categories of attributes that influence to shape the patterns of work. The application used is software that used Tanagra data mining for academic and research purposes.That contains data mining method explored starting from the data analysis, and classification data mining.Keywords: analysis, Data Mining, method C 4.5, Tanagra, patterns of work

Download Full-text

Large-scale data integration framework provides a comprehensive view on glioblastoma multiforme

Genome Medicine ◽

10.1186/gm186 ◽

2010 ◽

Vol 2 (9) ◽

pp. 65 ◽

Cited By ~ 122

Author(s):

Kristian Ovaska ◽

Marko Laakso ◽

Saija Haapa-Paananen ◽

Riku Louhimo ◽

Ping Chen ◽

...

Keyword(s):

Glioblastoma Multiforme ◽

Data Integration ◽

Large Scale ◽

Integration Framework ◽

Comprehensive View ◽

Large Scale Data ◽

Scale Data

Download Full-text