scholarly journals Accurate Prediction of Protein Functional Class from Sequence in the Mycobacterium tuberculosis and Escherichia coli Genomes Using Data Mining

Yeast ◽  
2000 ◽  
Vol 1 (4) ◽  
pp. 283-293
Author(s):  
Ross D. King ◽  
Andreas Karwath ◽  
Amanda Clare ◽  
Luc Dehaspe

The analysis of genomics data needs to become as automated as its generation. Here we present a novel data-mining approach to predicting protein functional class from sequence. This method is based on a combination of inductive logic programming clustering and rule learning. We demonstrate the effectiveness of this approach on the M. tuberculosis and E. coli genomes, and identify biologically interpretable rules which predict protein functional class from information only available from the sequence. These rules predict 65% of the ORFs with no assigned function in M. tuberculosis and 24% of those in E. coli, with an estimated accuracy of 60–80% (depending on the level of functional assignment). The rules are founded on a combination of detection of remote homology, convergent evolution and horizontal gene transfer. We identify rules that predict protein functional class even in the absence of detectable sequence or structural homology. These rules give insight into the evolutionary history of M. tuberculosis and E. coli.

Yeast ◽  
2000 ◽  
Vol 1 (4) ◽  
pp. 283-293 ◽  
Author(s):  
Ross D. King ◽  
Andreas Karwath ◽  
Amanda Clare ◽  
Luc Dehaspe

The analysis of genomics data needs to become as automated as its generation. Here we present a novel data-mining approach to predicting protein functional class from sequence. This method is based on a combination of inductive logic programming clustering and rule learning. We demonstrate the effectiveness of this approach on theM. tuberculosisandE. coligenomes, and identify biologically interpretable rules which predict protein functional class from information only available from the sequence. These rules predict 65% of the ORFs with no assigned function inM. tuberculosisand 24% of those inE. coli, with an estimated accuracy of 60–80% (depending on the level of functional assignment). The rules are founded on a combination of detection of remote homology, convergent evolution and horizontal gene transfer. We identify rules that predict protein functional class even in the absence of detectable sequence or structural homology. These rules give insight into the evolutionary history ofM. tuberculosisandE. coli.


2018 ◽  
Vol 8 (11) ◽  
pp. 2184 ◽  
Author(s):  
Sadok Rezig ◽  
Zied Achour ◽  
Nidhal Rezg

A data mining approach is integrated in this work for predictive sequential maintenance along with information on spare parts based on the history of the maintenance data. For most practical problems, the simple failure of one part of a given piece of equipment induces the subsequent failure of the other parts of said equipment. For example, it is frequently observed in mining industries that, like many other industries, the maintenance of conventional equipment is carried out in sequence. Besides, depending on the state of parts of the equipment, many parts can be consumed and replaced. Consequently, with a group of spare parts consumed sequentially in various maintenance activities, it is possible to discover sequential maintenance activities. From maintenance data with predefined support or threshold values and spare parts information, this work determines the sequential patterns of maintenance activities. The proposed method predicts the occurrence of the next maintenance activity with information on the consumed spare parts. An industrial real case study is presented in this paper and it is well-noticed that our experimental results shed new light on the maintenance prediction using data mining.


2004 ◽  
Vol 4 (4) ◽  
pp. 316-328 ◽  
Author(s):  
Carol J. Romanowski , ◽  
Rakesh Nagi

In variant design, the proliferation of bills of materials makes it difficult for designers to find previous designs that would aid in completing a new design task. This research presents a novel, data mining approach to forming generic bills of materials (GBOMs), entities that represent the different variants in a product family and facilitate the search for similar designs and configuration of new variants. The technical difficulties include: (i) developing families or categories for products, assemblies, and component parts; (ii) generalizing purchased parts and quantifying their similarity; (iii) performing tree union; and (iv) establishing design constraints. These challenges are met through data mining methods such as text and tree mining, a new tree union procedure, and embodying the GBOM and design constraints in constrained XML. The paper concludes with a case study, using data from a manufacturer of nurse call devices, and identifies a new research direction for data mining motivated by the domains of engineering design and information.


Author(s):  
Yannick Girardeau ◽  
Anne-Sophie Jannot ◽  
Gilles Chatellier ◽  
Olivier Saint-Jean

Edulib ◽  
2018 ◽  
Vol 8 (2) ◽  
pp. 194
Author(s):  
Lilis Syarifah ◽  
Imas Sukaesih Sitanggang ◽  
Pudji Muljono

The thesis is student study report which is accomplished as a requirement of graduation for Master program. Selecting study’s topic and advisors influence implementation of the study. Therefore, study’s topic is able to improve academic institution quality, however a large number of thesis documents on the repository cause difficulty to get information related to advisor’s expertness and the frequent or rare topic is former studied. Association rule mining can be used to mine information on the related item. This study aims to analyze advising patterns system in Master program on Agriculture based on supervisors and their topic research on metadata thesis of IPB repository and text documents of summary using data mining approach. The datas were collected from the repository of Bogor Agricultural University website and processed using R language programming. Pattern result of the reseach were that the most popular association on supervisor was occurred at support value of 0.00793 or equivalent to 7 theses and four popular topics were Botanical insecticide, Global warming, Upland Rice, and Land Use Change. The analysis result could be useful information to be reference or suggest future research or appropriate supervisor among agricultural.


Sign in / Sign up

Export Citation Format

Share Document