Assessment of a Rule-Based Virtual Screening Technology (INDDEx) on a Benchmark Data Set

Christopher R. Reynolds; Ata C. Amini; Stephen H. Muggleton; Michael J. E. Sternberg

doi:10.1021/jp212084f

Impact of Benchmark Data Set Topology on the Validation of Virtual Screening Methods: Exploration and Quantification by Spatial Statistics

Journal of Chemical Information and Modeling ◽

10.1021/ci700099u ◽

2008 ◽

Vol 48 (4) ◽

pp. 704-718 ◽

Cited By ~ 25

Author(s):

Sebastian G. Rohrer ◽

Knut Baumann

Keyword(s):

Virtual Screening ◽

Spatial Statistics ◽

Screening Methods ◽

Data Set ◽

Benchmark Data

Download Full-text

Comprehensive Comparison of Ligand-Based Virtual Screening Tools Against the DUD Data set Reveals Limitations of Current 3D Methods

Journal of Chemical Information and Modeling ◽

10.1021/ci100263p ◽

2010 ◽

Vol 50 (12) ◽

pp. 2079-2093 ◽

Cited By ~ 90

Author(s):

Vishwesh Venkatraman ◽

Violeta I. Pérez-Nueno ◽

Lazaros Mavridis ◽

David W. Ritchie

Keyword(s):

Virtual Screening ◽

Screening Tools ◽

Data Set ◽

Comprehensive Comparison ◽

3D Methods

Download Full-text

Nuclide Composition Benchmark Data Set for Verifying Burnup Codes on Spent Light Water Reactor Fuels

Nuclear Technology ◽

10.13182/nt02-2 ◽

2002 ◽

Vol 137 (2) ◽

pp. 111-126 ◽

Cited By ~ 30

Author(s):

Yoshinori Nakahara ◽

Kenya Suyama ◽

Jun Inagawa ◽

Ryuji Nagaishi ◽

Setsumi Kurosawa ◽

...

Keyword(s):

Light Water Reactor ◽

Data Set ◽

Light Water ◽

Water Reactor ◽

Benchmark Data ◽

Nuclide Composition

Download Full-text

ChemTok: A New Rule Based Tokenizer for Chemical Named Entity Recognition

BioMed Research International ◽

10.1155/2016/4248026 ◽

2016 ◽

Vol 2016 ◽

pp. 1-9 ◽

Cited By ~ 5

Author(s):

Abbas Akkasi ◽

Ekrem Varoğlu ◽

Nazife Dimililer

Keyword(s):

Conditional Random Fields ◽

Named Entity Recognition ◽

Classification Performance ◽

Entity Recognition ◽

Support Vector ◽

Learning Approaches ◽

Data Set ◽

Rule Based ◽

Named Entity ◽

Vector Machines

Named Entity Recognition (NER) from text constitutes the first step in many text mining applications. The most important preliminary step for NER systems using machine learning approaches is tokenization where raw text is segmented into tokens. This study proposes an enhanced rule based tokenizer, ChemTok, which utilizes rules extracted mainly from the train data set. The main novelty of ChemTok is the use of the extracted rules in order to merge the tokens split in the previous steps, thus producing longer and more discriminative tokens. ChemTok is compared to the tokenization methods utilized by ChemSpot and tmChem. Support Vector Machines and Conditional Random Fields are employed as the learning algorithms. The experimental results show that the classifiers trained on the output of ChemTok outperforms all classifiers trained on the output of the other two tokenizers in terms of classification performance, and the number of incorrectly segmented entities.

Download Full-text

A New Approach for Supervised Dimensionality Reduction

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2018100102 ◽

2018 ◽

Vol 14 (4) ◽

pp. 20-37 ◽

Cited By ~ 1

Author(s):

Yinglei Song ◽

Yongzhong Li ◽

Junfeng Qu

Keyword(s):

Eigenvalue Problem ◽

Dimensionality Reduction ◽

Image Databases ◽

Data Sets ◽

Data Set ◽

New Approach ◽

Local Structures ◽

Benchmark Data ◽

Global And Local

This article develops a new approach for supervised dimensionality reduction. This approach considers both global and local structures of a labelled data set and maximizes a new objective that includes the effects from both of them. The objective can be approximately optimized by solving an eigenvalue problem. The approach is evaluated based on a few benchmark data sets and image databases. Its performance is also compared with a few other existing approaches for dimensionality reduction. Testing results show that, on average, this new approach can achieve more accurate results for dimensionality reduction than existing approaches.

Download Full-text

PSOVina: The hybrid particle swarm optimization algorithm for protein–ligand docking

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720015410073 ◽

2015 ◽

Vol 13 (03) ◽

pp. 1541007 ◽

Cited By ~ 31

Author(s):

Marcus C. K. Ng ◽

Simon Fong ◽

Shirley W. I. Siu

Keyword(s):

Particle Swarm Optimization ◽

Virtual Screening ◽

Particle Swarm ◽

Ligand Docking ◽

Conformational Search ◽

Search Problem ◽

Data Set ◽

Swarm Optimization ◽

Screening Experiments ◽

Hybrid Particle Swarm Optimization

Protein–ligand docking is an essential step in modern drug discovery process. The challenge here is to accurately predict and efficiently optimize the position and orientation of ligands in the binding pocket of a target protein. In this paper, we present a new method called PSOVina which combined the particle swarm optimization (PSO) algorithm with the efficient Broyden–Fletcher–Goldfarb–Shannon (BFGS) local search method adopted in AutoDock Vina to tackle the conformational search problem in docking. Using a diverse data set of 201 protein–ligand complexes from the PDBbind database and a full set of ligands and decoys for four representative targets from the directory of useful decoys (DUD) virtual screening data set, we assessed the docking performance of PSOVina in comparison to the original Vina program. Our results showed that PSOVina achieves a remarkable execution time reduction of 51–60% without compromising the prediction accuracies in the docking and virtual screening experiments. This improvement in time efficiency makes PSOVina a better choice of a docking tool in large-scale protein–ligand docking applications. Our work lays the foundation for the future development of swarm-based algorithms in molecular docking programs. PSOVina is freely available to non-commercial users at http://cbbio.cis.umac.mo .

Download Full-text

Artificial bee colony algorithm for feature selection and improved support vector machine for text classification

Information Discovery and Delivery ◽

10.1108/idd-09-2018-0045 ◽

2019 ◽

Vol 47 (3) ◽

pp. 154-170

Author(s):

Janani Balakumar ◽

S. Vijayarani Mohan

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Text Classification ◽

Support Vector ◽

Data Sets ◽

Selection Algorithm ◽

Data Set ◽

Content Type ◽

Benchmark Data ◽

Bee Colony

Purpose Owing to the huge volume of documents available on the internet, text classification becomes a necessary task to handle these documents. To achieve optimal text classification results, feature selection, an important stage, is used to curtail the dimensionality of text documents by choosing suitable features. The main purpose of this research work is to classify the personal computer documents based on their content. Design/methodology/approach This paper proposes a new algorithm for feature selection based on artificial bee colony (ABCFS) to enhance the text classification accuracy. The proposed algorithm (ABCFS) is scrutinized with the real and benchmark data sets, which is contrary to the other existing feature selection approaches such as information gain and χ2 statistic. To justify the efficiency of the proposed algorithm, the support vector machine (SVM) and improved SVM classifier are used in this paper. Findings The experiment was conducted on real and benchmark data sets. The real data set was collected in the form of documents that were stored in the personal computer, and the benchmark data set was collected from Reuters and 20 Newsgroups corpus. The results prove the performance of the proposed feature selection algorithm by enhancing the text document classification accuracy. Originality/value This paper proposes a new ABCFS algorithm for feature selection, evaluates the efficiency of the ABCFS algorithm and improves the support vector machine. In this paper, the ABCFS algorithm is used to select the features from text (unstructured) documents. Although, there is no text feature selection algorithm in the existing work, the ABCFS algorithm is used to select the data (structured) features. The proposed algorithm will classify the documents automatically based on their content.

Download Full-text

Fuzzy rule-based model to optimize outflow in single reservoir operation

MATEC Web of Conferences ◽

10.1051/matecconf/201927004015 ◽

2019 ◽

Vol 270 ◽

pp. 04015

Author(s):

Edy Anto Soentoro ◽

Nina Pebriana

Keyword(s):

Water Availability ◽

Reservoir Operation ◽

Life Experiences ◽

Real Life ◽

Fuzzy Rule ◽

Basic Pattern ◽

Output Data ◽

Data Set ◽

Rule Based

Reservoir operations, especially those which regulate the outflow (release) volume, are crucial for the fulfillment of the purpose to build the reservoir. To get the best results, outflow (release) discharges need to be optimized to meet the objectives of the reservoir operation. A fuzzy rule-based model was used in this study because it can deal with uncertainty constraints and objects without clear or well-defined boundaries. The objective of this study is to determine the maximum total release volume based on water availability (i.e., a monthly release is equal to or more than monthly demand). The case study is located at Darma reservoir. A fuzzy rule-based model was used to optimize the monthly release volume, and the result was compared with that of NLP and the demand. The Sugeno fuzzy method was used to generate fuzzy rules from a given input-output data set that consisted of demand, inflow, storage, and release. The results of this study showed that the release of Sugeno method and the demand have the same basic pattern, in which the release fulfill the demand. The overall result showed that the fuzzy rule-based model with Sugeno method can be used for optimization based on real-life experiences from experts that are used to working in the field.

Download Full-text

A Benchmark Data Set for Model-Based Glycemic Control in Critical Care

Journal of Diabetes Science and Technology ◽

10.1177/193229680800200409 ◽

2008 ◽

Vol 2 (4) ◽

pp. 584-594 ◽

Cited By ~ 21

Author(s):

J. Geoffrey Chase ◽

Aaron LeCompte ◽

Geoffrey M. Shaw ◽

Amy Blakemore ◽

Jason Wong ◽

...

Keyword(s):

Critical Care ◽

Glycemic Control ◽

Data Set ◽

Benchmark Data ◽

Model Based

Download Full-text

Toward a Benchmarking Data Set Able to Evaluate Ligand- and Structure-based Virtual Screening Using Public HTS Data

Journal of Chemical Information and Modeling ◽

10.1021/ci5005465 ◽

2015 ◽

Vol 55 (2) ◽

pp. 343-353 ◽

Cited By ~ 18

Author(s):

Martin Lindh ◽

Fredrik Svensson ◽

Wesley Schaal ◽

Jin Zhang ◽

Christian Sköld ◽

...

Keyword(s):

Virtual Screening ◽

Data Set

Download Full-text