Collaborating All the Way to the Top

Innovations in Teaching & Learning Conference Proceedings ◽

10.13021/g8330k ◽

2014 ◽

Vol 6 ◽

Author(s):

Huzefa Rangwala

Keyword(s):

Data Mining ◽

Predictive Models ◽

Structure Prediction ◽

Predictive Performance ◽

General Purpose ◽

Sequence Information ◽

Base Line ◽

Specific Class ◽

Truth Values ◽

Final Project

The classes I teach have a predictive modeling component. As a student, having participated in blind protein structure prediction competitions (CASP, http://predictioncenter.org) and data mining competitions like KDD Cup, I have implemented this form of competitions in my bioinformatics and data mining classes. This semester I extended this idea to a different class (Parallel Computing). Specifically, as part of an assignment (or final project) students have to train a predictive models to distinguish a specific class of proteins called "solenoids" using the available protein sequence information. As part of this competition, the truth-values are hidden from the students and they have to make a prediction (guess) and submit their results to the instructor. The instructor then evaluates the results using the truth-values and provides a ranking of the class students based on the predictive performance. The concepts introduced in class allow the students to build base line predictive models, but to improve performance, students have to research, think critically and come up with innovative solutions. In my past two implementations of this project, I have used an in-house evaluation script and requested participants to send me solutions via a simple web server. Both times, the assignment was run for a 4-week period. I have also used technologies like Kaggle to setup this competition. In the future, I would like to implement the competition for the duration of the semester. Students would be taught a concept, and they would implement the same towards improving their predictive models and engineer better solutions as new, advanced concepts are taught.I am also developing a model that allows students to achieve these projects in a collaborative fashion by enabling resources like Wikis and other tools. As such, this session will be an introduction to the tools used and how they could be adapted to general purpose classes

Download Full-text

The economics of selection of mail orders Drs. Zahavi and Levin are the masterminds behind the development of AMOS, a customized predictive modeling system for the Franklin Mint in Philadelphia, and GainSmarts, a general purpose data mining system that is the two-time winner of the KDD-CUP competition for the best data mining tools (1997 and 1998) sponsored by the American Association for Artificial Intelligence.

Journal of Interactive Marketing ◽

10.1002/dir.1016.abs ◽

2001 ◽

Vol 15 (3) ◽

pp. 53

Author(s):

Nissan Levin ◽

Jacob Zahavi

Keyword(s):

Artificial Intelligence ◽

Data Mining ◽

Predictive Modeling ◽

American Association ◽

General Purpose ◽

Mining System ◽

Data Mining System ◽

Mining Tools ◽

Selection Of

Download Full-text

Recognizing Job Apathy Patterns of Iraqi Higher Education Employees Using Data Mining Techniques

Journal of Southwest Jiaotong University ◽

10.35741/issn.0258-2724.54.4.30 ◽

2019 ◽

Vol 54 (4) ◽

Author(s):

Mustafa S. Abd ◽

Suhad Faisal Behadili

Keyword(s):

Higher Education ◽

Data Mining ◽

Psychiatric Patients ◽

Human Life ◽

A Priori ◽

Psychological Research ◽

Attribute Selection ◽

Specific Class ◽

Data Mining Techniques ◽

Using Data

Psychological research centers help indirectly contact professionals from the fields of human life, job environment, family life, and psychological infrastructure for psychiatric patients. This research aims to detect job apathy patterns from the behavior of employee groups in the University of Baghdad and the Iraqi Ministry of Higher Education and Scientific Research. This investigation presents an approach using data mining techniques to acquire new knowledge and differs from statistical studies in terms of supporting the researchers’ evolving needs. These techniques manipulate redundant or irrelevant attributes to discover interesting patterns. The principal issue identifies several important and affective questions taken from a questionnaire, and the psychiatric researchers recommend these questions. Useless questions are pruned using the attribute selection method. Moreover, pieces of information gained through these questions are measured according to a specific class and ranked accordingly. Association and a priori algorithms are used to detect the most influential and interrelated questions in the questionnaire. Consequently, the decisive parameters that may lead to job apathy are determined.

Download Full-text

Combining Regional Habitat Selection Models for Large-Scale Prediction: Circumpolar Habitat Selection of Southern Ocean Humpback Whales

Remote Sensing ◽

10.3390/rs13112074 ◽

2021 ◽

Vol 13 (11) ◽

pp. 2074

Author(s):

Ryan R. Reisinger ◽

Ari S. Friedlaender ◽

Alexandre N. Zerbini ◽

Daniel M. Palacios ◽

Virginia Andrews-Goff ◽

...

Keyword(s):

Habitat Selection ◽

Predictive Models ◽

Regional Variation ◽

Large Scale ◽

Predictive Performance ◽

Humpback Whale ◽

Machine Learning Algorithms ◽

Humpback Whales ◽

Environmental Covariates ◽

Animal Habitat

Machine learning algorithms are often used to model and predict animal habitat selection—the relationships between animal occurrences and habitat characteristics. For broadly distributed species, habitat selection often varies among populations and regions; thus, it would seem preferable to fit region- or population-specific models of habitat selection for more accurate inference and prediction, rather than fitting large-scale models using pooled data. However, where the aim is to make range-wide predictions, including areas for which there are no existing data or models of habitat selection, how can regional models best be combined? We propose that ensemble approaches commonly used to combine different algorithms for a single region can be reframed, treating regional habitat selection models as the candidate models. By doing so, we can incorporate regional variation when fitting predictive models of animal habitat selection across large ranges. We test this approach using satellite telemetry data from 168 humpback whales across five geographic regions in the Southern Ocean. Using random forests, we fitted a large-scale model relating humpback whale locations, versus background locations, to 10 environmental covariates, and made a circumpolar prediction of humpback whale habitat selection. We also fitted five regional models, the predictions of which we used as input features for four ensemble approaches: an unweighted ensemble, an ensemble weighted by environmental similarity in each cell, stacked generalization, and a hybrid approach wherein the environmental covariates and regional predictions were used as input features in a new model. We tested the predictive performance of these approaches on an independent validation dataset of humpback whale sightings and whaling catches. These multiregional ensemble approaches resulted in models with higher predictive performance than the circumpolar naive model. These approaches can be used to incorporate regional variation in animal habitat selection when fitting range-wide predictive models using machine learning algorithms. This can yield more accurate predictions across regions or populations of animals that may show variation in habitat selection.

Download Full-text

Algoritma K-Means Untuk Klasterisasi Tugas Akhir Mahasiswa Berdasarkan Keahlian

Jurnal Sistim Informasi dan Teknologi ◽

10.37034/jsisfotek.v1i3.5 ◽

2019 ◽

pp. 24-29

Author(s):

Weri Sirait ◽

Sarjon Defit ◽

Gunadi Widi Nurcahyo

Keyword(s):

Higher Education ◽

Data Mining ◽

Private University ◽

Clustering Method ◽

Final Project ◽

Final Level ◽

Database Administrator ◽

Mapping System ◽

Education Service ◽

Service Institution

School of Information and Computer Management (STMIK) Indonesia Padang is a private university under the auspices of the Higher Education Service Institution (LLDIKTI) Region X, producing graduates who are competent in the field of system analysts and database administrators. Requirements to meet undergraduate graduates (S1) final year students need to complete a final project or thesis. Final year students at STMIK Indonesia Padang often experience confusion in taking the final assignment topic. This is due to the fact that the final year students have not been able to direct their potential in determining the final assignment topic. In this case, researchers conducted the process of grouping final level students using the Data Mining K-means Clustering technique. The process of grouping final-level students is done by utilizing the data of course values from the field mapping system analysts and database administrators. In this grouping two clusters will be produced, namely students taking the final assignment of system analysts and database administrator. So by using this K-means Clustering method, students have direction in taking the final assignment topic. The results obtained from 40 data samples used were students who took the topic of the final project system analysts as many as 20 students and students who took the final assignment of database administrators were 20 students.

Download Full-text

A method for RNA structure prediction shows evidence for structure in lncRNAs

10.1101/284869 ◽

2018 ◽

Author(s):

Riccardo Delli ponti ◽

Alexandros Armaos ◽

Stefanie Marti ◽

Gian Gaetano Tartaglia

Keyword(s):

Secondary Structure ◽

Rna Structure ◽

Structure Prediction ◽

Sequence Information ◽

Time Warping ◽

Single Nucleotide ◽

Rna Molecules ◽

Rna Structure Prediction ◽

Dynamic Time ◽

Nucleotide Resolution

AbstractTo compare the secondary structures of RNA molecules we developed the CROSSalign method. CROSSalign is based on the combination of the Computational Recognition Of Secondary Structure (CROSS) algorithm to predict the RNA secondary structure at single-nucleotide resolution using sequence information, and the Dynamic Time Warping (DTW) method to align profiles of different lengths. We applied CROSSalign to investigate the structural conservation of long non-coding RNAs such as XIST and HOTAIR as well as ssRNA viruses including HIV. In a pool of sequences with the same secondary structure CROSSalign accurately recognizes repeat A of XIST and domain D2 of HOTAIR and outperforms other methods based on covariance modelling. CROSSalign can be applied to perform pair-wise comparisons and is able to find homologues between thousands of matches identifying the exact regions of similarity between profiles of different lengths. The algorithm is freely available at the webpage http://service.tartaglialab.com//new_submission/CROSSalign.

Download Full-text

FingerprintContacts: Predicting Alternative Conformations of Proteins from Coevolution

10.1101/2020.04.13.037234 ◽

2020 ◽

Author(s):

Jiangyan Feng ◽

Diwakar Shukla

Keyword(s):

Ligand Binding ◽

Structure Prediction ◽

De Novo ◽

Three Dimensional ◽

Sequence Information ◽

Structural Constraints ◽

Complex Signals ◽

Residue Contacts ◽

Small Clusters ◽

Functional Mechanisms

AbstractProteins are dynamic molecules which perform diverse molecular functions by adopting different three-dimensional structures. Recent progress in residue-residue contacts prediction opens up new avenues for the de novo protein structure prediction from sequence information. However, it is still difficult to predict more than one conformation from residue-residue contacts alone. This is due to the inability to deconvolve the complex signals of residue-residue contacts, i.e. spatial contacts relevant for protein folding, conformational diversity, and ligand binding. Here, we introduce a machine learning based method, called FingerprintContacts, for extending the capabilities of residue-residue contacts. This algorithm leverages the features of residue-residue contacts, that is, (1) a single conformation outperforms the others in the structural prediction using all the top ranking residue-residue contacts as structural constraints, and (2) conformation specific contacts rank lower and constitute a small fraction of residue-residue contacts. We demonstrate the capabilities of FingerprintContacts on eight ligand binding proteins with varying conformational motions. Furthermore, FingerprintContacts identifies small clusters of residue-residue contacts which are preferentially located in the dynamically fluctuating regions. With the rapid growth in protein sequence information, we expect FingerprintContacts to be a powerful first step in structural understanding of protein functional mechanisms.

Download Full-text

Coupled 3-D CFD-DDPM Numerical Simulation of Turbulent Swirling Gas-Particle Flow Within Cyclone Suspension Preheater of Cement Kilns

Volume 1A, Symposia: Turbomachinery Flow Simulation and Optimization; Applications in CFD; Bio-Inspired and Bio-Medical Fluid Mechanics; CFD Verification and Validation; Development and Applications of Immersed Boundary Methods; DNS, LES and Hybrid RANS/LES Methods; Fluid Machinery; Fluid-Structure Interaction and Flow-Induced Noise in Industrial Applications; Flow Applications in Aerospace; Active Fluid Dynamics and Flow Control — Theory, Experiments and Implementation ◽

10.1115/fedsm2016-7596 ◽

2016 ◽

Cited By ~ 2

Author(s):

Eugen-Dan Cristea ◽

Pierangelo Conti

Keyword(s):

Numerical Simulation ◽

Collection Efficiency ◽

Three Dimensional ◽

Predictive Performance ◽

Reynolds Stress Model ◽

General Purpose ◽

Cement Kiln ◽

Discrete Phase Model ◽

Ansys Fluent ◽

Cement Kilns

The paper presents a three-dimensional (3-D), time-dependent Euler-Lagrange multiphase approach for high-fidelity numerical simulation of strongly swirling, turbulent, heavy dust-laden flows within large-sized cyclone separators, as components of the state-of-art suspension preheaters (SPH) of cement kilns. The case study evaluates the predictive performance of the coupled hybrid 3-D computational fluid dynamics–dense discrete phase model (CFD-DDPM) approach implemented into the commercial general purpose code ANSYS-Fluent R16.2, when applied to industrial cyclone collectors used to separate particles from gaseous streams. The gas (flue gases) flow is addressed numerically by using the traditional CFD methods to solve finite volume unsteady Reynolds-averaged Navier-Stokes (FV-URANS) equations. The multiphase turbulence is modeled by using an option of Reynolds stress model (RSM), namely dispersed turbulence model. The motion of the discrete (granular) phase is captured by DDPM methodology. The twin cyclones of SPH top-most stage have been analyzed extensively both for the overall pressure drop and global collection efficiency, and for the very complex multiphase flow patterns established inside this equipment. The numerical simulation results have been verified and partially validated against an available set of typical industrial measurements collected during a heat and mass balance (H&MB) of the cement kiln.

Download Full-text

Prediction of Structural and Functional Aspects of Protein

Advances in Secure Computing, Internet Services, and Applications - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-4666-4940-8.ch016 ◽

2014 ◽

pp. 317-333

Author(s):

Arun G. Ingale

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Tertiary Structure ◽

Protein Structures ◽

Three Dimensional ◽

Dimensional Structure ◽

Sequence Information ◽

Predict Protein Structure ◽

Basic Ideas

To predict the structure of protein from a primary amino acid sequence is computationally difficult. An investigation of the methods and algorithms used to predict protein structure and a thorough knowledge of the function and structure of proteins are critical for the advancement of biology and the life sciences as well as the development of better drugs, higher-yield crops, and even synthetic bio-fuels. To that end, this chapter sheds light on the methods used for protein structure prediction. This chapter covers the applications of modeled protein structures and unravels the relationship between pure sequence information and three-dimensional structure, which continues to be one of the greatest challenges in molecular biology. With this resource, it presents an all-encompassing examination of the problems, methods, tools, servers, databases, and applications of protein structure prediction, giving unique insight into the future applications of the modeled protein structures. In this chapter, current protein structure prediction methods are reviewed for a milieu on structure prediction, the prediction of structural fundamentals, tertiary structure prediction, and functional imminent. The basic ideas and advances of these directions are discussed in detail.

Download Full-text

Proposal of Analytical Model for Business Problems Solving in Big Data Environment

Web Services ◽

10.4018/978-1-5225-7501-6.ch034 ◽

2019 ◽

pp. 618-638

Author(s):

Goran Klepac ◽

Kristi L. Berg

Keyword(s):

Data Mining ◽

Big Data ◽

Predictive Models ◽

Analytical Approach ◽

Fraud Detection ◽

Analytical Techniques ◽

Data Sources ◽

Business Decisions ◽

Mining Projects ◽

Structured Approach

This chapter proposes a new analytical approach that consolidates the traditional analytical approach for solving problems such as churn detection, fraud detection, building predictive models, segmentation modeling with data sources, and analytical techniques from the big data area. Presented are solutions offering a structured approach for the integration of different concepts into one, which helps analysts as well as managers to use potentials from different areas in a systematic way. By using this concept, companies have the opportunity to introduce big data potential in everyday data mining projects. As is visible from the chapter, neglecting big data potentials results often with incomplete analytical results, which imply incomplete information for business decisions and can imply bad business decisions. The chapter also provides suggestions on how to recognize useful data sources from the big data area and how to analyze them along with traditional data sources for achieving more qualitative information for business decisions.

Download Full-text

Sports result prediction using data mining techniques in comparison with base line model

OPSEARCH ◽

10.1007/s12597-020-00470-9 ◽

2020 ◽

Author(s):

Praphula Kumar Jain ◽

Waris Quamer ◽

Rajendra Pamula

Keyword(s):

Data Mining ◽

Base Line ◽

Line Model ◽

Data Mining Techniques ◽

Using Data

Download Full-text