Introducing a Practical Educational Tool for Correlating Algorithm Time Complexity with Real Program Execution

Gisela Kurniawati; Oscar Karnalim

doi:10.25126/jitecs.20183140

Introducing a Practical Educational Tool for Correlating Algorithm Time Complexity with Real Program Execution

Journal of Information Technology and Computer Science ◽

10.25126/jitecs.20183140 ◽

2018 ◽

Vol 3 (1) ◽

pp. 1 ◽

Cited By ~ 2

Author(s):

Gisela Kurniawati ◽

Oscar Karnalim

Keyword(s):

Time Complexity ◽

Learning Algorithm ◽

Promising Result ◽

Pearson Correlation ◽

Source Code ◽

Program Execution ◽

Educational Tool ◽

Real Environment ◽

Informal Observation ◽

Three Components

Algorithm time complexity is an important topic to be learned for programmer; it could define whether an algorithm is practical to be used on real environment or not. However, learning such material is not a trivial task. Based on our informal observation regarding students’ test, most of them could not correlate Big-Oh equation to real program execution. This paper proposes JCEL, an educational tool that acts as a supportive tool for learning algorithm time complexity. Using this tool, user could learn how to correlate Big-Oh equation with real program execution by providing three components: a Java source code, source code input set, and time complexity equations. According to our evaluation, students feel that JCEL is helpful for learning the correlation between Big-Oh equation and real program execution. Further, the use of Pearson correlation in JCEL shows a promising result.

Download Full-text

Complexitor: An Educational Tool for Learning Algorithm Time Complexity in Practical Manner

ComTech Computer Mathematics and Engineering Applications ◽

10.21512/comtech.v8i1.3783 ◽

2017 ◽

Vol 8 (1) ◽

pp. 21 ◽

Cited By ~ 1

Author(s):

Elvina Elvina ◽

Oscar Karnalim

Keyword(s):

Time Complexity ◽

Learning Algorithm ◽

Pearson Correlation ◽

Source Code ◽

Test Cases ◽

Educational Tool ◽

Execution Sequence ◽

Algorithm Implementation

Based on the informal survey, learning algorithm time complexity in a theoretical manner can be rather difficult to understand. Therefore, this research proposed Complexitor, an educational tool for learning algorithm time complexity in a practical manner. Students could learn how to determine algorithm time complexity through the actual execution of algorithm implementation. They were only required to provide algorithm implementation (i.e. source code written on a particularprogramming language) and test cases to learn time complexity. After input was given, Complexitor generated execution sequence based on test cases and determine its time complexity through Pearson correlation. An algorithm time complexity with the highest correlation value toward execution sequence was assigned as its result. Based on the evaluation, it can be concluded this mechanism is quite effective for determining time complexity as long as the distribution of given input set is balanced.

Download Full-text

Interfacing Complexitor: An Empirical-based Educational Tool for Learning Time Complexity

10.31227/osf.io/2pqkh ◽

2017 ◽

Author(s):

Oscar Karnalim ◽

Elvina

Keyword(s):

Undergraduate Students ◽

Time Complexity ◽

Learning Algorithm ◽

Qualitative Evaluation ◽

Empirical Approach ◽

Educational Tool ◽

Learning Time ◽

Input And Output ◽

Standard Application ◽

Algorithm Implementation

Since learning algorithm time complexity in theoretical manner is rather difficult, an educational tool, which is named Complexitor, tried to incorporate empirical approach for teaching such material. Students can learn how to determine a time complexity for given algorithm based on the actual execution. Students are only required to provide algorithm implementation and input set. This paper extends the work of Complexitor by providing a stable interface and qualitative evaluation. The interface is developed based on input and output characteristic of Complexitor whereas the evaluation is represented as a survey toward 20 undergraduate students. Based on student’s perspective, Complexitor features, at some extent, may help students to learn algorithm time complexity. Moreover, they also state that our tool has fulfilled standard application aspects. In other words, our tool is eligible to be used for learning algorithm time complexity.

Download Full-text

The smallest extraction problem

Proceedings of the VLDB Endowment ◽

10.14778/3476249.3476293 ◽

2021 ◽

Vol 14 (11) ◽

pp. 2445-2458

Author(s):

Valerio Cetorelli ◽

Paolo Atzeni ◽

Valter Crescenzi ◽

Franco Milicchio

Keyword(s):

Unsupervised Learning ◽

Optimization Problem ◽

Learning Algorithm ◽

State Of The Art ◽

Data Extraction ◽

Source Code ◽

Web Data ◽

Web Data Extraction ◽

New Family ◽

Context Free

We introduce landmark grammars , a new family of context-free grammars aimed at describing the HTML source code of pages published by large and templated websites and therefore at effectively tackling Web data extraction problems. Indeed, they address the inherent ambiguity of HTML, one of the main challenges of Web data extraction, which, despite over twenty years of research, has been largely neglected by the approaches presented in literature. We then formalize the Smallest Extraction Problem (SEP), an optimization problem for finding the grammar of a family that best describes a set of pages and contextually extract their data. Finally, we present an unsupervised learning algorithm to induce a landmark grammar from a set of pages sharing a common HTML template, and we present an automatic Web data extraction system. The experiments on consolidated benchmarks show that the approach can substantially contribute to improve the state-of-the-art.

Download Full-text

Using Portfolios: Integrating Learning and Promoting for Social Work Students

Advances in Social Work ◽

10.18060/59 ◽

2004 ◽

Vol 5 (1) ◽

pp. 105-123 ◽

Cited By ~ 6

Author(s):

Mona Schatz

Keyword(s):

Social Work ◽

Graduate Students ◽

Pearson Correlation ◽

Social Work Students ◽

Educational Tool ◽

Second Year ◽

Graduate Social Work

Portfolios are a valuable educational tool to aid in the integrative experience for graduate social work students. Forty-one graduate students were asked to evaluate their portfolio experience. A Pearson correlation shows that graduate students find the experience of developing a portfolio to be reflective of their second year MSW program (r=.511; p

Download Full-text

Scoring functions for drug-effect similarity

Briefings in Bioinformatics ◽

10.1093/bib/bbaa072 ◽

2020 ◽

Cited By ~ 1

Author(s):

Stephan Struckmann ◽

Mathias Ernst ◽

Sarah Fischer ◽

Nancy Mah ◽

Georg Fuellen ◽

...

Keyword(s):

Cell Line ◽

Drug Effect ◽

Pearson Correlation ◽

Source Code ◽

New Drugs ◽

Supplementary Information ◽

Disease Genes ◽

Scoring Functions ◽

Profile Changes

Abstract Motivation The difficulty to find new drugs and bring them to the market has led to an increased interest to find new applications for known compounds. Biological samples from many disease contexts have been extensively profiled by transcriptomics, and, intuitively, this motivates to search for compounds with a reversing effect on the expression of characteristic disease genes. However, disease effects may be cell line-specific and also depend on other factors, such as genetics and environment. Transcription profile changes between healthy and diseased cells relate in complex ways to profile changes gathered from cell lines upon stimulation with a drug. Despite these differences, we expect that there will be some similarity in the gene regulatory networks at play in both situations. The challenge is to match transcriptomes for both diseases and drugs alike, even though the exact molecular pathology/pharmacogenomics may not be known. Results We substitute the challenge to match a drug effect to a disease effect with the challenge to match a drug effect to the effect of the same drug at another concentration or in another cell line. This is welldefined, reproducible in vitro and in silico and extendable with external data. Based on the Connectivity Map (CMap) dataset, we combined 26 different similarity scores with six different heuristics to reduce the number of genes in the model. Such gene filters may also utilize external knowledge e.g. from biological networks. We found that no similarity score always outperforms all others for all drugs, but the Pearson correlation finds the same drug with the highest reliability. Results are improved by filtering for highly expressed genes and to a lesser degree for genes with large fold changes. Also a network-based reduction of contributing transcripts was beneficial, here implemented by the FocusHeuristics. We found no drop in prediction accuracy when reducing the whole transcriptome to the set of 1000 landmark genes of the CMap’s successor project Library of Integrated Network-based Cellular Signatures. All source code to re-analyze and extend the CMap data, the source code of heuristics, filters and their evaluation are available to propel the development of new methods for drug repurposing. Availability https://bitbucket.org/ibima/moldrugeffectsdb Contact [email protected] Supplementary information Supplementary data are available at Briefings in Bioinformatics online.

Download Full-text

Investigating Serum and Tissue Expression Identified a Cytokine/Chemokine Signature as a Highly Effective Melanoma Marker

Cancers ◽

10.3390/cancers12123680 ◽

2020 ◽

Vol 12 (12) ◽

pp. 3680

Author(s):

Marco Cesati ◽

Francesca Scatozza ◽

Daniela D’Arcangelo ◽

Gian Carlo Antonini-Cappellini ◽

Stefania Rossi ◽

...

Keyword(s):

Single Molecule ◽

Learning Algorithm ◽

Profile Analysis ◽

Pearson Correlation ◽

Tissue Expression ◽

Support Vector ◽

Expression Data ◽

Serum Samples ◽

Tissue Samples ◽

Mortality And Morbidity

The identification of reliable and quantitative melanoma biomarkers may help an early diagnosis and may directly affect melanoma mortality and morbidity. The aim of the present study was to identify effective biomarkers by investigating the expression of 27 cytokines/chemokines in melanoma compared to healthy controls, both in serum and in tissue samples. Serum samples were from 232 patients recruited at the IDI-IRCCS hospital. Expression was quantified by xMAP technology, on 27 cytokines/chemokines, compared to the control sera. RNA expression data of the same 27 molecules were obtained from 511 melanoma- and healthy-tissue samples, from the GENT2 database. Statistical analysis involved a 3-step approach: analysis of the single-molecules by Mann–Whitney analysis; analysis of paired-molecules by Pearson correlation; and profile analysis by the machine learning algorithm Support Vector Machine (SVM). Single-molecule analysis of serum expression identified IL-1b, IL-6, IP-10, PDGF-BB, and RANTES differently expressed in melanoma (p < 0.05). Expression of IL-8, GM-CSF, MCP-1, and TNF-α was found to be significantly correlated with Breslow thickness. Eotaxin and MCP-1 were found differentially expressed in male vs. female patients. Tissue expression analysis identified very effective marker/predictor genes, namely, IL-1Ra, IL-7, MIP-1a, and MIP-1b, with individual AUC values of 0.88, 0.86, 0.93, 0.87, respectively. SVM analysis of the tissue expression data identified the combination of these four molecules as the most effective signature to discriminate melanoma patients (AUC = 0.98). Validation, using the GEPIA2 database on an additional 1019 independent samples, fully confirmed these observations. The present study demonstrates, for the first time, that the IL-1Ra, IL-7, MIP-1a, and MIP-1b gene signature discriminates melanoma from control tissues with extremely high efficacy. We therefore propose this 4-molecule combination as an effective melanoma marker.

Download Full-text

MS2CNN: predicting MS/MS spectrum based on protein sequence using deep convolutional neural networks

BMC Genomics ◽

10.1186/s12864-019-6297-6 ◽

2019 ◽

Vol 20 (S9) ◽

Cited By ~ 1

Author(s):

Yang-Ming Lin ◽

Ching-Tai Chen ◽

Jia-Ming Chang

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Convolutional Neural Networks ◽

Large Scale ◽

Learning Algorithm ◽

Pearson Correlation ◽

Tandem Mass ◽

Spectral Library ◽

Deep Convolutional Neural Networks ◽

Library Search

Abstract Background Tandem mass spectrometry allows biologists to identify and quantify protein samples in the form of digested peptide sequences. When performing peptide identification, spectral library search is more sensitive than traditional database search but is limited to peptides that have been previously identified. An accurate tandem mass spectrum prediction tool is thus crucial in expanding the peptide space and increasing the coverage of spectral library search. Results We propose MS2CNN, a non-linear regression model based on deep convolutional neural networks, a deep learning algorithm. The features for our model are amino acid composition, predicted secondary structure, and physical-chemical features such as isoelectric point, aromaticity, helicity, hydrophobicity, and basicity. MS2CNN was trained with five-fold cross validation on a three-way data split on the large-scale human HCD MS2 dataset of Orbitrap LC-MS/MS downloaded from the National Institute of Standards and Technology. It was then evaluated on a publicly available independent test dataset of human HeLa cell lysate from LC-MS experiments. On average, our model shows better cosine similarity and Pearson correlation coefficient (0.690 and 0.632) than MS2PIP (0.647 and 0.601) and is comparable with pDeep (0.692 and 0.642). Notably, for the more complex MS2 spectra of 3+ peptides, MS2PIP is significantly better than both MS2PIP and pDeep. Conclusions We showed that MS2CNN outperforms MS2PIP for 2+ and 3+ peptides and pDeep for 3+ peptides. This implies that MS2CNN, the proposed convolutional neural network model, generates highly accurate MS2 spectra for LC-MS/MS experiments using Orbitrap machines, which can be of great help in protein and peptide identifications. The results suggest that incorporating more data for deep learning model may improve performance.

Download Full-text

Missing Value Imputation using XGboost for Label-Free Mass Spectrometry-Based Proteomics Data

10.1101/2021.04.08.438945 ◽

2021 ◽

Author(s):

Jian Song ◽

Changbin Yu

Keyword(s):

Mass Spectrometry ◽

High Performance ◽

Missing Values ◽

Mean Squared Error ◽

Learning Algorithm ◽

Pearson Correlation ◽

Data Matrix ◽

Label Free ◽

Proteomics Data ◽

Benchmark Datasets

AbstractThe label-free mass spectrometry-based proteomics data inevitably suffer from the problem of missing values. The existence of missing values prevents the downstream analyses which need a complete data matrix. Our motivation is to introduce the state-of-art machine learning algorithm XGboost to realize a method of imputation which can improve the accuracy of imputation. But in practical, XGboost has many parameters need to be tuned to deliver on its potential high performance. Although cross validation may find the best parameters, it is much time-consuming. Alternatively, we empirically determined the parameters to two kinds of base learners of XGboost. To explore the robustness and performance of XGboost based imputation with predetermined parameters, we conducted tests on three benchmark datasets. As a comparative, six common imputation methods were also experimented in terms of normalized root mean squared error and Pearson correlation coefficient. The comparative experimental results indicated that the XGboost based imputation method using the linear base learner is competitive to or out-performs its competitors, including the random forest based imputation, by achieving smaller imputation errors and better structure preservation under the empirical parameters for the three benchmark datasets.

Download Full-text

Improvement of Support Vector Machine Algorithm in Big Data Background

Mathematical Problems in Engineering ◽

10.1155/2021/5594899 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Babacar Gaye ◽

Dezheng Zhang ◽

Aziguli Wulamu

Keyword(s):

Machine Learning ◽

Data Mining ◽

Big Data ◽

Time Complexity ◽

Dual Problem ◽

Learning Algorithm ◽

Rapid Development ◽

Machine Learning Algorithms ◽

Support Vector ◽

Original Space

With the rapid development of the Internet and the rapid development of big data analysis technology, data mining has played a positive role in promoting industry and academia. Classification is an important problem in data mining. This paper explores the background and theory of support vector machines (SVM) in data mining classification algorithms and analyzes and summarizes the research status of various improved methods of SVM. According to the scale and characteristics of the data, different solution spaces are selected, and the solution of the dual problem is transformed into the classification surface of the original space to improve the algorithm speed. Research Process. Incorporating fuzzy membership into multicore learning, it is found that the time complexity of the original problem is determined by the dimension, and the time complexity of the dual problem is determined by the quantity, and the dimension and quantity constitute the scale of the data, so it can be based on the scale of the data Features Choose different solution spaces. The algorithm speed can be improved by transforming the solution of the dual problem into the classification surface of the original space. Conclusion. By improving the calculation rate of traditional machine learning algorithms, it is concluded that the accuracy of the fitting prediction between the predicted data and the actual value is as high as 98%, which can make the traditional machine learning algorithm meet the requirements of the big data era. It can be widely used in the context of big data.

Download Full-text

Assessment of spatiotemporal gait parameters using a deep learning algorithm-based markerless motion capture system

10.31224/osf.io/j4rbg ◽

2020 ◽

Cited By ~ 1

Author(s):

Robert Kanko ◽

Gerda Strutzenberger ◽

Marcus Brown ◽

Scott Selbie ◽

Kevin Deluzio

Keyword(s):

Motion Capture ◽

Learning Algorithm ◽

Pearson Correlation ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Healthy Adults ◽

Video Data ◽

Heel Strike ◽

Markerless Motion Capture ◽

Gait Parameters

Spatiotemporal parameters can characterize the gait patterns of individuals, allowing assessment of their health status and detection of clinically meaningful changes in their gait. Video-based markerless motion capture is a user-friendly, inexpensive, and widely applicable technology that could reduce the barriers to measuring spatiotemporal gait parameters in clinical and more diverse settings. Two studies were performed to determine whether gait parameters measured using markerless motion capture demonstrate concurrent validity with those measured using marker-based motion capture and pressure sensitive gait mats. For the first study, thirty healthy adults performed treadmill gait at self-selected speeds while marker-based motion capture and synchronized video data were recorded simultaneously. For the second study, twenty-five healthy adults performed over-ground gait at self-selected speeds while footfalls were recorded using a gait mat and synchronized video data were recorded simultaneously. Kinematic heel-strike and toe-off gait events were used to identify the same gait cycles between systems. Nine spatiotemporal gait parameters were measured by each system and directly compared between systems. Measurements were compared using Bland-Altman methods, mean differences, Pearson correlation coefficients, and intraclass correlation coefficients. The results indicate that markerless measurements of spatiotemporal gait parameters have good to excellent agreement with marker-based motion capture and gait mat systems, except for stance time and double limb support time relative to both systems and stride width relative to the gait mat. These findings indicate that markerless motion capture can adequately measure spatiotemporal gait parameters during treadmill and overground gait.

Download Full-text