Women and men — Different but equal: On the impact of identifier style on source code reading

JAVASCRIPT ASYNCHRONOUS PROGRAMMING

HUE UNIVERSITY JOURNAL OF SCIENCE TECHNIQUES AND TECHNOLOGY ◽

10.26459/hueuni-jtt.v128i2b.5104 ◽

2019 ◽

Vol 128 (2B) ◽

pp. 5-16

Author(s):

Tran Thanh Luong ◽

Le My Canh

Keyword(s):

Design Patterns ◽

Source Code ◽

Object Oriented ◽

Exception Handling ◽

Code Smells ◽

Asynchronous Programming ◽

Event Driven ◽

Programming Patterns ◽

The Impact

JavaScript has become more and more popular in recent years because its wealthy features as being dynamic, interpreted and object-oriented with first-class functions. Furthermore, JavaScript is designed with event-driven and I/O non-blocking model that boosts the performance of overall application especially in the case of Node.js. To take advantage of these characteristics, many design patterns that implement asynchronous programming for JavaScript were proposed. However, choosing a right pattern and implementing a good asynchronous source code is a challenge and thus easily lead into less robust application and low quality source code. Extended from our previous works on exception handling code smells in JavaScript and exception handling code smells in JavaScript asynchronous programming with promise, this research aims at studying the impact of three JavaScript asynchronous programming patterns on quality of source code and application.

Download Full-text

Genome scaffolding with PE-contaminated mate-pair libraries

10.1101/025650 ◽

2015 ◽

Cited By ~ 1

Author(s):

Kristoffer Sahlin ◽

Rayan Chikhi ◽

Lars Arvestad

Keyword(s):

Source Code ◽

Linear Programs ◽

Read Pair ◽

Insert Size ◽

Essential Step ◽

Genome Scaffolding ◽

Mate Pair ◽

A Genome ◽

The Impact ◽

Adapter Sequence

Scaffolding is often an essential step in a genome assembly process,in which contigs are ordered and oriented using read pairs from a combination of paired-ends libraries and longer-range mate-pair libraries. Although a simple idea, scaffolding is unfortunately hard to get right in practice. One source of problem is so-called PE-contamination in mate-pair libraries, in which a non-negligible fraction of the read pairs get the wrong orientation and a much smaller insert size than what is expected. This contamination has been discussed in previous work on integrated scaffolders in end-to-end assemblers such as Allpaths-LG and MaSuRCA but the methods relies on the fact that the orientation is observable, \emph{e.g.}, by finding the junction adapter sequence in the reads. This is not always the case, making orientation and insert size of a read pair stochastic. Furthermore, work on modeling PE-contamination has so far been disregarded in stand-alone scaffolders and the effect that PE-contamination has on scaffolding quality has not been examined before. We have addressed PE-contamination in an update of our scaffolder BESST. We formulate the problem as an Integer Linear Program (ILP) and use characteristics of the problem, such as contig lengths and insert size, to efficiently solve the ILP using a linear amount (with respect to the number of contigs) of Linear Programs. Our results show significant improvement over both integrated and standalone scaffolders. The impact of modeling PE-contamination is quantified by comparison with the previous BESST model. We also show how other scaffolders are vulnerable to PE-contaminated libraries, resulting in increased number of misassemblies, more conservative scaffolding, and inflated assembly sizes. The model is implemented in BESST. Source code and usage instructions are found at https://github.com/ksahlin/BESST. BESST can also be downloaded using PyPI.

Download Full-text

Encoding Health Records into Pathway Representations for Deep Learning

10.3233/shti210800 ◽

2021 ◽

Author(s):

Marco Luca Sbodio ◽

Natasha Mulligan ◽

Stefanie Speichert ◽

Vanessa Lopez ◽

Joao Bettencourt-Silva

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Source Code ◽

Training Dataset ◽

Health Records ◽

Learning Tasks ◽

Patient Pathways ◽

Computational Resources ◽

The Impact

There is a growing trend in building deep learning patient representations from health records to obtain a comprehensive view of a patient’s data for machine learning tasks. This paper proposes a reproducible approach to generate patient pathways from health records and to transform them into a machine-processable image-like structure useful for deep learning tasks. Based on this approach, we generated over a million pathways from FAIR synthetic health records and used them to train a convolutional neural network. Our initial experiments show the accuracy of the CNN on a prediction task is comparable or better than other autoencoders trained on the same data, while requiring significantly less computational resources for training. We also assess the impact of the size of the training dataset on autoencoders performances. The source code for generating pathways from health records is provided as open source.

Download Full-text

How different are different diff algorithms in Git?

Empirical Software Engineering ◽

10.1007/s10664-019-09772-z ◽

2019 ◽

Vol 25 (1) ◽

pp. 790-823

Author(s):

Yusuf Sulistyo Nugroho ◽

Hideaki Hata ◽

Kenichi Matsumoto

Keyword(s):

Source Code ◽

Version Control ◽

Automatic Identification ◽

Systematic Mapping ◽

Patch Application ◽

Basic Task ◽

Manual Analysis ◽

Version Control System ◽

Change Identification ◽

The Impact

Abstract Automatic identification of the differences between two versions of a file is a common and basic task in several applications of mining code repositories. Git, a version control system, has a diff utility and users can select algorithms of diff from the default algorithm Myers to the advanced Histogram algorithm. From our systematic mapping, we identified three popular applications of diff in recent studies. On the impact on code churn metrics in 14 Java projects, we obtained different values in 1.7% to 8.2% commits based on the different diff algorithms. Regarding bug-introducing change identification, we found 6.0% and 13.3% in the identified bug-fix commits had different results of bug-introducing changes from 10 Java projects. For patch application, we found that the Histogram is more suitable than Myers for providing the changes of code, from our manual analysis. Thus, we strongly recommend using the Histogram algorithm when mining Git repositories to consider differences in source code.

Download Full-text

THE IMPACT OF SOURCE CODE NORMALIZATION ON MAIN CONTENT EXTRACTION

Proceedings of the 8th International Conference on Web Information Systems and Technologies ◽

10.5220/0003931906770682 ◽

2012 ◽

Keyword(s):

Source Code ◽

Content Extraction ◽

The Impact

Download Full-text

Space profiling for parallel functional programs

Journal of Functional Programming ◽

10.1017/s0956796810000146 ◽

2010 ◽

Vol 20 (5-6) ◽

pp. 417-461 ◽

Cited By ~ 6

Author(s):

DANIEL SPOONHOWER ◽

GUY E. BLELLOCH ◽

ROBERT HARPER ◽

PHILLIP B. GIBBONS

Keyword(s):

Resource Use ◽

Source Code ◽

Semantic Space ◽

Runtime System ◽

Scheduling Policies ◽

Use Patterns ◽

Scheduling Policy ◽

Standard Ml ◽

Cost Semantics ◽

The Impact

AbstractWe present a semantic space profiler for parallel functional programs. Building on previous work in sequential profiling, our tools help programmers to relate runtime resource use back to program source code. Unlike many profiling tools, our profiler is based on a cost semantics. This provides a means to reason about performance without requiring a detailed understanding of the compiler or runtime system. It also provides a specification for language implementers. This is critical in that it enables us to separate cleanly the performance of the application from that of the language implementation. Some aspects of the implementation can have significant effects on performance. Our cost semantics enables programmers to understand the impact of different scheduling policies while hiding many of the details of their implementations. We show applications where the choice of scheduling policy has asymptotic effects on space use. We explain these use patterns through a demonstration of our tools. We also validate our methodology by observing similar performance in our implementation of a parallel extension of Standard ML.

Download Full-text

On the Impact of Bad Smell Agglomerations on Software Quality

10.5753/cbsoft_estendido.2019.7653 ◽

2019 ◽

Author(s):

Amanda Damasceno Santana ◽

Eduardo Figueiredo

Keyword(s):

Data Mining ◽

Correlation Analysis ◽

Software Quality ◽

Source Code ◽

System Quality ◽

Large Dataset ◽

External Behavior ◽

Code Quality ◽

The Impact ◽

Open Source Systems

When a system evolution is not planned, developers can take decisions that degrade the system quality. To cope with this problem, refactoring can be applied to the source code aiming to increase code quality without modifying the software external behavior. To know when to refactor, the concept of bad smells can be used. Bad smells are snippets of source code that suggest the need of refactoring. However, bad smells does not always appear isolated. The aim of this study is to understand the impact of bad smell agglomerations on the software quality by evaluating a large dataset of open source systems. To achieve our goal, we plan to use data mining techniques complemented with correlation analysis of the dataset.

Download Full-text

The Impact of Version Control Operations on the Quality Change of the Source Code

Computational Science and Its Applications – ICCSA 2014 - Lecture Notes in Computer Science ◽

10.1007/978-3-319-09156-3_26 ◽

2014 ◽

pp. 353-369 ◽

Cited By ~ 4

Author(s):

Csaba Faragó ◽

Péter Hegedũs ◽

Rudolf Ferenc

Keyword(s):

Source Code ◽

Version Control ◽

Quality Change ◽

The Impact

Download Full-text

Megatrend and Intervention Impact Analyzer for Jobs: A Visualization Method for Labor Market Intelligence

Journal of Official Statistics ◽

10.2478/jos-2018-0047 ◽

2018 ◽

Vol 34 (4) ◽

pp. 961-979

Author(s):

Rain Opik ◽

Toomas Kirt ◽

Innar Liiv

Keyword(s):

Labor Market ◽

Source Code ◽

Heterogeneous Data ◽

External Information ◽

Market Intelligence ◽

Heterogeneous Data Sources ◽

Prototype Tool ◽

Intervention Impact ◽

The Impact

Abstract This article presents a visual method for representing the complex labor market internal structure from the perspective of similar occupations based on shared skills; and a prototype tool for interacting with the visualization, together with an extended description of graph construction and the necessary data processing for linking multiple heterogeneous data sources. Since the labor market is not an isolated phenomenon and is constantly impacted by external trends and interventions, the presented method is designed to enable adding extra layers of external information. For instance, what is the impact of a megatrend or an intervention on the labor market? Which parts of the labor market are the most vulnerable to an approaching megatrend or planned intervention? A case study analyzing the labor market together with the megatrend of job automation and computerization is presented. The source code of the prototype is released as open source for repeatability.

Download Full-text

Finding Software License Violations Through Binary Code Clone Detection - A Retrospective

ACM SIGSOFT Software Engineering Notes ◽

10.1145/3468744.3468752 ◽

2021 ◽

Vol 46 (3) ◽

pp. 24-25

Author(s):

Armijn Hemel ◽

Karl Trygve Kalleberg ◽

Rob Vermaas ◽

Eelco Dolstra

Keyword(s):

Open Source ◽

Original Problem ◽

Binary Code ◽

Source Code ◽

Clone Detection ◽

Problem Statement ◽

Open Source Code ◽

Code Clone ◽

Software License ◽

The Impact

Ten years ago, we published the article Finding software license violations through binary code clone detection at the MSR 2011 conference. Our paper was motivated by the tendency of em- bedded hardware vendors to only release binary blobs of their rmware, often violating the licensing terms of open-source soft- ware present inside those blobs. The techniques presented in our paper were designed to accurately identify open-source code hid- den inside binary blobs. Here, we give our perspectives on the impact of our work, both industrially and academically, and re- visit the original problem statement to see what has happened in the eld of open-source compliance in the intervening decade.

Download Full-text