Women and men — Different but equal: On the impact of identifier style on source code reading

Author(s):  
Zohreh Sharafi ◽  
Zephyrin Soh ◽  
Yann-Gael Gueheneuc ◽  
Giuliano Antoniol
Keyword(s):  
Author(s):  
Tran Thanh Luong ◽  
Le My Canh

JavaScript has become more and more popular in recent years because its wealthy features as being dynamic, interpreted and object-oriented with first-class functions. Furthermore, JavaScript is designed with event-driven and I/O non-blocking model that boosts the performance of overall application especially in the case of Node.js. To take advantage of these characteristics, many design patterns that implement asynchronous programming for JavaScript were proposed. However, choosing a right pattern and implementing a good asynchronous source code is a challenge and thus easily lead into less robust application and low quality source code. Extended from our previous works on exception handling code smells in JavaScript and exception handling code smells in JavaScript asynchronous programming with promise, this research aims at studying the impact of three JavaScript asynchronous programming patterns on quality of source code and application.


2015 ◽  
Author(s):  
Kristoffer Sahlin ◽  
Rayan Chikhi ◽  
Lars Arvestad

Scaffolding is often an essential step in a genome assembly process,in which contigs are ordered and oriented using read pairs from a combination of paired-ends libraries and longer-range mate-pair libraries. Although a simple idea, scaffolding is unfortunately hard to get right in practice. One source of problem is so-called PE-contamination in mate-pair libraries, in which a non-negligible fraction of the read pairs get the wrong orientation and a much smaller insert size than what is expected. This contamination has been discussed in previous work on integrated scaffolders in end-to-end assemblers such as Allpaths-LG and MaSuRCA but the methods relies on the fact that the orientation is observable, \emph{e.g.}, by finding the junction adapter sequence in the reads. This is not always the case, making orientation and insert size of a read pair stochastic. Furthermore, work on modeling PE-contamination has so far been disregarded in stand-alone scaffolders and the effect that PE-contamination has on scaffolding quality has not been examined before. We have addressed PE-contamination in an update of our scaffolder BESST. We formulate the problem as an Integer Linear Program (ILP) and use characteristics of the problem, such as contig lengths and insert size, to efficiently solve the ILP using a linear amount (with respect to the number of contigs) of Linear Programs. Our results show significant improvement over both integrated and standalone scaffolders. The impact of modeling PE-contamination is quantified by comparison with the previous BESST model. We also show how other scaffolders are vulnerable to PE-contaminated libraries, resulting in increased number of misassemblies, more conservative scaffolding, and inflated assembly sizes. The model is implemented in BESST. Source code and usage instructions are found at https://github.com/ksahlin/BESST. BESST can also be downloaded using PyPI.


2021 ◽  
Author(s):  
Marco Luca Sbodio ◽  
Natasha Mulligan ◽  
Stefanie Speichert ◽  
Vanessa Lopez ◽  
Joao Bettencourt-Silva

There is a growing trend in building deep learning patient representations from health records to obtain a comprehensive view of a patient’s data for machine learning tasks. This paper proposes a reproducible approach to generate patient pathways from health records and to transform them into a machine-processable image-like structure useful for deep learning tasks. Based on this approach, we generated over a million pathways from FAIR synthetic health records and used them to train a convolutional neural network. Our initial experiments show the accuracy of the CNN on a prediction task is comparable or better than other autoencoders trained on the same data, while requiring significantly less computational resources for training. We also assess the impact of the size of the training dataset on autoencoders performances. The source code for generating pathways from health records is provided as open source.


2019 ◽  
Vol 25 (1) ◽  
pp. 790-823
Author(s):  
Yusuf Sulistyo Nugroho ◽  
Hideaki Hata ◽  
Kenichi Matsumoto

Abstract Automatic identification of the differences between two versions of a file is a common and basic task in several applications of mining code repositories. Git, a version control system, has a diff utility and users can select algorithms of diff from the default algorithm Myers to the advanced Histogram algorithm. From our systematic mapping, we identified three popular applications of diff in recent studies. On the impact on code churn metrics in 14 Java projects, we obtained different values in 1.7% to 8.2% commits based on the different diff algorithms. Regarding bug-introducing change identification, we found 6.0% and 13.3% in the identified bug-fix commits had different results of bug-introducing changes from 10 Java projects. For patch application, we found that the Histogram is more suitable than Myers for providing the changes of code, from our manual analysis. Thus, we strongly recommend using the Histogram algorithm when mining Git repositories to consider differences in source code.


2010 ◽  
Vol 20 (5-6) ◽  
pp. 417-461 ◽  
Author(s):  
DANIEL SPOONHOWER ◽  
GUY E. BLELLOCH ◽  
ROBERT HARPER ◽  
PHILLIP B. GIBBONS

AbstractWe present a semantic space profiler for parallel functional programs. Building on previous work in sequential profiling, our tools help programmers to relate runtime resource use back to program source code. Unlike many profiling tools, our profiler is based on a cost semantics. This provides a means to reason about performance without requiring a detailed understanding of the compiler or runtime system. It also provides a specification for language implementers. This is critical in that it enables us to separate cleanly the performance of the application from that of the language implementation. Some aspects of the implementation can have significant effects on performance. Our cost semantics enables programmers to understand the impact of different scheduling policies while hiding many of the details of their implementations. We show applications where the choice of scheduling policy has asymptotic effects on space use. We explain these use patterns through a demonstration of our tools. We also validate our methodology by observing similar performance in our implementation of a parallel extension of Standard ML.


Author(s):  
Amanda Damasceno Santana ◽  
Eduardo Figueiredo

When a system evolution is not planned, developers can take decisions that degrade the system quality. To cope with this problem, refactoring can be applied to the source code aiming to increase code quality without modifying the software external behavior. To know when to refactor, the concept of bad smells can be used. Bad smells are snippets of source code that suggest the need of refactoring. However, bad smells does not always appear isolated. The aim of this study is to understand the impact of bad smell agglomerations on the software quality by evaluating a large dataset of open source systems. To achieve our goal, we plan to use data mining techniques complemented with correlation analysis of the dataset.


2018 ◽  
Vol 34 (4) ◽  
pp. 961-979
Author(s):  
Rain Opik ◽  
Toomas Kirt ◽  
Innar Liiv

Abstract This article presents a visual method for representing the complex labor market internal structure from the perspective of similar occupations based on shared skills; and a prototype tool for interacting with the visualization, together with an extended description of graph construction and the necessary data processing for linking multiple heterogeneous data sources. Since the labor market is not an isolated phenomenon and is constantly impacted by external trends and interventions, the presented method is designed to enable adding extra layers of external information. For instance, what is the impact of a megatrend or an intervention on the labor market? Which parts of the labor market are the most vulnerable to an approaching megatrend or planned intervention? A case study analyzing the labor market together with the megatrend of job automation and computerization is presented. The source code of the prototype is released as open source for repeatability.


2021 ◽  
Vol 46 (3) ◽  
pp. 24-25
Author(s):  
Armijn Hemel ◽  
Karl Trygve Kalleberg ◽  
Rob Vermaas ◽  
Eelco Dolstra

Ten years ago, we published the article Finding software license violations through binary code clone detection at the MSR 2011 conference. Our paper was motivated by the tendency of em- bedded hardware vendors to only release binary blobs of their rmware, often violating the licensing terms of open-source soft- ware present inside those blobs. The techniques presented in our paper were designed to accurately identify open-source code hid- den inside binary blobs. Here, we give our perspectives on the impact of our work, both industrially and academically, and re- visit the original problem statement to see what has happened in the eld of open-source compliance in the intervening decade.


Sign in / Sign up

Export Citation Format

Share Document