Understanding the Evolution of Code Smells by Observing Code Smell Clusters

A Review on Machine-Learning Based Code Smell Detection Techniquesin Object-Oriented Software System(s)

Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering) ◽

10.2174/2352096513999200922125839 ◽

2020 ◽

Vol 13 ◽

Author(s):

Amandeep Kaur ◽

Sushma Jain ◽

Shivani Goel ◽

Gaurav Dhiman

Keyword(s):

Machine Learning ◽

Programming Languages ◽

Software Quality ◽

Empirical Studies ◽

Statistical Testing ◽

Machine Learning Techniques ◽

Support Vector ◽

Code Smells ◽

Detection Techniques ◽

Code Smell

Context: Code smells are symptoms, that something may be wrong in software systems that can cause complications in maintaining software quality. In literature, there exists many code smells and their identification is far from trivial. Thus, several techniques have also been proposed to automate code smell detection in order to improve software quality. Objective: This paper presents an up-to-date review of simple and hybrid machine learning based code smell detection techniques and tools. Methods: We collected all the relevant research published in this field till 2020. We extracted the data from those articles and classified them into two major categories. In addition, we compared the selected studies based on several aspects like, code smells, machine learning techniques, datasets, programming languages used by datasets, dataset size, evaluation approach, and statistical testing. Results: Majority of empirical studies have proposed machine- learning based code smell detection tools. Support vector machine and decision tree algorithms are frequently used by the researchers. Along with this, a major proportion of research is conducted on Open Source Softwares (OSS) such as, Xerces, Gantt Project and ArgoUml. Furthermore, researchers paid more attention towards Feature Envy and Long Method code smells. Conclusion: We identified several areas of open research like, need of code smell detection techniques using hybrid approaches, need of validation employing industrial datasets, etc.

Download Full-text

AUTOMATSKA DETEKCIJA INDIKATORA LOŠE DIZAJNIRANOG KODA BAZIRANA NA ISTORIJI PROMENA KODA

Zbornik radova Fakulteta tehničkih nauka u Novom Sadu ◽

10.24867/11be03prokic ◽

2020 ◽

Vol 36 (01) ◽

pp. 43-46

Author(s):

Simona Prokić

Keyword(s):

Code Smells ◽

Code Smell

Kod niskog kvaliteta sadrži strukture (code smells) koje otežavaju održavanje i dalji razvoj softvera. U ovom radu predstavljen je model zasnovan na mašinskom učenju za automatsku detekciju indikatora loše dizajniranog koda (code smell-ova) baziranu na istoriji promena koda. Ulaz modela su vrednosti metrika softverskog koda, izračunate u n revizija za posmatrani isečak koda. Izlaz iz modela je labela koja označava da li posmatrani isečak koda sadrži indikator loše dizajniranog koda ili ne. Studija slučaja izvršena je na detekciji klasa sa mnogo odgovornosti (God Class). Predloženi su koraci za poboljšanje i dalji razvoj arhitekture.

Download Full-text

Automatic detection of Long Method and God Class code smells through neural source code embeddings

10.36227/techrxiv.17206010.v1 ◽

2021 ◽

Author(s):

Aleksandar Kovačević ◽

Jelena Slivka ◽

Dragan Vidaković ◽

Katarina-Glorija Grujić ◽

Nikola Luburić ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Negative Impact ◽

Source Code ◽

Systematic Evaluation ◽

Small Scale ◽

Code Smells ◽

Code Metrics ◽

Code Smell ◽

F Measure

<p>Code smells are structures in code that often have a negative impact on its quality. Manually detecting code smells is challenging and researchers proposed many automatic code smell detectors. Most of the studies propose detectors based on code metrics and heuristics. However, these studies have several limitations, including evaluating the detectors using small-scale case studies and an inconsistent experimental setting. Furthermore, heuristic-based detectors suffer from limitations that hinder their adoption in practice. Thus, researchers have recently started experimenting with machine learning (ML) based code smell detection. </p><p>This paper compares the performance of multiple ML-based code smell detection models against multiple traditionally employed metric-based heuristics for detection of God Class and Long Method code smells. We evaluate the effectiveness of different source code representations for machine learning: traditionally used code metrics and code embeddings (code2vec, code2seq, and CuBERT).<br></p><p>We perform our experiments on the large-scale, manually labeled MLCQ dataset. We consider the binary classification problem – we classify the code samples as smelly or non-smelly and use the F1-measure of the minority (smell) class as a measure of performance. In our experiments, the ML classifier trained using CuBERT source code embeddings achieved the best performance for both God Class (F-measure of 0.53) and Long Method detection (F-measure of 0.75). With the help of a domain expert, we perform the error analysis to discuss the advantages of the CuBERT approach.<br></p><p>This study is the first to evaluate the effectiveness of pre-trained neural source code embeddings for code smell detection to the best of our knowledge. A secondary contribution of our study is the systematic evaluation of the effectiveness of multiple heuristic-based approaches on the same large-scale, manually labeled MLCQ dataset.<br></p>

Download Full-text

Revealing Developers’ Arguments on Validating the Incidence of Code Smells: A Focus Group Experience

10.5753/vem.2021.17214 ◽

2021 ◽

Author(s):

Luis Felipi Junionello ◽

Rafael de Mello ◽

Roberto Oliveira ◽

Leonardo Sousa ◽

Alexander López ◽

...

Keyword(s):

Focus Group ◽

Tacit Knowledge ◽

Automated Detection ◽

Group Session ◽

Focus Group Session ◽

Code Smells ◽

Identifying Code ◽

Code Smell ◽

Human Validation ◽

Group Experience

Identifying code smells is considered a subjective task. Unfortunately, current automated detection tools cannot deal with such subjectivity, requiring human validation. Developers tend to follow different, albeit complementary, strategies when validating the identified smells. Intending to find out developers' arguments when validating the incidence of code smells, we conducted a focus group session with developers familiar with identifying code smells. We distributed them among two groups, in which they had to argue about the incidence of a code smell: either accepting or rejecting its presence. Based on their arguments, we compiled a set of general heuristics that developers follow when validating smells. We then used these heuristics for composing validation items. We understand that the set of validation items proposed may support developers in reflecting on the incidence of code smells. However, further studies are needed for reaching a more comprehensive and optimized set. The experience of this study reveals that conducting focus group sessions is helpful to emerge the tacit knowledge of developers when validating code smells.

Download Full-text

Design and Analysis of Improvised Genetic Algorithm with Particle Swarm Optimization for Code Smell Detection

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a5328.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 5327-5330

Keyword(s):

Genetic Algorithm ◽

Particle Swarm Optimization ◽

Software Development ◽

Software Maintenance ◽

Particle Swarm ◽

Code Smells ◽

Swarm Optimization ◽

Rule Based ◽

Code Smell ◽

Trial And Error Method

Software development phase is very important in the Software Development Life Cycle. Software maintenance is a difficult process if code smells exist in the code. The poor design of code development is called code smells. The code smells are identified by various tools using various approaches. Many code smell approaches are rule based. The rule based approaches are based on trial and error method. Genetic Algorithm is a heuristic Algorithm by Darwin’s Theory. This paper presents a metric based code smell detection approach by Genetic Algorithm with particle swarm optimization based on Euclidean data distance. The Euclidean data distance gives best proximity value between two points. Our approach is evaluated on the three open source projects like JFreeChart v1.0.9, Log4J v1.2.1 and Xerces-J for identifying the eight types of code smells namely Functional Decomposition, Feature Envy, Blob, Long Parameter List, Spaghetti Code, Data Class, Lazy Class, Shotgun Surgery.

Download Full-text

Automatic detection of Long Method and God Class code smells through neural source code embeddings

10.36227/techrxiv.17206010 ◽

2021 ◽

Author(s):

Aleksandar Kovačević ◽

Jelena Slivka ◽

Dragan Vidaković ◽

Katarina-Glorija Grujić ◽

Nikola Luburić ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Negative Impact ◽

Source Code ◽

Systematic Evaluation ◽

Small Scale ◽

Code Smells ◽

Code Metrics ◽

Code Smell ◽

F Measure

<p>Code smells are structures in code that often have a negative impact on its quality. Manually detecting code smells is challenging and researchers proposed many automatic code smell detectors. Most of the studies propose detectors based on code metrics and heuristics. However, these studies have several limitations, including evaluating the detectors using small-scale case studies and an inconsistent experimental setting. Furthermore, heuristic-based detectors suffer from limitations that hinder their adoption in practice. Thus, researchers have recently started experimenting with machine learning (ML) based code smell detection. </p><p>This paper compares the performance of multiple ML-based code smell detection models against multiple traditionally employed metric-based heuristics for detection of God Class and Long Method code smells. We evaluate the effectiveness of different source code representations for machine learning: traditionally used code metrics and code embeddings (code2vec, code2seq, and CuBERT).<br></p><p>We perform our experiments on the large-scale, manually labeled MLCQ dataset. We consider the binary classification problem – we classify the code samples as smelly or non-smelly and use the F1-measure of the minority (smell) class as a measure of performance. In our experiments, the ML classifier trained using CuBERT source code embeddings achieved the best performance for both God Class (F-measure of 0.53) and Long Method detection (F-measure of 0.75). With the help of a domain expert, we perform the error analysis to discuss the advantages of the CuBERT approach.<br></p><p>This study is the first to evaluate the effectiveness of pre-trained neural source code embeddings for code smell detection to the best of our knowledge. A secondary contribution of our study is the systematic evaluation of the effectiveness of multiple heuristic-based approaches on the same large-scale, manually labeled MLCQ dataset.<br></p>

Download Full-text

The Relationship Between Code Smells and Traceable Patterns — Are They Measuring the Same Thing?

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194017400095 ◽

2017 ◽

Vol 27 (09n10) ◽

pp. 1529-1547 ◽

Cited By ~ 1

Author(s):

Zadia Codabux ◽

Kazi Zakia Sultana ◽

Byron J. Williams

Keyword(s):

Web Applications ◽

Source Code ◽

Code Smells ◽

Class Level ◽

Micro Pattern ◽

Code Smell ◽

The Relationship ◽

Apache Tomcat ◽

Structural Code ◽

Micro Patterns

It is important to maintain software quality as a software system evolves. Managing code smells in source code contributes towards quality software. While metrics have been used to pinpoint code smells in source code, we present an empirical study on the correlation of code smells with class-level (micro pattern) and method-level (nano-pattern) traceable code patterns. This study explores the relationship between code smells and class-level and method-level structural code constructs. We extracted micro patterns at the class level and nano-patterns at the method level from three versions of Apache Tomcat, three versions of Apache CXF and two J2EE web applications namely PersonalBlog and Roller from Stanford SecuriBench and then compared their distributions in code smell versus noncode smell classes and methods. We found that Immutable and Sink micro patterns are more frequent in classes having code smells compared to the noncode smell classes in the applications we analyzed. On the other hand, LocalReader and LocalWriter nano-patterns are more frequent in code smell methods compared to the noncode smell methods. We conclude that code smells are correlated with both micro and nano-patterns.

Download Full-text

On the Characterization, Detection and Impact of Batch Refactoring in Practice

10.5753/cbsoft_estendido.2020.14626 ◽

2020 ◽

Author(s):

Ana Carla Bibiano ◽

Alessandro Garcia

Keyword(s):

Impact Analysis ◽

Empirical Studies ◽

Code Smells ◽

Software Projects ◽

Conceptual Foundation ◽

Software Maintainability ◽

Proper Design ◽

Code Smell ◽

The Impact

Up to 60% of the refactorings in software projects are constituted of a set of interrelated transformations, the so-called batches (or composite refactoring), rather than single transformations applied in isolation. However, a systematic characterization of batch characterization is missing, which hampers the elaboration of proper tooling support and empirical studies of how (batch) refactoring is applied in practice. This paper summarizes the research performed under the context of a Master's dissertation, which aimed at taming the aforementioned problems. To the best of our knowledge, our research is the first work published that provides a conceptual foundation, detection support and an large impact analysis of batch refactoring on code maintainability. To this end, we performed two complementary empirical studies as well as designed a first heuristic aimed at explicitly detecting batch refactorings. Our first study consisted of a literature review that synthesizes the otherwise scattered, partial conceptualization of batch refactoring mentioned in 29 studies with different purposes. We identified and defined seven batch characteristics such as the scope and typology of batches, plus seven types of batch effect on software maintainability, including code smell removal. All batches' characteristics and possible impacts were systematized in a conceptual framework, which assists, for instance, the proper design of batch refactoring studies and batch detection heuristics. We defined a new heuristic for batch detection, which made it possible to conduct a large study involving 4,607 batches discovered in 57 open and closed software projects. Amongst various findings, we reveal that most batches in practice occur entirely within one commit (93%), affect multiple methods (90%). Surprisingly, batches mostly end up introducing (51%) or not removing (38%) code smells. These findings contradict previous investigations limited to the impact analysis of each transformation in isolation. Our findings also enabled us to reveal beneficial or harmful patterns of batches that respectively induces the introduction or removal of certain code smells. These patterns: (i) were not previously documented even in Fowler's refactoring catalog, and (ii) provide concrete guidance for both researchers, tool designers, and practitioners.

Download Full-text

Towards a systematic approach to manual annotation of code smells

10.36227/techrxiv.14159183.v1 ◽

2021 ◽

Author(s):

Nikola Luburić ◽

Simona Prokić ◽

Katarina-Glorija Grujić ◽

Jelena Slivka ◽

Aleksandar Kovačević ◽

...

Keyword(s):

Systematic Approach ◽

Manual Annotation ◽

Code Smells ◽

C Programming Language ◽

C Programming ◽

Software Engineers ◽

Code Smell ◽

Supporting Tool ◽

Almost All ◽

Primary Contribution

<div>Code smells are structures in code that indicate the presence of maintainability issues. A significant problem with code smells is their ambiguity. They are challenging to define, and software engineers have a different understanding of what a code smell is and which code suffers from code smells.</div><div>A solution to this problem could be an AI digital assistant that understands code smells and can detect (and perhaps resolve) them. However, it is challenging to develop such an assistant as there are few usable datasets of code smells on which to train and evaluate it. Furthermore, the existing datasets suffer from issues that mostly arise from an unsystematic approach used for their construction.</div><div>Through this work, we address this issue by developing a procedure for the systematic manual annotation of code smells. We use this procedure to build a dataset of code smells. During this process, we refine the procedure and identify recommendations and pitfalls for its use. The primary contribution is the proposed annotation model and procedure and the annotators’ experience report. The dataset and supporting tool are secondary contributions of our study. Notably, our dataset includes open-source projects written in the C# programming language, while almost all manually annotated datasets contain projects written in Java.</div>

Download Full-text

Implementation and Analysis of a Refactoring Tool for Detecting Code Smells

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v6i1.4455 ◽

2013 ◽

Vol 6 (1) ◽

pp. 242-247 ◽

Cited By ~ 2

Author(s):

Amandeep Kaur ◽

Himanshi Raperia

Keyword(s):

Software Development ◽

Internal Structure ◽

Clustering Algorithm ◽

Code Smells ◽

Efficient Code ◽

Clustering Approach ◽

Code Smell ◽

Software Code

Software development is a field which is in action for decades. Preparing code for Software is not a difficult task, but preparing an efficient code is complicated one. To change the code is to make internal structure of the code easier to understand and economic to modify, without changing the behavior and desired response. More changes will make software patchy. No Software is free from smells especially the patchy one. Lots of work has been done for detecting and removing a few of the smells (Refactoring) from code. In this paper our main focus will be on tool SCSD (Software Code Smell Detector) developed, uses a bit classification, clustering approach with K-mean Clustering Algorithm to detect the code smells, which can implement completely different architecture if it discovers smell.Â

Download Full-text