An efficient new multi-language clone detection approach from large source code

Closed-form solutions, as opposed to numerically integrated solutions, can now be obtained for many problems in engineering. In the area of finite element analysis, researchers have been able to demonstrate the efficiency of closed-form solutions when compared to numerical integration for elements such as straight-sided triangular [1] and tetrahedral elements [2, 3]. With higher order elements, however, the length of the resulting expressions is excessive. When these expressions are to be implemented in finite element applications as source code files, large source code files can be generated, resulting in line length/ line continuation limit issues with the compiler. This paper discusses a simple algorithm for the reduction of large source code files in which duplicate terms are replaced through the use of an adaptive dictionary. The importance of this algorithm lies in its ability to produce manageable source code files that can be used to improve efficiency in the element generation step of higher order finite element analysis. The algorithm is applied to Fortran files developed for the implementation of closed-form element stiffness and error estimator expressions for straight-sided tetrahedral finite elements through the fourth order. Reductions in individual source code file size by as much as 83% are demonstrated.

Download Full-text

A metric space based software clone detection approach

2010 2nd IEEE International Conference on Information Management and Engineering ◽

10.1109/icime.2010.5478099 ◽

2010 ◽

Cited By ~ 2

Author(s):

Zhu O Li ◽

Jianling Sun

Keyword(s):

Metric Space ◽

Clone Detection ◽

Detection Approach ◽

Software Clone Detection

Download Full-text

How are functionally similar code clones syntactically different? An empirical study and a benchmark

PeerJ Computer Science ◽

10.7717/peerj-cs.49 ◽

2016 ◽

Vol 2 ◽

pp. e49 ◽

Cited By ~ 7

Author(s):

Stefan Wagner ◽

Asim Abdulkhaleq ◽

Ivan Bogicevic ◽

Jan-Peter Ostberg ◽

Jasmin Ramadani

Keyword(s):

Data Structure ◽

Empirical Study ◽

Random Sample ◽

Source Code ◽

Clone Detection ◽

Code Clones ◽

Syntactic Similarity ◽

Similar Code

Background. Today, redundancy in source code, so-called “clones” caused by copy&paste can be found reliably using clone detection tools. Redundancy can arise also independently, however, not caused by copy&paste. At present, it is not clear how onlyfunctionally similar clones(FSC) differ from clones created by copy&paste. Our aim is to understand and categorise the syntactical differences in FSCs that distinguish them from copy&paste clones in a way that helps clone detection research.Methods. We conducted an experiment using known functionally similar programs in Java and C from coding contests. We analysed syntactic similarity with traditional detection tools and explored whether concolic clone detection can go beyond syntax. We ran all tools on 2,800 programs and manually categorised the differences in a random sample of 70 program pairs.Results. We found no FSCs where complete files were syntactically similar. We could detect a syntactic similarity in a part of the files in <16% of the program pairs. Concolic detection found 1 of the FSCs. The differences between program pairs were in the categories algorithm, data structure, OO design, I/O and libraries. We selected 58 pairs for an openly accessible benchmark representing these categories.Discussion. The majority of differences between functionally similar clones are beyond the capabilities of current clone detection approaches. Yet, our benchmark can help to drive further clone detection research.

Download Full-text