Detecting duplicate bug reports with software engineering domain knowledge

2016 ◽  
Vol 29 (3) ◽  
pp. e1821 ◽  
Author(s):  
Karan Aggarwal ◽  
Finbarr Timbers ◽  
Tanner Rutgers ◽  
Abram Hindle ◽  
Eleni Stroulia ◽  
...  
2016 ◽  
Author(s):  
Abram Hindle

Bug deduplication is a hot topic in software engineering information retrieval research, but it is often not deployed. Typically to de-duplicate bug reports developers rely upon the search capabilities of the bug report software they employ, such as Bugzilla, Jira, or Github Issues. These search capabilities range from simple SQL string search to IR-based word indexing methods employed by search engines. Yet too often these searches do very little to stop the creation of duplicate bug reports. Some bug trackers have more than 10\% of their bug reports marked as duplicate. Perhaps these bug tracker search engines are not enough? In this paper we propose a method of attempting to prevent duplicate bug reports before they start: continuous querying. That is as the bug reporter types in their bug report their text is used to query the bug database to find duplicate or related bug reports. This continuous querying allows the reporter to be alerted to duplicate bug reports as they report the bug, rather than formulating queries to find the duplicate bug report. Thus this work ushers in a new way of evaluating bug report deduplication techniques, as well as a new kind of bug deduplication task. We show that simple IR measures show some promise for addressing this problem but also that further research is needed to refine this novel process that is integrate-able into modern bug report systems.


Author(s):  
Abram Hindle

Bug deduplication is a hot topic in software engineering information retrieval research, but it is often not deployed. Typically to de-duplicate bug reports developers rely upon the search capabilities of the bug report software they employ, such as Bugzilla, Jira, or Github Issues. These search capabilities range from simple SQL string search to IR-based word indexing methods employed by search engines. Yet too often these searches do very little to stop the creation of duplicate bug reports. Some bug trackers have more than 10\% of their bug reports marked as duplicate. Perhaps these bug tracker search engines are not enough? In this paper we propose a method of attempting to prevent duplicate bug reports before they start: continuous querying. That is as the bug reporter types in their bug report their text is used to query the bug database to find duplicate or related bug reports. This continuous querying allows the reporter to be alerted to duplicate bug reports as they report the bug, rather than formulating queries to find the duplicate bug report. Thus this work ushers in a new way of evaluating bug report deduplication techniques, as well as a new kind of bug deduplication task. We show that simple IR measures show some promise for addressing this problem but also that further research is needed to refine this novel process that is integrate-able into modern bug report systems.


Author(s):  
Qazi Mudassar Ilyas

Semantic Web was proposed to make the content machine-understandable by developing ontologies to capture domain knowledge and annotating content with this domain knowledge. Although, the original idea of semantic web was to make content on the World Wide Web machine-understandable, with recent advancements and awareness about these technologies, researchers have applied ontologies in many interesting domains. Many phases in software engineering are dependent on availability of knowledge, and the use of ontologies to capture and process this knowledge is a natural choice. This chapter discusses how ontologies can be used in various stages of the system development life cycle. Ontologies can be used to support requirements engineering phase in identifying and fixing inconsistent, incomplete, and ambiguous requirement. They can also be used to model the requirements and assist in requirements management and validation. During software design and development stages, ontologies can help software engineers in finding suitable components, managing documentation of APIs, and coding support. Ontologies can help in system integration and evolution process by aligning various databases with the help of ontologies capturing knowledge about database schema and aligning them with concepts in ontology. Ontologies can also be used in software maintenance by developing a bug tracking system based upon ontological knowledge of software artifacts and roles of developers involved in software maintenance task.


Author(s):  
He Jiang ◽  
Najam Nazar ◽  
Jingxuan Zhang ◽  
Tao Zhang ◽  
Zhilei Ren

During software maintenance, bug reports are widely employed to improve the software project’s quality. A developer often refers to stowed bug reports in a repository for bug resolution. However, this reference process often requires a developer to pursue a substantial amount of textual information in bug reports which is lengthy and tedious. Automatic summarization of bug reports is one way to overcome this problem. Both supervised and unsupervised methods are effectively proposed for the automatic summary generation of bug reports. However, existing methods disregard the significance of duplicate bug reports in summarizing bug reports. In this study, we propose a PageRank-based Summarization Technique (PRST), which utilizes the textual information contained in bug reports and additional information in associated duplicate bug reports. PRST uses three variants of PageRank-based on Vector Space Model (VSM), Jaccard, and WordNet similarity metrics. These variants are utilized to calculate the textual similarity of the sentences between the master bug reports and their duplicates. PRST further trains a regression model and predicts the probability of sentences belonging to the summary. Finally, we combine the values of PageRank and regression model scores to rank the sentences and produce the summary for the master bug reports. In addition, we construct two corpora of bug reports and duplicates, i.e. MBRC and OSCAR. Empirical results suggest that PRST outperforms the state-of-the-art method BRC in terms of Precision, Recall, F-score, and Pyramid Precision. Meanwhile, PRST with WordNet achieves the best results against PRST with VSM and Jaccard.


2019 ◽  
Vol 113 ◽  
pp. 98-109 ◽  
Author(s):  
Neda Ebrahimi ◽  
Abdelaziz Trabelsi ◽  
Md. Shariful Islam ◽  
Abdelwahab Hamou-Lhadj ◽  
Kobra Khanmohammadi

Sign in / Sign up

Export Citation Format

Share Document