Detecting duplicate bug reports with software engineering domain knowledge

Bug deduplication is a hot topic in software engineering information retrieval research, but it is often not deployed. Typically to de-duplicate bug reports developers rely upon the search capabilities of the bug report software they employ, such as Bugzilla, Jira, or Github Issues. These search capabilities range from simple SQL string search to IR-based word indexing methods employed by search engines. Yet too often these searches do very little to stop the creation of duplicate bug reports. Some bug trackers have more than 10\% of their bug reports marked as duplicate. Perhaps these bug tracker search engines are not enough? In this paper we propose a method of attempting to prevent duplicate bug reports before they start: continuous querying. That is as the bug reporter types in their bug report their text is used to query the bug database to find duplicate or related bug reports. This continuous querying allows the reporter to be alerted to duplicate bug reports as they report the bug, rather than formulating queries to find the duplicate bug report. Thus this work ushers in a new way of evaluating bug report deduplication techniques, as well as a new kind of bug deduplication task. We show that simple IR measures show some promise for addressing this problem but also that further research is needed to refine this novel process that is integrate-able into modern bug report systems.

Download Full-text

Stopping duplicate bug reports before they start with Continuous Querying for bug reports

10.7287/peerj.preprints.2373 ◽

2016 ◽

Cited By ~ 1

Author(s):

Abram Hindle

Keyword(s):

Information Retrieval ◽

Software Engineering ◽

Search Engines ◽

Indexing Methods ◽

Bug Reports ◽

Bug Report ◽

Engineering Information ◽

String Search ◽

The Creation ◽

Duplicate Bug Reports

Bug deduplication is a hot topic in software engineering information retrieval research, but it is often not deployed. Typically to de-duplicate bug reports developers rely upon the search capabilities of the bug report software they employ, such as Bugzilla, Jira, or Github Issues. These search capabilities range from simple SQL string search to IR-based word indexing methods employed by search engines. Yet too often these searches do very little to stop the creation of duplicate bug reports. Some bug trackers have more than 10\% of their bug reports marked as duplicate. Perhaps these bug tracker search engines are not enough? In this paper we propose a method of attempting to prevent duplicate bug reports before they start: continuous querying. That is as the bug reporter types in their bug report their text is used to query the bug database to find duplicate or related bug reports. This continuous querying allows the reporter to be alerted to duplicate bug reports as they report the bug, rather than formulating queries to find the duplicate bug report. Thus this work ushers in a new way of evaluating bug report deduplication techniques, as well as a new kind of bug deduplication task. We show that simple IR measures show some promise for addressing this problem but also that further research is needed to refine this novel process that is integrate-able into modern bug report systems.

Download Full-text

Fast Duplicate Bug Reports Detector Training using Sampling for Dimension Reduction: Using Instance-based Learning for Continous Query in Real-World

2020 11th International Conference on Information and Knowledge Technology (IKT) ◽

10.1109/ikt51791.2020.9345611 ◽

2020 ◽

Author(s):

Behzad Soleimani Neysiani ◽

Saeed Doostali ◽

Seyed Morteza Babamir ◽

Zahra Aminoroaya

Keyword(s):

Dimension Reduction ◽

Real World ◽

Bug Reports ◽

Instance Based Learning ◽

Duplicate Bug Reports

Download Full-text

Ontology Augmented Software Engineering

Software Development Techniques for Constructive Information Systems Design ◽

10.4018/978-1-4666-3679-8.ch023 ◽

2013 ◽

pp. 406-413

Author(s):

Qazi Mudassar Ilyas

Keyword(s):

Software Engineering ◽

Semantic Web ◽

System Integration ◽

Software Maintenance ◽

Domain Knowledge ◽

System Development ◽

Tracking System ◽

Software Artifacts ◽

Natural Choice ◽

Development Life Cycle

Semantic Web was proposed to make the content machine-understandable by developing ontologies to capture domain knowledge and annotating content with this domain knowledge. Although, the original idea of semantic web was to make content on the World Wide Web machine-understandable, with recent advancements and awareness about these technologies, researchers have applied ontologies in many interesting domains. Many phases in software engineering are dependent on availability of knowledge, and the use of ontologies to capture and process this knowledge is a natural choice. This chapter discusses how ontologies can be used in various stages of the system development life cycle. Ontologies can be used to support requirements engineering phase in identifying and fixing inconsistent, incomplete, and ambiguous requirement. They can also be used to model the requirements and assist in requirements management and validation. During software design and development stages, ontologies can help software engineers in finding suitable components, managing documentation of APIs, and coding support. Ontologies can help in system integration and evolution process by aligning various databases with the help of ontologies capturing knowledge about database schema and aligning them with concepts in ontology. Ontologies can also be used in software maintenance by developing a bug tracking system based upon ontological knowledge of software artifacts and roles of developers involved in software maintenance task.

Download Full-text

Detecting Duplicate Bug Reports with Convolutional Neural Networks

2018 25th Asia-Pacific Software Engineering Conference (APSEC) ◽

10.1109/apsec.2018.00056 ◽

2018 ◽

Cited By ~ 7

Author(s):

Qi Xie ◽

Zhiyuan Wen ◽

Jieming Zhu ◽

Cuiyun Gao ◽

Zibin Zheng

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Bug Reports ◽

Duplicate Bug Reports

Download Full-text

DURFEX: A Feature Extraction Technique for Efficient Detection of Duplicate Bug Reports

2017 IEEE International Conference on Software Quality, Reliability and Security (QRS) ◽

10.1109/qrs.2017.35 ◽

2017 ◽

Cited By ~ 5

Author(s):

Korosh Koochekian Sabor ◽

Abdelwahab Hamou-Lhadj ◽

Alf Larsson

Keyword(s):

Feature Extraction ◽

Extraction Technique ◽

Bug Reports ◽

Efficient Detection ◽

Feature Extraction Technique ◽

Duplicate Bug Reports

Download Full-text

PRST: A PageRank-Based Summarization Technique for Summarizing Bug Reports with Duplicates

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194017500322 ◽

2017 ◽

Vol 27 (06) ◽

pp. 869-896 ◽

Cited By ~ 14

Author(s):

He Jiang ◽

Najam Nazar ◽

Jingxuan Zhang ◽

Tao Zhang ◽

Zhilei Ren

Keyword(s):

Regression Model ◽

Software Maintenance ◽

State Of The Art ◽

Similarity Metrics ◽

Automatic Summarization ◽

Textual Information ◽

Additional Information ◽

Bug Reports ◽

Reference Process ◽

Duplicate Bug Reports

During software maintenance, bug reports are widely employed to improve the software project’s quality. A developer often refers to stowed bug reports in a repository for bug resolution. However, this reference process often requires a developer to pursue a substantial amount of textual information in bug reports which is lengthy and tedious. Automatic summarization of bug reports is one way to overcome this problem. Both supervised and unsupervised methods are effectively proposed for the automatic summary generation of bug reports. However, existing methods disregard the significance of duplicate bug reports in summarizing bug reports. In this study, we propose a PageRank-based Summarization Technique (PRST), which utilizes the textual information contained in bug reports and additional information in associated duplicate bug reports. PRST uses three variants of PageRank-based on Vector Space Model (VSM), Jaccard, and WordNet similarity metrics. These variants are utilized to calculate the textual similarity of the sentences between the master bug reports and their duplicates. PRST further trains a regression model and predicts the probability of sentences belonging to the summary. Finally, we combine the values of PageRank and regression model scores to rank the sentences and produce the summary for the master bug reports. In addition, we construct two corpora of bug reports and duplicates, i.e. MBRC and OSCAR. Empirical results suggest that PRST outperforms the state-of-the-art method BRC in terms of Precision, Recall, F-score, and Pyramid Precision. Meanwhile, PRST with WordNet achieves the best results against PRST with VSM and Jaccard.

Download Full-text

New Methodology for Contextual Features Usage in Duplicate Bug Reports Detection : Dimension Expansion based on Manhattan Distance Similarity of Topics

2019 5th International Conference on Web Research (ICWR) ◽

10.1109/icwr.2019.8765296 ◽

2019 ◽

Author(s):

Behzad Soleimani Neysiani ◽

Seyed Morteza Babamir

Keyword(s):

Manhattan Distance ◽

Contextual Features ◽

Bug Reports ◽

Dimension Expansion ◽

Duplicate Bug Reports

Download Full-text

An HMM-based approach for automatic detection and classification of duplicate bug reports

Information and Software Technology ◽

10.1016/j.infsof.2019.05.007 ◽

2019 ◽

Vol 113 ◽

pp. 98-109 ◽

Cited By ~ 11

Author(s):

Neda Ebrahimi ◽

Abdelaziz Trabelsi ◽

Md. Shariful Islam ◽

Abdelwahab Hamou-Lhadj ◽

Kobra Khanmohammadi

Keyword(s):

Automatic Detection ◽

Bug Reports ◽

Duplicate Bug Reports

Download Full-text