Combining Text Mining and Data Mining for Bug Report Classification

Author(s):  
Yu Zhou ◽  
Yanxiang Tong ◽  
Ruihang Gu ◽  
Harald Gall
Keyword(s):  
2016 ◽  
Vol 28 (3) ◽  
pp. 150-176 ◽  
Author(s):  
Yu Zhou ◽  
Yanxiang Tong ◽  
Ruihang Gu ◽  
Harald Gall
Keyword(s):  

Author(s):  
Mahwish Abid ◽  
Muhammad Usman ◽  
Muhammad Waleed Ashraf

<strong>As the technology is growing very fast and usage of computer systems is increased  as compared to the old times, plagiarism is the phenomenon which is increasing day by day. Wrongful appropriation of someone else’s work is known as plagiarism. Manually detection of plagiarism is difficult so this process should be automated. There are various tools which can be used for plagiarism detection. Some works on intrinsic plagiarism while other work on extrinsic plagiarism. Data mining the field which can help in detecting the plagiarism as well as can help to improve the efficiency of the process. Different data mining techniques can be used to detect plagiarism. Text mining, clustering, bi-gram, tri-grams, n-grams are the techniques which can help in this process</strong>


2018 ◽  
Vol 25 (4) ◽  
pp. 74
Author(s):  
Alfredo Silveira Araújo Neto ◽  
Marcos Negreiros

The rapid advances in technologies related to the capture and storage of data in digital format have allowed to organizations the accumulation of a volume of information extremely high, constituted a higher proportion of data in unstructured format, represented by texts. However, it is noted that the retrieval of useful information from these large repositories has been a very challenging activity. In this context, data mining is presented as a self-discovery process that acts on large databases and enables the knowledge extraction from raw text documents. Among the many sources of textual documents are electronic diaries of justice, which are intended to make public officially all the acts of the Judiciary. Despite the publication in digital form has provided improvements represented by the removal of imperfections related to divulgation at printed format, it is observed that the application of data mining methods could render more rapid analysis of its contents. In this sense, this article establishes a tool capable of automatically grouping and categorizing digital procedural acts, based on the evaluation of text mining techniques applied to groups determination activity. In addition, the strategy of defining the descriptors of the groups, that is usually conducted based on the most frequent words in the documents, was evaluated and remodeled in order to use, instead of words, the most regularly identified concepts in the texts.


2016 ◽  
Vol 9 (1) ◽  
Author(s):  
Gabriela Jurca ◽  
Omar Addam ◽  
Alper Aksac ◽  
Shang Gao ◽  
Tansel Özyer ◽  
...  

Author(s):  
Juan I. Guerrero ◽  
Íñigo Monedero ◽  
Félix Biscarri ◽  
Jesús Biscarri ◽  
Rocío Millán ◽  
...  

The MIDAS project began in 2006 as collaboration between Endesa, Sadiel, and the University of Seville. The objective of the MIDAS project is the detection of Non-Technical Losses (NTLs) on power utilities. The NTLs represent the non-billed energy due to faults or illegal manipulations in clients’ facilities. Initially, research lines study the application of techniques of data mining and neural networks. After several researches, the studies are expanded to other research fields: expert systems, text mining, statistical techniques, pattern recognition, etc. These techniques have provided an automated system for detection of NTLs on company databases. This system is in the test phase, and it is applied in real cases in company databases.


Author(s):  
Manish Gupta ◽  
Jiawei Han

Sequential pattern mining methods have been found to be applicable in a large number of domains. Sequential data is omnipresent. Sequential pattern mining methods have been used to analyze this data and identify patterns. Such patterns have been used to implement efficient systems that can recommend based on previously observed patterns, help in making predictions, improve usability of systems, detect events, and in general help in making strategic product decisions. In this chapter, we discuss the applications of sequential data mining in a variety of domains like healthcare, education, Web usage mining, text mining, bioinformatics, telecommunications, intrusion detection, et cetera. We conclude with a summary of the work.


Author(s):  
Dan Zhu

With the advent of technology, information is available in abundance on the World Wide Web. In order to have appropriate and useful information users must increasingly use techniques and automated tools to search, extract, filter, analyze and evaluate desired information and resources. Data mining can be defined as the extraction of implicit, previously unknown, and potentially useful information from large databases. On the other hand, text mining is the process of extracting the information from an unstructured text. A standard text mining approach will involve categorization of text, text clustering, and extraction of concepts, granular taxonomies production, sentiment analysis, document summarization, and modeling (Fan et al, 2006). Furthermore, Web mining is the discovery and analysis of useful information using the World Wide Web (Berry, 2002; Mobasher, 2007). This broad definition encompasses “web content mining,” the automated search for resources and retrieval of information from millions of websites and online databases, as well as “web usage mining,” the discovery and analysis of users’ website navigation and online service access patterns. Companies are investing significant amounts of time and money on creating, developing, and enhancing individualized customer relationship, a process called customer relationship management or CRM. Based on a report by the Aberdeen Group, worldwide CRM spending reached close to $20 billion by 2006. Today, to improve the customer relationship, most companies collect and refine massive amounts of data available through the customers. To increase the value of current information resources, data mining techniques can be rapidly implemented on existing software and hardware platforms, and integrated with new products and systems (Wang et al., 2008). If implemented on high-performance client/server or parallel processing computers, data mining tools can analyze enormous databases to answer customer-centric questions such as, “Which clients have the highest likelihood of responding to my next promotional mailing, and why.” This paper provides a basic introduction to data mining and other related technologies and their applications in CRM.


Author(s):  
Taşkın Dirsehan

Marketing concept has progressed through different phases of evolution in the past. At the moment, customer relationship management is considered as the last era of marketing development. The main purpose of this approach is to build long-term oriented profitable relationships with customers. So, companies should know better their customers. This knowledge can be created through a deeper analysis of companies' data with data mining tools. Companies which are able to use data mining tools will gain strong competitive advantages for their strategic decisions. Hotel industry is selected in this study, since it provides a warehouse of customer comments from which precious knowledge can be obtained if text mining as a data mining tool is used appropriately. Thus, this study attempts to explain the stages of text mining with the use of Rapidminer. As a result, different approaches according to the customer satisfaction/dissatisfaction are discussed to build competitive advantages.


Author(s):  
Hércules Antonio do Prado ◽  
José Palazzo Moreira de Oliveira ◽  
Edilson Ferneda ◽  
Leandro Krug Wives ◽  
Edilberto Magalhães Silva ◽  
...  

Information about the external environment and organizational processes are among the most worthwhile input for business intelligence (BI). Nowadays, companies have plenty of information in structured or textual forms, either from external monitoring or from the corporative systems. In the last years, the structured part of this information stock has been massively explored by means of data-mining (DM) techniques (Wang, 2003), generating models that enable the analysts to gain insights on the solutions for organizational problems. On the text-mining (TM) side, the rhythm of new applications development did not go so fast. In an informal poll carried out in 2002 (Kdnuggets), just 4% of the knowledge-discovery-from-databases (KDD) practitioners were applying TM techniques. This fact is as intriguing as surprising if one considers that 80% of all information available in an organization comes in textual form (Tan, 1999).


Sign in / Sign up

Export Citation Format

Share Document