Automatic User Preferences Acquirement in Chinese Commercial Web Sites with NLP and DM Techniques

Author(s):  
Shilin Zhang ◽  
Hui Wang
Keyword(s):  
Author(s):  
Anne Yun-An Chen ◽  
Dennis McLeod

In order to draw users’ attention and to increase their satisfaction toward online information search results, search-engine developers and vendors try to predict user preferences based on users’ behavior. Recommendations are provided by the search engines or online vendors to the users. Recommendation systems are implemented on commercial and nonprofit Web sites to predict user preferences. For commercial Web sites, accurate predictions may result in higher selling rates. The main functions of recommendation systems include analyzing user data and extracting useful information for further predictions. Recommendation systems are designed to allow users to locate preferable items quickly and to avoid possible information overload. Recommendation systems apply data-mining techniques to determine the similarity among thousands or even millions of data. Collaborative-filtering techniques have been successful in enabling the prediction of user preferences in recommendation systems (Hill, Stead, Rosenstein, & Furnas, 1995, Shardanand & Maes, 1995). There are three major processes in recommendation systems: object data collections and representations, similarity decisions, and recommendation computations. Collaborative filtering aims at finding the relationships among new individual data and existing data in order to further determine their similarity and provide recommendations. How to define the similarity is an important issue. How similar should two objects be in order to finalize the preference prediction? Similarity decisions are concluded differently by collaborative-filtering techniques. For example, people that like and dislike movies in the same categories would be considered as the ones with similar behavior (Chee, Han, & Wang, 2001). The concept of the nearest-neighbor algorithm has been included in the implementation of recommendation systems (Resnick, Iacovou, Suchak, Bergstrom, & Riedl, 1994). The designs of pioneer recommendation systems focus on entertainment fields (Dahlen, Konstan, Herlocker, Good, Borchers, & Riedl, 1998; Resnick et al.; Shardanand & Maes; Hill et al.). The challenge of conventional collaborative-filtering algorithms is the scalability issue (Sarwar, Karypis, Konstan, & Riedl, 2000a). Conventional algorithms explore the relationships among system users in large data sets. User data are dynamic, which means the data vary within a short time period. Current users may change their behavior patterns, and new users may enter the system at any moment. Millions of user data, which are called neighbors, are to be examined in real time in order to provide recommendations (Herlocker, Konstan, Borchers, & Riedl, 1999). Searching among millions of neighbors is a time-consuming process. To solve this, item-based collaborative-filtering algorithms are proposed to enable reductions of computations because properties of items are relatively static (Sarwar, Karypis, Konstan, & Riedl, 2001). Suggest is a top-N recommendation engine implemented with item-based recommendation algorithms (Deshpande & Karypis, 2004; Karypis, 2000). Meanwhile, the amount of items is usually less than the number of users. In early 2004, Amazon Investor Relations (2004) stated that the Amazon.com apparel and accessories store provided about 150,000 items but had more than 1 million customer accounts that had ordered from this store. Amazon.com employs an item-based algorithm for collaborative-filtering-based recommendations (Linden, Smith, & York, 2003) to avoid the disadvantages of conventional collaborative-filtering algorithms.


2015 ◽  
Vol 34 (4) ◽  
pp. 113-118 ◽  
Author(s):  
Erin E. Kerby ◽  
Kelli Trei

Purpose – This study aims to highlight practical considerations to be made when choosing an eBook package for an institution. Many academic libraries purchase eBooks bundled in packages, either as a time- or cost-saving measure or to build a new subject collection. Design/methodology/approach – The authors searched the Web sites of six major publishers for information on eBook packages, including subject coverage, digital rights management restrictions and usage allowances. The analysis also includes a potential overlap between related subject collections and the ability to purchase titles individually. Findings – Usage allowances, digital rights management restrictions and purchasing options vary considerably from publisher to publisher. There was title overlap between related subject packages found in some publishers. In response to user preferences and needs, many publishers are loosening restrictions on their eBook content, which make purchasing packages a more attractive option for libraries. Originality/value – The landscape of eBook publishing is rapidly changing, which can complicate purchasing decisions. The detailed comparison provided by this study can be used to assist collections developers in making purchasing decisions best suited to their library and avoiding pitfalls such as duplicate purchases.


2015 ◽  
Author(s):  
Susanne Mikki ◽  
Marta Zygmuntowska ◽  
Hemed Ali Al Ruwehy ◽  
Øyvind Liland Gjesdal

See video of the presentation.We investigate the digital presence of scholars at different academic Web sites. With new technologies, creating profiles, disseminating and exchanging ideas is easily done, and scholars are more likely to attend the networks and impact their community.In our study we compare research profiles of employees at the University of Bergen at five different academic network sites. The sites are ResearchGate, Academia.edu, Google Scholar, ResearcherID and ORCID. CRIStin, the Current Research Information System in Norway (www.cristin.no), is used as a reference value. CRIStin is a national database which contains quality-assured data on scientific publications including supplementary author details such as age, gender, position and affiliation. All investigated sites have varying scopes (and degree of control), but also common features which are worth to investigate and compare.Data is collected using Web scraping applications developed at the University of Bergen Library by searching for the researchers that are affiliated with the University of Bergen.  This was achieved by analyzing the Document Object Model (DOM) of every academic site and then building up a set of selectors and expressions, so that the DOM could be traversed programmatically and indicators extracted.Author recognition is then done by comparing names given in the services with names in CRIStin. After extensive data cleansing and deduplication we were able to compare the different services.Our first goal is to determine number of profiles and degree of overlap. The overlap tells us whether scholars are willing to maintain their profiles at several services. Preference of platform in regard to faculty affiliation, position and age is another aspect of our investigation.Further, we analyze extracted indicators in regard to traditional bibliometric and “altmetric” measures. Bibliometric measures are related to publications and citations, while “altmetric” indicators comprise different forms of Web activities such as followers, following, views and downloads. The indicators vary from service to service, and a correlation analysis tells us whether indicators are related to each other or not. We find that about 37% of researchers at the University of Bergen have at least one profile. They are reluctant to maintain several profiles and overlap was therefore relatively small. Age is a poor predictor of web site use, and women are underrepresented on the investigated platforms. The representation is highest at the Faculty of Psychology and the Faculty of Social Sciences (> 40%). Available indicators show high correlation within bibliometric indicators, but correlation is weak with social and activity indicators across platforms.


Sign in / Sign up

Export Citation Format

Share Document