scholarly journals Study on Unknown Term Translation Mining from Google Snippets

Information ◽  
2019 ◽  
Vol 10 (9) ◽  
pp. 267 ◽  
Author(s):  
Bin Li ◽  
Jianmin Yao

Bilingual web pages are widely used to mine translations of unknown terms. This study focused on an effective solution for obtaining relevant web pages, extracting translations with correct lexical boundaries, and ranking the translation candidates. This research adopted co-occurrence information to obtain the subject terms and then expanded the source query with the translation of the subject terms to collect effective bilingual search engine snippets. Afterwards, valid candidates were extracted from small-sized, noisy bilingual corpora using an improved frequency change measurement that combines adjacent information. This research developed a method that considers surface patterns, frequency–distance, and phonetic features to elect an appropriate translation. The experimental results revealed that the proposed method performed remarkably well for mining translations of unknown terms.

2013 ◽  
Vol 25 ◽  
pp. 189-203 ◽  
Author(s):  
Dominik Schlosser

This paper attempts to give an overview of the different representations of the pilgrimage to Mecca found in the ‘liminal space’ of the internet. For that purpose, it examines a handful of emblematic examples of how the hajj is being presented and discussed in cyberspace. Thereby, special attention shall be paid to the question of how far issues of religious authority are manifest on these websites, whether the content providers of web pages appoint themselves as authorities by scrutinizing established views of the fifth pillar of Islam, or if they upload already printed texts onto their sites in order to reiterate normative notions of the pilgrimage to Mecca, or of they make use of search engine optimisation techniques, thus heightening the very visibility of their online presence and increasing the possibility of becoming authoritative in shaping internet surfers’ perceptions of the hajj.


Author(s):  
Rizwan Ur Rahman ◽  
Rishu Verma ◽  
Himani Bansal ◽  
Deepak Singh Tomar

With the explosive expansion of information on the world wide web, search engines are becoming more significant in the day-to-day lives of humans. Even though a search engine generally gives huge number of results for certain query, the majority of the search engine users simply view the first few web pages in result lists. Consequently, the ranking position has become a most important concern of internet service providers. This article addresses the vulnerabilities, spamming attacks, and countermeasures in blogging sites. In the first part, the article explores the spamming types and detailed section on vulnerabilities. In the next part, an attack scenario of form spamming is presented, and defense approach is presented. Consequently, the aim of this article is to provide review of vulnerabilities, threats of spamming associated with blogging websites, and effective measures to counter them.


Author(s):  
Ricard Monclús-Guitart ◽  
Teresa Torres-Coronas ◽  
Araceli Rodríguez-Merayo ◽  
M. Arántzazu Vidal-Blasco ◽  
Mario Arias-Oliva

The European Credit Transfer System establishes a calculation based on the work students do, rather than direct teaching hours as is the case with the current credit system. These are known as ECTS credits and they represent the amount of work the student needs to do to pass a subject. In short, ECTS credits are the quantity of work needed to learn a subject, including theory, practical classes, seminars, exams as well as anything the student has done individually which can be evaluated. This is where a Wiki would provide a new space for students, where they could and should introduce information on matters related to the subject, as well as edit, correct, expand and improve etc. the already existing information. This information, which would be a collection of web pages in hypertext, would make it possible to create a computer application based on the collaborative work of the students which can be accessed by any student from any Internet connection. At the same time, it can be assessed and therefore form part of the student’s final grade for the subject. The aim of this chapter is to show the methodology which will enable a Wiki to be used for professional learning. Therefore, first the authors define what a Wiki is; second they discuss the Wiki as a collaborative teaching instrument; and third they deal with Wikis as a tool for educational assessment.


Author(s):  
Ravi P. Kumar ◽  
Ashutosh K. Singh ◽  
Anand Mohan

In this era of Web computing, Cyber Security is very important as more and more data is moving into the Web. Some data are confidential and important. There are many threats for the data in the Web. Some of the basic threats can be addressed by designing the Web sites properly using Search Engine Optimization techniques. One such threat is the hanging page which gives room for link spamming. This chapter addresses the issues caused by hanging pages in Web computing. This Chapter has four important objectives. They are 1) Compare and review the different types of link structure based ranking algorithms in ranking Web pages. PageRank is used as the base algorithm throughout this Chapter. 2) Study on hanging pages, explore the effects of hanging pages in Web security and compare the existing methods to handle hanging pages. 3) Study on Link spam and explore the effect of hanging pages in link spam contribution and 4) Study on Search Engine Optimization (SEO) / Web Site Optimization (WSO) and explore the effect of hanging pages in Search Engine Optimization (SEO).


Author(s):  
Oğuzhan Menemencioğlu ◽  
İlhami Muharrem Orak

Semantic web works on producing machine readable data and aims to deal with large amount of data. The most important tool to access the data which exist in web is the search engine. Traditional search engines are insufficient in the face of the amount of data that consists in the existing web pages. Semantic search engines are extensions to traditional engines and overcome the difficulties faced by them. This paper summarizes semantic web, concept of traditional and semantic search engines and infrastructure. Also semantic search approaches are detailed. A summary of the literature is provided by touching on the trends. In this respect, type of applications and the areas worked for are considered. Based on the data for two different years, trend on these points are analyzed and impacts of changes are discussed. It shows that evaluation on the semantic web continues and new applications and areas are also emerging. Multimedia retrieval is a newly scope of semantic. Hence, multimedia retrieval approaches are discussed. Text and multimedia retrieval is analyzed within semantic search.


2016 ◽  
Vol 6 (2) ◽  
pp. 41-65 ◽  
Author(s):  
Sheetal A. Takale ◽  
Prakash J. Kulkarni ◽  
Sahil K. Shah

Information available on the internet is huge, diverse and dynamic. Current Search Engine is doing the task of intelligent help to the users of the internet. For a query, it provides a listing of best matching or relevant web pages. However, information for the query is often spread across multiple pages which are returned by the search engine. This degrades the quality of search results. So, the search engines are drowning in information, but starving for knowledge. Here, we present a query focused extractive summarization of search engine results. We propose a two level summarization process: identification of relevant theme clusters, and selection of top ranking sentences to form summarized result for user query. A new approach to semantic similarity computation using semantic roles and semantic meaning is proposed. Document clustering is effectively achieved by application of MDL principle and sentence clustering and ranking is done by using SNMF. Experiments conducted demonstrate the effectiveness of system in semantic text understanding, document clustering and summarization.


2019 ◽  
Vol 16 (9) ◽  
pp. 3712-3716
Author(s):  
Kailash Kumar ◽  
Abdulaziz Al-Besher

This paper examines the overlapping of the results retrieved between three major search engines namely Google, Yahoo and Bing. A rigorous analysis of overlap among these search engines was conducted on 100 random queries. The overlap of first ten web page results, i.e., hundred results from each search engine and only non-sponsored results from these above major search engines were taken into consideration. Search engines have their own frequency of updates and ranking of results based on their relevance. Moreover, sponsored search advertisers are different for different search engines. Single search engine cannot index all Web pages. In this research paper, the overlapping analysis of the results were carried out between October 1, 2018 to October 31, 2018 among these major search engines namely, Google, Yahoo and Bing. A framework is built in Java to analyze the overlap among these search engines. This framework eliminates the common results and merges them in a unified list. It also uses the ranking algorithm to re-rank the search engine results and displays it back to the user.


2018 ◽  
Vol 6 (3) ◽  
pp. 67-78
Author(s):  
Tian Nie ◽  
Yi Ding ◽  
Chen Zhao ◽  
Youchao Lin ◽  
Takehito Utsuro

The background of this article is the issue of how to overview the knowledge of a given query keyword. Especially, the authors focus on concerns of those who search for web pages with a given query keyword. The Web search information needs of a given query keyword is collected through search engine suggests. Given a query keyword, the authors collect up to around 1,000 suggests, while many of them are redundant. They classify redundant search engine suggests based on a topic model. However, one limitation of the topic model based classification of search engine suggests is that the granularity of the topics, i.e., the clusters of search engine suggests, is too coarse. In order to overcome the problem of the coarse-grained classification of search engine suggests, this article further applies the word embedding technique to the webpages used during the training of the topic model, in addition to the text data of the whole Japanese version of Wikipedia. Then, the authors examine the word embedding based similarity between search engines suggests and further classify search engine suggests within a single topic into finer-grained subtopics based on the similarity of word embeddings. Evaluation results prove that the proposed approach performs well in the task of subtopic classification of search engine suggests.


2016 ◽  
Vol 3 (2) ◽  
pp. 152-164 ◽  
Author(s):  
Hamid Sharifi

In this research, we studied localized commercial texts of globalized companies in the context of intertextuality on three levels: lexical, thematic, and cultural. Amongst many products of the three companies under study (Samsung, LG, and Sony), four smartphone models of each were selected (total: 12). Their introductory web pages both in Persian and English were the sources of the data. Furthermore, we used an online analyzer tool (online-utility.org/text/analyzer.jsp) so as to analyze the data; the results were also corroborated with other pieces of software packages and applications. In the scene of booming globalization, a better understanding of cross-cultural vocative communication proves to be helpful. One of the most active areas is to study flagship brands where rivals are trying their best at localizing their devices to the liking of potential customers. Descriptive and explanatory methods were brought into play in order to compare English and Persian commercial texts. The research revealed the critical role intertextuality plays in the process of glocalization. Developing companies should note that they, too, could utilize this great potentiality in the context of web localization. Therefore, the findings would be of benefit to Chief Executive Officers (CEOs), product developers and scholars interested in the subject.


Sign in / Sign up

Export Citation Format

Share Document