A Model for Text Summarization

2017 ◽  
Vol 13 (1) ◽  
pp. 67-85 ◽  
Author(s):  
Rasim M. Alguliyev ◽  
Ramiz M. Aliguliyev ◽  
Nijat R. Isazade ◽  
Asad Abdi ◽  
Norisma Idris

Text summarization is a process for creating a concise version of document(s) preserving its main content. In this paper, to cover all topics and reduce redundancy in summaries, a two-stage sentences selection method for text summarization is proposed. At the first stage, to discover all topics the sentences set is clustered by using k-means method. At the second stage, optimum selection of sentences is proposed. From each cluster the salient sentences are selected according to their contribution to the topic (cluster) and their proximity to other sentences in cluster to avoid redundancy in summaries until the appointed summary length is reached. Sentence selection is modeled as an optimization problem. In this study, to solve the optimization problem an adaptive differential evolution with novel mutation strategy is employed. With a test on benchmark DUC2001 and DUC2002 data sets, the ROUGE value of summaries got by the proposed approach demonstrated its validity, compared to the traditional methods of sentence selection and the top three performing systems for DUC2001 and DUC2002.

Author(s):  
Christopher Jayakaran ◽  
Ragini Patel ◽  
Prashant Momaya ◽  
K. Roopesh ◽  
Umeshchandra Ananthanarayana ◽  
...  

The activity of tolerance allocation and optimization is a critical step in the product design process. This inherent trade-off between design objectives and process capability poses challenges in achieving right tolerances, both technically and effort-wise. Traditional methods in tolerance allocation are mostly regressive and are constrained by selection of the manufacturing processes. A progressive approach to tolerance allocation that does not assume these processes helps in achieving optimality of the tolerances and selection of manufacturing processes to realize the design. The two-stage process suggested in this paper formulates an optimization problem that allocates the tolerances based on sensitivities of tolerance values at the first stage followed by manufacturing process selection and further optimization to adhere to the processes selected in the second stage. The approach aims at achieving optimal allocation of tolerances and assignment of the manufacturing processes, while keeping the optimization problem computationally simple, although iterative.


Author(s):  
Zhiqiang Gao ◽  
Yixiao Sun ◽  
Xiaolong Cui ◽  
Yutao Wang ◽  
Yanyu Duan ◽  
...  

This article describes how the most widely used clustering, k-means, is prone to fall into a local optimum. Notably, traditional clustering approaches are directly performed on private data and fail to cope with malicious attacks in massive data mining tasks against attackers' arbitrary background knowledge. It would result in violation of individuals' privacy, as well as leaks through system resources and clustering outputs. To address these issues, the authors propose an efficient privacy-preserving hybrid k-means under Spark. In the first stage, particle swarm optimization is executed in resilient distributed datasets to initiate the selection of clustering centroids in the k-means on Spark. In the second stage, k-means is executed on the condition that a privacy budget is set as ε/2t with Laplace noise added in each round of iterations. Extensive experimentation on public UCI data sets show that on the premise of guaranteeing utility of privacy data and scalability, their approach outperforms the state-of-the-art varieties of k-means by utilizing swarm intelligence and rigorous paradigms of differential privacy.


2018 ◽  
Vol 14 (2) ◽  
pp. 1-17 ◽  
Author(s):  
Zhiqiang Gao ◽  
Yixiao Sun ◽  
Xiaolong Cui ◽  
Yutao Wang ◽  
Yanyu Duan ◽  
...  

This article describes how the most widely used clustering, k-means, is prone to fall into a local optimum. Notably, traditional clustering approaches are directly performed on private data and fail to cope with malicious attacks in massive data mining tasks against attackers' arbitrary background knowledge. It would result in violation of individuals' privacy, as well as leaks through system resources and clustering outputs. To address these issues, the authors propose an efficient privacy-preserving hybrid k-means under Spark. In the first stage, particle swarm optimization is executed in resilient distributed datasets to initiate the selection of clustering centroids in the k-means on Spark. In the second stage, k-means is executed on the condition that a privacy budget is set as ε/2t with Laplace noise added in each round of iterations. Extensive experimentation on public UCI data sets show that on the premise of guaranteeing utility of privacy data and scalability, their approach outperforms the state-of-the-art varieties of k-means by utilizing swarm intelligence and rigorous paradigms of differential privacy.


2013 ◽  
Vol 12 (03) ◽  
pp. 361-393 ◽  
Author(s):  
RASIM M. ALGULIEV ◽  
RAMIZ M. ALIGULIYEV ◽  
NIJAT R. ISAZADE

We have presented an approach to automatic document summarization. In the proposed approach, text summarization is modeled as a quadratic integer-programming problem. This model generally attempts to optimize three properties, namely, (1) relevance: summary should contain informative textual units that are relevant to the user; (2) redundancy: summaries should not contain multiple textual units that convey the same information; and (3) length: summary is bounded in length. To solve the optimization problem we have created a novel differential evolution algorithm. Experimental results on DUC2005 and DUC2007 data sets showed that the proposed approach outperforms the other methods.


1995 ◽  
Vol 31 (2) ◽  
pp. 193-204 ◽  
Author(s):  
Koen Grijspeerdt ◽  
Peter Vanrolleghem ◽  
Willy Verstraete

A comparative study of several recently proposed one-dimensional sedimentation models has been made. This has been achieved by fitting these models to steady-state and dynamic concentration profiles obtained in a down-scaled secondary decanter. The models were evaluated with several a posteriori model selection criteria. Since the purpose of the modelling task is to do on-line simulations, the calculation time was used as one of the selection criteria. Finally, the practical identifiability of the models for the available data sets was also investigated. It could be concluded that the model of Takács et al. (1991) gave the most reliable results.


1997 ◽  
Vol 36 (5) ◽  
pp. 61-68 ◽  
Author(s):  
Hermann Eberl ◽  
Amar Khelil ◽  
Peter Wilderer

A numerical method for the identification of parameters of nonlinear higher order differential equations is presented, which is based on the Levenberg-Marquardt algorithm. The estimation of the parameters can be performed by using several reference data sets simultaneously. This leads to a multicriteria optimization problem, which will be treated by using the Pareto optimality concept. In this paper, the emphasis is put on the presentation of the calibration method. As an example identification of the parameters of a nonlinear hydrological transport model for urban runoff is included, but the method can be applied to other problems as well.


2018 ◽  
Vol 21 (2) ◽  
pp. 117-124 ◽  
Author(s):  
Bakhtyar Sepehri ◽  
Nematollah Omidikia ◽  
Mohsen Kompany-Zareh ◽  
Raouf Ghavami

Aims & Scope: In this research, 8 variable selection approaches were used to investigate the effect of variable selection on the predictive power and stability of CoMFA models. Materials & Methods: Three data sets including 36 EPAC antagonists, 79 CD38 inhibitors and 57 ATAD2 bromodomain inhibitors were modelled by CoMFA. First of all, for all three data sets, CoMFA models with all CoMFA descriptors were created then by applying each variable selection method a new CoMFA model was developed so for each data set, 9 CoMFA models were built. Obtained results show noisy and uninformative variables affect CoMFA results. Based on created models, applying 5 variable selection approaches including FFD, SRD-FFD, IVE-PLS, SRD-UVEPLS and SPA-jackknife increases the predictive power and stability of CoMFA models significantly. Result & Conclusion: Among them, SPA-jackknife removes most of the variables while FFD retains most of them. FFD and IVE-PLS are time consuming process while SRD-FFD and SRD-UVE-PLS run need to few seconds. Also applying FFD, SRD-FFD, IVE-PLS, SRD-UVE-PLS protect CoMFA countor maps information for both fields.


Author(s):  
Christian Luksch ◽  
Lukas Prost ◽  
Michael Wimmer

We present a real-time rendering technique for photometric polygonal lights. Our method uses a numerical integration technique based on a triangulation to calculate noise-free diffuse shading. We include a dynamic point in the triangulation that provides a continuous near-field illumination resembling the shape of the light emitter and its characteristics. We evaluate the accuracy of our approach with a diverse selection of photometric measurement data sets in a comprehensive benchmark framework. Furthermore, we provide an extension for specular reflection on surfaces with arbitrary roughness that facilitates the use of existing real-time shading techniques. Our technique is easy to integrate into real-time rendering systems and extends the range of possible applications with photometric area lights.


2021 ◽  
pp. 1-16
Author(s):  
Hajer Al-Faham

How does surveillance shape political science research in the United States? In comparative and international politics, there is a rich literature concerning the conduct of research amid conditions of conflict and state repression. As this literature locates “the field” in distant contexts “over there,” the United States continues to be saturated with various forms of state control. What this portends for American politics research has thus far been examined by a limited selection of scholars. Expanding on their insights, I situate “the field” in the United States and examine surveillance of American Muslims, an understudied case of racialized state control. Drawing on qualitative data from a case study of sixty-nine interviews with Arab and Black American Muslims, I argue that surveillance operated as a two-stage political mechanism that mapped onto research methodologically and substantively. In the first stage, surveillance reconfigured the researcher-researchee dynamic, hindered recruitment and access, and limited data-collection. In the second stage, surveillance colored the self-perceptions, political attitudes, and civic engagement of respondents, thereby indicating a political socialization unfolding among Muslims. The implications of this study suggest that researchers can mitigate against some, but not all, of the challenges presented by surveillance and concomitant forms of state control.


2021 ◽  
Vol 24 (2) ◽  
pp. 1-35
Author(s):  
Isabel Wagner ◽  
Iryna Yevseyeva

The ability to measure privacy accurately and consistently is key in the development of new privacy protections. However, recent studies have uncovered weaknesses in existing privacy metrics, as well as weaknesses caused by the use of only a single privacy metric. Metrics suites, or combinations of privacy metrics, are a promising mechanism to alleviate these weaknesses, if we can solve two open problems: which metrics should be combined and how. In this article, we tackle the first problem, i.e., the selection of metrics for strong metrics suites, by formulating it as a knapsack optimization problem with both single and multiple objectives. Because solving this problem exactly is difficult due to the large number of combinations and many qualities/objectives that need to be evaluated for each metrics suite, we apply 16 existing evolutionary and metaheuristic optimization algorithms. We solve the optimization problem for three privacy application domains: genomic privacy, graph privacy, and vehicular communications privacy. We find that the resulting metrics suites have better properties, i.e., higher monotonicity, diversity, evenness, and shared value range, than previously proposed metrics suites.


Sign in / Sign up

Export Citation Format

Share Document