Privileged Information for Hierarchical Document Clustering: A Metric Learning Approach

Author(s):  
Ricardo Marcondes Marcacini ◽  
Marcos Aurelio Domingues ◽  
Eduardo R. Hruschka ◽  
Solange Oliveira Rezende
2014 ◽  
Vol 2014 ◽  
pp. 1-10 ◽  
Author(s):  
Guofeng Zou ◽  
Yuanyuan Zhang ◽  
Kejun Wang ◽  
Shuming Jiang ◽  
Huisong Wan ◽  
...  

To solve the matching problem of the elements in different data collections, an improved coupled metric learning approach is proposed. First, we improved the supervised locality preserving projection algorithm and added the within-class and between-class information of the improved algorithm to coupled metric learning, so a novel coupled metric learning method is proposed. Furthermore, we extended this algorithm to nonlinear space, and the kernel coupled metric learning method based on supervised locality preserving projection is proposed. In kernel coupled metric learning approach, two elements of different collections are mapped to the unified high dimensional feature space by kernel function, and then generalized metric learning is performed in this space. Experiments based on Yale and CAS-PEAL-R1 face databases demonstrate that the proposed kernel coupled approach performs better in low-resolution and fuzzy face recognition and can reduce the computing time; it is an effective metric method.


Author(s):  
Han-Jia Ye ◽  
De-Chuan Zhan ◽  
Xue-Min Si ◽  
Yuan Jiang

Mahalanobis distance metric takes feature weights and correlation into account in the distance computation, which can improve the performance of many similarity/dissimilarity based methods, such as kNN. Most existing distance metric learning methods obtain metric based on the raw features and side information but neglect the reliability of them. Noises or disturbances on instances will make changes on their relationships, so as to affect the learned metric.In this paper, we claim that considering disturbance of instances may help the distance metric learning approach get a robust metric, and propose the Distance metRIc learning Facilitated by disTurbances (DRIFT) approach. In DRIFT, the noise or the disturbance of each instance is learned. Therefore, the distance between each pair of (noisy) instances can be better estimated, which facilitates side information utilization and metric learning.Experiments on prediction and visualization clearly indicate the effectiveness of the proposed approach.


IEEE Access ◽  
2018 ◽  
Vol 6 ◽  
pp. 60380-60395 ◽  
Author(s):  
Han Hu ◽  
Yong Luo ◽  
Yonggang Wen ◽  
Yew-Soon Ong ◽  
Xinwen Zhang

2019 ◽  
Vol 41 (5) ◽  
pp. 1257-1270 ◽  
Author(s):  
Han-Jia Ye ◽  
De-Chuan Zhan ◽  
Yuan Jiang ◽  
Zhi-Hua Zhou

2012 ◽  
Vol 97 ◽  
pp. 44-51 ◽  
Author(s):  
Xianye Ben ◽  
Weixiao Meng ◽  
Rui Yan ◽  
Kejun Wang

2019 ◽  
Vol 46 (2) ◽  
pp. 104-121 ◽  
Author(s):  
Koraljka Golub

Automatic subject indexing addresses problems of scale and sustainability and can be at the same time used to enrich existing metadata records, establish more connections across and between resources from various metadata and resource collections, and enhance consistency of the metadata. In this work, automatic subject indexing focuses on assigning index terms or classes from established knowledge organization systems (KOSs) for subject indexing like thesauri, subject headings systems and classification systems. The following major approaches are discussed, in terms of their similarities and differences, advantages and disadvantages for automatic assigned indexing from KOSs: “text categorization,” “document clustering,” and “document classification.” Text categorization is perhaps the most widespread, machine-learning approach with what seems generally good reported performance. Document clustering automatically both creates groups of related documents and extracts names of subjects depicting the group at hand. Document classification re-uses the intellectual effort invested into creating a KOS for subject indexing and even simple string-matching algorithms have been reported to achieve good results, because one concept can be described using a number of different terms, including equivalent, related, narrower and broader terms. Finally, applicability of automatic subject indexing to operative information systems and challenges of evaluation are outlined, suggesting the need for more research.


Sign in / Sign up

Export Citation Format

Share Document