Machine Learning Based Lithography Hotspot Detection with Sparse Feature Encoding and Hierarchical Pattern Classification

2014 ◽  
Vol 60 (1) ◽  
pp. 1179-1184 ◽  
Author(s):  
K. s. Luo ◽  
Z. Shi ◽  
Z. Geng
Author(s):  
Pedro J. García-Laencina ◽  
Juan Morales-Sánchez ◽  
Rafael Verdú-Monedero ◽  
Jorge Larrey-Ruiz ◽  
José-Luis Sancho-Gómez ◽  
...  

Many real-word classification scenarios suffer a common drawback: missing, or incomplete, data. The ability of missing data handling has become a fundamental requirement for pattern classification because the absence of certain values for relevant data attributes can seriously affect the accuracy of classification results. This chapter focuses on incomplete pattern classification. The research works on this topic currently grows wider and it is well known how useful and efficient are most of the solutions based on machine learning. This chapter analyzes the most popular and proper missing data techniques based on machine learning for solving pattern classification tasks, trying to highlight their advantages and disadvantages.


Sensors ◽  
2015 ◽  
Vol 15 (11) ◽  
pp. 28456-28471 ◽  
Author(s):  
Vinicius Pegorini ◽  
Leandro Zen Karam ◽  
Christiano Pitta ◽  
Rafael Cardoso ◽  
Jean da Silva ◽  
...  

2017 ◽  
Author(s):  
David Z. Pan ◽  
Yibo Lin ◽  
Xiaoqing Xu ◽  
Jiaojiao Ou

2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Daniele Raimondi ◽  
Gabriele Orlando ◽  
Wim F. Vranken ◽  
Yves Moreau

AbstractMachine learning (ML) is ubiquitous in bioinformatics, due to its versatility. One of the most crucial aspects to consider while training a ML model is to carefully select the optimal feature encoding for the problem at hand. Biophysical propensity scales are widely adopted in structural bioinformatics because they describe amino acids properties that are intuitively relevant for many structural and functional aspects of proteins, and are thus commonly used as input features for ML methods. In this paper we reproduce three classical structural bioinformatics prediction tasks to investigate the main assumptions about the use of propensity scales as input features for ML methods. We investigate their usefulness with different randomization experiments and we show that their effectiveness varies among the ML methods used and the tasks. We show that while linear methods are more dependent on the feature encoding, the specific biophysical meaning of the features is less relevant for non-linear methods. Moreover, we show that even among linear ML methods, the simpler one-hot encoding can surprisingly outperform the “biologically meaningful” scales. We also show that feature selection performed with non-linear ML methods may not be able to distinguish between randomized and “real” propensity scales by properly prioritizing to the latter. Finally, we show that learning problem-specific embeddings could be a simple, assumptions-free and optimal way to perform feature learning/engineering for structural bioinformatics tasks.


PLoS ONE ◽  
2020 ◽  
Vol 15 (4) ◽  
pp. e0232087
Author(s):  
Chi-Hua Tung ◽  
Ching-Hsuan Chien ◽  
Chi-Wei Chen ◽  
Lan-Ying Huang ◽  
Yu-Nan Liu ◽  
...  

Author(s):  
Jea Woo Park ◽  
Andres Torres ◽  
Xiaoyu Song

Sign in / Sign up

Export Citation Format

Share Document