scholarly journals Semantic-Based Representation Binary Clone Detection for Cross-Architectures in the Internet of Things

2019 ◽  
Vol 9 (16) ◽  
pp. 3283 ◽  
Author(s):  
Zhenhao Luo ◽  
Baosheng Wang ◽  
Yong Tang ◽  
Wei Xie

Code reuse is widespread in software development as well as internet of things (IoT) devices. However, code reuse introduces many problems, e.g., software plagiarism and known vulnerabilities. Solving these problems requires extensive manual reverse analysis. Fortunately, binary clone detection can help analysts mitigate manual work by matching reusable code and known parts. However, many binary clone detection methods are not robust to various compiler optimization options and different architectures. While some clone detection methods can be applied across different architectures, they rely on manual features based on human prior knowledge to generate feature vectors for assembly functions and fail to consider the internal associations between features from a semantic perspective. To address this problem, we propose and implement a prototype GeneDiff, a semantic-based representation binary clone detection approach for cross-architectures. GeneDiff utilizes a representation model based on natural language processing (NLP) to generate high-dimensional numeric vectors for each function based on the Valgrind intermediate representation (VEX) representation. This is the first work that translates assembly instructions into an intermediate representation and uses a semantic representation model to implement clone detection for cross-architectures. GeneDiff is robust to various compiler optimization options and different architectures. Compared to approaches using symbolic execution, GeneDiff is significantly more efficient and accurate. The area under the curve (AUC) of the receiver operating characteristic (ROC) of GeneDiff reaches 92.35%, which is considerably higher than the approaches that use symbolic execution. Extensive experiments indicate that GeneDiff can detect similarity with high accuracy even when the code has been compiled with different optimization options and targeted to different architectures. We also use real-world IoT firmware across different architectures as targets, therein proving the practicality of GeneDiff in being able to detect known vulnerabilities.

2017 ◽  
Vol 1 (1) ◽  
pp. 61 ◽  
Author(s):  
Ricardo Mairal-Usón ◽  
Francisco Cortés-Rodríguez

Within the framework of FUNK Lab – a virtual laboratory for natural language processing inspired on a functionally-oriented linguistic theory like Role and Reference Grammar-, a number of computational resources have been built dealing with different aspects of language and with an application in different scientific domains, i.e. terminology, lexicography, sentiment analysis, document classification, text analysis, data mining etc. One of these resources is ARTEMIS (<span style="text-decoration: underline;">A</span>utomatically <span style="text-decoration: underline;">R</span>epresenting <span style="text-decoration: underline;">TE</span>xt <span style="text-decoration: underline;">M</span>eaning via an <span style="text-decoration: underline;">I</span>nterlingua-Based <span style="text-decoration: underline;">S</span>ystem), which departs from the pioneering work of Periñán-Pascual (2013) and Periñán-Pascual &amp; Arcas (2014).  This computational tool is a proof of concept prototype which allows the automatic generation of a conceptual logical structure (CLS) (cf. Mairal-Usón, Periñán-Pascual and Pérez 2012; Van Valin and Mairal-Usón 2014), that is, a fully specified semantic representation of an input text on the basis of a reduced sample of sentences. The primary aim of this paper is to develop the syntactic rules that form part of the computational grammar for the representation of simple clauses in English. More specifically, this work focuses on the format of those syntactic rules that account for the upper levels of the RRG Layered Structure of the Clause (LSC), that is, the <em>core</em> (and the level-1 construction associated with it), the <em>clause</em> and the <em>sentence </em>(Van Valin 2005). In essence, this analysis, together with that in Cortés-Rodríguez and Mairal-Usón (2016), offers an almost complete description of the computational grammar behind the LSC for simple clauses.


2021 ◽  
Vol 5 (1) ◽  
pp. 28-39
Author(s):  
Minami Yoda ◽  
Shuji Sakuraba ◽  
Yuichi Sei ◽  
Yasuyuki Tahara ◽  
Akihiko Ohsuga

Internet of Things (IoT) for smart homes enhances convenience; however, it also introduces the risk of the leakage of private data. TOP10 IoT of OWASP 2018 shows that the first vulnerability is ”Weak, easy to predict, or embedded passwords.” This problem poses a risk because a user can not fix, change, or detect a password if it is embedded in firmware because only the developer of the firmware can control an update. In this study, we propose a lightweight method to detect the hardcoded username and password in IoT devices using a static analysis called Socket Search and String Search to protect from first vulnerability from 2018 OWASP TOP 10 for the IoT device. The hardcoded login information can be obtained by comparing the user input with strcmp or strncmp. Previous studies analyzed the symbols of strcmp or strncmp to detect the hardcoded login information. However, those studies required a lot of time because of the usage of complicated algorithms such as symbolic execution. To develop a lightweight algorithm, we focus on a network function, such as the socket symbol in firmware, because the IoT device is compromised when it is invaded by someone via the Internet. We propose two methods to detect the hardcoded login information: string search and socket search. In string search, the algorithm finds a function that uses the strcmp or strncmp symbol. In socket search, the algorithm finds a function that is referenced by the socket symbol. In this experiment, we measured the ability of our proposed method by searching six firmware in the real world that has a backdoor. We ran three methods: string search, socket search, and whole search to compare the two methods. As a result, all methods found login information from five of six firmware and one unexpected password. Our method reduces the analysis time. The whole search generally takes 38 mins to complete, but our methods finish the search in 4-6 min.


2020 ◽  
Vol 50 (8) ◽  
pp. 2339-2351 ◽  
Author(s):  
Tianshi Wang ◽  
Li Liu ◽  
Naiwen Liu ◽  
Huaxiang Zhang ◽  
Long Zhang ◽  
...  

2018 ◽  
Vol 24 (6) ◽  
pp. 861-886 ◽  
Author(s):  
ABDULGABBAR SAIF ◽  
UMMI ZAKIAH ZAINODIN ◽  
NAZLIA OMAR ◽  
ABDULLAH SAEED GHAREB

AbstractSemantic measures are used in handling different issues in several research areas, such as artificial intelligence, natural language processing, knowledge engineering, bioinformatics, and information retrieval. Hierarchical feature-based semantic measures have been proposed to estimate the semantic similarity between two concepts/words depending on the features extracted from a semantic taxonomy (hierarchy) of a given lexical source. The central issue in these measures is the constant weighting assumption that all elements in the semantic representation of the concept possess the same relevance. In this paper, a new weighting-based semantic similarity measure is proposed to address the issues in hierarchical feature-based measures. Four mechanisms are introduced to weigh the degree of relevance of features in the semantic representation of a concept by using topological parameters (edge, depth, descendants, and density) in a semantic taxonomy. With the semantic taxonomy of WordNet, the proposed semantic measure is evaluated for word semantic similarity in four gold-standard datasets. Experimental results show that the proposed measure outperforms hierarchical feature-based semantic measures in all the datasets. Comparison results also imply that the proposed measure is more effective than information-content measures in measuring semantic similarity.


Author(s):  
Yilin Yan ◽  
Jonathan Chen ◽  
Mei-Ling Shyu

Stance detection is an important research direction which attempts to automatically determine the attitude (positive, negative, or neutral) of the author of text (such as tweets), towards a target. Nowadays, a number of frameworks have been proposed using deep learning techniques that show promising results in application domains such as automatic speech recognition and computer vision, as well as natural language processing (NLP). This article shows a novel deep learning-based fast stance detection framework in bipolar affinities on Twitter. It is noted that millions of tweets regarding Clinton and Trump were produced per day on Twitter during the 2016 United States presidential election campaign, and thus it is used as a test use case because of its significant and unique counter-factual properties. In addition, stance detection can be utilized to imply the political tendency of the general public. Experimental results show that the proposed framework achieves high accuracy results when compared to several existing stance detection methods.


Sensors ◽  
2019 ◽  
Vol 19 (20) ◽  
pp. 4536 ◽  
Author(s):  
Yan Zhong ◽  
Simon Fong ◽  
Shimin Hu ◽  
Raymond Wong ◽  
Weiwei Lin

The Internet of Things (IoT) and sensors are becoming increasingly popular, especially in monitoring large and ambient environments. Applications that embrace IoT and sensors often require mining the data feeds that are collected at frequent intervals for intelligence. Despite the fact that such sensor data are massive, most of the data contents are identical and repetitive; for example, human traffic in a park at night. Most of the traditional classification algorithms were originally formulated decades ago, and they were not designed to handle such sensor data effectively. Hence, the performance of the learned model is often poor because of the small granularity in classification and the sporadic patterns in the data. To improve the quality of data mining from the IoT data, a new pre-processing methodology based on subspace similarity detection is proposed. Our method can be well integrated with traditional data mining algorithms and anomaly detection methods. The pre-processing method is flexible for handling similar kinds of sensor data that are sporadic in nature that exist in many ambient sensing applications. The proposed methodology is evaluated by extensive experiment with a collection of classical data mining models. An improvement over the precision rate is shown by using the proposed method.


2018 ◽  
Vol 13 (8) ◽  
pp. 2031-2046 ◽  
Author(s):  
Po-Yen Lee ◽  
Chia-Mu Yu ◽  
Tooska Dargahi ◽  
Mauro Conti ◽  
Giuseppe Bianchi

Sign in / Sign up

Export Citation Format

Share Document