Рус Eng During last 365 days Approved articles: 2335,   Articles in work: 280 Declined articles: 900 
Library
Articles and journals | Tariffs | Payments | Your profile


The use of regular expressions for decompiling static data
Gusenko Mikhail Yur'evich

PhD in Technical Science

Associate Professor, Department of Applied and Business Informatics, Moscow Technological University

119454, Russia, Moscow, pr-t Vernadskogo, 78, of. 418

mikegus@yandex.ru
Abstract. The subject of the study is the process of decompiling the source code of programs into high-level languages. The author shows the decompilation point in the program transformation cycle which includes the processes of canonization, compilation, optimization, and decompilation. The object of the study is the compiled equivalent of the static data description on a high level programming language, which in general case is a nontrivial mapping of syntactic constructions on a high level programming language into a byte sequences located in executable program modules and constructed considering various optimization techniques for this microprocessor architecture. The paper reviews the static data decompilation process as reconstruction of the parse tree of the program, which is recovered during the analysis of its executable code and as a binary sequence in the memory of the von Neumann machine, which is analyzed by the regular expression created by the decompiler from the supposed description of the data. Regular expressions are traditionally used to analyze character sequences. The article presents another area of application of this tool – for proving the hypothesis that this byte array of the executable module is the equivalent of compiled static data. The author suggests a variant of the corresponding syntax of the regular expression language. The article shows that the proposed method can be used to further verify the quality of the decompiled code.
Keywords: object code, executable module, compilation, program language, regular expression, decompilation, reverse translation, translation, program parsing tree, data definition
DOI: 10.7256/2454-0714.2017.2.22608
Article was received: 07-04-2017

Publish date: 06-05-2017

This article written in Russian. You can find full text of article in Russian here.

References
1.
ISO/IEC 9899:1999 (E) Programming languages – C. URL: https://www.iso.org/standard/29237.html (data obrashcheniya 01.04.2017)
2.
ISO/IEC 9899:2011 (E) Programming languages – C. URL: https://www.iso.org/standard/57853.html (data obrashcheniya 01.04.2017)
3.
ISO/IEC 10646:2014 Information technology – Universal Coded Character Set (UCS) URL: https://www.iso.org/standard/63182.html (data obrashcheniya 01.04.2017)
4.
ISO/IEC 14882:1998 (E) Programming languages – C++ URL: https://www.iso.org/standard/25845.html (data obrashcheniya 01.04.2017)
5.
ISO/IEC 14882:2003 (E) Programming languages – C++ URL: https://www.iso.org/standard/38110.html (data obrashcheniya 01.04.2017)
6.
ISO/IEC 14882:2014 (E) Programming languages – C++ URL: https://www.iso.org/standard/64029.html (data obrashcheniya 01.04.2017)
7.
Standard ECMA-262 ECMAScript® 2016 Language Specification URL: http://www.ecma-international.org/publications/standards/Ecma-262.htm (data obrashcheniya 01.04.2017)
8.
Kholsted M.Kh. Nachala nauki o programmakh / Per. s angl.-M.: Finansy i statistika, 1981.-128 s.
9.
Gusenko M.Yu., Gusenko Yu.M. Voprosy obratnoi translyatsii programmnogo obespecheniya spetsial'nogo naznacheniya // Informatsionnye tekhnologii. – M., 2002. – № 4, – S. 8–19.