Gusenko M.Yu. —
The use of regular expressions for decompiling static data
// Software systems and computational methods. – 2017. – ¹ 2.
– P. 1 - 13.
Read the article
Review: The subject of the study is the process of decompiling the source code of programs into high-level languages. The author shows the decompilation point in the program transformation cycle which includes the processes of canonization, compilation, optimization, and decompilation. The object of the study is the compiled equivalent of the static data description on a high level programming language, which in general case is a nontrivial mapping of syntactic constructions on a high level programming language into a byte sequences located in executable program modules and constructed considering various optimization techniques for this microprocessor architecture. The paper reviews the static data decompilation process as reconstruction of the parse tree of the program, which is recovered during the analysis of its executable code and as a binary sequence in the memory of the von Neumann machine, which is analyzed by the regular expression created by the decompiler from the supposed description of the data. Regular expressions are traditionally used to analyze character sequences. The article presents another area of application of this tool – for proving the hypothesis that this byte array of the executable module is the equivalent of compiled static data. The author suggests a variant of the corresponding syntax of the regular expression language. The article shows that the proposed method can be used to further verify the quality of the decompiled code.
Keywords: object code, executable module, compilation, program language, regular expression, decompilation, reverse translation, translation, program parsing tree, data definition
ISO/IEC 9899:1999 (E) Programming languages – C. URL: https://www.iso.org/standard/29237.html (data obrashcheniya 01.04.2017)
ISO/IEC 9899:2011 (E) Programming languages – C. URL: https://www.iso.org/standard/57853.html (data obrashcheniya 01.04.2017)
ISO/IEC 10646:2014 Information technology – Universal Coded Character Set (UCS) URL: https://www.iso.org/standard/63182.html (data obrashcheniya 01.04.2017)
ISO/IEC 14882:1998 (E) Programming languages – C++ URL: https://www.iso.org/standard/25845.html (data obrashcheniya 01.04.2017)
ISO/IEC 14882:2003 (E) Programming languages – C++ URL: https://www.iso.org/standard/38110.html (data obrashcheniya 01.04.2017)
ISO/IEC 14882:2014 (E) Programming languages – C++ URL: https://www.iso.org/standard/64029.html (data obrashcheniya 01.04.2017)
Standard ECMA-262 ECMAScript® 2016 Language Specification URL: http://www.ecma-international.org/publications/standards/Ecma-262.htm (data obrashcheniya 01.04.2017)
Kholsted M.Kh. Nachala nauki o programmakh / Per. s angl.-M.: Finansy i stati