US7383583B2 - Static and run-time anti-disassembly and anti-debugging - Google Patents

Static and run-time anti-disassembly and anti-debugging Download PDF

Info

Publication number
US7383583B2
US7383583B2 US10/795,058 US79505804A US7383583B2 US 7383583 B2 US7383583 B2 US 7383583B2 US 79505804 A US79505804 A US 79505804A US 7383583 B2 US7383583 B2 US 7383583B2
Authority
US
United States
Prior art keywords
code
instruction
snippet
bypass
location
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related, expires
Application number
US10/795,058
Other versions
US20050198526A1 (en
Inventor
Michael David Marr
Brandon Baker
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corp filed Critical Microsoft Corp
Priority to US10/795,058 priority Critical patent/US7383583B2/en
Assigned to MICROSOFT CORPORATION reassignment MICROSOFT CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BAKER, BRANDON, MARR, MICHAEL DAVID
Publication of US20050198526A1 publication Critical patent/US20050198526A1/en
Application granted granted Critical
Publication of US7383583B2 publication Critical patent/US7383583B2/en
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC reassignment MICROSOFT TECHNOLOGY LICENSING, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MICROSOFT CORPORATION
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/12Protecting executable software
    • G06F21/14Protecting executable software against software analysis or reverse engineering, e.g. by obfuscation

Definitions

  • the present invention relates generally to the field of computing, and, more particularly, to preventing examination or manipulation of code by implementing instruction-level obfuscation techniques.
  • object code controls the actions of the computer systems on which it is run.
  • Such code may be made public or otherwise made accessible by its authors, for example by publishing the original source code that was compiled to create the object code.
  • the object code will be accessible for analysis by others.
  • the original authors may also choose to make the object code more accessible by other programmers by including “debug symbols” which are data files which help to describe the structure of the object code so that users of the object code can debug their own programs.
  • Maintaining the security of content may be useful in many contexts.
  • One such context is where the content is private information, for example, financial information about a computer user, user passwords, or other personal information.
  • Another context in which maintaining the security of content is useful is when the secure content is content which is to be used only in a limited way. For example, copyrighted content may be protected by requiring a license for use.
  • License provisions may be enforced for secure content by a digital rights management (DRM) application.
  • DRM digital rights management
  • content is stored only in an encrypted form, and then decrypted for use according to an electronic license which defines the specific user's rights to use the content.
  • Some or all of the code and associated decryption data which decrypts the encrypted content is therefore protected, because if this code which accesses the copyrighted context is compromised, it may be possible to access the copyright outside of the bounds of the electronic license, for unlicensed uses.
  • the black box may include hardware elements along with software elements.
  • an adversary may use several means in order to attempt to compromise the security of the operations of the black box. For example, an adversary may attempt to trace the code running in the black box. One way in which an adversary may attempt to do this is by using a debugger to track the progress of the code. Another way an adversary may attempt to compromise the security of the black box is by making modifications to the code in order to provide access to the secure content to the adversary.
  • Static analysis is analysis of the code when it is not executing.
  • certain of these tools for discouraging static analysis allow sections of binaries to be rendered unreadable at or after compile time.
  • some tools encrypt sections of the binary.
  • the binary is unreadable.
  • the code is to be run, the code is returned to its original state, e.g. by decryption where the modification was encryption. Therefore, an adversary will not be able to perform a static analysis on the binary, but the code will function properly when it is decrypted for execution.
  • the present invention prevents analysis by static and dynamic disassembly techniques by performing instruction-level code obfuscation.
  • the sensitive code is modified by obfuscation of portions of the code at the instruction level. These code obfuscations are corrected shortly before the execution point of the code.
  • the code obfuscations include bogus data injection into the instruction stream and run-time modifications which occur just before the execution point of the modified instructions.
  • Data injection into the instruction stream can fool both static and run-time disassemblers into treating bogus data and parts of other instructions as instruction opcodes.
  • This misinterpretation of the instruction data can be referred to as “instruction misalignment” because the disassembler can no longer effectively determine instruction boundaries. This causes a cascade of misinterpretation by the disassembler which lasts as long as the disassembler remains misaligned.
  • Such data injection in one embodiment, includes a bypass and bogus data which triggers misalignment of a disassembler.
  • Run-time modification code snippets inserted into code change an instruction before it is executed. This is achieved, e.g., by performing an arithmetic operation on the bytes of the instruction, writing bytes over parts of the instruction, or copying bytes from a location in memory over the instructions.
  • the modification makes the instruction valid.
  • the code is modified again to return it to an invalid state.
  • FIG. 1 is a block diagram of an example computing environment in which aspects of the invention may be implemented
  • FIG. 2 is a block diagram of an instruction stream in which aspects of the invention may be implemented
  • FIG. 3 is a block diagram of an instruction stream according to an example illustrating one embodiment of the invention.
  • FIG. 4 is a flow diagram of a method according to one embodiment for protecting code from disassembly by inserting bogus data and associated bypass code
  • FIG. 5 is a diagram of a method according to one embodiment for protecting code from disassembly performing run-time modifications.
  • Debuggers and static and dynamic disassemblers analyze code and allow a user to examine its contents.
  • code obfuscation is performed.
  • the present invention provides mechanisms for inserting bogus data and runtime modifications into sensitive code. Such obfuscation hinders or prevents examination of the contents of code by such static and dynamic disassemblers.
  • code misalignments and run-time modifications presented herein relate primarily to assembly language code.
  • the techniques described may be applicable to code in other programming languages, and it is contemplated that the application of the mechanisms described and claimed is not limited to assembly language code. That is, any system that infers non-executable source code from executable object code and related data files (for example, debugging symbol files) might be affected by the invention herein.
  • FIG. 1 shows an exemplary computing environment in which aspects of the invention may be implemented.
  • the computing system environment 100 is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the invention. Neither should the computing environment 100 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary operating environment 100 .
  • the invention is operational with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the invention include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, embedded systems, distributed computing environments that include any of the above systems or devices, and the like.
  • the invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer.
  • program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • the invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network or other data transmission medium.
  • program modules and other data may be located in both local and remote computer storage media including memory storage devices.
  • an exemplary system for implementing the invention includes a general purpose computing device in the form of a computer 110 .
  • Components of computer 110 may include, but are not limited to, a processing unit 120 , a system memory 130 , and a system bus 121 that couples various system components including the system memory to the processing unit 120 .
  • the processing unit 120 may represent multiple logical processing units such as those supported on a multi-threaded processor.
  • the system bus 121 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures.
  • ISA Industry Standard Architecture
  • MCA Micro Channel Architecture
  • EISA Enhanced ISA
  • VESA Video Electronics Standards Association
  • PCI Peripheral Component Interconnect
  • the system bus 121 may also be implemented as a point-to-point connection, switching fabric, or the like, among the communicating devices.
  • Computer 110 typically includes a variety of computer readable media.
  • Computer readable media can be any available media that can be accessed by computer 110 and includes both volatile and nonvolatile media, removable and non-removable media.
  • Computer readable media may comprise computer storage media and communication media.
  • Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CDROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by computer 110 .
  • Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
  • modulated data signal means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
  • communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of any of the above should also be included within the scope of computer readable media.
  • the system memory 130 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 131 and random access memory (RAM) 132 .
  • ROM read only memory
  • RAM random access memory
  • BIOS basic input/output system
  • RAM 132 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 120 .
  • FIG. 1 illustrates operating system 134 , application programs 135 , other program modules 136 , and program data 137 .
  • the computer 110 may also include other removable/non-removable, volatile/nonvolatile computer storage media.
  • FIG. 1 illustrates a hard disk drive 140 that reads from or writes to non-removable, nonvolatile magnetic media, a magnetic disk drive 151 that reads from or writes to a removable, nonvolatile magnetic disk 152 , and an optical disk drive 155 that reads from or writes to a removable, nonvolatile optical disk 156 , such as a CD ROM or other optical media.
  • removable/non-removable, volatile/nonvolatile computer storage media that can be used in the exemplary operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like.
  • the hard disk drive 141 is typically connected to the system bus 121 through a non-removable memory interface such as interface 140
  • magnetic disk drive 151 and optical disk drive 155 are typically connected to the system bus 121 by a removable memory interface, such as interface 150 .
  • hard disk drive 141 is illustrated as storing operating system 144 , application programs 145 , other program modules 146 , and program data 147 . Note that these components can either be the same as or different from operating system 134 , application programs 135 , other program modules 136 , and program data 137 . Operating system 144 , application programs 145 , other program modules 146 , and program data 147 are given different numbers here to illustrate that, at a minimum, they are different copies.
  • a user may enter commands and information into the computer 20 through input devices such as a keyboard 162 and pointing device 161 , commonly referred to as a mouse, trackball or touch pad.
  • Other input devices may include a microphone, joystick, game pad, satellite dish, scanner, or the like.
  • These and other input devices are often connected to the processing unit 120 through a user input interface 160 that is coupled to the system bus, but may be connected by other interface and bus structures, such as a parallel port, game port or a universal serial bus (USB).
  • a monitor 191 or other type of display device is also connected to the system bus 121 via an interface, such as a video interface 190 .
  • computers may also include other peripheral output devices such as speakers 197 and printer 196 , which may be connected through an output peripheral interface 195 .
  • the computer 110 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 180 .
  • the remote computer 180 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 110 , although only a memory storage device 181 has been illustrated in FIG. 1 .
  • the logical connections depicted in FIG. 1 include a local area network (LAN) 171 and a wide area network (WAN) 173 , but may also include other networks.
  • LAN local area network
  • WAN wide area network
  • Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet.
  • the computer 110 When used in a LAN networking environment, the computer 110 is connected to the LAN 171 through a network interface or adapter 170 .
  • the computer 110 When used in a WAN networking environment, the computer 110 typically includes a modem 172 or other means for establishing communications over the WAN 173 , such as the Internet.
  • the modem 172 which may be internal or external, may be connected to the system bus 121 via the user input interface 160 , or other appropriate mechanism.
  • program modules depicted relative to the computer 110 may be stored in the remote memory storage device.
  • FIG. 1 illustrates remote application programs 185 as residing on memory device 181 . It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.
  • FIG. 2 is a representation of an instruction stream 200 containing bytes 210 .
  • variable length instructions are possible, given a stream of bytes in an object code instruction stream 200 without an indication of an initial byte for an instruction, it can not be determined how many bytes or which bytes are used in one instruction in the object code until the execution begins on the code and determine an end byte for the previous instruction.
  • An instruction may consist of an initial number of bytes which describe the type of instruction (an opcode), followed optionally by bytes which are arguments to the instruction (an operand).
  • byte 210 a may be the opcode for an instruction with byte 210 b as its operand
  • byte 210 b may be the opcode of an instruction, and that instruction may continue in succeeding bytes in the instruction stream 200 .
  • a program counter is used to control the flow of execution through the instruction stream 200 .
  • the program counter is a register which stores the address of the next instruction to be executed.
  • a jump relative instruction jumps a specified number of bytes forward in the instruction stream.
  • the jump relative instruction adds the number of bytes specified by its operand to the program counter which causes execution to jump ahead by the specified number of bytes.
  • misalignment As an example of how such misalignments may be caused, a misalignment will be described which prevents a disassembler from determining that the instruction stream contains the instruction:
  • FIG. 3 is a representation of an instruction stream 300 containing bytes 310 , certain of which have been injected to induce a misalignment around the first instruction (call dword ptr . . . ) discussed above using the fact that the second instruction (mov dword ptr . . . ) discussed above is a seven-byte instruction.
  • the first instruction is represented in bytes 310 g through 310 l . In order to cause a misalignment, however, only a fragment of the second instruction has been added.
  • Disassemblers operate under the assumption that the instruction stream 300 contains a sequence of complete instructions. When a disassembler reaches the instruction in bytes 310 a and 310 b , it will decode the “jmp 04” instruction represented by the “EB 04” object code. However, the disassembler will continue interpreting bytes continuing with byte 310 c , in order provide a complete picture of all instructions in the instruction stream 300 . Thus, it would attempt to interpret bytes starting with 310 c into an instruction.
  • bogus incomplete instruction data is inserted into the code
  • a bypass is inserted which allows correct execution to bypass the bogus incomplete or invalid instruction data.
  • the bypass may, as described, be a bypass which will be taken during execution, followed sequentially with bogus code. Some disassemblers will always assume that a conditional jump is not taken, these disassemblers will be misaligned by conditional jumps contained in such bypasses.
  • both styles of bypass code that is which assume conditional jumps taken vs. not taken, can be included sequentially in the instruction stream with appropriate bogus data. This will guarantee that a misalignment occurs at that site in the instruction stream no matter which type of disassembler is analyzing the code.
  • bogus instruction data consists of random bytes. Upon examination of these random bytes, the disassembler will attempt to interpret the bogus instruction data, which will likely cause a misalignment. Additionally, random bytes may yield a random instruction, which will confuse the user of the disassembler.
  • bogus instruction data is chosen to be instruction opcodes whose instructions are long but are not specified completely. Since such instruction opcodes signal instructions that are not completely specified, real instructions that follow are interpreted as being part of the bogus instruction—and misalignment occurs. For example, the first four bytes of the instruction “89 84 8A 00 16 00 00” is inserted into the instruction stream; that is, the seven byte instruction is inserted with the final three bytes omitted. Three bytes from the following real executable instruction “FF 15 28 10 00 01” are interpreted as the omitted three bytes.
  • the resulting instruction appears to be “89 84 8A 00 FF 15 28” which will be interpreted as “mov dword ptr [edx+ecx*4+2815FF00h], eax”. Since the last three bytes of the instruction “89 84 8A 00 16 00 00” are not specified, the first three bytes of the instruction “FF 15 28 10 00 01” became part of the value “2815FF00h” which is added in the instruction. Because of this, bytes of the real instruction will be interpreted as part of an “argument” to the bogus data's partial instruction.
  • Bypasses may be accomplished by any means which causes the processor to transition execution to a new location.
  • Direct bypasses can be accomplished with jump instructions, as described above.
  • More indirect transitions include a combination of instructions. For example, in the X86 instruction set, the following possible indirect transitions are possible bypasses:
  • misalignments are generated automatically. Elements of the misalignments may be chosen so that a variety of bypasses and bogus data is used. Such a variety will insure that an adversary who has figured out one of the misalignments or a disassembler which is not susceptible to that misalignment may still not be able to correctly identify or interpret another misalignment. The following may be randomly selected for each insertion of a misalignment:
  • Instruction fragment The instruction fragment used for the bogus data may be selected from among a pool of possible instruction fragments. Such a pool may consist of a number of instruction fragments which are not completely specified. In the case where some bogus data may be random, such random bogus data may be randomly generated for each instruction fragment, or for some proportion of the instruction fragments required (with others, for example, selected from a pool as described above.)
  • Fragment length The length of the bogus data may also be random. In the case where instruction fragments are selected from a predetermined pool, additional bytes or fragments may be added randomly to increase the length. In the case where random data is used for the bogus data, different fragment lengths may be used for the bogus data.
  • the bypass to be used may be randomly selected from among these.
  • the method used to implement the selected transition method may be selected from among the possibilities.
  • the method may be one from among the several ways to set conditional codes for conditional bypass, or one jump from among several ways to perform a direct jump.
  • Registers and addressing modes used by each instruction Different registers may be used by bypass instructions, and to the extent that the register used is not determined by the instruction used, the register used may be selected randomly from a pool of possibilities. Additionally, the addressing mode, for example using an 8-bit vs. 32-bit relative offset in a jump, may be randomly selected to the extent that the addressing mode is not already dictated by the required bypass or instruction used in the bypass.
  • Bypass chain length The number of misalignments (bypass and bogus data) may also be randomly selected to create a chain of bypass/bogus data pairs. For example, a misalignment which assumes disassembly relies on a branch not taken can be followed by a misalignment which assumes disassembly relies on a branch being taken in order to cause both kinds of disassemblers to become misaligned. Generating chains which contain a random number of misalignments may be robust against future disassemblers which might be designed to be resistant to single or paired misalignments.
  • Some adversaries may perform entropy analysis on the statistical distribution of different registers, instructions, and addresses in the code in order to try to infer whether anti-disassembly code has been added.
  • the distribution and selection of misalignment code may be chosen to so that the resulting code has the same entropic profile.
  • selection of misalignment code is weighted according to frequency counts of the code in which anti-disassembly segments will be inserted, and address fragments are chosen from likely and/or frequently occurring addresses of the target code.
  • FIG. 4 is a diagram of a method according to one embodiment for protecting code from disassembly by inducing misalignments in disassemblers.
  • a bypass code snippet is inserted into a first location in the code.
  • the bypass code upon execution, causes execution to transition to a second location in said code.
  • This second location may be elsewhere in code, where bogus data is to follow the bypass code immediately.
  • the second location may also be immediately after the bypass code, where the bypass is not taken during normal execution, but may be taken by a static disassembler which assumes, e.g., that jumps are taken, when disassembling code.
  • a bogus instruction snippet is inserted into a third location in the code.
  • the execution of the bypass code snippet will, as previously mentioned, cause a transition to the second location, not to the third location. However, a disassembler may be fooled into assuming execution will transition to the third location. The bogus data may then cause a misalignment.
  • Misalignments according to this method may also be used in combination with run-time instruction modifications, described below.
  • Run-time instruction modifications are instructions which make modifications to bytes which make up the instruction stream. Such modifications change either the instructions which are being requested or in cases where the instruction is not completely specified, may change the data being used with the instruction. These modifications are made to other instructions involved in critical calculations of the executing program. In one embodiment, the run-time modification instructions change the instruction which directly follows in the instruction stream.
  • Static disassemblers generally do not track processor or memory state, and so can not infer run-time modifications to the instruction stream.
  • Some dynamic disassemblers have integrated debuggers which allow the user to step through instruction execution and can correct misalignments once the proper execution path has been determined.
  • dynamic disassembly is really static disassembly at a particular point in time; that is, analysis can only be performed on the code when the debugger is halted and asked to reanalyze the instruction stream. Because run-time modifications of the instruction stream are rare, particularly so close in the instruction stream, disassemblers often simply ignore changes to the instruction stream as uninteresting to display to the user.
  • debuggers often don't reanalyze parts of their instruction stream, particularly that which has already been executed.
  • dynamic disassemblers often fail to update instructions that have been modified at run-time even when the instructions are manually single-stepped through by the user.
  • run-time instruction modifications consist of two parts.
  • the first part contains one or more instructions which modify an invalid instruction into something valid.
  • the second part contains one or more instructions which modify the valid instruction to something invalid again.
  • the invalid instruction in one embodiment, has a different instruction length than the valid instruction. This causes misaligniments, as described above.
  • the invalid instruction is the same length as the valid instruction, but represents a different operation.
  • an “add” may be replaced with a “subtract” or the constant “4” operand of an “add eax, 4” instruction may become “add eax, 8”.
  • the code would appear to be performing a logically different computation without necessarily misaligning subsequent disassembly of the instruction stream. This subtle misinterpretation caused by obfuscation of the instruction sequence may require the adversary to spend even more time and effort in an attempt to reverse engineer the code.
  • a second operation which returns the valid instruction back to invalid depends on the particular method by which the instruction is made valid from its invalid state. That is, if a run-time modification snippet is used which can be executed multiple times without making the valid instruction invalid, then it is not necessary (although it may be desirable) to return the target instruction to its original invalid state. For example, unconditionally overwriting the correct bytes into the instruction stream to make the invalid instruction valid could be executed on the valid instruction without turning it invalid (i.e. it would simply overwrite the correct bytes with the same correct bytes). However, if the run-time modification involved XOR'ing the invalid instruction with a constant to make it valid, a subsequent XOR would make the instruction invalid—in this case, by returning it to its original state (such is the nature of XOR).
  • An arithmetic operation may be performed on one or more bytes of the instruction. New information may be written over one or more bytes of an instruction. Bytes may be copied from a location in memory over the instruction.
  • These methods may be used on the instruction opcodes themselves, to change the instruction operation, registers, or addressing mode used. These methods may also be used on an instruction operand, such as a constant or target address, for example, by changing the target address specified in a jump instruction to a different target address.
  • code in instruction stream 300 should be interpreted as follows:
  • a code snippet would be inserted in the instruction stream which will change the E9 to an EB at some point shortly before the instruction is run, and. In some situations, a code snippet would also be inserted which would change it back to an invalid instruction afterwards. In addition to hiding the actual instruction, the effect of this is also to cause a misalignment.
  • the “04” of the “EB 04” instruction above could have started as an “09” and been changed at run-time to the valid “04” shortly before executing. This would have the effect of foiling a static disassembler that might properly interpreted the bypass jump as taken.
  • run-time modification is particularly powerful when combined with the misalignment technique. Instructions which cause a “jump over” transition for the bypass of the misalignment can be toggled to innocuous looking instructions until right before they are executed.
  • run-time modifications can also be made to fake jump-over instructions. That is, an instruction which appears to jump over other code can be changed to not jump-over or to jump a different number of bytes. This is useful against disassemblers which assume branches are taken, for example. Therefore, all combinations of misalignment and run-time modifications can be used against assumptions made by the disassembler. Certain disassemblers always assume that a jump is taken when one is encountered during disassembly and other disassemblers always assume that a jump is not taken. When techniques are combined to create a situation in which some jumps will be taken when code is executing and some jumps will not be taken, it will be difficult to determine which method applies and it will be difficult to analyze the code using existing analysis tools.
  • FIG. 5 is a diagram of a method according to one embodiment for protecting code from disassembly performing run-time modifications.
  • an instruction from the code is changed from a first state to a second state.
  • This change as described, may change the opcode or the operand of the instruction, may cause an instruction of a first length to be changed to have an opcode indicating it will be a different length, or otherwise will confuse static and dynamic disassembly.
  • a run-time modification code snippet is inserted into the code, so that when the code is executed, the run-time modification code snippet will modify said instruction from the second (invalid) state to the first (valid) state before execution of the instruction.
  • another run-time modification code snippet may be inserted which changes the state of the instruction back to an invalid state after execution.

Abstract

In order to prevent analysis by static and dynamic disassembly techniques, instruction level code obfuscation is performed to induce misalignment and mistaken analysis by disassemblers. Misalignment is induced by including a bypass which leads, during execution, to a legitimate location. During analysis, however, bogus data may be analyzed by the disassembler due to the bypass. Run-time modifications may also be included in code. Code is changed to an invalid state, and instructions inserted into the code which will return the code to a valid state during execution. During analysis, these invalid states may be analyzed by the disassembler as invalid instructions. Induced misalignments and run-time modifications can be chained together to produce sequences of code that will always produce invalid disassembly output from common disassemblers.

Description

FIELD OF THE INVENTION
The present invention relates generally to the field of computing, and, more particularly, to preventing examination or manipulation of code by implementing instruction-level obfuscation techniques.
BACKGROUND OF THE INVENTION
Generally, computer applications run by executing object code. The object code controls the actions of the computer systems on which it is run. Such code may be made public or otherwise made accessible by its authors, for example by publishing the original source code that was compiled to create the object code. When software is sold publicly, the object code will be accessible for analysis by others. The original authors may also choose to make the object code more accessible by other programmers by including “debug symbols” which are data files which help to describe the structure of the object code so that users of the object code can debug their own programs. However, for some uses, it is advisable to protect code, including object code, from examination by possible adversaries. For example, where the code represents the best available implementation of a particular algorithm, the code itself may represent a trade secret. In another example, where code is used to secure content, it may be useful to protect the code in order to ensure the security of the content from an adversary.
Maintaining the security of content may be useful in many contexts. One such context is where the content is private information, for example, financial information about a computer user, user passwords, or other personal information. Another context in which maintaining the security of content is useful is when the secure content is content which is to be used only in a limited way. For example, copyrighted content may be protected by requiring a license for use.
License provisions may be enforced for secure content by a digital rights management (DRM) application. In some DRM applications, content is stored only in an encrypted form, and then decrypted for use according to an electronic license which defines the specific user's rights to use the content. Some or all of the code and associated decryption data which decrypts the encrypted content is therefore protected, because if this code which accesses the copyrighted context is compromised, it may be possible to access the copyright outside of the bounds of the electronic license, for unlicensed uses.
Code which protects sensitive information or performs other sensitive operations is, in some contexts, referred to as a “black box.” The black box may include hardware elements along with software elements. When a black box (or other sensitive code) is being used on a computer system, an adversary may use several means in order to attempt to compromise the security of the operations of the black box. For example, an adversary may attempt to trace the code running in the black box. One way in which an adversary may attempt to do this is by using a debugger to track the progress of the code. Another way an adversary may attempt to compromise the security of the black box is by making modifications to the code in order to provide access to the secure content to the adversary.
There are numerous tools for discouraging static analysis of binary images of code. Static analysis is analysis of the code when it is not executing. For example, certain of these tools for discouraging static analysis allow sections of binaries to be rendered unreadable at or after compile time. For example, some tools encrypt sections of the binary. Thus, before run-time, the binary is unreadable. When the code is to be run, the code is returned to its original state, e.g. by decryption where the modification was encryption. Therefore, an adversary will not be able to perform a static analysis on the binary, but the code will function properly when it is decrypted for execution.
Increasingly, however, dynamic analysis tools and dynamic analyzers integrated with debuggers are being created and used. Such tools allow an adversary to examine code as it is executing. Because the code, as it is executing, has been returned to its unmodified state (e.g. by decryption) it can be analyzed by an adversary using techniques that exist for static analysis. Therefore, static analysis prevention tools such as encryption of binaries are not effective protection for sensitive code, which must be protected from both static and dynamic disassembly techniques.
In view of the foregoing, there is a need for a system that overcomes the drawbacks of the prior art.
SUMMARY OF THE INVENTION
The present invention prevents analysis by static and dynamic disassembly techniques by performing instruction-level code obfuscation. Thus, the sensitive code is modified by obfuscation of portions of the code at the instruction level. These code obfuscations are corrected shortly before the execution point of the code.
The code obfuscations include bogus data injection into the instruction stream and run-time modifications which occur just before the execution point of the modified instructions.
Data injection into the instruction stream can fool both static and run-time disassemblers into treating bogus data and parts of other instructions as instruction opcodes. This misinterpretation of the instruction data can be referred to as “instruction misalignment” because the disassembler can no longer effectively determine instruction boundaries. This causes a cascade of misinterpretation by the disassembler which lasts as long as the disassembler remains misaligned. Such data injection, in one embodiment, includes a bypass and bogus data which triggers misalignment of a disassembler.
Run-time modification code snippets inserted into code change an instruction before it is executed. This is achieved, e.g., by performing an arithmetic operation on the bytes of the instruction, writing bytes over parts of the instruction, or copying bytes from a location in memory over the instructions. The modification makes the instruction valid. In one embodiment, after execution, the code is modified again to return it to an invalid state.
Other features of the invention are described below.
BRIEF DESCRIPTION OF THE DRAWINGS
The foregoing summary, as well as the following detailed description of preferred embodiments, is better understood when read in conjunction with the appended drawings. For the purpose of illustrating the invention, there is shown in the drawings example constructions of the invention; however, the invention is not limited to the specific methods and instrumentalities disclosed. In the drawings:
FIG. 1 is a block diagram of an example computing environment in which aspects of the invention may be implemented;
FIG. 2 is a block diagram of an instruction stream in which aspects of the invention may be implemented;
FIG. 3 is a block diagram of an instruction stream according to an example illustrating one embodiment of the invention;
FIG. 4 is a flow diagram of a method according to one embodiment for protecting code from disassembly by inserting bogus data and associated bypass code; and
FIG. 5 is a diagram of a method according to one embodiment for protecting code from disassembly performing run-time modifications.
DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS
Overview
Debuggers and static and dynamic disassemblers analyze code and allow a user to examine its contents. In order to protect sensitive code from such disassembly and debugging by an adversary, code obfuscation is performed. The present invention provides mechanisms for inserting bogus data and runtime modifications into sensitive code. Such obfuscation hinders or prevents examination of the contents of code by such static and dynamic disassemblers.
The description of code misalignments and run-time modifications presented herein relate primarily to assembly language code. However, the techniques described may be applicable to code in other programming languages, and it is contemplated that the application of the mechanisms described and claimed is not limited to assembly language code. That is, any system that infers non-executable source code from executable object code and related data files (for example, debugging symbol files) might be affected by the invention herein.
Exemplary Computing Arrangement
FIG. 1 shows an exemplary computing environment in which aspects of the invention may be implemented. The computing system environment 100 is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the invention. Neither should the computing environment 100 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary operating environment 100.
The invention is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the invention include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, embedded systems, distributed computing environments that include any of the above systems or devices, and the like.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network or other data transmission medium. In a distributed computing environment, program modules and other data may be located in both local and remote computer storage media including memory storage devices.
With reference to FIG. 1, an exemplary system for implementing the invention includes a general purpose computing device in the form of a computer 110. Components of computer 110 may include, but are not limited to, a processing unit 120, a system memory 130, and a system bus 121 that couples various system components including the system memory to the processing unit 120. The processing unit 120 may represent multiple logical processing units such as those supported on a multi-threaded processor. The system bus 121 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus (also known as Mezzanine bus). The system bus 121 may also be implemented as a point-to-point connection, switching fabric, or the like, among the communicating devices.
Computer 110 typically includes a variety of computer readable media. Computer readable media can be any available media that can be accessed by computer 110 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CDROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by computer 110. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of any of the above should also be included within the scope of computer readable media.
The system memory 130 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 131 and random access memory (RAM) 132. A basic input/output system 133 (BIOS), containing the basic routines that help to transfer information between elements within computer 110, such as during start-up, is typically stored in ROM 131. RAM 132 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 120. By way of example, and not limitation, FIG. 1 illustrates operating system 134, application programs 135, other program modules 136, and program data 137.
The computer 110 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only, FIG. 1 illustrates a hard disk drive 140 that reads from or writes to non-removable, nonvolatile magnetic media, a magnetic disk drive 151 that reads from or writes to a removable, nonvolatile magnetic disk 152, and an optical disk drive 155 that reads from or writes to a removable, nonvolatile optical disk 156, such as a CD ROM or other optical media. Other removable/non-removable, volatile/nonvolatile computer storage media that can be used in the exemplary operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like. The hard disk drive 141 is typically connected to the system bus 121 through a non-removable memory interface such as interface 140, and magnetic disk drive 151 and optical disk drive 155 are typically connected to the system bus 121 by a removable memory interface, such as interface 150.
The drives and their associated computer storage media discussed above and illustrated in FIG. 1, provide storage of computer readable instructions, data structures, program modules and other data for the computer 110. In FIG. 1, for example, hard disk drive 141 is illustrated as storing operating system 144, application programs 145, other program modules 146, and program data 147. Note that these components can either be the same as or different from operating system 134, application programs 135, other program modules 136, and program data 137. Operating system 144, application programs 145, other program modules 146, and program data 147 are given different numbers here to illustrate that, at a minimum, they are different copies. A user may enter commands and information into the computer 20 through input devices such as a keyboard 162 and pointing device 161, commonly referred to as a mouse, trackball or touch pad. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processing unit 120 through a user input interface 160 that is coupled to the system bus, but may be connected by other interface and bus structures, such as a parallel port, game port or a universal serial bus (USB). A monitor 191 or other type of display device is also connected to the system bus 121 via an interface, such as a video interface 190. In addition to the monitor, computers may also include other peripheral output devices such as speakers 197 and printer 196, which may be connected through an output peripheral interface 195.
The computer 110 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 180. The remote computer 180 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 110, although only a memory storage device 181 has been illustrated in FIG. 1. The logical connections depicted in FIG. 1 include a local area network (LAN) 171 and a wide area network (WAN) 173, but may also include other networks. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet.
When used in a LAN networking environment, the computer 110 is connected to the LAN 171 through a network interface or adapter 170. When used in a WAN networking environment, the computer 110 typically includes a modem 172 or other means for establishing communications over the WAN 173, such as the Internet. The modem 172, which may be internal or external, may be connected to the system bus 121 via the user input interface 160, or other appropriate mechanism. In a networked environment, program modules depicted relative to the computer 110, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation, FIG. 1 illustrates remote application programs 185 as residing on memory device 181. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.
Inducing Misalignments
On computer platforms with variable length instructions, such as X86 platforms, in order to allow a user to examine code, a disassembler must determine which bytes of the code are part of which instructions. Thus, the disassembler must rely on sequentially decoding instruction streams from a known valid starting location, like the address of a module “entry point” described in an export table of the portable executable (PE) header. An instruction stream is a stream of bytes which is loaded into memory by the operating system and decoded and executed by the processing unit. FIG. 2 is a representation of an instruction stream 200 containing bytes 210.
Where variable length instructions are possible, given a stream of bytes in an object code instruction stream 200 without an indication of an initial byte for an instruction, it can not be determined how many bytes or which bytes are used in one instruction in the object code until the execution begins on the code and determine an end byte for the previous instruction. An instruction may consist of an initial number of bytes which describe the type of instruction (an opcode), followed optionally by bytes which are arguments to the instruction (an operand). While byte 210 a may be the opcode for an instruction with byte 210 b as its operand, if the previous instruction ends with byte 210 a, then byte 210 b may be the opcode of an instruction, and that instruction may continue in succeeding bytes in the instruction stream 200. A program counter is used to control the flow of execution through the instruction stream 200. The program counter is a register which stores the address of the next instruction to be executed.
One possible instruction in most versions of assembly language is a jump relative instruction. A jump relative instruction jumps a specified number of bytes forward in the instruction stream. The jump relative instruction adds the number of bytes specified by its operand to the program counter which causes execution to jump ahead by the specified number of bytes.
Thus, if the byte value “EB” is the opcode for the jump relative command, the two bytes “EB 04” stored in bytes 210 a and 210 b will perform a relative jump over the next four bytes. The program counter will then point to byte 210 g, where execution will continue. Other jumps are also possible, including jumps to a specific location in the instruction stream (“absolute” jump), jumps which are conditional, etc. Jumps and other instructions which will cause execution to transition to another location in the instruction stream can be used to create a misalignment which will foil disassembly analysis.
As an example of how such misalignments may be caused, a misalignment will be described which prevents a disassembler from determining that the instruction stream contains the instruction:
    • FF 15 28 10 00 01
      which in certain execution environments represents:
    • call dword ptr [_imp_GetProcessHeap@0 (1001028)]
      In order to create the misalignment, a portion of another instruction will be used. This instruction is a seven-byte instruction:
    • 89 84 8A 00 16 00 00
      which represents
    • mov dword ptr [edx+ecx*4+1600h], eax
FIG. 3 is a representation of an instruction stream 300 containing bytes 310, certain of which have been injected to induce a misalignment around the first instruction (call dword ptr . . . ) discussed above using the fact that the second instruction (mov dword ptr . . . ) discussed above is a seven-byte instruction. The first instruction is represented in bytes 310 g through 310 l. In order to cause a misalignment, however, only a fragment of the second instruction has been added.
The result of adding these bytes is a misalignment which will confuse a disassembler. Disassemblers operate under the assumption that the instruction stream 300 contains a sequence of complete instructions. When a disassembler reaches the instruction in bytes 310 a and 310 b, it will decode the “jmp 04” instruction represented by the “EB 04” object code. However, the disassembler will continue interpreting bytes continuing with byte 310 c, in order provide a complete picture of all instructions in the instruction stream 300. Thus, it would attempt to interpret bytes starting with 310 c into an instruction.
As a result, instead of interpreting the code in instruction stream 300 as follows:
EB 04 ; jump to Calladdr
89 84 8A 00 ; incomplete instruction
Calladdr:
FF 15 28 10 00 01 ; call instruction

a disassembler will interpret the code as follows:
EB 04 ; jump to Calladdr
89 84 8A 00 FF 15 28 ; bogus mov instruction
10 00 01 ; bogus add ??? instruction
Generally, in order to perform a misalignment, (1) bogus incomplete instruction data is inserted into the code, and (2) a bypass is inserted which allows correct execution to bypass the bogus incomplete or invalid instruction data.
The bypass may, as described, be a bypass which will be taken during execution, followed sequentially with bogus code. Some disassemblers will always assume that a conditional jump is not taken, these disassemblers will be misaligned by conditional jumps contained in such bypasses.
However, some disassemblers will always assume that a conditional jump is taken. Thus, some bypasses allow execution to bypass the bogus code by creating a situation in which the jump or other bypass leads to the bogus code. During execution, the bypass will not be taken. However, a disassembler will take the bypass, which will lead the disassembler to the bogus data.
To fool both kinds of disassemblers, both styles of bypass code, that is which assume conditional jumps taken vs. not taken, can be included sequentially in the instruction stream with appropriate bogus data. This will guarantee that a misalignment occurs at that site in the instruction stream no matter which type of disassembler is analyzing the code.
Bogus Data
In one embodiment, bogus instruction data consists of random bytes. Upon examination of these random bytes, the disassembler will attempt to interpret the bogus instruction data, which will likely cause a misalignment. Additionally, random bytes may yield a random instruction, which will confuse the user of the disassembler.
According to another embodiment, bogus instruction data is chosen to be instruction opcodes whose instructions are long but are not specified completely. Since such instruction opcodes signal instructions that are not completely specified, real instructions that follow are interpreted as being part of the bogus instruction—and misalignment occurs. For example, the first four bytes of the instruction “89 84 8A 00 16 00 00” is inserted into the instruction stream; that is, the seven byte instruction is inserted with the final three bytes omitted. Three bytes from the following real executable instruction “FF 15 28 10 00 01” are interpreted as the omitted three bytes. The resulting instruction appears to be “89 84 8A 00 FF 15 28” which will be interpreted as “mov dword ptr [edx+ecx*4+2815FF00h], eax”. Since the last three bytes of the instruction “89 84 8A 00 16 00 00” are not specified, the first three bytes of the instruction “FF 15 28 10 00 01” became part of the value “2815FF00h” which is added in the instruction. Because of this, bytes of the real instruction will be interpreted as part of an “argument” to the bogus data's partial instruction.
Bypasses
Bypasses may be accomplished by any means which causes the processor to transition execution to a new location. Direct bypasses can be accomplished with jump instructions, as described above. More indirect transitions include a combination of instructions. For example, in the X86 instruction set, the following possible indirect transitions are possible bypasses:
    • “push” a return address on the stack and “ret”—this will cause the processor to return to the pushed address. Such a push and return combination may also include an instruction which, before returning, modifies the pushed address (e.g. using “add”, “mov”, etc.);
    • Perform a “call” to a nearby address (e.g. to the next instruction). This implicitly causes a return address to get pushed on to the stack. After the “call”, modify the return address. The return from the call will therefore be to a different location than the expected return;
    • Cause a conditional flag to get set by the processor (for example, using cmp, test, stc, or clc) and then perform a conditional branch with the intention of always or never branching.
Use of a combination of bypass transitions reduces the chances that the disassembler will properly interpret all of the bypasses, and thus increases the chances that the disassembler will encounter the bogus instruction data and become misaligned.
Random Insertion of Misalignments
In one embodiment, misalignments (bypasses and bogus data) are generated automatically. Elements of the misalignments may be chosen so that a variety of bypasses and bogus data is used. Such a variety will insure that an adversary who has figured out one of the misalignments or a disassembler which is not susceptible to that misalignment may still not be able to correctly identify or interpret another misalignment. The following may be randomly selected for each insertion of a misalignment:
Instruction fragment The instruction fragment used for the bogus data may be selected from among a pool of possible instruction fragments. Such a pool may consist of a number of instruction fragments which are not completely specified. In the case where some bogus data may be random, such random bogus data may be randomly generated for each instruction fragment, or for some proportion of the instruction fragments required (with others, for example, selected from a pool as described above.)
Fragment length The length of the bogus data may also be random. In the case where instruction fragments are selected from a predetermined pool, additional bytes or fragments may be added randomly to increase the length. In the case where random data is used for the bogus data, different fragment lengths may be used for the bogus data.
Transition method and instructions used for the transition method As described above, there are different methods for performing bypasses—the bypass to be used may be randomly selected from among these. Additionally, the method used to implement the selected transition method may be selected from among the possibilities. For example, the method may be one from among the several ways to set conditional codes for conditional bypass, or one jump from among several ways to perform a direct jump.
Registers and addressing modes used by each instruction Different registers may be used by bypass instructions, and to the extent that the register used is not determined by the instruction used, the register used may be selected randomly from a pool of possibilities. Additionally, the addressing mode, for example using an 8-bit vs. 32-bit relative offset in a jump, may be randomly selected to the extent that the addressing mode is not already dictated by the required bypass or instruction used in the bypass.
Bypass chain length The number of misalignments (bypass and bogus data) may also be randomly selected to create a chain of bypass/bogus data pairs. For example, a misalignment which assumes disassembly relies on a branch not taken can be followed by a misalignment which assumes disassembly relies on a branch being taken in order to cause both kinds of disassemblers to become misaligned. Generating chains which contain a random number of misalignments may be robust against future disassemblers which might be designed to be resistant to single or paired misalignments.
Some adversaries may perform entropy analysis on the statistical distribution of different registers, instructions, and addresses in the code in order to try to infer whether anti-disassembly code has been added. To preserve entropic distribution and thus hide the insertions, the distribution and selection of misalignment code may be chosen to so that the resulting code has the same entropic profile. Thus, in one embodiment, selection of misalignment code is weighted according to frequency counts of the code in which anti-disassembly segments will be inserted, and address fragments are chosen from likely and/or frequently occurring addresses of the target code.
FIG. 4 is a diagram of a method according to one embodiment for protecting code from disassembly by inducing misalignments in disassemblers. In step 400, a bypass code snippet is inserted into a first location in the code. The bypass code, upon execution, causes execution to transition to a second location in said code. This second location may be elsewhere in code, where bogus data is to follow the bypass code immediately. The second location may also be immediately after the bypass code, where the bypass is not taken during normal execution, but may be taken by a static disassembler which assumes, e.g., that jumps are taken, when disassembling code.
In step 410, a bogus instruction snippet is inserted into a third location in the code. The execution of the bypass code snippet will, as previously mentioned, cause a transition to the second location, not to the third location. However, a disassembler may be fooled into assuming execution will transition to the third location. The bogus data may then cause a misalignment.
Misalignments according to this method may also be used in combination with run-time instruction modifications, described below.
Run-Time Instruction Modification
Run-time instruction modifications are instructions which make modifications to bytes which make up the instruction stream. Such modifications change either the instructions which are being requested or in cases where the instruction is not completely specified, may change the data being used with the instruction. These modifications are made to other instructions involved in critical calculations of the executing program. In one embodiment, the run-time modification instructions change the instruction which directly follows in the instruction stream.
Static disassemblers generally do not track processor or memory state, and so can not infer run-time modifications to the instruction stream. Some dynamic disassemblers have integrated debuggers which allow the user to step through instruction execution and can correct misalignments once the proper execution path has been determined. However, even dynamic disassembly is really static disassembly at a particular point in time; that is, analysis can only be performed on the code when the debugger is halted and asked to reanalyze the instruction stream. Because run-time modifications of the instruction stream are rare, particularly so close in the instruction stream, disassemblers often simply ignore changes to the instruction stream as uninteresting to display to the user. That is, debuggers often don't reanalyze parts of their instruction stream, particularly that which has already been executed. Thus, even dynamic disassemblers often fail to update instructions that have been modified at run-time even when the instructions are manually single-stepped through by the user.
Although in principle, some dynamic disassemblers allow an adversary to step through the real-time instruction modifications and fix the instruction sequences, this would be unbearably time consuming to do for every instruction. Without a convenient mechanism for automatically disassembling the code, the task of inferring the instruction stream becomes a labor intensive, error prone chore for the adversary. The individualized (i.e. performed on individual instructions) and last minute (i.e. preceding the modified instruction immediately or only by only a couple instructions) nature of the run-time instruction modifications enable this technique for foiling adversaries.
In one embodiment, run-time instruction modifications consist of two parts. The first part contains one or more instructions which modify an invalid instruction into something valid. The second part contains one or more instructions which modify the valid instruction to something invalid again. The invalid instruction, in one embodiment, has a different instruction length than the valid instruction. This causes misaligniments, as described above.
In another embodiment, the invalid instruction is the same length as the valid instruction, but represents a different operation. For example, an “add” may be replaced with a “subtract” or the constant “4” operand of an “add eax, 4” instruction may become “add eax, 8”. The code would appear to be performing a logically different computation without necessarily misaligning subsequent disassembly of the instruction stream. This subtle misinterpretation caused by obfuscation of the instruction sequence may require the adversary to spend even more time and effort in an attempt to reverse engineer the code.
Since the processor will be executing the run-time modification instructions every time the protected code executes, whether or not a second operation which returns the valid instruction back to invalid is necessary depends on the particular method by which the instruction is made valid from its invalid state. That is, if a run-time modification snippet is used which can be executed multiple times without making the valid instruction invalid, then it is not necessary (although it may be desirable) to return the target instruction to its original invalid state. For example, unconditionally overwriting the correct bytes into the instruction stream to make the invalid instruction valid could be executed on the valid instruction without turning it invalid (i.e. it would simply overwrite the correct bytes with the same correct bytes). However, if the run-time modification involved XOR'ing the invalid instruction with a constant to make it valid, a subsequent XOR would make the instruction invalid—in this case, by returning it to its original state (such is the nature of XOR).
In order to modify an instruction at run-time, either to make it valid or to make it invalid, a number of techniques may be used. An arithmetic operation may be performed on one or more bytes of the instruction. New information may be written over one or more bytes of an instruction. Bytes may be copied from a location in memory over the instruction. These methods may be used on the instruction opcodes themselves, to change the instruction operation, registers, or addressing mode used. These methods may also be used on an instruction operand, such as a constant or target address, for example, by changing the target address specified in a jump instruction to a different target address.
As an example, the code in instruction stream 300, as previously discussed, should be interpreted as follows:
EB 04 ; jump to Calladdr
89 84 8A 00 ; incomplete instruction
Calladdr:
FF 15 28 10 00 01 ; call instruction
However, if the first byte is changed to E9 rather than EB, the opcode will be interpreted as a five-byte jump relative instruction rather than a two-byte jump relative instruction. Hence, a disassembler will interpret this as:
E9 04 89 84 8A ; jmp 8A848904h
00 FF 15 28 00 01 . . . ; bogus instruction
In this example, a code snippet would be inserted in the instruction stream which will change the E9 to an EB at some point shortly before the instruction is run, and. In some situations, a code snippet would also be inserted which would change it back to an invalid instruction afterwards. In addition to hiding the actual instruction, the effect of this is also to cause a misalignment. As an example of modifying an operand, the “04” of the “EB 04” instruction above could have started as an “09” and been changed at run-time to the valid “04” shortly before executing. This would have the effect of foiling a static disassembler that might properly interpreted the bypass jump as taken.
Code Obfuscation Using Both Misalignments and Run-Time Modifications
As alluded to in the above examples, run-time modification is particularly powerful when combined with the misalignment technique. Instructions which cause a “jump over” transition for the bypass of the misalignment can be toggled to innocuous looking instructions until right before they are executed.
Additionally, run-time modifications can also be made to fake jump-over instructions. That is, an instruction which appears to jump over other code can be changed to not jump-over or to jump a different number of bytes. This is useful against disassemblers which assume branches are taken, for example. Therefore, all combinations of misalignment and run-time modifications can be used against assumptions made by the disassembler. Certain disassemblers always assume that a jump is taken when one is encountered during disassembly and other disassemblers always assume that a jump is not taken. When techniques are combined to create a situation in which some jumps will be taken when code is executing and some jumps will not be taken, it will be difficult to determine which method applies and it will be difficult to analyze the code using existing analysis tools.
FIG. 5 is a diagram of a method according to one embodiment for protecting code from disassembly performing run-time modifications. In step 500, an instruction from the code is changed from a first state to a second state. This change, as described, may change the opcode or the operand of the instruction, may cause an instruction of a first length to be changed to have an opcode indicating it will be a different length, or otherwise will confuse static and dynamic disassembly.
In order for legitimate execution of the code to not be hindered, in step 510, a run-time modification code snippet is inserted into the code, so that when the code is executed, the run-time modification code snippet will modify said instruction from the second (invalid) state to the first (valid) state before execution of the instruction. In one embodiment, another run-time modification code snippet may be inserted which changes the state of the instruction back to an invalid state after execution.
It is noted that the foregoing examples have been provided merely for the purpose of explanation and are in no way to be construed as limiting of the present invention. While the invention has been described with reference to various embodiments, it is understood that the words which have been used herein are words of description and illustration, rather than words of limitations. Further, although the invention has been described herein with reference to particular means, materials and embodiments, the invention is not intended to be limited to the particulars disclosed herein; rather, the invention extends to all functionally equivalent structures, methods and uses, such as are within the scope of the appended claims. Those skilled in the art, having the benefit of the teachings of this specification, may effect numerous modifications thereto and changes may be made without departing from the scope and spirit of the invention in its aspects.

Claims (22)

1. A method for protecting code from disassembly, said method comprising:
inserting a first bypass code snippet into a first location in said code, where said first bypass code snippet, upon execution, causes execution to transition to a second location in said code;
inserting a bogus instruction snippet into a third location in said code, such that execution of said code will, as a result of said bypass, not reach said third location,
wherein the selection of the first bypass code snippet and the bogus instruction snippet preserves an entropic distribution.
2. The method of claim 1 where said first bypass code snippet causes said third location to be bypassed during execution of said code.
3. The method of claim 1, where said first bypass code snippet is followed sequentially by said second location in code, and where said third location is a location referred to by said bypass code snippet.
4. The method of claim 1, where said first bypass code snippet comprises at least one selected from among the following: a jump instruction; a return instruction, and a call instruction.
5. The method of claim 1, where said bogus instruction snippet comprises an instruction fragment comprising one or more opcode bytes describing an instruction of a certain number of bytes, and where said instruction fragment is of a length not equal to said certain number of bytes.
6. The method of claim 1, where said bogus instruction snippet comprises randomly-selected bytes.
7. A method for protecting code for disassembly, comprising at least two misalignment insertions, each of said misalignment insertions comprising the insertion of a first bypass code snippet and a bogus instruction snippet according to the method of claim 1, wherein the insertion of the first bypass code snippet and the bogus instruction snippet preserves an entropic distribution.
8. The method of claim 7, where the selection of said first bypass code snippets and said bogus instruction snippets is performed in order to include a variety among said first bypass code snippets and said bogus instruction snippets.
9. The method of claim 8, where said entropic distribution is preserved for at least one of the following: a frequency of instruction types used, frequency counts of address references, and frequency counts of register references.
10. The method of claim 1, further comprising:
modifying a first instruction of said code from a first state to a second state;
inserting a run-time modification code snippet such that when said code is executed, said run-time modification code snippet will modify said first instruction from said second state to said first state before execution of said first instruction.
11. The method of claim 1, further comprising:
inserting a second bypass code snippet into a fourth location in said code, where said second bypass code snippet, upon execution, causes execution to transition to said first location.
12. The method of claim 11, where one of said first bypass code snippet and said second bypass code snippet causes a non-sequential transition upon execution, and where the other of said first bypass code snippet and said second bypass code snippet causes a sequential transition upon execution.
13. A computer readable storage medium having a plurality of computer-executable instructions stored thereon, said instructions for performing the following:
inserting a first bypass code snippet into a first location in said code, where said first bypass code snippet, upon execution, causes execution to transition to a second location in said code;
inserting a bogus instruction snippet into a third location in said code, such that execution of said code will, as a result of said bypass, not reach said third location,
wherein the selection of the first bypass code snippet and the bogus instruction snippet preserves an entropic distribution.
14. A system for protecting code from disassembly, said system comprising:
a computing processor adapted to receive and execute computer-readable instructions;
computing memory, said computing memory having instructions stored therein for performing the following when executed by said computing processor:
inserting a first bypass code snippet into a first location in said code, said first bypass code snippet, upon execution, causing execution to transition to a second location in said code; and
inserting a bogus instruction snippet into a third location in said code, such that execution of said code will, as a result of said bypass, not reach said third location,
wherein the selection of the first bypass code snippet and the bogus instruction snippet preserves an entropic distribution.
15. The system of claim 14 where said first bypass code snippet causes said third location to be bypassed during execution of said code.
16. The system of claim 14, where said first bypass code snippet is followed sequentially by said second location in code, and where said third location is a location referred to by said bypass code snippet.
17. The system of claim 14, where said first bypass code snippet comprises at least one selected from among the following: a jump instruction; a return instruction, and a call instruction.
18. The system of claim 14, where said bogus instruction snippet comprises an instruction fragment comprising one or more opcode bytes describing an instruction of a certain number of bytes, and where said instruction fragment is of a length not equal to said certain number of bytes.
19. The system of claim 14, where said bogus instruction snippet comprises randomly-selected bytes.
20. The system of claim 14, further comprising instructions for performing the following:
modifying a first instruction of said code from a first state to a second state; and
inserting a run-time modification code snippet such that when said code is executed, said run-time modification code snippet will modify said first instruction from said second state to said first state before execution of said first instruction.
21. The system of claim 14, further comprising instructions for performing the following:
inserting a second bypass code snippet into a fourth location in said code, where said second bypass code snippet, upon execution, causes execution to transition to said first location.
22. The system of claim 21, where one of said first bypass code snippet and said second bypass code snippet causes a non-sequential transition upon execution, and where the other of said first bypass code snippet and said second bypass code snippet causes a sequential transition upon execution.
US10/795,058 2004-03-05 2004-03-05 Static and run-time anti-disassembly and anti-debugging Expired - Fee Related US7383583B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US10/795,058 US7383583B2 (en) 2004-03-05 2004-03-05 Static and run-time anti-disassembly and anti-debugging

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US10/795,058 US7383583B2 (en) 2004-03-05 2004-03-05 Static and run-time anti-disassembly and anti-debugging

Publications (2)

Publication Number Publication Date
US20050198526A1 US20050198526A1 (en) 2005-09-08
US7383583B2 true US7383583B2 (en) 2008-06-03

Family

ID=34912419

Family Applications (1)

Application Number Title Priority Date Filing Date
US10/795,058 Expired - Fee Related US7383583B2 (en) 2004-03-05 2004-03-05 Static and run-time anti-disassembly and anti-debugging

Country Status (1)

Country Link
US (1) US7383583B2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050198516A1 (en) * 2004-03-05 2005-09-08 Marr Michael D. Intentional cascade failure
US10664254B2 (en) * 2017-10-23 2020-05-26 Blackberry Limited Analyzing binary software components utilizing multiple instruction sets
US11409635B2 (en) 2019-08-23 2022-08-09 Raytheon Company Hacker-resistant anti-debug system

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100509650B1 (en) * 2003-03-14 2005-08-23 주식회사 안철수연구소 Method to detect malicious scripts using code insertion technique
US7975257B2 (en) * 2006-06-13 2011-07-05 Microsoft Corporation Iterative static and dynamic software analysis
FR2903508B1 (en) * 2006-07-10 2008-10-17 Sagem Defense Securite PROTECTION OF A PROGRAM INTERPRETED BY A VIRTUAL MACHINE
US8356356B2 (en) * 2007-01-30 2013-01-15 Microsoft Corporation Anti-debugger comprising spatially and temporally separate detection and response portions
US7664937B2 (en) * 2007-03-01 2010-02-16 Microsoft Corporation Self-checking code for tamper-resistance based on code overlapping
US8869109B2 (en) * 2008-03-17 2014-10-21 Microsoft Corporation Disassembling an executable binary
US8800048B2 (en) * 2008-05-20 2014-08-05 Microsoft Corporation Software protection through interdependent parameter cloud constrained software execution
US8782613B2 (en) * 2008-08-12 2014-07-15 Hewlett-Packard Development Company, L.P. Optimizing applications using source code patterns and performance analysis
US9111072B1 (en) * 2011-08-23 2015-08-18 Tectonic Labs, LLC Anti-reverse engineering unified process
US20150294114A1 (en) * 2012-09-28 2015-10-15 Hewlett-Packard Development Company, L.P. Application randomization
US10599852B2 (en) 2014-08-15 2020-03-24 Securisea, Inc. High performance software vulnerabilities detection system and methods
US9824214B2 (en) 2014-08-15 2017-11-21 Securisea, Inc. High performance software vulnerabilities detection system and methods
US9454659B1 (en) 2014-08-15 2016-09-27 Securisea, Inc. Software vulnerabilities detection system and methods
US9785777B2 (en) * 2014-12-19 2017-10-10 International Business Machines Corporation Static analysis based on abstract program representations
US10210323B2 (en) * 2016-05-06 2019-02-19 The Boeing Company Information assurance system for secure program execution
EP3264307A1 (en) * 2016-06-29 2018-01-03 Nagravision SA On demand code decryption
CN107728510A (en) * 2017-09-12 2018-02-23 山东超越数控电子有限公司 A kind of anti-disassembling mechanism suitable for portable set
CN110516438B (en) * 2018-05-21 2023-11-07 深信服科技股份有限公司 Method, system and related components for disassembling executable file
EP3722980A1 (en) * 2019-04-10 2020-10-14 SafeNet, Inc. Method to secure a software code

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5764374A (en) * 1996-02-05 1998-06-09 Hewlett-Packard Company System and method for lossless image compression having improved sequential determination of golomb parameter
US6006328A (en) * 1995-07-14 1999-12-21 Christopher N. Drake Computer software authentication, protection, and security system
US20020016918A1 (en) * 2000-05-12 2002-02-07 David Tucker Information security method and system
US6480959B1 (en) * 1997-12-05 2002-11-12 Jamama, Llc Software system and associated methods for controlling the use of computer programs
US6594761B1 (en) * 1999-06-09 2003-07-15 Cloakware Corporation Tamper resistant software encoding
US6668325B1 (en) * 1997-06-09 2003-12-23 Intertrust Technologies Obfuscation techniques for enhancing software security
US6829710B1 (en) * 2000-03-14 2004-12-07 Microsoft Corporation Technique for producing, through watermarking, highly tamper-resistant executable code and resulting “watermarked” code so formed
US20050183072A1 (en) * 1999-07-29 2005-08-18 Intertrust Technologies Corporation Software self-defense systems and methods
US6981146B1 (en) * 1999-05-17 2005-12-27 Invicta Networks, Inc. Method of communications and communication network intrusion protection methods and intrusion attempt detection system
US20060053307A1 (en) * 2000-06-21 2006-03-09 Aladdin Knowledge Systems, Ltd. System for obfuscating computer code upon disassembly

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6006328A (en) * 1995-07-14 1999-12-21 Christopher N. Drake Computer software authentication, protection, and security system
US5764374A (en) * 1996-02-05 1998-06-09 Hewlett-Packard Company System and method for lossless image compression having improved sequential determination of golomb parameter
US6668325B1 (en) * 1997-06-09 2003-12-23 Intertrust Technologies Obfuscation techniques for enhancing software security
US6480959B1 (en) * 1997-12-05 2002-11-12 Jamama, Llc Software system and associated methods for controlling the use of computer programs
US6981146B1 (en) * 1999-05-17 2005-12-27 Invicta Networks, Inc. Method of communications and communication network intrusion protection methods and intrusion attempt detection system
US6594761B1 (en) * 1999-06-09 2003-07-15 Cloakware Corporation Tamper resistant software encoding
US20050183072A1 (en) * 1999-07-29 2005-08-18 Intertrust Technologies Corporation Software self-defense systems and methods
US6829710B1 (en) * 2000-03-14 2004-12-07 Microsoft Corporation Technique for producing, through watermarking, highly tamper-resistant executable code and resulting “watermarked” code so formed
US20020016918A1 (en) * 2000-05-12 2002-02-07 David Tucker Information security method and system
US20060053307A1 (en) * 2000-06-21 2006-03-09 Aladdin Knowledge Systems, Ltd. System for obfuscating computer code upon disassembly

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Aucsmith, D., "Tamper Resistant Software: An Implementation", Proceedings 1<SUP>st </SUP>International Information Hiding Workshop (IHW), Cambridge, U.K. 1996, Springer LNCS 1174, 1997, 317-333.
Collberg, C. et al., "A Taxonomy of Obfuscating Transfromations", Technical Report No. 148, Department of Computer Science, University of Auckland, New Zealand, Jul. 1997, 36 pages.
Horne, B. et al., "Dynamic Self-Checking Techniques for Improved Tamper Resistance", Proceedings 1<SUP>st </SUP>ACM Workshop on Digital Rights Management(DRM2001), Springer LNCS 2320, 2002, 141-159.
Linn, C. et al., "Enhancing Software Tamper-Resistance via Stealthy Address Computations", Work in Progress, 19<SUP>th </SUP>Annual Computer Society Applications Conference(ACSAC2003), Dec. 2003, 1-3.
Wang, C. et al., "Software Tamper Resistance: Obstructing Static Analysis of Programs", Technical Report CS-2000-12, Department of Computer Science, University of Virginia, Dec. 2000, 18 pages.

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050198516A1 (en) * 2004-03-05 2005-09-08 Marr Michael D. Intentional cascade failure
US7444677B2 (en) * 2004-03-05 2008-10-28 Microsoft Corporation Intentional cascade failure
US10664254B2 (en) * 2017-10-23 2020-05-26 Blackberry Limited Analyzing binary software components utilizing multiple instruction sets
US11409635B2 (en) 2019-08-23 2022-08-09 Raytheon Company Hacker-resistant anti-debug system

Also Published As

Publication number Publication date
US20050198526A1 (en) 2005-09-08

Similar Documents

Publication Publication Date Title
US7383583B2 (en) Static and run-time anti-disassembly and anti-debugging
De Clercq et al. SOFIA: Software and control flow integrity architecture
US10255414B2 (en) Software self-defense systems and methods
US7584364B2 (en) Overlapped code obfuscation
US7254586B2 (en) Secure and opaque type library providing secure data protection of variables
US7287166B1 (en) Guards for application in software tamperproofing
KR101719635B1 (en) A system and method for aggressive self-modification in dynamic function call systems
EP1410150B1 (en) Protecting software applications against software piracy
US7757097B2 (en) Method and system for tamperproofing software
US6480959B1 (en) Software system and associated methods for controlling the use of computer programs
US7865961B2 (en) Computer system, central unit, and program execution method
Lin et al. Control-flow carrying code
Yang et al. Arm pointer authentication based forward-edge and backward-edge control flow integrity for kernels
Lehniger et al. Combination of ROP Defense Mechanisms for Better Safety and Security in Embedded Systems
Wang et al. An efficient control-flow based obfuscator for micropython bytecode
Togan et al. Virtual machine for encrypted code execution
Wu et al. Efficient and automatic instrumentation for packed binaries
Kanzaki et al. A software protection method based on time-sensitive code and self-modification mechanism
Iandoli PROLEPSIS: Binary Instrumentation Tool for Control-Flow Integrity in ARM and RISC-V
Aelterman Exploitation of synergies between software protections
Miljak An experimental study on which anti-reverse engineering technique are the most effective to protect your software from reversers
Nürnberger Mitigating the imposition of malicious behaviour on code
Ghosh Software Protection via Composable Process-level Virtual Machines
Singh Fundamental of reverse engineering
Liang Software code protection through software obfuscation

Legal Events

Date Code Title Description
AS Assignment

Owner name: MICROSOFT CORPORATION, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MARR, MICHAEL DAVID;BAKER, BRANDON;REEL/FRAME:015055/0812

Effective date: 20040305

STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

AS Assignment

Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034541/0477

Effective date: 20141014

FPAY Fee payment

Year of fee payment: 8

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20200603