backtrace.py version 0.3

backtrace.py version 0.3 has been pushed out to it's repo. A couple of notable features have been added. The previous version only tracked the use of the MOV instruction. This is kind of useful..I guess..well at least it was fun to code.  The current version tracks whenever a register(ECX) or it's sub-register (CX) are manipulated. The old version relied on string comparisons. For example if we back trace from the highlighted code up we would see al is referenced then EAX, then byte_1003B03C, then dl, etc..

.text:10004E99                 mov     byte_1003B03C, al
.text:10004E9E                 movsx   ecx, byte_1003B03C
.text:10004EA5                 imul    ecx, 0A2h
.text:10004EAB                 mov     byte_1003B03C, cl
.text:10004EB1                 movsx   edx, byte_1003B03C
.text:10004EB8                 xor     edx, 0A4h
.text:10004EBE                 mov     byte_1003B03C, dl
.text:10004EC4                 movsx   eax, byte_1003B03C
.text:10004ECB                 cdq
.text:10004ECC                 mov     ecx, 0C8h
.text:10004ED1                 idiv    ecx
.text:10004ED3                 mov     byte_1003B03C, al
.text:10004ED8                 xor     eax, eax
.text:10004EDA                 jmp     short loc_10004F01
.text:10004EDC ; ---------------------------------------------------------------------------
.text:10004EDC                 movsx   edx, byte_1003B03C
.text:10004EE3                 or      edx, 0D2h
.text:10004EE9                 mov     byte_1003B03C, dl
.text:10004EEF                 movsx   eax, byte_1003B03C
.text:10004EF6                 imul    eax, 0C1h
.text:10004EFC                 mov     byte_1003B03C, al

The old version did not know that AL is the lower address of EAX due to the use of string comparison. The new version does a simple check of the register name and it's purpose. Note: there will be some issues if AH is moved into AL or other similar operations. I didn't code that logic in. If we were to back trace the code above we would have the following output.

Python>s.backtrace(here(),1)
 0x10004efc mov     byte_1003B03C, al
 0x10004ef6 imul    eax, 0C1h
 0x10004eef movsx   eax, byte_1003B03C
 0x10004ee9 mov     byte_1003B03C, dl
 0x10004ee3 or      edx, 0D2h
 0x10004edc movsx   edx, byte_1003B03C
 0x10004ed3 mov     byte_1003B03C, al
 0x10004ec4 movsx   eax, byte_1003B03C
 0x10004ebe mov     byte_1003B03C, dl
 0x10004eb8 xor     edx, 0A4h
 0x10004eb1 movsx   edx, byte_1003B03C
 0x10004eab mov     byte_1003B03C, cl
 0x10004ea5 imul    ecx, 0A2h
 0x10004e9e movsx   ecx, byte_1003B03C
 0x10004e99 mov     byte_1003B03C, al

The code also tracks how some general purpose instructions manipulate different registers. Most of them are simple due to the x86 standard of instruction destination source format. Not all of them are though. I spent a good amount of time wondering what variables to back trace when following instructions such as DIV. Is EAX or the DIV operand more important back trace? I went with the operand but in the future I plan on creating back split trace that will track EAX and the operand passed to DIV.  Odds are there are still more general purpose instructions I need to check for. XADD is a pretty cool instruction. The shortest Fibonacci can be written using XADD. 

This version was written in order for me to crack an obfuscation technique that I have seen lately. Using backtrace.py and the last line of the dead code blocks I'm able to identify most of the junk code and variables. I'm sure there are flaws (like not tracing push or pops...future release) but so far it is working well for me. I hope the code is of use to others. If you have any recommendations, thoughts, etc please shoot me an email (line 20 of the source code) or ping me on twitter.

No comments:

Post a Comment