Hooked on Mnemonics Worked for Me

Upatre: Sample Set Analysis

Hi. I recently wrapped up an analysis of Upatre. My original intention was to write a generic C2 decoder/extractor for the executables. After analyzing ten samples I realized it was not feasible due to the different encoding algorithms and obfuscation. After analyzing the ten samples I became interested in how Upatre obfuscates/encodes it's executables. My analysis is based off of 94 unique Upatre samples. My sample set might be little skewed because I grabbed the first 100 files sorted by file size with a type zip. Having the files in their original zip file allowed me to have their original file names. I would like to thank VirusTotal for access to the samples and Glenn Edwards for feedback.

Google Docs (HTML)
Git Repo (PDF)

PE Skeletons

I know this topic has been beaten to death but I thought I'd share a technique for detecting single byte XOR executables in file streams. Recently while looking at a file stream I instantly knew there was an encoded XOR executable file in it. This got me thinking, why can I spot this and can I script it up? Since executable files are a defined structure they have a standard skeleton to them. It's not always easy to see the skeleton if we just look at the hex bytes.

We can add some color via the following Python code. 

import matplotlib.pyplot as plt
import numpy as np
import sys

def main():
        data = open(sys.argv[1], 'rb').read()[:512]
        dlist = bytearray(data)
        print len(dlist)
        plotters = np.array(dlist)
        plotters.shape = (32,16)
        plt.axis([0,16,0, 32])

if __name__ == '__main__':

The code reads the first 512 bytes of a file, puts each byte into a bytearray and then plots the color in the same structure as the hex dump. If we were to pass an executable file to this script we would get the following pretty picture. 
The bottom left hand corner byte is 0x4d 'M' the second is 0x5a 'Z' and so on. If we were to XOR the executable with  0x88 we would get the following image.
XOR 0x88 Key
If we were to think about the red in the first image and blue in the second image as negative space we would see the PE skeleton. Okay, that was cute, now let's see if we can detect this in a file stream. Since the Portable Executable have a standard structure. The beginning starts with 'MZ', jump 0x3C bytes, read four bytes to get the address of the PE, then check if "PE" is at the read offset. This is a highly dumb down version. Check out PE101 by Ange Albertini for an awesome introduction if my definition is unclear. Since these are standard steps all we have to do is check for the same structure but with XORed data.

import sys
import struct

# read file into a bytearray
byte = bytearray(open(sys.argv[1], 'rb').read())

# for each byte in the file stream, excluding the last 256 bytes
for i in range(0, len(byte) - 256):
        # KEY ^ VALUE ^ KEY = VALUE; Simple way to get the key 
        key = byte[i] ^ ord('M')
        # verify the two bytes contain 'M' & 'Z'
        if chr(byte[i] ^ key) == 'M' and  chr(byte[i+1] ^ key) == 'Z':
                # skip non-XOR encoded MZ
                if key == 0:
                # read four bytes into temp, offset to PE aka lfanew
                temp = byte[(i + 0x3c) : (i + 0x3c + 4)]
                # decode values with key 
                lfanew = []
                for x in temp:
                        lfanew.append( x ^ key)
                # convert from bytearray to int value, probably a better way to do this
                pe_offset  = struct.unpack( '<i', str(bytearray(lfanew)))[0]
                # verify results are not negative or read is bigger than file 
                if pe_offset < 0 or pe_offset > len(byte):
                # verify the two decoded bytes are 'P' & 'E'
                if byte[pe_offset] ^ key == ord('P') and byte[pe_offset + 1] ^ key == ord('E'):
                        print "Encoded PE Found, Key %x, Offset %x" % (key, i)
Speed, false postives testing, etc are all probably areas of improvement for the code.

If we were to run this on the executable XORed with 0x88 we would be present with the following output Encoded PE Found, Key 88, Offset 0

Kind of a cool technique to use the Portable Executable structure to find XOR exes. It only works on single byte executables. Could be modified for 2 or 4 bytes. Not sure about anything higher. A brute force approach would probably be better for key byte size of anything higher than 4. The skeleton is prevalent when an executable is XORed with a key of five bytes in size. Using gray tones can help show the skeleton because the contrast is dulled.

Useful Links

Side Note:
I have to admit I'm a huge fan of using ByteArrays now. I wish I could have of learned of them sooner. They are very useful for writing decoders. It remove a lot of the four play of checking the computed size ( value & 0xFF), using ord() and using chr().

xxxswf.py updates

A new version of xxxswf.py has been pushed to it's repo. The current build handles the extracting and decompressing of LZMA compressed SWFs. In order to decompress ZWS SWFs pylzma will need to be installed.

__@____:~/projects/swfs/ZWS$ python xxxswf.py -x c026ebfa3a191d4f27ee72f34fa0d97656113be368369f605e7845a30bc19f6a 

[SUMMARY] Potentially 1 SWF(s) in MD5 d41d8cd98f00b204e9800998ecf8427e:c026ebfa3a191d4f27ee72f34fa0d97656113be368369f605e7845a30bc19f6a
 [ADDR] SWF 1 at 0x4008 - ZWS Header
  [FILE] Carved SWF MD5: 14c29705d5239690ce9d359dccd41fa7.swf

At least ninety percent of the code has been rewritten. A lot of bugs were fixed. The current build is 1.9.1. I'm still wanting to add more features and test newly added functionality but due to the increase use of malicious ZWS I decided to push this out. I have tested it against a couple of hundred malicious files and I have found no errors when running xxxswf.py from the command line. All of the traditional features when invoked from the command line have been tested and our stable. New features include being able to create an xxxswf instance and calling functions for prepping or converting the file stream and scanning the extracted SWF(s). If we wanted to decompress a Microsoft Zip+XML file and then extract embedded SWF(s), this would be a prepping function. I need to write more functions to figure out a good work flow. Once I get this sorted out I'll push out the 2.0 version. If you find any errors please send me an email, leave a comment or ping me on twitter.

injdmp - source code release

 I keep seeing people ask about process injection detection on Twitter, Stackflow, etc. If anyone is interested I released the source code to injdmp.

Longer than usual disclaimer:
  • This project was for learning C. The code sucks but the concepts are there, even if they are basic.
  • I'd recommend updating the repo often. Odds are 0xdabbad00 will keep pointing out my mistakes. His current count is 2.
  • Any type of detection from User Space is fundamentally flawed. 
  • Volatility did all this stuff years ago, use it.  
  • Code was tested on Windows XP, very minimal was done on Windows 7.
 With that being said it's fun to think of ways to detect process injection. If anyone can recommend any good articles on how anti-cheating engines detect process injection or other detection techniques please shoot me an email ( line 3 in the source code) or ping me on Twitter.

reiat.py - 0.5 version

An updated version of reiat.py has been pushed to it's repo. For more information about reiat.py please see the following link. A couple of new features have been added. The first feature is a simple window/viewer. An example can be seen above. The address on the left is where GetProcAddress is originally called. The orange strings are the API names( lpProcName) passed to GetProcAddress. The third column is the addresses of the last reference to the returned values and the fourth row is the type. The type can be one of four values. If the address is saved at a dword address mov  ds:NSSBase64_DecodeBuffer, eax it will have an xref type. If the address of an API is called  call    edi   ;  MiniDumpWriteDump it will have a type of call. If the address is saved to an array or some other register + offset the type will be the register values. An example can be seen below of this type.

"Type" is probably not the best term/description... The last type will be None if the trace of the variables failed. Another feature is all of the data is stored in a list of tuples called log.
The format is the same as the output window (address, string, address, my_type). A couple of bug fixes were also added. An interesting bug was to relying on FUNCATTR_END for testing boundaries of the end of a function. This approach is flawed when dealing with obfuscated code that jumps around. Calling funcAddress = list(FuncItems(address)) and then checking if an address is in the list is a more accurate approach. Code changes.

I'd still like to add a couple of more features. Ashutosh Mehra mentioned some issues around the use of EncodePointer. Simple scenarios are not that hard but anytime more functions or APIs calls are added to flow tracing logic things get complicated quickly. Also, it would be cool to solve this problem. I tried some approaches of adding sections and patching the IDB but I was unsuccessful. If you have an ideas, comments or find bugs please send me an email or ping me on Twitter.

Most of window/viewer code came from the post Extending IDA with Custom Viewers. There are a lot of great post on the MindShaRE blog. I'd highly recommend reading through them if you haven't already.

backtrace.py version 0.3

backtrace.py version 0.3 has been pushed out to it's repo. A couple of notable features have been added. The previous version only tracked the use of the MOV instruction. This is kind of useful..I guess..well at least it was fun to code.  The current version tracks whenever a register(ECX) or it's sub-register (CX) are manipulated. The old version relied on string comparisons. For example if we back trace from the highlighted code up we would see al is referenced then EAX, then byte_1003B03C, then dl, etc..

.text:10004E99                 mov     byte_1003B03C, al
.text:10004E9E                 movsx   ecx, byte_1003B03C
.text:10004EA5                 imul    ecx, 0A2h
.text:10004EAB                 mov     byte_1003B03C, cl
.text:10004EB1                 movsx   edx, byte_1003B03C
.text:10004EB8                 xor     edx, 0A4h
.text:10004EBE                 mov     byte_1003B03C, dl
.text:10004EC4                 movsx   eax, byte_1003B03C
.text:10004ECB                 cdq
.text:10004ECC                 mov     ecx, 0C8h
.text:10004ED1                 idiv    ecx
.text:10004ED3                 mov     byte_1003B03C, al
.text:10004ED8                 xor     eax, eax
.text:10004EDA                 jmp     short loc_10004F01
.text:10004EDC ; ---------------------------------------------------------------------------
.text:10004EDC                 movsx   edx, byte_1003B03C
.text:10004EE3                 or      edx, 0D2h
.text:10004EE9                 mov     byte_1003B03C, dl
.text:10004EEF                 movsx   eax, byte_1003B03C
.text:10004EF6                 imul    eax, 0C1h
.text:10004EFC                 mov     byte_1003B03C, al

The old version did not know that AL is the lower address of EAX due to the use of string comparison. The new version does a simple check of the register name and it's purpose. Note: there will be some issues if AH is moved into AL or other similar operations. I didn't code that logic in. If we were to back trace the code above we would have the following output.

 0x10004efc mov     byte_1003B03C, al
 0x10004ef6 imul    eax, 0C1h
 0x10004eef movsx   eax, byte_1003B03C
 0x10004ee9 mov     byte_1003B03C, dl
 0x10004ee3 or      edx, 0D2h
 0x10004edc movsx   edx, byte_1003B03C
 0x10004ed3 mov     byte_1003B03C, al
 0x10004ec4 movsx   eax, byte_1003B03C
 0x10004ebe mov     byte_1003B03C, dl
 0x10004eb8 xor     edx, 0A4h
 0x10004eb1 movsx   edx, byte_1003B03C
 0x10004eab mov     byte_1003B03C, cl
 0x10004ea5 imul    ecx, 0A2h
 0x10004e9e movsx   ecx, byte_1003B03C
 0x10004e99 mov     byte_1003B03C, al

The code also tracks how some general purpose instructions manipulate different registers. Most of them are simple due to the x86 standard of instruction destination source format. Not all of them are though. I spent a good amount of time wondering what variables to back trace when following instructions such as DIV. Is EAX or the DIV operand more important back trace? I went with the operand but in the future I plan on creating back split trace that will track EAX and the operand passed to DIV.  Odds are there are still more general purpose instructions I need to check for. XADD is a pretty cool instruction. The shortest Fibonacci can be written using XADD. 

This version was written in order for me to crack an obfuscation technique that I have seen lately. Using backtrace.py and the last line of the dead code blocks I'm able to identify most of the junk code and variables. I'm sure there are flaws (like not tracing push or pops...future release) but so far it is working well for me. I hope the code is of use to others. If you have any recommendations, thoughts, etc please shoot me an email (line 20 of the source code) or ping me on twitter.

Identifying Unreachable Basic Blocks

The below IDAPython code can be used to identify unreachable code in a function.

def unreachable(address, colorme = True, verbose = True):
    'identfies basic blocks that do not have an entry point and are not the entry point of a function' 
    child = set([])
    curr = 0
    orph = []
    f = idaapi.FlowChart(idaapi.get_func(address))
    for block in f:
        for succ_block in block.succs():
        for pred_block in block.preds():
    for block in f:
        if block.id not in child and  block.id != 0:
            if verbose:
                print "Unreachable Code - %x - %x" % (block.startEA, block.endEA)
            if colorme:
                curr = block.startEA
                while curr != block.endEA:
                    SetColor(curr, CIC_ITEM, 0x80ffff)
                    curr = NextAddr(curr)
            orph.append((block.startEA, block.endEA))
    return orph

Some obfuscation techniques use unreachable code and dead code variables to clutter the output of a disassembler.  The goal of this is to either hinder or slow down the analysis of the code. In the grey code below we can see a basic block of unreachable code.

The basic block has no entry point to it. Typically the only time we will see a basic block that doesn't have an entry will be at the entry point of the function. Using IDA's FlowChart function can we get all the blocks ids and their exit point(s). If all the exit points are saved into a set; we will also have a set of corresponding entry points for other basic blocks. The last block which is usually the function epilogue will not have an exit point (from a control flow [1] function chunk association view ).  Using the output from Elias Bachaalany's ex_gdl_qflow_chart.py (which my code is just a hack of) we can see the block ids and their corresponding exit points.

1002a5d0 - 1002a626 [0]:
  1002a626 - 1002a62b [1]:
  1002a63f - 1002a671 [3]:
1002a626 - 1002a62b [1]:
  1002a684 - 1002a68a [5]:
1002a62b - 1002a63f [2]:
  1002a63f - 1002a671 [3]:
1002a63f - 1002a671 [3]:
  1002a684 - 1002a68a [5]:
1002a671 - 1002a684 [4]:
  1002a684 - 1002a68a [5]:
1002a684 - 1002a68a [5]:  

Notice the last block in the output does not contain an exit point. Since we have all the entry points we now just need to iterate through all the block ids and see if it's id is in the exit point set. All the block ids (except for the function entry point) should be an exit point from another block id. Or simply, all exit points are another blocks entry point.  If the block id is not the function entry point and is not entry point from another basic block we know that block is unreachable code. Let's see an example run.

The function needs only one argument which is the address. A second argument can be True or False for colors, and True or False for verbose. unreachable(here(), False, False) would only return a list of tuples (start address, end address) for the unreachable basic block. The code can be easily modified to loop through every function to print out all unreachable code in an executable. Or could be used to highlight functions via a hot-key similar to deroko's color eip redirection script. Just a quick post. Cheers.

[1] "The IDA SDK's notion of basic block membership within a function is based upon function chunk association and not control flow." - via Rolf on RE Reddit - source