Hooked on Mnemonics Worked for Me

Upatre, rtrace and XP EOL

A couple of days while reading my RSS feeds I noticed an article entitled Daily analysis note: "Upatre" is back to SSL? on the Malware Must Die! blog. During their analysis they mentioned that they didn't solve the obfuscation. I had recently wrapped up an analysis of different obfuscation techniques that Upatre uses. So I thought I'd take a crack at it. After glancing at the sample for a couple of seconds I thought to myself it was a variant of the obfuscated dword XOR sample set. I did a quick educated guess hit F9 in ollydbg and nothing happened. Which is strange because typically Upatre will write the sample to the %TEMP% directory and then try to download it's payload. Instead of the expected activity, I was looking at ExitProcess. At this point my interest was peeked and I wanted to find the anti-debugging...but I didn't care enough to read the assembly code. The code looked somewhat obfuscated.

Rather than reversing the obfuscation I decided to take a different route. Ollydbg has a useful feature called Run Trace or rtrace for short. It's great for tracking down anti-debugging or logging changes to registers. It's not a feature I would recommend for tracing over large amounts of code. Creating a PinTool would be a better choice for this type of task but rtrace is good for small jobs. Here are three excellent reads on using Pintool for similar tasks 1, 2 and 3.

rtrace log
Rtrace logs the address, thread, instructions, and the register changes. This is very useful for if you want to review how specific registers are changed.  The rtrace output can be saved to a file in Ollydbg by selecting View, Run Trace, right-click, and Log to file. Since rtrace logs can be quite large it's best to only trace code that is worth logging. Breakpoints can be used for creating the boundaries of the trace data. Rather than working with rtrace log as a text file I decided to create an ordered dictionary and save it to a JSON using Python. Once the rtrace log is converted to a JSON I can populate the IDB with interesting values. Some values that I thought were interesting were the count of how many times an address got called, the amount of unique register values for an instruction. For example if the following instruction sub     ecx, 0D733EFF3h is called X number of times; how many different values of ECX were there? If the set is 1 but the instruction was called 6,000 different times we know it's a static value and not give it another thought. Here is the Python code that I hacked together. 

import sys
import json
import collections

def run():
    'this function creates an ordered dict saved to a json from an rtrace log'
    rtrace = collections.OrderedDict()
    # rtrace log passed as argument
    with open(sys.argv[1]) as f:
        for line in f:
            addr, reg_values = parse_line(line.rstrip())
            if addr:
                    addr = int(addr,16)
                if rtrace.has_key(addr):
                    if len(reg_values) == 0:
                    for val in reg_values:
                    rtrace[addr] = []
                    if len(reg_values) == 0:
                    for val in reg_values:
def save_off(rtrace):
    'save the data to a file named rtrace.json'
    with open("rtrace.json", 'w') as f:
        json.dump(rtrace, f)

def parse_line(line):
    'parses the rtrace log' 
    temp = line.split('\t')
        return (temp[0], list(temp[3:]))
        return None, None                 


def show_data():
    'prints all the modified registers for an address in a rtrace log'
        print rtrace_data[str(here())]
        print "Error: never called"

def show_next():
    'prints next called address. Kind of buggy on ret'
        temp = rtrace_data.items()[rtrace_data.keys().index(str(here())) + 1]
        print hex(int(temp[0]))
        print "Error: Could not be found"
def populate():
    'populate the IDB with counts and set values'
    import idautils
    import idaapi
    global rtrace_data
    rtrace_data = get_json()
    idaapi.add_hotkey("Shift-A", show_data)
    idaapi.add_hotkey("Shift-S", show_next)
    for func in idautils.Functions():
        flags = GetFunctionFlags(func)
        if flags & FUNC_LIB or flags & FUNC_THUNK:
        dism_addr = list(FuncItems(func))
        for line in dism_addr:
            com = format_data(rtrace_data, line)
            if com:
                MakeComm(line, com)

def format_data(rtrace_data, line):
        instr_data = rtrace_data[str(line)]
        return None
    count = len(instr_data)
    # Empty lists are not hashable for sets. 
        data_count = len(set(instr_data))
        data_count = None
    if data_count:
        comment = "C: 0x%x, S: 0x%x" % (count, data_count)
        comment = "C: 0x%x" % (count)
    return comment
def get_json():
        f = open("rtrace.json")
        js = json.load(f, object_pairs_hook=collections.OrderedDict)
    except e:
        print "ERROR: %s" % e
    return js

if __name__ == '__main__':
# the sys.arv[1] exception is a hack to guess how the script is being
# called. If there is an exception it is being run in IDA.

To execute the code above pass the output log file to the script, copy the rtrace.json to the working directory of the script and the IDB and then execute the script in IDA. This will give an output as seen below. The C is for Count and S is for Set. Glancing over other functions it is easy to see which instructions are responsible for looping and decoding data.
Since the executable only had 17 functions it was easy to identify instructions that were not called and then investigate why they weren't. Notice the fourth and fifth block was not called. This is due to the returned values of what is called at EBX.

Since rtrace logs all modified register values it is easy to access those values. Included in the script is the ability to print all those values. This can be done by selecting the address and pressing Shift-A. To print the next address called, select the address and the press Shift-S. The arrow is the address of the selected line.

Once I followed the address in the output window it lead me to what called ExitProcess. Now all I needed to do was investigate the calls of EBX in the first block of the previous to see what is being checked and patch the ZF flag results to continue execution. What makes this sample interesting (and actually worth reading the code) is Upatre is using an undocumented technique to determine if it is running on a Windows NT 6.0 or higher. I'm unaware if this techniques works on Windows Vista. I have only tested on Windows XP SP3 (NT 5.1) and Windows 7 ( NT 6.1).

The malware calls RtlAcquirePebLock and NtCurrentTeb twice. On Windows XP when RtlAcquirePebLock is called the first time ECX, EDX and EAX is over written. ECX will be the return address of RtlAcquirePebLock, EDX will be the address of PEB FastPebLoclk which is a pointer to _RTL_CRITICAL_SECTION and EAX will be zero. On the second call only EAX will be over written. On Windows 7 when RtlAcquirePebLock is called EAX will become zero and ECX will be equal to the Thread Information Block (TEB) ClientId. On the second call to RtlAcquirePebLock EAX will be zero, EDX will be the TEB CliendId but ECX will be equal the the TEB. On the second call to RtlAcquirePebLock  if  ECX is equal to TEB or the return of NtCurrentTeb the sample is running on NT 6.0 or higher.

Below is the code rewritten in C++

#include <windows.h>
#include <stdio.h>
#include <iostream>

int getECX() {
 int value = 0;
 //Moves edx into 'value'
 _asm mov value, ecx;
 return value;

int getEAX() {
 int value = 0;
 //Moves edx into 'value'
 _asm mov value, eax;
 return value;

int getEDX() {
 int value = 0;
 //Moves edx into 'value'
 _asm mov value, edx;
 return value;
int main()
   HMODULE hModule = GetModuleHandleA(("ntdll.dll"));
   FARPROC pRtlAcquirePebLock = GetProcAddress(hModule, "RtlAcquirePebLock");

   HMODULE h2Module = GetModuleHandleA(("ntdll.dll"));
   FARPROC ntcur = GetProcAddress(h2Module, "NtCurrentTeb");

   // ecx, eax & edx are modifed when RtlAcquirePebLockis called
   int ecxValue = getECX();
   int eaxValue = getEAX();
   int edxValue = getEDX();

   // call current teb
   eaxValue = getEAX();

   if (eaxValue == ecxValue)
    std::cout << "EAX & ECX Match - second stage" << std::endl;
   if (eaxValue == edxValue)
    std::cout << "EAX & EDX Match - second stage" << std::endl;

   // lock again 
   ecxValue = getECX();
   eaxValue = getEAX();
   edxValue = getEDX();

   // current teb 
   eaxValue = getEAX();

   if (eaxValue == ecxValue)
   std::cout << "EAX & ECX Match - second stage aka > XP" << std::endl;

The changes in the return values are caused by the difference in how RtlAcquirePebLock calls RtlEnterCriticalSection. In Windows XP RtlEnterCriticalSection is called by being passed a pointer  from PEB FastPebLockRoutine. Since the PEB is writable from user mode the FastPebLockRoutine, it can be over written to cause a heap overflow. See Refs 1 and 2. Below we can see the difference between XP and Win7 for RtlAcquirePebLock.
_RtlAcquirePebLock XP
_RtlAcquirePebLock Win7
Pretty interesting technique for avoiding Windows XP. Thanks to Hexacorn for help and feedback. If  I wanted to continue executing the malware I would either need to patch the return of the second call to RtlAcquirePebLock or patch the comparison. Please feel free to send me any feedback, criticism or notes via email (in the python code), hit me up on Twitter,  or leave a comment. Cheers.

Hash - 891F33FDD94481E84278736CEB891D1036564C03

[1] http://net-ninja.net/article/2011/Sep/03/heap-overflows-for-humans-102/
[2] http://www.exploit-db.com/papers/13178/

Upatre: Sample Set Analysis

Hi. I recently wrapped up an analysis of Upatre. My original intention was to write a generic C2 decoder/extractor for the executables. After analyzing ten samples I realized it was not feasible due to the different encoding algorithms and obfuscation. After analyzing the ten samples I became interested in how Upatre obfuscates/encodes it's executables. My analysis is based off of 94 unique Upatre samples. My sample set might be little skewed because I grabbed the first 100 files sorted by file size with a type zip. Having the files in their original zip file allowed me to have their original file names. I would like to thank VirusTotal for access to the samples and Glenn Edwards for feedback.

Google Docs (HTML)
Git Repo (PDF)

PE Skeletons

I know this topic has been beaten to death but I thought I'd share a technique for detecting single byte XOR executables in file streams. Recently while looking at a file stream I instantly knew there was an encoded XOR executable file in it. This got me thinking, why can I spot this and can I script it up? Since executable files are a defined structure they have a standard skeleton to them. It's not always easy to see the skeleton if we just look at the hex bytes.

We can add some color via the following Python code. 

import matplotlib.pyplot as plt
import numpy as np
import sys

def main():
        data = open(sys.argv[1], 'rb').read()[:512]
        dlist = bytearray(data)
        print len(dlist)
        plotters = np.array(dlist)
        plotters.shape = (32,16)
        plt.axis([0,16,0, 32])

if __name__ == '__main__':

The code reads the first 512 bytes of a file, puts each byte into a bytearray and then plots the color in the same structure as the hex dump. If we were to pass an executable file to this script we would get the following pretty picture. 
The bottom left hand corner byte is 0x4d 'M' the second is 0x5a 'Z' and so on. If we were to XOR the executable with  0x88 we would get the following image.
XOR 0x88 Key
If we were to think about the red in the first image and blue in the second image as negative space we would see the PE skeleton. Okay, that was cute, now let's see if we can detect this in a file stream. Since the Portable Executable have a standard structure. The beginning starts with 'MZ', jump 0x3C bytes, read four bytes to get the address of the PE, then check if "PE" is at the read offset. This is a highly dumb down version. Check out PE101 by Ange Albertini for an awesome introduction if my definition is unclear. Since these are standard steps all we have to do is check for the same structure but with XORed data.

import sys
import struct

# read file into a bytearray
byte = bytearray(open(sys.argv[1], 'rb').read())

# for each byte in the file stream, excluding the last 256 bytes
for i in range(0, len(byte) - 256):
        # KEY ^ VALUE ^ KEY = VALUE; Simple way to get the key 
        key = byte[i] ^ ord('M')
        # verify the two bytes contain 'M' & 'Z'
        if chr(byte[i] ^ key) == 'M' and  chr(byte[i+1] ^ key) == 'Z':
                # skip non-XOR encoded MZ
                if key == 0:
                # read four bytes into temp, offset to PE aka lfanew
                temp = byte[(i + 0x3c) : (i + 0x3c + 4)]
                # decode values with key 
                lfanew = []
                for x in temp:
                        lfanew.append( x ^ key)
                # convert from bytearray to int value, probably a better way to do this
                pe_offset  = struct.unpack( '<i', str(bytearray(lfanew)))[0]
                # verify results are not negative or read is bigger than file 
                if pe_offset < 0 or pe_offset > len(byte):
                # verify the two decoded bytes are 'P' & 'E'
                if byte[pe_offset] ^ key == ord('P') and byte[pe_offset + 1] ^ key == ord('E'):
                        print "Encoded PE Found, Key %x, Offset %x" % (key, i)
Speed, false postives testing, etc are all probably areas of improvement for the code.

If we were to run this on the executable XORed with 0x88 we would be present with the following output Encoded PE Found, Key 88, Offset 0

Kind of a cool technique to use the Portable Executable structure to find XOR exes. It only works on single byte executables. Could be modified for 2 or 4 bytes. Not sure about anything higher. A brute force approach would probably be better for key byte size of anything higher than 4. The skeleton is prevalent when an executable is XORed with a key of five bytes in size. Using gray tones can help show the skeleton because the contrast is dulled.

Useful Links

Side Note:
I have to admit I'm a huge fan of using ByteArrays now. I wish I could have of learned of them sooner. They are very useful for writing decoders. It remove a lot of the four play of checking the computed size ( value & 0xFF), using ord() and using chr().

xxxswf.py updates

A new version of xxxswf.py has been pushed to it's repo. The current build handles the extracting and decompressing of LZMA compressed SWFs. In order to decompress ZWS SWFs pylzma will need to be installed.

__@____:~/projects/swfs/ZWS$ python xxxswf.py -x c026ebfa3a191d4f27ee72f34fa0d97656113be368369f605e7845a30bc19f6a 

[SUMMARY] Potentially 1 SWF(s) in MD5 d41d8cd98f00b204e9800998ecf8427e:c026ebfa3a191d4f27ee72f34fa0d97656113be368369f605e7845a30bc19f6a
 [ADDR] SWF 1 at 0x4008 - ZWS Header
  [FILE] Carved SWF MD5: 14c29705d5239690ce9d359dccd41fa7.swf

At least ninety percent of the code has been rewritten. A lot of bugs were fixed. The current build is 1.9.1. I'm still wanting to add more features and test newly added functionality but due to the increase use of malicious ZWS I decided to push this out. I have tested it against a couple of hundred malicious files and I have found no errors when running xxxswf.py from the command line. All of the traditional features when invoked from the command line have been tested and our stable. New features include being able to create an xxxswf instance and calling functions for prepping or converting the file stream and scanning the extracted SWF(s). If we wanted to decompress a Microsoft Zip+XML file and then extract embedded SWF(s), this would be a prepping function. I need to write more functions to figure out a good work flow. Once I get this sorted out I'll push out the 2.0 version. If you find any errors please send me an email, leave a comment or ping me on twitter.

injdmp - source code release

 I keep seeing people ask about process injection detection on Twitter, Stackflow, etc. If anyone is interested I released the source code to injdmp.

Longer than usual disclaimer:
  • This project was for learning C. The code sucks but the concepts are there, even if they are basic.
  • I'd recommend updating the repo often. Odds are 0xdabbad00 will keep pointing out my mistakes. His current count is 2.
  • Any type of detection from User Space is fundamentally flawed. 
  • Volatility did all this stuff years ago, use it.  
  • Code was tested on Windows XP, very minimal was done on Windows 7.
 With that being said it's fun to think of ways to detect process injection. If anyone can recommend any good articles on how anti-cheating engines detect process injection or other detection techniques please shoot me an email ( line 3 in the source code) or ping me on Twitter.

reiat.py - 0.5 version

An updated version of reiat.py has been pushed to it's repo. For more information about reiat.py please see the following link. A couple of new features have been added. The first feature is a simple window/viewer. An example can be seen above. The address on the left is where GetProcAddress is originally called. The orange strings are the API names( lpProcName) passed to GetProcAddress. The third column is the addresses of the last reference to the returned values and the fourth row is the type. The type can be one of four values. If the address is saved at a dword address mov  ds:NSSBase64_DecodeBuffer, eax it will have an xref type. If the address of an API is called  call    edi   ;  MiniDumpWriteDump it will have a type of call. If the address is saved to an array or some other register + offset the type will be the register values. An example can be seen below of this type.

"Type" is probably not the best term/description... The last type will be None if the trace of the variables failed. Another feature is all of the data is stored in a list of tuples called log.
The format is the same as the output window (address, string, address, my_type). A couple of bug fixes were also added. An interesting bug was to relying on FUNCATTR_END for testing boundaries of the end of a function. This approach is flawed when dealing with obfuscated code that jumps around. Calling funcAddress = list(FuncItems(address)) and then checking if an address is in the list is a more accurate approach. Code changes.

I'd still like to add a couple of more features. Ashutosh Mehra mentioned some issues around the use of EncodePointer. Simple scenarios are not that hard but anytime more functions or APIs calls are added to flow tracing logic things get complicated quickly. Also, it would be cool to solve this problem. I tried some approaches of adding sections and patching the IDB but I was unsuccessful. If you have an ideas, comments or find bugs please send me an email or ping me on Twitter.

Most of window/viewer code came from the post Extending IDA with Custom Viewers. There are a lot of great post on the MindShaRE blog. I'd highly recommend reading through them if you haven't already.

backtrace.py version 0.3

backtrace.py version 0.3 has been pushed out to it's repo. A couple of notable features have been added. The previous version only tracked the use of the MOV instruction. This is kind of useful..I guess..well at least it was fun to code.  The current version tracks whenever a register(ECX) or it's sub-register (CX) are manipulated. The old version relied on string comparisons. For example if we back trace from the highlighted code up we would see al is referenced then EAX, then byte_1003B03C, then dl, etc..

.text:10004E99                 mov     byte_1003B03C, al
.text:10004E9E                 movsx   ecx, byte_1003B03C
.text:10004EA5                 imul    ecx, 0A2h
.text:10004EAB                 mov     byte_1003B03C, cl
.text:10004EB1                 movsx   edx, byte_1003B03C
.text:10004EB8                 xor     edx, 0A4h
.text:10004EBE                 mov     byte_1003B03C, dl
.text:10004EC4                 movsx   eax, byte_1003B03C
.text:10004ECB                 cdq
.text:10004ECC                 mov     ecx, 0C8h
.text:10004ED1                 idiv    ecx
.text:10004ED3                 mov     byte_1003B03C, al
.text:10004ED8                 xor     eax, eax
.text:10004EDA                 jmp     short loc_10004F01
.text:10004EDC ; ---------------------------------------------------------------------------
.text:10004EDC                 movsx   edx, byte_1003B03C
.text:10004EE3                 or      edx, 0D2h
.text:10004EE9                 mov     byte_1003B03C, dl
.text:10004EEF                 movsx   eax, byte_1003B03C
.text:10004EF6                 imul    eax, 0C1h
.text:10004EFC                 mov     byte_1003B03C, al

The old version did not know that AL is the lower address of EAX due to the use of string comparison. The new version does a simple check of the register name and it's purpose. Note: there will be some issues if AH is moved into AL or other similar operations. I didn't code that logic in. If we were to back trace the code above we would have the following output.

 0x10004efc mov     byte_1003B03C, al
 0x10004ef6 imul    eax, 0C1h
 0x10004eef movsx   eax, byte_1003B03C
 0x10004ee9 mov     byte_1003B03C, dl
 0x10004ee3 or      edx, 0D2h
 0x10004edc movsx   edx, byte_1003B03C
 0x10004ed3 mov     byte_1003B03C, al
 0x10004ec4 movsx   eax, byte_1003B03C
 0x10004ebe mov     byte_1003B03C, dl
 0x10004eb8 xor     edx, 0A4h
 0x10004eb1 movsx   edx, byte_1003B03C
 0x10004eab mov     byte_1003B03C, cl
 0x10004ea5 imul    ecx, 0A2h
 0x10004e9e movsx   ecx, byte_1003B03C
 0x10004e99 mov     byte_1003B03C, al

The code also tracks how some general purpose instructions manipulate different registers. Most of them are simple due to the x86 standard of instruction destination source format. Not all of them are though. I spent a good amount of time wondering what variables to back trace when following instructions such as DIV. Is EAX or the DIV operand more important back trace? I went with the operand but in the future I plan on creating back split trace that will track EAX and the operand passed to DIV.  Odds are there are still more general purpose instructions I need to check for. XADD is a pretty cool instruction. The shortest Fibonacci can be written using XADD. 

This version was written in order for me to crack an obfuscation technique that I have seen lately. Using backtrace.py and the last line of the dead code blocks I'm able to identify most of the junk code and variables. I'm sure there are flaws (like not tracing push or pops...future release) but so far it is working well for me. I hope the code is of use to others. If you have any recommendations, thoughts, etc please shoot me an email (line 20 of the source code) or ping me on twitter.