Analyzing an Office Maldoc with a VBA Emulator

Today we were informed of another maldoc sample. After a quick look, we were convinced that this sample would be a good candidate for Philippe Lagadec’s VBA emulator ViperMonkey.

The maldoc in a nutshell: when the spreadsheet is opened, the VBA code builds a long JScript script and then executes it. This script contains base64 code for an executable (ransomware Petya GoldenEye version), which is written to disk and executed. The building of the script is done with heavily obfuscated VBA code, so we thought it would be a good idea to try ViperMonkey. ViperMonkey is a free, open-source VBA emulator engine written in Python. You can use it to emulate VBA code on different platforms without MS Office.

Taking a look with oledump.py at this sample (md5 b231884cf0e4f33d84912e7a452d3a10), we see it contains a large VBA macro stream:

20161207-140153

 

Here is the end of the VBA code:

20161207-140222

Let’s analyze this with ViperMonkey:

vmonkey.py sample.vir

Since there are a lot of VBA statements, it will take ViperMonkey some time (couple of minutes) to parse this:

20161207-134559

In the end we get this result:

20161207-135220

ViperMonkey doesn’t identify any suspicious actions, but we see that the ActiveX object to be created is “MSScriptControl.ScriptControl”. This string was obfuscated with Chr concatenations, and ViperMonkey was able to parse it. To parse all obfuscated expressions like this, we provide option -e to ViperMonkey:

vmonkey.py -e sample.vir

20161207-140124

 

We this information, we can understand what subroutine Workbook_Open does: it executes a JScript script stored in variable LQ3.

How to we get the value of LQ3? We can set ViperMonkey’s log level to debug, and log the emulation of all statements. This will produce a lot of output, so it’s beter to redirect this to file.

vmonkey.py -l debug sample.vir > output.log 2> debug.log

Searching for the last occurrence of string “setting LQ3” in debug.log, we find the JScript script:

20161207-141806

This script decodes a BASE64 payload, writes it to disk and then executes it: it’s a new variant of Petya ransomware, GoldenEye.

 

PDF URIs

I was handed an interesting PDF document. It doesn’t contain malicious code, yet it generates network traffic. Let me explain how this is achieved.

Creating a PDF that makes a HTTP(S) connection to a website is easy. There’s no need to use an exploit, not even JavaScript. You just have to use a URI object:

20161128-103231

On its own, this object will do nothing. An action is needed to have this URI requested. If you want this URI to be requested when the PDF document is opened, you could add an /OpenAction:

20161128-103504

Adobe Reader will not let this connection happen silently. The user will be prompted before the TCP connection (to subdomain.nviso.be in our example) is established:

screen-shot-2016-11-28-at-10-44-35

But even before the user clicks one of the buttons, Adobe Reader will do a DNS request for this domain (nviso.be):

screen-shot-2016-11-28-at-10-43-43

If the domain does not resolve to an IP address, Adobe Reader will do another DNS request for the subdomain (subdomain.nxdomains.be in this example, where nxdomains.be does not resolve to an IP address):

screen-shot-2016-11-28-at-10-46-50

In this case, the warning presented to the user is slightly different:

screen-shot-2016-11-28-at-10-47-05

This type of PDF document can be used to track users: when the document is opened, a DNS request is performed. If the request is a FQDN unique to the PDF document, then such a DNS request logged by the DNS server is a sure indicator that the PDF document has been opened. Remark that this DNS request will have a source IP address from a DNS server, not from the user’s machine.

If the user allows a connection to be made, then a TCP connection will be established between the user’s machine and the web server.

In a corporate environment with HTTP(S) proxies, the DNS requests can be prevented from going to the Internet.

Malicious Document Targets Belgian Users

In this blog post I want to show how a malicious document (maldoc) behaves and how it can be analyzed with free tools.

A couple of weeks ago many users in Belgium received an e-mail, supposedly from a courier company, informing them that a package was waiting for them (article in Dutch).

This is an example of the e-mail:

20161114-142948

This e-mail contains a link to a Word document:

20161114-142226

The Word document contains VBA macro code to download and execute malware (downloader behavior). But MS Word contains protection features that prevent the code from running when the document is opened in Word.

First of all, since the Word document was downloaded from the Internet, it will be marked as such, and MS Word will open the document in Protected View:

20161114-143404

The user is social-engineered into clicking the Enable Editing button. Because the Word document contains VBA macros, another protection kicks in:

20161114-143421

By default, MS Word disables macros for documents of untrusted sources. Only after the user clicks on the Enable Content button, will the VBA macros run.

The user is presented with an empty document, but meanwhile malware was downloaded and executed invisibly to the user:

20161114-143442

The VBA macro code can be extracted with a free open-source tool: oledump.py.

20161114-153022

When looking at the VBA code (streams 8 and 9), we find subroutine Document_Open in stream 9:

20161114-153526

This subroutine is automatically executed when Word opens the document. Subroutine Document_Open contains a call to subroutine TvoFLxE in Module1:

20161114-155109

Subroutine TvoFLxE removes the content of the document (this causes the document to become blank, see screenshot), saves the document and calls function HuEJcCj.

20161114-155123

In this function we see a call to CreateObject. This is always interesting and requires further analysis. CreateObject takes a string as argument: the name of the object to be created. In this code, the string is returned by function JFZzIeCKcjgPWI which takes 2 arguments: 2 strings that look like gibberish. We see this often in maldocs (malicious documents): strings are obfuscated, e.g. made unreadable. Function JFZzIeCKcjgPWI is a string decoding function, taking strings “MWqSBYcnRrviVpGRtY.ASJhGneqYlVl”and “FYqRnVNvJB1GqMA” and converting them to a meaningful string.

In this maldoc, the string obfuscation method is rather simple. Function JFZzIeCKcjgPWI removes all characters found in string “FYqRnVNvJB1GqMA” from string “MWqSBYcnRrviVpGRtY.ASJhGneqYlVl”. Was is left is string “WScript.Shell”. This Shell object can be used to make Windows execute commands. So we need to deobfus.

20161114-155207

When we deobfuscate these strings, we get this PowerShell command:

20161114-162354

This PowerShell command downloads an executable (malware) to disk and executes it. The downloaded malware seems to be ransomware, we’ll write another blog post if it has interesting features.

To protect yourself from this kind of attacks, never activate the document (Enable Editing and Enable Content). Anti-virus can also protect you by 1) detecting the maldoc and 2) detecting the executable written to disk. When you don’t trust a document, you can always upload it to VirusTotal.

 

Testimonial of Stefaan Truijen

Hi, I’m Stefaan Truijen and in 2014-2015 I did my master thesis at the department of computer science at KULeuven. I assessed the susceptibility of modern web browsers to RAM scrapers in collaboration with NVISO. Security had always been one of my passions, so I was excited to get started.

Writing a thesis is an intensive process. Happily, I was able to rely on both Arne (NVISO) and Raoul (KULeuven) throughout the entire year for advice/brainstorming.

First, I needed to get an overview of prior research on memory scraping. Arne supplied me with a couple of initial research documents and references, and I reviewed any new material I found with Arne and Raoul almost weekly.

After some preliminary tests, I had to determine how I would continue and I wanted to contribute at least a little bit to fighting memory scrapers. I was able to bounce a few ideas off Arne and Raoul. In the end we decided that, since I was unable to find any prior research that had already assessed the size of the problem – i.e. memory scraping web browsers – measuring the degree of susceptibility of each of the three most commonly used web browsers (Chrome, Firefox, IE) was the most interesting angle.

In order to get a sufficient amount of data to form a solid conclusion, I ran thousands of experiments. Of course, running thousands of experiments manually is not very efficient and it affects reproducibility of the results. Therefore I learned how to work with new tools. Most relevant were Selenium’s automated testing framework for web browsers and the Windows API. Whenever I had questions, Arne and Raoul gladly answered them.

Now that the dust has settled, I can say that I have acquired a deeper understanding of low level security, more specifically memory scraping, and the consequences of relatively relaxed memory and API access policies that I did not have before. I am very satisfied with the result of my thesis and NVISO played an important role in realizing it!

Testimonial of Nick Van Haver

Hi, I’m Nick Van Haver and I want to reflect briefly on my master thesis which I have worked out in cooperation with NVISO and the Ghent University. NVISO helped me in many ways while providing me with a lot of freedom to choose the course of my thesis. They showed me a lot of trust and respect, which I truly appreciate.

The topic of my thesis research was “The Detection of Client-side Vulnerabilities in Web Applications through the Browser”. This topic is deeply rooted in the field of web application security, and thus lead me far beyond its basics. At first I had quite some experience with the development of web applications, but far less with relation to their security aspects.

When looking into a new field or topic, it is hard to find the right sources and high quality references. The right resources can turn a week’s worth of work into a single day. NVISO provided me with these resources and handed me tools, enabling me to educate myself in the web application security field and to make the most out of my thesis. Thanks to NVISO, I had contact with some of the big names in the industry such as Google, Minded Security, Portswigger and many others. Furthermore they assisted me with their expertise in security during meetings.

In the end, my research resulted in a fairly high score of 16 out of 20. Because of these grades I graduated magna cum laude as a Civil Engineer in Computer Sciences. At the beginning of my thesis my knowledge on web application security was rather limited. Now I feel accomplished in this field of security and I now know where to find the most correct information when dealing with web application vulnerabilities. I now also feel more confident when contacting external parties.

I can highly recommend working with NVISO. Choosing to work together with them for your master thesis ensures you that the topic will be both challenging and interesting. You will receive the support and resources you need to achieve your goal. It really is a worthwhile experience! Once the results of my thesis are public, they will be shared with the community!

Cyber Security Challenge Belgium 2015 – Solving the NVISO Lottery challenge

This is the fourth and final blog post in the Cyber Security Challenge Belgium 2015 (CSCBE) solutions series. This time, we’re taking a look at one of the more programming oriented challenges: The NVISO Lottery.

The NVISO Lottery

The students were given the following info:

Come and throw away your money at the NViso Lottery!

They also received the IP address for the NVISO Lottery service.

Gathering information

Once again we take out our trusty pocket knife named netcat.
We have to guess the correct number from a set of 1000 possibilities. If we guess the right number, we get $75, but each guess costs us $10. If we want to win the prize, we have to earn $1337. This means we have to guess correctly at least 20 times without making too many mistakes. Let’s try!
We weren’t able to guess the correct number. We do get an ID, which we can use to get feedback from the NVISO casino. The ID looks completely random, but the last character (=) is a typical tell-tale for Base64 encoding. The equals sign is used as extra padding when the amount of bytes to encode is not dividable by 8. Decoding this using the Base64 algorithm gives the following:
The decoded string doesn’t give us the answer to the random number, but the content does appear to be structured and further decoding may be necessary.
As was explained at the beginning of this write up, this challenge is programming oriented. If you’ve worked with the Python language a lot, you may already recognize the decoded string as being a specific python file format.
In Python, you can use the Pickle module to serialize data objects. Serializing (or marshaling) objects is the processes of converting arbitrary data to a byte stream. This byte stream can then safely be transported over a network, or stored in a file.
Serializing is a reversible process. That means we can deserialize (or ‘unpickle’) the data we got from the Base64 decode:
 This is very promising. The unpickled value consists of a nested list with three random numbers.

Random number generators

Lets take a look at how random numbers are usually generated. Algorithms can not generate truly random numbers. An algorithm will always perform the exact same steps given the same input. Many software implementations therefore rely on Pseudo-Random Number Generators (PRNGs). These algorithms do not generate true random numbers, but they do share many properties with true random numbers. For example, a good PRNG will make it extremely difficult to determine the next random number based on the random number that was just received.
An example of a PRNG is a Linear Congruential Generator (LCG). The most simple LCG needs three numbers to calculate a random number sequence. These three numbers are called the seed of the LCG. Given these numbers (a, c, m), the LCG will calculate the sequence as follows:
The next number in the sequence is calculated by multiplying the current number by a, adding the result to c and taking the remainder of division by m.
From a programming point of view, PRNGs are very useful as they can be reverted to a certain state. If the application suddenly crashes based on a specific random input, it would be very hard to debug the application if the same random input can not be generated. For security critical implementations, of course, a PRNG should not be used.
Since we have to guess a random number, it may be a good guess to say that the decoded value is the seed for a PRNG.

Exploiting the vulnerability

Python allows the programmer to set the state of the random number generator. To confirm we’re on the right track, let’s print out the current state of the default random number generator:
Unfortunately, this seed appears to be a lot bigger than the seed we recovered from the lottery service. Python’s random module actually uses the Mersene Twister algorithm, which is not an LCG,
But there is good news, the output of the getstate() command is very similar to our decoded value. Python has a few other random libraries: random.SystemRandom() and random.WichmannHill(). According to the documentation, SystemRandom() doesn’t have a getstate() method. WichmannHill() does:
This is exactly what we were looking for. By using setstate() with our decoded lottery ID, we should be able to predict the number that will be generated:
Great! That was the solution we were looking for. Because we get the ticket ID before we have to enter our guess, we can predict the value that the server will expect and get our prize!
We could do this manually since there’s no timeout for our answer, but we can just as easily create a python script that does this for us:
We got the flag, which is “I’m_going_to_be_a_professional_gambler!

Statistics

We had many different connections to the server, so a lot of teams tried to solve the challenge. Most teams told us they managed to decipher the Base64 encoding, and some teams also found the Python pickle format. In the end, only four teams were able to completely solve this challenge: HacknamStyle Jr, ISW, Turla Tech Support and Vrije Universiteit Leuven. All of these teams made it to the finals.

Final thoughts

This challenge was partly aimed at testing the student’s programming skills. Although Python is a very popular programming language, some students may have never used it before, making this challenge a little bit harder. Even so, a security researcher will often encounter unknown file formats or protocols, and finding out what the data means or how to use it may be critical to a successful security audit or forensics investigation. Being able to automate custom tasks can often save lots of time or solve problems that would be impossible to do manually. Having some experience with any programming language is an invaluable tool in every security expert’s toolkit!

Cyber Security Challenge Belgium 2015 – Solving the One Way challenge

This is the third blog post in the Cyber Security Challenge Belgium 2015 (CSCBE) solutions series. This time, we’re taking on a very technical challenge: One Way.

Data Extraction

The challenge

The following challenge description was given to the students:

We want our employees to be able to send us confidential information which only we can decrypt. Since we don’t believe in PKI (we have our reasons!), we made our own crypto system (homemade is always better, right!). To prevent tampering, we took some precautions: A salt is added to each request and the IV is chosen at random for every connection. Take a look at the given clientFramework.py file for more info on how to use our crypto system.

The accompanying clientFramework.py file contains some helper methods so that the students could focus on the actually encryption logic instead of fighting with python to be able to correctly communicate with the server.

The details

The python file contains some information about the server, from which the following information can be deduced:

  • The Initiation Vector (IV) is chosen at random for every session
  • The IV is updated after every encryption request according to a known algorithm
  • The server encrypts the given plain text as follows: encryption = encrypt(plain text + FLAG, IV)
  • The encryption protocol is AES in CBC mode with blocks of 16 characters
  • The FLAG consists of 8 lowercase ASCII characters
  • The used IV is returned together with the encrypted string

The IV is randomly chosen at the start of the session, but the client can request multiple encryption operations during each session. After each encryption, the IV is updated according to a known function. That means that we can calculate the IV that will be used for the next iteration. This will prove to be very important in what follows.

Encryption 101

Let’s take a look at how the Cipher Block Chaining (CBC) algorithm works, which is what the challenge is using.
The following image shows the working of CBC:
Image taken from Wikipedia
The plaintext is split up into blocks of 16 bytes each and each block is encrypted separately. In order to counter certain attacks which are possible against the Electronic CodeBook algorithm (ECB), each plaintext is first XOR’ed with the ciphertext of the previous block. Because the first block doesn’t have a previous block which it can use to XOR with, an IV is used. The IV should always be random and unpredictable. 
After the plaintext has been encrypted, the IV has served its purpose and it no longer has to be secret. In this challenge, the IV is returned to the client together with the encrypted text.

Finding the flaw

You may have already noticed a small but very important mismatch between how CBC should be implemented, and how the challenge server implements CBC: the IV should always be random and unpredictable. The server’s IV is completely random and unpredictable, but only for the first encryption request. For every subsequent request, the IV can be calculated from the original IV, which creates a serious security flaw.
Take another look at the CBC diagram. By knowing which IV will be used to XOR with the plaintext, we can prevent the IV from having effect. If we XOR the plaintext with the predicted IV before sending it to the server, the server will apply the XOR again which undoes our original XOR:
plaintext \oplus IV \oplus IV = plaintext \oplus (IV \oplus IV) = plaintext \oplus 0 = plaintext.
The second flaw is that the flag is appended to the given plaintext. Since we have full control over the plaintext, we can decide at which position in the plaintext the flag will be, and hence we can control where it will end up in the encrypted string.

Exploiting the flaw

If we have complete control over which plaintext is entered into the first encrypted block, we can get the encrypted value of any given plaintext. This means we can create a rainbow table for every possible plaintext consisting of 16 bytes:
aaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaab
aaaaaaaaaaaaaaac
//…
zzzzzzzzzzzzzzzx
zzzzzzzzzzzzzzzy
zzzzzzzzzzzzzzzz
Remember that we have to XOR the plaintext string with the predicted IV before sending it to the server.
Before encrypting the plaintext, the server appends the flag to our input. If we only send 15 characters to the server, the server will encrypt aaaaaaaaaaaaaaaX where X is the first character of the flag. 
We can now look up the encrypted value of aaaaaaaaaaaaaaaX in our rainbow table. This will match to 
aaaaaaaaaaaaaaas and we now know that the first character of the flag is an ‘s’. 
To get the second character, we need to create a rainbow table based on the aaaaaaaaaaaaaas prefix (which has 14 a’s). When the table is complete, we can ask the server to encrypt “aaaaaaaaaaaaaas”. The encrypted string will contain the second character of the flag in the last position and we can look it up in our rainbow table. The encrypted string will match to aaaaaaaaaaaaaasa, so ‘a’ is the next character of the flag. We can keep doing this for every character:
aaaaaaaaaaaaaaas
aaaaaaaaaaaaaasa
aaaaaaaaaaaaasal
aaaaaaaaaaaasalt
aaaaaaaaaaasaltm
aaaaaaaaaasaltmi
aaaaaaaaasaltmin
aaaaaaaasaltmine
aaaaaaasaltmine0
aaaaaasaltmine00
aaaaasaltmine000
aaaasaltmine0000
aaasaltmine00000
aasaltmine000000
asaltmine0000000
saltmine00000000
After a few iterations, the padding zeros start showing up in the solution. These extra zeros after the flag are just padding that was added by the server in order to have a complete block to encrypt. When we’ve removed all the prefixed a’s, we end up with the flag, which is saltmine.

Padding attack

The attack we used above is a form of padding oracle attack. This attack is possible because of two distinct vulnerabilities in the server algorithm: We can predict the IV, and we can modify the padding in front of the flag. By combining these two flaws, we are able to get the flag, which would have been impossible without either of them.
In November 2014, the POODLE attack was discovered, which uses a padding oracle attack against SSL3.0.

Statistics

Nine of the participating teams were able to solve this challenge. Eight of these teams were able to secure a place in the CSCBE finals. There were a lot of random guesses for the solution of this challenge. Some even came close (“saltflag” or “salted00”) but luckily, only the teams who actually solved the challenge were able to get the points.

Final thoughts

A strong cryptographic algorithm is only effective when it is used correctly. The challenge demonstrated that small flaws can a have disastrous effects. Although cryptography can be very daunting at first, it certainly pays off to invest some time in to understanding how different algorithms work and how they should be used. Even if you don’t fully understand the internal workings of the AES encryption method, you may still be able to find flaws in the way it is used and thereby be able to break the encryption.