Monthly Archives: February 2017

Hunting with YARA rules and ClamAV

Did you know the open-source anti-virus ClamAV supports YARA rules? What benefits can this bring to us? One of the important features ClamAV has is the file decomposition capability. Say that the file you want to analyze resides in an archive, or is a packed executable, then ClamAV will unarchive/unpack the file, and run the YARA engine on it.

Let’s start with a simple YARA rule to detect the string “!This program cannot be run in DOS mode”:


When we scan the notepad.exe PE file with this YARA rule, the rule (test1) triggers.

We can do the same with clamscan:


With option -d (database), we bypass ClamAV’s signature database defined in clamd.conf and instruct clamscan to use the YARA rule test1.yara.

As shown in the example above, using clamscan on the PE file notepad.exe also triggers the previously created YARA rule test1.yara: YARA.test1.UNOFFICIAL.

In this example we decided to use just one YARA rule for simplicity, but of course you can use several YARA rules together with ClamAV’s signature database. Just put your YARA rules (extension .yara or .yar) in the database folder.

As mentioned in the introduction, ClamAV can also look inside ZIP files and apply the YARA rules on all files found in archives:


This is something the standard YARA tool can not:


ClamAV’s YARA rules support does however have some limitations. You can not use modules (like the PE file module), or use YARA rule sets that contain external variables, tags, private rules, global rules, …Every rule must also have strings to search for (at least 2 bytes long). Rules with a condition and without strings are not supported.

Let us take a look at a rule to detect if a file is a PE file (see appendix for the details of the rule):


We get a warning from ClamAV: “yara rule contains no supported string”.

As ClamAV does not support rules without string: section. We must add a string to search for, even if the rule logic itself does not need it. Since a PE file contains string MZ, let’s search for that:


This time the rule triggers.

Now, a tricky case: how do we design a rule when we have no single string to search for? The ClamAV developers offer a work-around for such cases: search for any string, and add a condition checking for the presence OR absence of the string. Like this:


We search for string $a = “any string will do”, and we add condition ($a or not $a). It’s a bit of a hack, but it works.

ClamAV’s file decomposition features bring a lot to the table when it comes to YARA scanning, but in some cases it can be a bit too much. For example, ClamAV decompresses the VBA macro streams in Office documents for scanning. This means that we can use YARA rules to scan VBA source code. A simple rule searching for words AutoOpen and Declare would trigger on all Word documents with macros that run automatically and use the Windows API. Which is very nice to detect potential maldocs. However, ClamAV will apply this YARA rule to all files and decomposed/contained files. So if we feed ClamAV all kind of files (not only MS Office files), then the rule could also trigger (for example) on text files or e-mails that contain words AutoOpen and Declare.

If we could limit the scope of selected YARA rules to certain file types, this would help. Currently ClamAV supports signatures that are only applied to given file types (PE files, OLE files, …), unfortunately this is not supported for YARA files.

ClamAV is an interesting engine to run our YARA rules instead of the standard YARA engine. It has some limitations however, that can also generate false positives if we are not careful with the rules we use or design.

Deconstructing the YARA rule

Our example rule to detect a PE file contains just a condition:

uint16(0) = 0x5A4D and uint32(uint32(0x3C)) == 0x00004550

This rule does not use string searches. It checks a couple of values to determine if a file is a PE file. The checks it performs are:

  • see if the file starts with a MZ header, and;
  • contains a PE header.

First check: the first 2 bytes of the file are equal to MZ. uint16(0) = 0x5A4D.

Second check: the field (32-bit integer) at position 0x3C contains a pointer to a PE header. A PE header starts with bytes PE followed by 2 NULL bytes. uint32(uint32(0x3C)) == 0x00004550.

Functions uint16 and uint32 are little-endian, so we have to write the bytes in reverse order: MZ = 0x4D5A -> 0x5A4D

Maldoc: It’s not all VBA these days

Since late 2014 we witness a resurgence of campaigns spamming malicious Office documents with VBA macros. Sometimes however, we also see malicious Office documents exploiting relatively recent vulnerabilities.

In this blog post we look at a malicious MS Office document that uses an exploit instead of VBA.

The sample we received is 65495b359097c8fdce7fe30513b7c637. It exploits vulnerability CVE-2015-2545 which allows remote attackers to execute arbitrary code via a crafted EPS image, aka “Microsoft Office Malformed EPS File Vulnerability”. In this blog post we want to focus on extracting the payload.

A more detailed explanation on the exploit itself can be found here (pdf).


The sample we received is a .docx file, with we can confirm it doesn’t contain VBA code:


With (remember that the MS Office 2007+ file format uses ZIP as a container) we can see what’s inside the document:

Screen Shot 2017-02-06 at 11.03.02.png

Looking at the extensions, we see that most files are XML files. There’s also a .gif and .eps file. The .eps file is unusual. Let’s check the start of each file to see if the extensions can be trusted:


This confirms the extensions we see: 12 XML files, a GIF file and an EPS files. As we know are exploits for EPS, we took a look at this file first:


The file contains a large amount of lines like above, so let’s get some stats first:

Screen Shot 2017-02-06 at 11.52.35.png gives us all kinds of statistics about the content of the file. First of all, it’s a large file for MS Office documents (10MB). And it’s a pure text file (it only contains printable characters (96%) and whitespace (4%)).

There is a large amount of bytes that are hexadecimal characters (66%) and BASE64 characters (93%). Since the hexadecimal character set is a subset of the BASE64 character set, we need more info to determine if the file contains hexadecimal strings or BASE64 strings. But it very likely contains some, as there are parts of the file (buckets) that contain only hexadecimal/BASE64 characters (10240 100%). is a tool to search for BASE64 strings (and other encodings like hexadecimal). We will use it to search for a payload in the EPS file. Since the file is large, we can expect to have a lot of hits. So let’s set a minimum sequence length of 1000:


There are 4 large sequences of BASE64 characters in the document. But as the start of each sequence (field Encoded) contains only hexadecimal characters, it’s necessary to check for hexadecimal encoding too:


With this output, it’s clear that the EPS file contains 4 large hexadecimal strings.

Near the start of the decoded string 2 and 4, we can see characters MZ: this could indicate a PE file. So let’s check:

Screen Shot 2017-02-06 at 12.19.55.png

This certainly looks like a PE file. Let’s pipe it through (we need to skip the first 8 bytes: UUUUffff):


This tells us that it is definitively a PE file. With more details from pecheck’s output, we can say it’s a 64-bit DLL. It has a small overlay:


Since this overlay is actually just 8 bytes (UUUUffff), it’s not an overlay, but a “sentinel” like at the start of the hexadecimal sequence. So let’s remove this:


We did submit this DLL to VirusTotal: 30ec672cfcde4ea6fd3b5b14d6201c43.

It has some interesting strings:


Like the string of the PDB file: GetDownLoader. And a PowerShell command to download and execute an .exe (the URL is readable).

Also notice that the string “This program can not be run in DOS mode.” appears twice. This is a strong indication that this DLL contains another PE file.

Let’s search for it. By using the –cut operator to search for another instance of string MZ, we can cut-out the embedded PE file:


We also submitted this file to VirusTotal: 2938d6eda6cd941e59df3dd54bf8dad8. It is a 32-bit EXE file.

The hexadecimal string with Id 4 found with base64dump also contains a PE file. It is a 32-bit DLL (ce95faf23621a0a705b796c19d9fec44), containing the same 32-bit EXE as the 64-bit DLL:  2938d6eda6cd941e59df3dd54bf8dad8.


With a steady flow of VBA maldocs for over more than 2 years, one would almost forget that Office maldocs with exploits are found in-the-wild too. If you just look for VBA macros in documents you receive, you will miss these exploits.

In this sample, detecting a payload was not too difficult: we found an unusual file (large .eps file) with long hexadecimal strings that decode to PE-files. It’s not always that easy, especially if we are dealing with binary MS Office files (like .doc).

In this post we focus on a static analysis method to extract the payload. When performing analysis on this file yourself, be aware that this maldoc also contains shellcode (strings 1 and 3 found by base64dump) and an exploit to break out of the sandbox (protected view).

Working with GFI Cloud anti-virus quarantine files

We were recently requested to analyse a sample that was quarantined by GFI Cloud anti-virus. Based on our previous experiences with various anti-virus products we wanted to obtain the sample directly from the quarantine rather than restoring it first. Anti-virus products use quarantine files to safely store files that were detected as being malicious and thus are deleted (or cleaned). Usually, the content of the original (malicious) files is encoded before these are stored in a quarantine file.

These quarantine files are in the first place useful to restore files that were falsely detected as being malicious. From an analyst point of view, these quarantine files are particularly handy to determine if the file is indeed malicious or if it was erroneously quarantined.

When analysing a file that was detected and quarantined by anti-virus, we have found it to be preferable to try to extract the file directly from the quarantine file rather than through the anti-virus management console for three main reasons:

  • Restoring the quarantine file via the anti-virus management console could expose us to the risk of inadvertently opening the potentially malicious file;
  • Some anti-virus products will no longer protect us against a file restored from quarantine, therefor it is best only to restore false positives;
  • The restoring operation through the anti-virus software could also destroy metadata that is created on the quarantined file.

Additionally, malware analysts are typically not the people who would also administer the anti-virus solution. Grabbing these files directly from the quarantine allows the authorised administrators to safely provide potential malicious files to the malware analysts.

GFI  Cloud anti-virus quarantine files are stored inside the following folders:

C:\ProgramData\GFI Software\AntiMalware\Quarantine
C:\Users\All Users\GFI Software\AntiMalware\Quarantine

For each quarantined file, 2 files are created in the with the following structure:


The first file is an XML file containing metadata, such as the MD5 hash of the quarantined file and the original name and location of the file:


The second file contains the encoded, quarantined file (this file is referenced in the XML file):


The encoding used in this quarantine file is simple: each byte is XORed with value 0x33:


When a quarantined file is restored via the GFI management console, the 2 corresponding quarantine files .xml and _ENC2 are deleted and the original file is restored.

Concluding, when you are asked to analyse a sample that has been quarantined by an anti-virus product, we recommend to use the quarantine files directly for analysis, rather than restoring the quarantined file through the anti-virus management console. Using the metadata file you can easily grab the MD5 hash of the sample, and look it up on scanning services like VirusTotal. If the file can not be found there, then decode the _ENC2 file and start analysing it in a malware lab.