Category Archives: malware

Hunting with YARA rules and ClamAV

Did you know the open-source anti-virus ClamAV supports YARA rules? What benefits can this bring to us? One of the important features ClamAV has is the file decomposition capability. Say that the file you want to analyze resides in an archive, or is a packed executable, then ClamAV will unarchive/unpack the file, and run the YARA engine on it.

Let’s start with a simple YARA rule to detect the string “!This program cannot be run in DOS mode”:


When we scan the notepad.exe PE file with this YARA rule, the rule (test1) triggers.

We can do the same with clamscan:


With option -d (database), we bypass ClamAV’s signature database defined in clamd.conf and instruct clamscan to use the YARA rule test1.yara.

As shown in the example above, using clamscan on the PE file notepad.exe also triggers the previously created YARA rule test1.yara: YARA.test1.UNOFFICIAL.

In this example we decided to use just one YARA rule for simplicity, but of course you can use several YARA rules together with ClamAV’s signature database. Just put your YARA rules (extension .yara or .yar) in the database folder.

As mentioned in the introduction, ClamAV can also look inside ZIP files and apply the YARA rules on all files found in archives:


This is something the standard YARA tool can not:


ClamAV’s YARA rules support does however have some limitations. You can not use modules (like the PE file module), or use YARA rule sets that contain external variables, tags, private rules, global rules, …Every rule must also have strings to search for (at least 2 bytes long). Rules with a condition and without strings are not supported.

Let us take a look at a rule to detect if a file is a PE file (see appendix for the details of the rule):


We get a warning from ClamAV: “yara rule contains no supported string”.

As ClamAV does not support rules without string: section. We must add a string to search for, even if the rule logic itself does not need it. Since a PE file contains string MZ, let’s search for that:


This time the rule triggers.

Now, a tricky case: how do we design a rule when we have no single string to search for? The ClamAV developers offer a work-around for such cases: search for any string, and add a condition checking for the presence OR absence of the string. Like this:


We search for string $a = “any string will do”, and we add condition ($a or not $a). It’s a bit of a hack, but it works.

ClamAV’s file decomposition features bring a lot to the table when it comes to YARA scanning, but in some cases it can be a bit too much. For example, ClamAV decompresses the VBA macro streams in Office documents for scanning. This means that we can use YARA rules to scan VBA source code. A simple rule searching for words AutoOpen and Declare would trigger on all Word documents with macros that run automatically and use the Windows API. Which is very nice to detect potential maldocs. However, ClamAV will apply this YARA rule to all files and decomposed/contained files. So if we feed ClamAV all kind of files (not only MS Office files), then the rule could also trigger (for example) on text files or e-mails that contain words AutoOpen and Declare.

If we could limit the scope of selected YARA rules to certain file types, this would help. Currently ClamAV supports signatures that are only applied to given file types (PE files, OLE files, …), unfortunately this is not supported for YARA files.

ClamAV is an interesting engine to run our YARA rules instead of the standard YARA engine. It has some limitations however, that can also generate false positives if we are not careful with the rules we use or design.

Deconstructing the YARA rule

Our example rule to detect a PE file contains just a condition:

uint16(0) = 0x5A4D and uint32(uint32(0x3C)) == 0x00004550

This rule does not use string searches. It checks a couple of values to determine if a file is a PE file. The checks it performs are:

  • see if the file starts with a MZ header, and;
  • contains a PE header.

First check: the first 2 bytes of the file are equal to MZ. uint16(0) = 0x5A4D.

Second check: the field (32-bit integer) at position 0x3C contains a pointer to a PE header. A PE header starts with bytes PE followed by 2 NULL bytes. uint32(uint32(0x3C)) == 0x00004550.

Functions uint16 and uint32 are little-endian, so we have to write the bytes in reverse order: MZ = 0x4D5A -> 0x5A4D

Working with GFI Cloud anti-virus quarantine files

We were recently requested to analyse a sample that was quarantined by GFI Cloud anti-virus. Based on our previous experiences with various anti-virus products we wanted to obtain the sample directly from the quarantine rather than restoring it first. Anti-virus products use quarantine files to safely store files that were detected as being malicious and thus are deleted (or cleaned). Usually, the content of the original (malicious) files is encoded before these are stored in a quarantine file.

These quarantine files are in the first place useful to restore files that were falsely detected as being malicious. From an analyst point of view, these quarantine files are particularly handy to determine if the file is indeed malicious or if it was erroneously quarantined.

When analysing a file that was detected and quarantined by anti-virus, we have found it to be preferable to try to extract the file directly from the quarantine file rather than through the anti-virus management console for three main reasons:

  • Restoring the quarantine file via the anti-virus management console could expose us to the risk of inadvertently opening the potentially malicious file;
  • Some anti-virus products will no longer protect us against a file restored from quarantine, therefor it is best only to restore false positives;
  • The restoring operation through the anti-virus software could also destroy metadata that is created on the quarantined file.

Additionally, malware analysts are typically not the people who would also administer the anti-virus solution. Grabbing these files directly from the quarantine allows the authorised administrators to safely provide potential malicious files to the malware analysts.

GFI  Cloud anti-virus quarantine files are stored inside the following folders:

C:\ProgramData\GFI Software\AntiMalware\Quarantine
C:\Users\All Users\GFI Software\AntiMalware\Quarantine

For each quarantined file, 2 files are created in the with the following structure:


The first file is an XML file containing metadata, such as the MD5 hash of the quarantined file and the original name and location of the file:


The second file contains the encoded, quarantined file (this file is referenced in the XML file):


The encoding used in this quarantine file is simple: each byte is XORed with value 0x33:


When a quarantined file is restored via the GFI management console, the 2 corresponding quarantine files .xml and _ENC2 are deleted and the original file is restored.

Concluding, when you are asked to analyse a sample that has been quarantined by an anti-virus product, we recommend to use the quarantine files directly for analysis, rather than restoring the quarantined file through the anti-virus management console. Using the metadata file you can easily grab the MD5 hash of the sample, and look it up on scanning services like VirusTotal. If the file can not be found there, then decode the _ENC2 file and start analysing it in a malware lab.

Detecting py2exe Executables: YARA Rule

Following the release of the tool to decompile EXE files generated with py2exe, we release a YARA rule to detect such EXE files.

Imagine you receive an executable for analysis. If you go for static analysis, it’s useful to know how the executable was produced. For example, if it was “converted” from Python to EXE, decompiling it with a tool like Hex-Rays decompiler will not help you. Python converters like PyInstaller and py2exe don’t actually convert the Python code to machine instructions to create the executable, rather they generate an executable that contains Python bytecode and deploy a Python runtime environment to execute this bytecode. As such, you need to extract and decompile the bytecode to know what the executable does.

How do you know the executable was produced with py2exe? A good indicator is the presence of a resource named PYTHONSCRIPT. Using YARA rules it is possible to automate this detection:  for this purpose we created YARA rule py2exe.


The idea is that you build a set of YARA rules to classify executables (another good rule to include in this set is a rule to detect PyInstaller generated executables). Then you let these rules run on your executable, and hopefully some rules will trigger and help you identify the type of executable you’re dealing with.

This rule is not an indicator of malware, it just identifies that the executable was generated with py2exe.

Analyzing an Office Maldoc with a VBA Emulator

Today we were informed of another maldoc sample. After a quick look, we were convinced that this sample would be a good candidate for Philippe Lagadec’s VBA emulator ViperMonkey.

The maldoc in a nutshell: when the spreadsheet is opened, the VBA code builds a long JScript script and then executes it. This script contains base64 code for an executable (ransomware Petya GoldenEye version), which is written to disk and executed. The building of the script is done with heavily obfuscated VBA code, so we thought it would be a good idea to try ViperMonkey. ViperMonkey is a free, open-source VBA emulator engine written in Python. You can use it to emulate VBA code on different platforms without MS Office.

Taking a look with at this sample (md5 b231884cf0e4f33d84912e7a452d3a10), we see it contains a large VBA macro stream:



Here is the end of the VBA code:


Let’s analyze this with ViperMonkey: sample.vir

Since there are a lot of VBA statements, it will take ViperMonkey some time (couple of minutes) to parse this:


In the end we get this result:


ViperMonkey doesn’t identify any suspicious actions, but we see that the ActiveX object to be created is “MSScriptControl.ScriptControl”. This string was obfuscated with Chr concatenations, and ViperMonkey was able to parse it. To parse all obfuscated expressions like this, we provide option -e to ViperMonkey: -e sample.vir



We this information, we can understand what subroutine Workbook_Open does: it executes a JScript script stored in variable LQ3.

How to we get the value of LQ3? We can set ViperMonkey’s log level to debug, and log the emulation of all statements. This will produce a lot of output, so it’s beter to redirect this to file. -l debug sample.vir > output.log 2> debug.log

Searching for the last occurrence of string “setting LQ3” in debug.log, we find the JScript script:


This script decodes a BASE64 payload, writes it to disk and then executes it: it’s a new variant of Petya ransomware, GoldenEye.


Malicious Document Targets Belgian Users

In this blog post I want to show how a malicious document (maldoc) behaves and how it can be analyzed with free tools.

A couple of weeks ago many users in Belgium received an e-mail, supposedly from a courier company, informing them that a package was waiting for them (article in Dutch).

This is an example of the e-mail:


This e-mail contains a link to a Word document:


The Word document contains VBA macro code to download and execute malware (downloader behavior). But MS Word contains protection features that prevent the code from running when the document is opened in Word.

First of all, since the Word document was downloaded from the Internet, it will be marked as such, and MS Word will open the document in Protected View:


The user is social-engineered into clicking the Enable Editing button. Because the Word document contains VBA macros, another protection kicks in:


By default, MS Word disables macros for documents of untrusted sources. Only after the user clicks on the Enable Content button, will the VBA macros run.

The user is presented with an empty document, but meanwhile malware was downloaded and executed invisibly to the user:


The VBA macro code can be extracted with a free open-source tool:


When looking at the VBA code (streams 8 and 9), we find subroutine Document_Open in stream 9:


This subroutine is automatically executed when Word opens the document. Subroutine Document_Open contains a call to subroutine TvoFLxE in Module1:


Subroutine TvoFLxE removes the content of the document (this causes the document to become blank, see screenshot), saves the document and calls function HuEJcCj.


In this function we see a call to CreateObject. This is always interesting and requires further analysis. CreateObject takes a string as argument: the name of the object to be created. In this code, the string is returned by function JFZzIeCKcjgPWI which takes 2 arguments: 2 strings that look like gibberish. We see this often in maldocs (malicious documents): strings are obfuscated, e.g. made unreadable. Function JFZzIeCKcjgPWI is a string decoding function, taking strings “MWqSBYcnRrviVpGRtY.ASJhGneqYlVl”and “FYqRnVNvJB1GqMA” and converting them to a meaningful string.

In this maldoc, the string obfuscation method is rather simple. Function JFZzIeCKcjgPWI removes all characters found in string “FYqRnVNvJB1GqMA” from string “MWqSBYcnRrviVpGRtY.ASJhGneqYlVl”. Was is left is string “WScript.Shell”. This Shell object can be used to make Windows execute commands. So we need to deobfus.


When we deobfuscate these strings, we get this PowerShell command:


This PowerShell command downloads an executable (malware) to disk and executes it. The downloaded malware seems to be ransomware, we’ll write another blog post if it has interesting features.

To protect yourself from this kind of attacks, never activate the document (Enable Editing and Enable Content). Anti-virus can also protect you by 1) detecting the maldoc and 2) detecting the executable written to disk. When you don’t trust a document, you can always upload it to VirusTotal.