Author Archives: didiernviso

Hunting with YARA rules and ClamAV

Did you know the open-source anti-virus ClamAV supports YARA rules? What benefits can this bring to us? One of the important features ClamAV has is the file decomposition capability. Say that the file you want to analyze resides in an archive, or is a packed executable, then ClamAV will unarchive/unpack the file, and run the YARA engine on it.

Let’s start with a simple YARA rule to detect the string “!This program cannot be run in DOS mode”:

20170213-155846

When we scan the notepad.exe PE file with this YARA rule, the rule (test1) triggers.

We can do the same with clamscan:

20170213-155926

With option -d (database), we bypass ClamAV’s signature database defined in clamd.conf and instruct clamscan to use the YARA rule test1.yara.

As shown in the example above, using clamscan on the PE file notepad.exe also triggers the previously created YARA rule test1.yara: YARA.test1.UNOFFICIAL.

In this example we decided to use just one YARA rule for simplicity, but of course you can use several YARA rules together with ClamAV’s signature database. Just put your YARA rules (extension .yara or .yar) in the database folder.

As mentioned in the introduction, ClamAV can also look inside ZIP files and apply the YARA rules on all files found in archives:

20170213-155946

This is something the standard YARA tool can not:

20170213-160050

ClamAV’s YARA rules support does however have some limitations. You can not use modules (like the PE file module), or use YARA rule sets that contain external variables, tags, private rules, global rules, …Every rule must also have strings to search for (at least 2 bytes long). Rules with a condition and without strings are not supported.

Let us take a look at a rule to detect if a file is a PE file (see appendix for the details of the rule):

20170213-160204

We get a warning from ClamAV: “yara rule contains no supported string”.

As ClamAV does not support rules without string: section. We must add a string to search for, even if the rule logic itself does not need it. Since a PE file contains string MZ, let’s search for that:

20170213-160236

This time the rule triggers.

Now, a tricky case: how do we design a rule when we have no single string to search for? The ClamAV developers offer a work-around for such cases: search for any string, and add a condition checking for the presence OR absence of the string. Like this:

20170213-160310

We search for string $a = “any string will do”, and we add condition ($a or not $a). It’s a bit of a hack, but it works.

ClamAV’s file decomposition features bring a lot to the table when it comes to YARA scanning, but in some cases it can be a bit too much. For example, ClamAV decompresses the VBA macro streams in Office documents for scanning. This means that we can use YARA rules to scan VBA source code. A simple rule searching for words AutoOpen and Declare would trigger on all Word documents with macros that run automatically and use the Windows API. Which is very nice to detect potential maldocs. However, ClamAV will apply this YARA rule to all files and decomposed/contained files. So if we feed ClamAV all kind of files (not only MS Office files), then the rule could also trigger (for example) on text files or e-mails that contain words AutoOpen and Declare.

If we could limit the scope of selected YARA rules to certain file types, this would help. Currently ClamAV supports signatures that are only applied to given file types (PE files, OLE files, …), unfortunately this is not supported for YARA files.

ClamAV is an interesting engine to run our YARA rules instead of the standard YARA engine. It has some limitations however, that can also generate false positives if we are not careful with the rules we use or design.

Deconstructing the YARA rule

Our example rule to detect a PE file contains just a condition:

uint16(0) = 0x5A4D and uint32(uint32(0x3C)) == 0x00004550

This rule does not use string searches. It checks a couple of values to determine if a file is a PE file. The checks it performs are:

  • see if the file starts with a MZ header, and;
  • contains a PE header.

First check: the first 2 bytes of the file are equal to MZ. uint16(0) = 0x5A4D.

Second check: the field (32-bit integer) at position 0x3C contains a pointer to a PE header. A PE header starts with bytes PE followed by 2 NULL bytes. uint32(uint32(0x3C)) == 0x00004550.

Functions uint16 and uint32 are little-endian, so we have to write the bytes in reverse order: MZ = 0x4D5A -> 0x5A4D

Maldoc: It’s not all VBA these days

Since late 2014 we witness a resurgence of campaigns spamming malicious Office documents with VBA macros. Sometimes however, we also see malicious Office documents exploiting relatively recent vulnerabilities.

In this blog post we look at a malicious MS Office document that uses an exploit instead of VBA.

The sample we received is 65495b359097c8fdce7fe30513b7c637. It exploits vulnerability CVE-2015-2545 which allows remote attackers to execute arbitrary code via a crafted EPS image, aka “Microsoft Office Malformed EPS File Vulnerability”. In this blog post we want to focus on extracting the payload.

A more detailed explanation on the exploit itself can be found here (pdf).

Analysis

The sample we received is a .docx file, with oledump.py we can confirm it doesn’t contain VBA code:

screen-shot-2017-02-06-at-10-51-58

With zipdump.py (remember that the MS Office 2007+ file format uses ZIP as a container) we can see what’s inside the document:

Screen Shot 2017-02-06 at 11.03.02.png

Looking at the extensions, we see that most files are XML files. There’s also a .gif and .eps file. The .eps file is unusual. Let’s check the start of each file to see if the extensions can be trusted:

screen-shot-2017-02-06-at-11-09-46

This confirms the extensions we see: 12 XML files, a GIF file and an EPS files. As we know are exploits for EPS, we took a look at this file first:

screen-shot-2017-02-06-at-11-58-26

The file contains a large amount of lines like above, so let’s get some stats first:

Screen Shot 2017-02-06 at 11.52.35.png

byte-stats.py gives us all kinds of statistics about the content of the file. First of all, it’s a large file for MS Office documents (10MB). And it’s a pure text file (it only contains printable characters (96%) and whitespace (4%)).

There is a large amount of bytes that are hexadecimal characters (66%) and BASE64 characters (93%). Since the hexadecimal character set is a subset of the BASE64 character set, we need more info to determine if the file contains hexadecimal strings or BASE64 strings. But it very likely contains some, as there are parts of the file (buckets) that contain only hexadecimal/BASE64 characters (10240 100%).

base64dump.py is a tool to search for BASE64 strings (and other encodings like hexadecimal). We will use it to search for a payload in the EPS file. Since the file is large, we can expect to have a lot of hits. So let’s set a minimum sequence length of 1000:

screen-shot-2017-02-06-at-12-11-31

There are 4 large sequences of BASE64 characters in the document. But as the start of each sequence (field Encoded) contains only hexadecimal characters, it’s necessary to check for hexadecimal encoding too:

screen-shot-2017-02-06-at-12-14-46

With this output, it’s clear that the EPS file contains 4 large hexadecimal strings.

Near the start of the decoded string 2 and 4, we can see characters MZ: this could indicate a PE file. So let’s check:

Screen Shot 2017-02-06 at 12.19.55.png

This certainly looks like a PE file. Let’s pipe it through pecheck.py (we need to skip the first 8 bytes: UUUUffff):

screen-shot-2017-02-06-at-12-32-10

This tells us that it is definitively a PE file. With more details from pecheck’s output, we can say it’s a 64-bit DLL. It has a small overlay:

screen-shot-2017-02-06-at-12-36-22

Since this overlay is actually just 8 bytes (UUUUffff), it’s not an overlay, but a “sentinel” like at the start of the hexadecimal sequence. So let’s remove this:

screen-shot-2017-02-06-at-13-43-24

We did submit this DLL to VirusTotal: 30ec672cfcde4ea6fd3b5b14d6201c43.

It has some interesting strings:

screen-shot-2017-02-06-at-13-18-04

Like the string of the PDB file: GetDownLoader. And a PowerShell command to download and execute an .exe (the URL is readable).

Also notice that the string “This program can not be run in DOS mode.” appears twice. This is a strong indication that this DLL contains another PE file.

Let’s search for it. By using the –cut operator to search for another instance of string MZ, we can cut-out the embedded PE file:

screen-shot-2017-02-06-at-13-56-17

We also submitted this file to VirusTotal: 2938d6eda6cd941e59df3dd54bf8dad8. It is a 32-bit EXE file.

The hexadecimal string with Id 4 found with base64dump also contains a PE file. It is a 32-bit DLL (ce95faf23621a0a705b796c19d9fec44), containing the same 32-bit EXE as the 64-bit DLL:  2938d6eda6cd941e59df3dd54bf8dad8.

Conclusion

With a steady flow of VBA maldocs for over more than 2 years, one would almost forget that Office maldocs with exploits are found in-the-wild too. If you just look for VBA macros in documents you receive, you will miss these exploits.

In this sample, detecting a payload was not too difficult: we found an unusual file (large .eps file) with long hexadecimal strings that decode to PE-files. It’s not always that easy, especially if we are dealing with binary MS Office files (like .doc).

In this post we focus on a static analysis method to extract the payload. When performing analysis on this file yourself, be aware that this maldoc also contains shellcode (strings 1 and 3 found by base64dump) and an exploit to break out of the sandbox (protected view).

Working with GFI Cloud anti-virus quarantine files

We were recently requested to analyse a sample that was quarantined by GFI Cloud anti-virus. Based on our previous experiences with various anti-virus products we wanted to obtain the sample directly from the quarantine rather than restoring it first. Anti-virus products use quarantine files to safely store files that were detected as being malicious and thus are deleted (or cleaned). Usually, the content of the original (malicious) files is encoded before these are stored in a quarantine file.

These quarantine files are in the first place useful to restore files that were falsely detected as being malicious. From an analyst point of view, these quarantine files are particularly handy to determine if the file is indeed malicious or if it was erroneously quarantined.

When analysing a file that was detected and quarantined by anti-virus, we have found it to be preferable to try to extract the file directly from the quarantine file rather than through the anti-virus management console for three main reasons:

  • Restoring the quarantine file via the anti-virus management console could expose us to the risk of inadvertently opening the potentially malicious file;
  • Some anti-virus products will no longer protect us against a file restored from quarantine, therefor it is best only to restore false positives;
  • The restoring operation through the anti-virus software could also destroy metadata that is created on the quarantined file.

Additionally, malware analysts are typically not the people who would also administer the anti-virus solution. Grabbing these files directly from the quarantine allows the authorised administrators to safely provide potential malicious files to the malware analysts.

GFI  Cloud anti-virus quarantine files are stored inside the following folders:

C:\ProgramData\GFI Software\AntiMalware\Quarantine
C:\Users\All Users\GFI Software\AntiMalware\Quarantine

For each quarantined file, 2 files are created in the with the following structure:

QR{63D882D7-FE51-4FF5-9491-0123456789AB}53430.xml
{93F3EB8C-482C-4B27-8A78-0123456789AB}_ENC2

The first file is an XML file containing metadata, such as the MD5 hash of the quarantined file and the original name and location of the file:

screen-shot-2017-01-30-at-15-22-10

The second file contains the encoded, quarantined file (this file is referenced in the XML file):

screen-shot-2017-01-30-at-14-49-06

The encoding used in this quarantine file is simple: each byte is XORed with value 0x33:

screen-shot-2017-01-30-at-14-49-27

When a quarantined file is restored via the GFI management console, the 2 corresponding quarantine files .xml and _ENC2 are deleted and the original file is restored.

Concluding, when you are asked to analyse a sample that has been quarantined by an anti-virus product, we recommend to use the quarantine files directly for analysis, rather than restoring the quarantined file through the anti-virus management console. Using the metadata file you can easily grab the MD5 hash of the sample, and look it up on scanning services like VirusTotal. If the file can not be found there, then decode the _ENC2 file and start analysing it in a malware lab.

Detecting py2exe Executables: YARA Rule

Following the release of the tool to decompile EXE files generated with py2exe, we release a YARA rule to detect such EXE files.

Imagine you receive an executable for analysis. If you go for static analysis, it’s useful to know how the executable was produced. For example, if it was “converted” from Python to EXE, decompiling it with a tool like Hex-Rays decompiler will not help you. Python converters like PyInstaller and py2exe don’t actually convert the Python code to machine instructions to create the executable, rather they generate an executable that contains Python bytecode and deploy a Python runtime environment to execute this bytecode. As such, you need to extract and decompile the bytecode to know what the executable does.

How do you know the executable was produced with py2exe? A good indicator is the presence of a resource named PYTHONSCRIPT. Using YARA rules it is possible to automate this detection:  for this purpose we created YARA rule py2exe.

20170109-104608

The idea is that you build a set of YARA rules to classify executables (another good rule to include in this set is a rule to detect PyInstaller generated executables). Then you let these rules run on your executable, and hopefully some rules will trigger and help you identify the type of executable you’re dealing with.

This rule is not an indicator of malware, it just identifies that the executable was generated with py2exe.

Decompiling py2exe Executables

We had to decompile an executable (.exe) generated with py2exe for Python 3.

py2exe takes a Python program and generates a Windows executable. This .exe file contains the Python bytecode of the program, a Python interpreter and all the necessary modules. The bytecode is stored as a resource inside the .exe file.

unpy2exe will extract the Python bytecode as a pyc file from the .exe file, which can then be decompiled with uncompyle6. Unfortunately, unpy2exe does not support files generated with py2exe for Python 3.

We release our program decompile-py2exe to handle py2exe Python 3 executables. It is simple to use:

20170102-110208

decompile-py2exe takes an executable as argument, extracts the Python bytecode and decompiles it with uncompyle6, all in one step. The executable can also be passed via stdin or inside a (password protected) ZIP file. Be sure to use Python 3 to run decompile-py2exe.

 

PDF Analysis: Back To Basics

When you receive a suspicious PDF these days, it could be just a scam without malicious code. Let’s see how to analyze such samples with PDF Tools.

As always, we first take a look with pdfid:

20161228-111628

There’s nothing special to see, but we have to check the content of the Stream Objects (/ObjStm):

20161228-111805

Still nothing special to see. This could be a malicious PDF document with a pure binary exploit (e.g. without using JavaScript), but nowadays, it’s more likely that we received a PDF containing links to a malicious website, like a phishing website.

To check for URLs, use option search (-s) to search for the string uri (the search option is not case sensitive):

20161228-111841

And indeed we find objects with URIs. These are links tied to a rectangle, thus a zone that must be clicked by the user to “activate” the URL: Adobe Reader will display a warning, and after user acceptance, the default browser will be launched to visit the given URL.

pdf-parser also has an option to select key-value pairs from dictionaries of PDF objects: option -k. This is useful to generate a quick overview. This option is case sensitive, and the full keyname must be provided:

20161228-111902

When we open the PDF document with Adobe Reader, we get visual confirmation that it is a phishing PDF:

20161213-174821.png

And this is the phishing website:

20161213-175330.png

Conclusion: if pdfid reports nothing suspicious, before looking for binary exploits (for example with pdf-parser’s YARA support), search first for URIs with pdf-parser.