Author Archives: Daan Raman

My Internship Experience at NVISO – by Thibaut Flochon

Hi! I’m Thibaut, a bachelor student in Information Technology at Hénallux. As a final-year student, I have had the opportunity to do my internship at NVISO for 4 months. Let me share this experience with you!

Why NVISO?

The year before my internship, I took part in the Cyber Security Challenge Belgium 2017 with some friends. Playing with Wireshark captures, trying to brute force passwords, analysing files, all are activities that really made me want to know more about the cyber security field. That’s why the following year, when the time had come to do my internship, I directly thought about cyber security and told myself “Why not do my internship in the company organising the annual competition? “ And here I was, intern at NVISO!

file6-3.jpeg

Quick! Let’s all look professional for the photo! 😀 (I’m sitting in the right side of the sofa, joined by 3 of my internship colleagues at our office.)

What did I do?

The goal of my internship was to create a cyber attack scenario within NVISO’s training infrastructure. With basically no knowledge about post-compromise attacks against a Windows domain, a great learning journey was waiting for me.

At first, I had a meeting with my internship mentor, Wouter, to talk a bit about the “what”, and “why” of my project. I could then start to gather some info on the “how”, in order to figure out possibilities and tools I could use to create that cyber attack scenario.

Once I found enough info and resources for my project, Wouter kindly guided me to the right employee to ask questions to, so that I could sort out what I found and keep the necessary, to be sure to create something useful for the company. Following that meeting, lunch time! After having grabbed a sandwich with other employees and interns, we ate our meal all together in the cafeteria, as is NVISO team’s habit.

As the weeks passed and through regular follow-ups and meetings, I was glad to see my project really move forward; the cyber attack scenario was almost finished.

Time flies! A few weeks later, it was already the end of my internship. I made a PowerPoint presentation presenting the finished scenario and the results of my internship!

Conclusion

I have had a great time during that internship; I learnt a lot about attacks against Windows domains, I improved my soft skills by explaining technical issues in an understandable way, and I laughed a lot with my colleagues! Until a next time!

Filtering out top 1 million domains from corporate network traffic

During network traffic analysis and malware investigations, we often use IP and domain reputation lists to quickly filter out traffic we can expect to be benign. This typically includes filtering out traffic related to the top X most popular websites world-wide.

For some detection mechanisms, this technique of filtering out popular traffic is not recommended – for example, for the detection of phishing e-mail addresses (which are often hosted on respected & popular e-mail hosting services such as Gmail). However for the detection of certain network attacks, this can be very effective at filtering out unnecessary noise (for example, the detection of custom Command & Control domains that impersonate legitimate services).

For this type of filtering, the Alexa top 1 million was often used in the past, however this free service was recently retired. A more recent & up to date alternative is the Majestic Million.

We ran some tests against a few million DNS requests in a corporate network environment, and measured the impact of filtering out the most popular domain names.

Top 1000.PNG

Progressively filtering out the Top 1000 most popular domains from network traffic. The Y-axis shows the % of traffic that matches the Top list for a given size.

When progressively filtering out more domains from the top 1000 of most popular domains, we notice a steep increase of matched traffic, especially for the top 500 domains. By just filtering out the top 100 most popular domains (including Facebook, Google, Instagram, Twitter, etc.) we already filter out 12% of our traffic. For the top 1000, we arrive at 24%.

We can also do the same exercise for the top 1 million of most popular domains, resulting in the graph below.

Top 1000000.PNG

Progressively filtering out the Top 1.000.000 most popular domains from network traffic. The Y-axis shows the % of traffic that matches the Top list for a given size.

When progressively filtering out all top 1 million domains from DNS traffic, we notice a progressive increase for the top 500.000 (with a noticeable jump around that mark for this particular case, caused by a few popular region-specific domains). Between the top 500.000 and the top 1.000.000, only an additional 11% of traffic is filtered out.

Based on the above charts, we see that the “sweet spot” in this case lies around 100.000, where we still filter out a significant amount of traffic (49%) while lowering the risk of filtering out too many domains which could be useful for our analysis. Finding this appropriate threshold greatly depends on the type of analysis you are doing, as well as the particular environment you are working in; however, we have seen similarly shaped charts for other corporate environments, too, with non-significant differences in terms of the observed trends.

As you can see, this is a highly effective strategy for filtering out possible less interesting traffic for investigations. Do note that there’s a disclaimer to be made: several of the 1Million top domains are expired and can thus be re-registered by adversaries and subsequently used for Command & Control! As always in cyber security, there’s no silver bullet but we hope you enjoyed our analysis :-).

About the author
Daan Raman is in charge of NVISO Labs, the research arm of NVISO. Together with the team, he drives initiatives around innovation to ensure we stay on top of our game; innovating the things we do, the technology we use and the way we work form an essential part of this. Daan doesn’t like to write about himself in third-person. You can contact him at draman@nviso.be or find Daan online on Twitter and LinkedIn.

Going beyond Wireshark: experiments in visualising network traffic

Introduction
At NVISO Labs, we are constantly trying to find better ways of understanding the data our analysts are looking at. This ranges from our SOC analysts looking at millions of collected data points per day all the way to the malware analyst tearing apart a malware sample and trying to make sense of its behaviour.

In this context, our work often involves investigating raw network traffic logs. Analysing these often takes a lot of time: a 1MB network traffic capture (PCAP) can easily contain several hundred different packets (and most are much larger!). Going through these network events in traditional tools such as Wireshark is extremely valuable; however they are not always the best to quickly understand from a higher level what is actually going on.

In this blog post we want to perform a series of experiments to try and improve our understanding of captured network traffic as intuitively as possible, by exclusively using interactive visualisations.

Screen Shot 2018-02-15 at 00.07.26.png

A screen we are all familiar with – our beloved Wireshark! Unmatched capabilities to analyse even the most exotic protocols, but scrolling & filtering through events can be daunting if we want to quickly understand what is actually happening inside the PCAP. In the screenshot a 15Kb sample containing 112 packets.

For this blog post we will use this simple 112 packet PCAP to experiment with novel ways of visualising and understanding our network data. Let’s go!

Experiment 1 – Visualising network traffic using graph nodes
As a first step, we simply represent all IP packets in our PCAP as unconnected graph nodes. Each dot in the visualisation represents the source of a packet. A packet being sent from source A to destination B is visualised as the dot visually traveling from A to B. This simple principle is highlighted below. For our experiments, the time dimension is normalised: each packet traveling from A to B is visualised in the order they took place, but we don’t distinguish the duration between packets for now.

network_traffic_as_nodes_raw.mov.gif

IP traffic illustrated as interactive nodes.

This visualisation already allows us to quickly see and understand a few things:

  • We quickly see which IP addresses are most actively communicating with each other (172.29.0.111 and 172.29.0.255)
  • It’s quickly visible which hosts account for the “bulk” of the traffic
  • We see how interactions between systems change as time moves on.

A few shortcomings of this first experiment include:

  • We have no clue on the actual content of the network communication (is this DNS? HTTP? Something else?)
  • IP addresses are made to be processed by a computer, not by a human; adding additional context to make them easier to classify by a human analyst would definitely help.

Experiment 2 – Adding context to IP address information
By using basic information to enrich the nodes in our graph, we can aggregate all LAN traffic into a single node. This lets us quickly see which external systems our LAN hosts are communicating with:

network_traffic_with_context_raw.mov.gif

Tagging the LAN segment clearly shows which packets leaves the network.

By doing this, we have improved our understanding of the data in a few ways:

  • We can very quickly see which traffic is leaving our LAN, and which external (internet facing) systems are involved in the communication.
  • All the internal LAN traffic is now represented as 1 node in our graph; this can be interesting in case our analyst wants to quickly check which network segments are involved in the communication.

However, we still face a few shortcomings:

  • We still don’t really have a clue on the actual content of the network communication (is this DNS? HTTP? Something else?)
  • We don’t know much about the external systems that are being contacted.


Experiment 3 – Isolating specific types of traffic
By applying simple visual filters to our simulation (just like we do in tools like Wireshark), we can make a selection of the packets we want to investigate. The benefit of doing this is that we can easily focus on the type of traffic we want to investigate without being burdened with things we don’t care about at that point in time of the analysis.

In the example below, we have isolated DNS traffic; we quickly see that the PCAP contains communication between hosts in our LAN (remember, the LAN dot now represents traffic from multiple hosts!) and 2 different DNS servers.

network_traffic_rogue_dns_raw.mov.gif

When isolating DNS traffic in our graph, we clearly see communication with a non-corporate DNS server.

Once we notice that the rogue DNS server is being contacted by a LAN host, we can change our visualisation to see which domain name is being queried by which server.We also conveniently attached the context tag “Suspicious DNS server” to host 68.87.71.230 (the result of our previous experiment). The result is illustrated below. It also shows that we are not only limited by showing relations between IP addresses; we can for example illustrate the link between the DNS server and the hosts they query.

network_traffic_rogue_dns_domains_raw.mov.gif

We clearly see the suspicious DNS server making request to 2 different domain names. For each request made by the suspicious DNS server, we see an interaction from a host in the LAN.

Even with larger network captures, we can use this technique to quickly visualise connectivity to suspicious systems. In the example below, we can quickly see that the bulk of all DNS traffic is being sent to the trusted corporate DNS server, whereas a single hosts is interacting with the suspicious DNS server we identified before.

network_traffic_rogue_dns_lots_of_traffic_raw.mov.gif

So what’s next?
Nothing keeps us from entirely changing the relation between two nodes; typically, we are used to visualising packets as going from IP address A to IP address B; however, more exotic visualisations as possible (think about the relations between user-agents and domains, query lengths and DNS requests, etc.); in addition, there is plenty of opportunity to add more context to our graph nodes (link an IP address to a geographical location, correlate domain names with Indicators of Compromise, use whitelists and blacklists to more quickly distinguish baseline vs. other traffic, etc.). These are topics we want to further explore.

Going forward, we plan on continuing to share our experiments and insights with the community around this topic! Depending on where this takes us, we plan on releasing part of a more complete toolset to the community, too.

our_rats_love_cheese_raw.mov.gif

Making our lab rats happy, 1 piece of cheese at a time!

Squeesh out! 🐀

About the author
Daan Raman is in charge of NVISO Labs, the research arm of NVISO. Together with the team, he drives initiatives around innovation to ensure we stay on top of our game; innovating the things we do, the technology we use and the way we work form an essential part of this. Daan doesn’t like to write about himself in third-person. You can contact him at draman@nviso.be or find Daan online on Twitter and LinkedIn.

 

Using binsnitch.py to detect files touched by malware

Yesterday, we released binsnitch.py – a tool you can use to detect unwanted changes to the file sytem. The tool and documentation is available here: https://github.com/NVISO-BE/binsnitch.

Binsnitch can be used to detect silent (unwanted) changes to files on your system. It will scan a given directory recursively for files and keep track of any changes it detects, based on the SHA256 hash of the file. You have the option to either track executable files (based on a static list available in the source code), or all files.

Binsnitch.py can be used for a variety of use cases, including:

  • Use binsnitch.py to create a baseline of trusted files for a workstation (golden image) and use it again later on to automatically generate a list of all modifications made to that system (for example caused by rogue executables installed by users, or dropped malware files). The baseline could also be used for other detection purposes later on (e.g., in a whitelist);
  • Use binsnitch.py to automatically generate hashes of executables (or all files if you are feeling adventurous) in a certain directory (and its subdirectories);
  • Use binsnitch.py during live malware analysis to carefully track which files are touched by malware (this is the topic of this blog post).

In this blog post, we will use binsnitch.py during the analysis of a malware sample (VirusTotal link:
https://virustotal.com/en/file/adb63fa734946d7a7bb7d61c88c133b58a6390a1e1cb045358bfea04f1639d3a/analysis/)

A summary of options available at the time of writing in binsnitchy.py:

usage: binsnitch.py [-h] [-v] [-s] [-a] [-n] [-b] [-w] dir

positional arguments:
  dir               the directory to monitor

optional arguments:
  -h, --help        show this help message and exit
  -v, --verbose     increase output verbosity
  -s, --singlepass  do a single pass over all files
  -a, --all         keep track of all files, not only executables
  -n, --new         alert on new files too, not only on modified files
  -b, --baseline    do not generate alerts (useful to create baseline)
  -w, --wipe        start with a clean db.json and alerts.log file

We are going to use binsnitch.py to detect which files are created or modified by the sample. We start our analysis by creating a “baseline” of all the executable files in the system. We will then execute the malware and run binsnitch.py again to detect changes to disk.

Creating the baseline

Capture.PNG

Command to create the baseline of our entire system.

We only need a single pass of the file system to generate the clean baseline of our system (using the “-s” option). In addition, we are not interested in generating any alerts yet (again: we are merely generating a baseline here!), hence the “-b” option (baseline). Finally, we run with the “-w” argument to start with a clean database file.

After launching the command, binsnitch.py will start hashing all the executable files it discovers, and write the results to a folder called binsnitch_data. This can take a while, especially if you scan an entire drive (“C:/” in this case).

Capture.PNG

Baseline creation in progress … time to fetch some cheese in the meantime! 🐀 🧀

After the command has completed, we check the alerts file in “binsnitch_data/alerts.log”. As we ran with the “-b” command to generate a baseline, we don’t expect to see alerts:

Capture 2.PNG

Baseline successfully created! No alerts in the file, as we expected.

Looks good! The baseline was created in 7 minutes.

We are now ready to launch our malware and let it do its thing (of-course, we do this step in a fully isolated sandbox environment).

Running the malware sample and analyzing changes

Next, we run the malware sample. After that, we canrun binsnitch.py again to check which executable files have been created (or modified):

Capture.PNG

Scanning our system again to detect changes to disk performed by the sample.

We again use the “-s” flag to do a single pass of all executable files on the “C:/” drive. In addition, we also provide the “-n” flag: this ensures we are not only alerted on modified executable files, but also on new files that might have been created since the creation of the baseline. Don’t run using the “-w” flag this time, as this would wipe the baseline results. Optionally, you could also add the “-a” flag, which would track ALL files (not only executable files). If you do so, make sure your baseline is also created using the “-a” flag (otherwise, you will be facing a ton of alerts in the next step!).

Running the command above will again take a few minutes (in our example, it took 2 minutes to rescan the entire “C:/” drive for changes). The resulting alerts file (“binsnitch_data/alerts.log”) looks as following:

Capture.PNG

Bingo! We can clearly spot suspicious behaviour now observing the alerts.log file. 🔥

A few observations based on the above:

  • The malware file itself was detected in “C:/malware”. This is normal of-course, since the malware file itself was not present in our baseline! However, we had to copy it in order to run it;
  • A bunch of new files are detected in the “C:/Program Files(x86)/” folder;
  • More suspicious though are the new executable files created in “C:/Users/admin/AppData/Local/Temp” and the startup folder.

The SHA256 hash of the newly created startup item is readily available in the alerts.log file: 8b030f151c855e24748a08c234cfd518d2bae6ac6075b544d775f93c4c0af2f3

Doing a quick VirusTotal search for this hash results in a clear “hit” confirming our suspicion that this sample is malicious (see below). The filename on VirusTotal also matches the filename of the executable created in the C:/Users/admin/AppData/Local/Temp folder (“A Bastard’s Tale.exe”).

Screen Shot 2017-05-17 at 00.28.05.png

VirusTotal confirms that the dropped file is malicious.

You can also dive deeper into the details of the scan by opening “binsnitch_data/data.json” (warning, this file can grow huge over time, especially when using the “-a” option!):

Capture.PNG

Details on the scanned files. In case a file is modified over time, the different hashes per file will be tracked here, too.

From here on, you would continue your investigation into the behaviour of the sample (network, services, memory, etc.) but this is outside the scope of this blog post.

We hope you find binsnitch.py useful during your own investigations and let us know on github if you have any suggestions for improvements, or if you want to contribute yourself!

Squeak out! 🐁

Daan