Tag Archives: majestic million

Filtering out top 1 million domains from corporate network traffic

During network traffic analysis and malware investigations, we often use IP and domain reputation lists to quickly filter out traffic we can expect to be benign. This typically includes filtering out traffic related to the top X most popular websites world-wide.

For some detection mechanisms, this technique of filtering out popular traffic is not recommended – for example, for the detection of phishing e-mail addresses (which are often hosted on respected & popular e-mail hosting services such as Gmail). However for the detection of certain network attacks, this can be very effective at filtering out unnecessary noise (for example, the detection of custom Command & Control domains that impersonate legitimate services).

For this type of filtering, the Alexa top 1 million was often used in the past, however this free service was recently retired. A more recent & up to date alternative is the Majestic Million.

We ran some tests against a few million DNS requests in a corporate network environment, and measured the impact of filtering out the most popular domain names.

Top 1000.PNG

Progressively filtering out the Top 1000 most popular domains from network traffic. The Y-axis shows the % of traffic that matches the Top list for a given size.

When progressively filtering out more domains from the top 1000 of most popular domains, we notice a steep increase of matched traffic, especially for the top 500 domains. By just filtering out the top 100 most popular domains (including Facebook, Google, Instagram, Twitter, etc.) we already filter out 12% of our traffic. For the top 1000, we arrive at 24%.

We can also do the same exercise for the top 1 million of most popular domains, resulting in the graph below.

Top 1000000.PNG

Progressively filtering out the Top 1.000.000 most popular domains from network traffic. The Y-axis shows the % of traffic that matches the Top list for a given size.

When progressively filtering out all top 1 million domains from DNS traffic, we notice a progressive increase for the top 500.000 (with a noticeable jump around that mark for this particular case, caused by a few popular region-specific domains). Between the top 500.000 and the top 1.000.000, only an additional 11% of traffic is filtered out.

Based on the above charts, we see that the “sweet spot” in this case lies around 100.000, where we still filter out a significant amount of traffic (49%) while lowering the risk of filtering out too many domains which could be useful for our analysis. Finding this appropriate threshold greatly depends on the type of analysis you are doing, as well as the particular environment you are working in; however, we have seen similarly shaped charts for other corporate environments, too, with non-significant differences in terms of the observed trends.

As you can see, this is a highly effective strategy for filtering out possible less interesting traffic for investigations. Do note that there’s a disclaimer to be made: several of the 1Million top domains are expired and can thus be re-registered by adversaries and subsequently used for Command & Control! As always in cyber security, there’s no silver bullet but we hope you enjoyed our analysis :-).

About the author
Daan Raman is in charge of NVISO Labs, the research arm of NVISO. Together with the team, he drives initiatives around innovation to ensure we stay on top of our game; innovating the things we do, the technology we use and the way we work form an essential part of this. Daan doesn’t like to write about himself in third-person. You can contact him at draman@nviso.be or find Daan online on Twitter and LinkedIn.