Hunting for Emotet with URLHaus

Emotet started out as a banking trojan around 2014, but has evolved to (primarily) deliver ransomware via malicious emails (malspam). Infections occur either via malicious scripts in web pages or MS Office files with macros enabled from emails or downloads. It often gets to victims by way of a legitimate-looking email. The subject line and URL pattern usually contains words like ‘invoice’, ‘payment’, or similar.

Emotet disappeared for many months in mid 2019, but as of late Sept/early Oct, 2019, new command and control (C&C) servers were brought online and new malspam started going out to potential victims.

You may want to get started tracking Emotet, or look at how it operates in order to identify it in your environment. You can do so manually, but tracking activity programmatically is probably more efficient.

There are many open source places to search for threats like Emotet. One great resource we’re going to look at is URLhaus.

You can browse URLhaus to find URLs that have been submitted. It can searched by tags, status or the username that reported it.

The feeds page has search fields for ASN, Country and TLD information.

It’s also possible to download a comma-separated list of data from here.

We’ll look at working with that CSV file by downloading and analyzing it with Python.

The only import we need is requests:

import requests

Define the filename for the data you’ll be saving:

urlhaus_data = 'urlhaus_data.txt'

Write a function to handle the download:

def download_urlhaus_list():
    res = requests.get('https://urlhaus.abuse.ch/downloads/csv/')
    savefile = open('urlhaus_data.txt', 'ab')
    for chunk in res.iter_content(100000):
        savefile.write(chunk)

And call the function:

download_urlhaus_list()

When you run it, it will download the contents of https://urlhaus.abuse.ch/downloads/csv/ as urlhaus_data.txt, exactly as it is on the website.

Next, let’s modify the script by adding a function to analyze the downloaded file.

The function will be called analyze_data(), and it takes the variable ‘urlhaus_data’ as input.

Each line of the data looks like:

212922","2019-07-01 05:17:09","http://35.245.198.20/F/3058740","online","malware_download","exe,Formbook","https://urlhaus.abuse.ch/url/212922/","abuse_ch"

We use strip() to remove blank space lines

We want to take all the items in between the commas and add them to a list in order to access any of the fields. We aren’t interested in the quote marks and want to split on the comma, so we’ll split on "," with:

item = item.split('\",\"')

The backslashes are used to escape the " marks. If you don’t use them, Python will not read the line correctly.

The top of the file has information which we’ll want to ignore, so we just use some try/except stuff to ignore anything that doesn’t match our criteria.

After making each line into a list of items, it looks like this:

[‘”212922’, ‘2019-07-01 05:17:09’, ‘http://35.245.198.20/F/3058740’, ‘online’, ‘malware_download’, ‘exe,Formbook’, ‘https://urlhaus.abuse.ch/url/212922/’, ‘abuse_ch”‘]

Inside the try: clause, the items can all be defined as variables, like this:

id, dateadded,url,url_status,threat,tags,urlhaus_link = item[0],item[1],item[2],item[3],item[4],item[5],item[6]

And then we can identify only the items tagged as emotet, using:

if 'emotet' in tags:

And we can print only the domain by splitting the url variable on the forward slash with:

print(url.split('/')[2])

Here’s the complete analyze_data() function:

def analyze_data(urlhaus_data):
    with open (urlhaus_data,'r') as data:
        for item in data:
            item = item.strip()
            item = item.split('\",\"')
            try:
                id, dateadded,url,url_status,threat,tags,urlhaus_link = item[0],item[1],item[2],item[3],item[4],item[5],item[6]
                if 'emotet' in tags:
                    print(url.split('/')[2])
            except:
                pass

And we call it with:

analyze_data(urlhaus_data)

Some things to note:

This is very basic. There is so much more that can be done, such as:

  • Store all the domains in a list and print out only the unique ones (in case there are duplicates). Let’s take a look at an example. If you have a list that has multiple domains, but some of them are repeated, like: domains = ['domain1.com','domain1.com','domain1.com','domain2.com'] This can be easily uniqued by using the set() command, like so: print(set(domains)) Output: {'domain2.com', 'domain1.com'}
  • Look up each domain using a third party API or look up the A records and put everything on a map
  • Analyze the URL structure to find patterns (I’ll be writing about this soon)

If you want to use the methods written about here, it’s probably best to download the file using the download_urlhaus_list() function the first time you run the script. After downloading, comment it out and run only the analyze_data() function so you can try different techniques within it. This stops re-downloading the CSV over and over again (it’s not polite or necessary to download the data again every time you run a script – and some sources will block you if they notice you scraping their content too often). However, if you want an ongoing import of new data, you could download the file once a day, importing only the new entries since the last time.

It’s important to mention that sites like URLHaus rely on community submissions. If you find URLs throughout your hunting process, it’s a good idea to submit them.

I’ll be writing about how to submit via a python script soon. URLhaus has some instructions on submitting via the web interface or by using a POST request here


The complete script:

import requests
from pprint import pprint as pp

urlhaus_data = 'urlhaus_data.txt'

def write_append(filename, line):
    writefile = open(filename,'a')
    writefile.write(line + '\n')
    writefile.close()

def download_urlhaus_list():
    res = requests.get('https://urlhaus.abuse.ch/downloads/csv/')
    savefile = open('urlhaus_data.txt', 'ab')
    for chunk in res.iter_content(100000):
        savefile.write(chunk)

def analyze_data(urlhaus_data):
    with open (urlhaus_data,'r') as data:
        for item in data:
            item = item.strip()
            item = item.split('\",\"')
            try:
                id, dateadded,url,url_status,threat,tags,urlhaus_link = item[0],item[1],item[2],item[3],item[4],item[5],item[6]
                if 'emotet' in tags:
                    print(url.split('/')[2])
            except:
                pass   

download_urlhaus_list()
analyze_data(urlhaus_data)