Putting data on a map

When threat hunting, putting information on a map can be useful at times. In most cases, maps are used as eye candy, but provide little valuable information. However, there are times when a map provides context in support of other information.

You may have a list of related IP addresses or domains and want to see if they’re in the same physical area to determine if only a certain region is being targeted, or maybe you just want to know where the majority of attacks are coming from when analyzing a log file.

Let’s work with the following scenario: You’ve downloaded a list of domains that have been distributing Emotet malspam from somewhere like URLHaus and want to see where they are geographically located.

Your data is 38,267 unique domain names and IP addresses, saved in ’emotet_unique.txt’ and it looks like this:

000359[.]xyz
01asdfceas1234[.]com
021shanghaitan[.]com
024dna[.]cn

So it’s one domain per line and each domain is defanged (the ‘.’ is surrounded by brackets – this is in case your domain list is shared in some program that automatically sees it as a domain and makes it clickable. It’s generally considered bad form to share clickable IOCs).

For this script, you’ll need a couple of built-in modules and the following external modules:

1: folium: This is a mapping tool. It will allow you to create a local html file to display the map.

pip install folium

2: pandas: A data analysis library. It’ll be used for organizing and formatting the data.

pip install pandas

3: dnspython: Perform resolutions of the domain to find the A record.

You may need to download the tar.gz file, unzip it and install with python (or python3) setup.py install

Note: I had previously suggested that we use pygeoip, but a colleague of mine, @k3v_b0t, modified the script to use the newer form of the maxmind database. I’ve tested and modified the instructions with his update:

4: geoip2: Look up the IP location.

pip install geoip2

You will also need a copy of the GeoLite2-City.mmdb database to get the latitude and longitude. Download that here. Unzip the database and store the ‘GeoLite2-City.mmdb’ file in the same directory as the python script.

First, we’ll import modules:

import sys, os
import geoip2.database  # Change suggested by https://twitter.com/k3v_b0t
import dns.resolver
import folium
from folium import plugins
import pandas as pd

We’ll define our input file, ’emotet_unique.txt’, then create a map object called ‘mapdata’. This will use the ‘GeoLiteCity.dat’ file to draw a map in ‘map.html’, where the data will be added at the end of this script:

data = 'emotet_unique.txt'
map = open('map.html','a')
mapdata = geoip2.database.Reader('/path/to/GeoLite2-City.mmdb') # Change suggested by https://twitter.com/k3v_b0t.

Next, we have some functions.

The first function uses the dns.resolver module to find an A record of a domain:

def get_a_records(domain):
    _ip = dns.resolver.query(domain,'a')
    for a in _ip:
        ip = str(a)
    return(ip)

This next function will use the mapdata object to find the latitude and longitude. There are other fields returned from mapdata, but we’re just using longitude and latitude for now:

def get_lon_lat(ip):
    data = mapdata.city(ip) # Change suggested by https://twitter.com/k3v_b0t
    lon = data.location.longitude # Change suggested by https://twitter.com/k3v_b0t
    lat = data.location.latitude # Change suggested by https://twitter.com/k3v_b0t
    return(lat,lon)

This function is to check if the ‘domain’ is an IP address:

def valid_ip(address):
    try:
        host_bytes = address.split('.')
        valid = [int(b) for b in host_bytes]
        valid = [b for b in valid if b >= 0 and b<=255]
        return len(host_bytes) == 4 and len(valid) == 4
    except:
        return False

This is the main part of the program, where we will open the ’emotet_unique.txt’ file to process through one domain at a time. It first checks if the domain is a domain or an IP address. If it’s an IP, it runs it through the get_lat_lon() function to get the lat and lon.

If it’s a domain, it runs it through the get_a_records() function to get the IP address, and then it passes it to the get_lat_lon() function to get the lat and lon.

It adds the domain or IP to the ‘domains’ list, the latitudes to the lats list, and the longitudes to the lons list:

domains,lats,lons = [],[],[]
name_and_ip = []
with open(data,'r') as d:
    for domain in d:
        domain = domain.strip()
        domain = domain.replace('[.]','.')
        ipcheck = valid_ip(domain)
        if ipcheck == True:
            try:
                lat,lon = get_lon_lat(ip)
                print(domain,ip,lat,lon)
                domains.append(domain)
                lats.append(lat)
                lons.append(lon)
            except:
                pass
        else:
            try:
                ip = get_a_records(domain)
                lat,lon = get_lon_lat(ip)
                print(domain,ip,lat,lon)
                domains.append(domain)
                lats.append(lat)
                lons.append(lon)
            except:
                pass

This next function isn’t totally necessary, but it’s nice if you happen to have some method to determine a ‘status’ of each domain. What I’ve used it for is when running domains through a third party tool that tells me a domain is blocked, not blocked or whitelisted. I might want to color the marker on the map for that domain as green for good, white for neutral, or red for malicious.

You’ll see when making the map, I manually specify what I want the color to be. I’ve just left this functionality in the script because it’s useful at times:

def color(status):
    if status == str(1):
        outline = 'black'
        fillcolor='green'
    if status == str(0):
        outline = 'black'
        fillcolor='white'
    if status == str(-1):
        outline = 'black'
        fillcolor='red'
    return outline,fillcolor

Now we’re going to create a Pandas dataframe that combines the domains, latitudes and longitudes in what can be thought of like a spreadsheet. You’ll have a data structure which is like this:

domain, lat,lon anotherdomain, lat, lon etc…

df = pd.DataFrame({'domain':domains,'lat':lats, 'lon':lons})

Now we switch to using folium to create the map. We give the map a name (Emotet in this case), then we go through each item in the dataframe, followed by applying an outline and fill color based off the status (the ‘-1’ is where we’re saying we want the color to be red, as specified in the color() function). We start to add markers on the map for each lat/lon combination and label them with the domain or IP. Finally, the map is saved as ‘map.html’.

map_osm=folium.Map(location=[df['lat'].mean(),df['lon'].mean()],zoom_start=2)

fg=folium.FeatureGroup(name='Emotet')
for lat,lon,domain in zip(df['lat'],df['lon'],df['domain']):
    outline,fill = color('-1') # Change that -1 to 1 or 0 to change to the colors in the color() function
    fg.add_child(folium.Marker(location=[lat,lon],popup=(folium.Popup(domain)),icon=folium.Icon(color=fill,icon_color='green')))
    map_osm.add_child(fg)

map_osm.save('map.html')

We end up with an interactive map that looks like this:

We can change the icons and create other kinds of maps, like heatmaps with Folium, but for now this is a good start.

When working with lists of domains, it can take a while to process all of them through the pygeoip module. I ended up running only 1000 of the unique domains just to speed up the process and get a basic idea of what the map might look like

The full script is as follows:

# Created by Josh Pyorre (https://twitter.com/joshpyorre)
# Works with Python 2 or 3

import geoip2.database # Change suggested by https://twitter.com/k3v_b0t
import sys, os
import dns.resolver
import folium
from folium import plugins
import pandas as pd

data = 'file.txt'   # change this to your file name
map = open('map.html','a')
mapdata = geoip2.database.Reader('/path/to/GeoLite2-City.mmdb') # Change suggested by https://twitter.com/k3v_b0t

def get_a_records(domain):
    _ip = dns.resolver.query(domain,'a')
    for a in _ip:
        ip = str(a)
    return(ip)

def get_lon_lat(ip):
    data = mapdata.city(ip) # Change suggested by https://twitter.com/k3v_b0t
    lon = data.location.longitude # Change suggested by https://twitter.com/k3v_b0t
    lat = data.location.latitude # Change suggested by https://twitter.com/k3v_b0t
    return(lat,lon)

def valid_ip(address):
    try:
        host_bytes = address.split('.')
        valid = [int(b) for b in host_bytes]
        valid = [b for b in valid if b >= 0 and b<=255]
        return len(host_bytes) == 4 and len(valid) == 4
    except:
        return False

domains,lats,lons = [],[],[]
name_and_ip = []
with open(data,'r') as d:
    for domain in d:
        domain = domain.strip()
        domain = domain.replace('[.]','.')
        ipcheck = valid_ip(domain)
        if ipcheck == True:
            try:
                lat,lon = get_lon_lat(ip)
                domains.append(domain)
                lats.append(lat)
                lons.append(lon)
            except:
                pass
        else:
            try:
                ip = get_a_records(domain)
                lat,lon = get_lon_lat(ip)
                domains.append(domain)
                lats.append(lat)
                lons.append(lon)
            except:
                pass

def color(status):
    if status == str(1):
        outline = 'black'
        fillcolor='green'
    if status == str(0):
        outline = 'black'
        fillcolor='white'
    if status == str(-1):
        outline = 'black'
        fillcolor='red'
    return outline,fillcolor

df = pd.DataFrame({'domain':domains,'lat':lats, 'lon':lons})
map_osm=folium.Map(location=[df['lat'].mean(),df['lon'].mean()],zoom_start=2)

fg=folium.FeatureGroup(name='Emotet')
for lat,lon,domain in zip(df['lat'],df['lon'],df['domain']):
    outline,fill = color('-1') # Change that -1 to 1 or 0 to change to the colors in the color() function
    fg.add_child(folium.Marker(location=[lat,lon],popup=(folium.Popup(domain)),icon=folium.Icon(color=fill,icon_color='green')))
    map_osm.add_child(fg)

map_osm.save('map.html')