Collecting Malware Samples from Malware Bazaar

I’m always looking for IOCs (Indicators of Compromise – domains, IP addresses, and more) in my work. This means I hunt for and download a lot of malware to analyze. I use various private and public directories to acquire malware. Some of the public ones are VirusTotal, Malware Bazaar, and VX Underground.

Downloading malware artifacts can be tedious, so I find it better to automate the process, either on a system that takes care of the downloads for me or in a semi-manual process. In this post, I’m going to explain how I download specific malware samples from Malware Bazaar in a semi-interactive manner.

Malware Bazaar is a product of abuse.ch where the community uploads malware samples found in the wild. The creator and maintainer of that site also provides other services, such as URLHaus which I’ve written about before. They provide an amazing community service for researchers and threat hunters. If you have the means, please consider donating to them.

Visiting the site, you’ll observe a nicely formatted table of recent uploads along with a summary of current trends:

Clicking on an entry, you’ll be shown information on the sample and can download it using the provided link:

To start downloading select files automatically, we can use the API.

The API requires the following:

  • API Key Get one here
  • query: get_file
  • sha256_hash cba0ff05701110c61714517e11a1a86ec5512f07b1a8c463b56cc182c0618aeb (this is listed on each malware samples page on the bazaar)

First, get an API key, then we need to acquire the daily CSV file. There are additional methods of getting new data as it’s released, but to help the creator of abuse.ch with their infrastructure costs, it’s a good idea not to download things more than needed.

Note: I wrote this using Python 3.

This function in python will download the CSV:

import requests
malware_bazaar_file = 'malware_bazaar.csv'

def download_malware_bazaar():
    res = requests.get('https://bazaar.abuse.ch/export/csv/recent/')
    savefile = open(malware_bazaar_file, 'ab')
    for chunk in res.iter_content(100000):
        savefile.write(chunk)
    print('Successfully downloaded {}'.format(malware_bazaar_file))

Once downloaded, we can parse through the file, looking for items of interest:

def analyze_data(malware_bazaar_file):
    matched_threats = []
    with open (malware_bazaar_file,'r') as data:
        for item in data:
            item = item.strip()
            if "#" in item:
                pass
            else:
                # Parse malware bazaar file
                item = item.replace("\"","")
                item = item.split(',')
                first_seen_utc = item[0]
                sha256 = item[1].strip()
                filename = item[5].strip()
                zip_filename = filename +'.zip'
                file_type_guess = item[6]
                mime_type = item[7]
                signature = item[8]
                clamav = item[9]
                vtpercent = item[10]
                _created = (time.strptime(first_seen_utc,'%Y-%m-%d %H:%M:%S'))
                created =(time.strftime("%Y-%m-%dT%H:%M:%S", _created))

And then we can download the files of interest:

headers = { 'API-KEY': apikey }
data = { 'query': 'get_file','sha256_hash':sha256, }
response = requests.post('https://mb-api.abuse.ch/api/v1/', data=data, timeout=15, headers=headers, allow_redirects=True)
open(zip_filename,'wb').write(response.content)

and we can automatically unzip the files, which are zipped and password protected with the password ‘infected’:

def unzip_file(filename):
    with pyzipper.AESZipFile(filename) as zf:
        zf.pwd = b'infected'
        zf.extractall("./malware")
        print("{} downloaded and unzipped.".format(filename.strip('.zip')))

try:
    unzip_file(zip_filename)
    os.remove(zip_filename)
except:
    print("{} failed to unzip. Zip file saved".format(zip_filename))

Let’s view the entire script, which I’ve modified to allow you to enter the specific malware name, as specfied in column “I”, labeled “signature” of the downloaded CSV.

The script will search for the malware you’re interested in, give you the total count it found and then prompt you if you want to download it. It will download the zip files into the same directory and then unzip them into a folder called ‘malware’. If you don’t end up downloading the script from my github, make sure to create the malware folder in the same directory that you run the script.

There is one import, which you’ll have to install:

pip3 install pyzipper

The script:

import requests,time,pyzipper,os,argparse
from datetime import datetime, timedelta


# apikey = 'get api key here https://bazaar.abuse.ch/api/'
malware_bazaar_file = 'malware_bazaar.csv'

parser = argparse.ArgumentParser(description="Download malware samples from https://bazaar.abuse.ch/")
parser.add_argument('-d','--download',action="store_true",help="Download a new copy of the CSV to work with")
parser.add_argument('-t','--threat',help="Download samples associated with the threat name you enter")
args = parser.parse_args()

def download_malware_bazaar():
    res = requests.get('https://bazaar.abuse.ch/export/csv/recent/')
    savefile = open(malware_bazaar_file, 'ab')
    for chunk in res.iter_content(100000):
        savefile.write(chunk)
    print('Successfully downloaded {}'.format(malware_bazaar_file))

def unzip_file(filename):
    with pyzipper.AESZipFile(filename) as zf:
        zf.pwd = b'infected'
        zf.extractall("./malware")
        print("{} downloaded and unzipped.".format(filename.strip('.zip')))


def analyze_data(threat):
    threatcount = 0
    matched_threats = []
    with open (malware_bazaar_file,'r') as data:
        for item in data:
            item = item.strip()
            if "#" in item:
                pass
            else:
                # Parse malware bazaar file
                item = item.replace("\"","")
                item = item.split(',')
                first_seen_utc = item[0]
                sha256 = item[1].strip()
                filename = item[5].strip()
                zip_filename = filename +'.zip'
                file_type_guess = item[6]
                mime_type = item[7]
                signature = item[8]
                clamav = item[9]
                vtpercent = item[10]
                _created = (time.strptime(first_seen_utc,'%Y-%m-%d %H:%M:%S'))
                created =(time.strftime("%Y-%m-%dT%H:%M:%S", _created))

                if threat.lower() in signature.lower():
                    threatcount += 1
                    matched_threats.append("{},{},{}".format(created,sha256,zip_filename))

        print("-"* 10)
        answer = input("{} entries match {}.\nDo you want to download all these samples? - [y/n <enter>]\n".format(threatcount,threat)).lower()

        if answer == 'y': # Download file
            for item in matched_threats:
                created,sha256,zip_filename = item.split(',')
                headers = { 'API-KEY': apikey }
                data = { 'query': 'get_file','sha256_hash':sha256, }
                response = requests.post('https://mb-api.abuse.ch/api/v1/', data=data, timeout=15, headers=headers, allow_redirects=True)
                open(zip_filename,'wb').write(response.content)

                # Unzip file
                try:
                    unzip_file(zip_filename)
                    os.remove(zip_filename)
                except:
                    print("{} failed to unzip. Zip file saved".format(zip_filename))
        else:
            print("Exiting since you didn't press 'y'")

if args.download:
    malware_bazaar_data = download_malware_bazaar() # use to download a complete list of URLs from urlhaus - do just once a day

if args.threat:
    analyze_data(args.threat)   # use to download files