MAWILab

Documentation






Contents


What is MAWILab?

MAWILab labels

MAWILab annotates traffic anomalies in the MAWI archive with four different labels: anomalous, suspicious, notice, and benign.

Anomaly classification

For a better understanding of identified anomalies, MAWILab also employs two distinct anomaly classification techniques:

Simple heuristic

The heuristic inspects the port number, TCP flags and ICMP codes of anomalous traffic and assign a code to each anomaly. If the code value is lower than 500 it means the anomalous traffic is using well known suspicious ports or it contains an abnormally high number of packets with SYN, RST or FIN flag:

If the value is between 500 and 900 it means the anomaly is seen on well known ports: If the value is higher than 900 it means the anomaly is seen on unknown ports.

Anomaly taxonomy

Mazel et al. (TRAC 2014) presented a taxonomy that reveals the nature of backbone traffic anomalies. MAWILab takes advantage of this taxonomy to provide more insights into the identified anomalies. The taxonomy consists of more than one hundred labels and corresponding signatures to classify events identified in backbone traffic. The details of labels and signatures are available at http://www.fukuda-lab.org/mawilab/classification/ .

Since MAWILab v1.1, the plots depicting the byte and packet breakdown in the data set webpages (e.g. http://www.fukuda-lab.org/mawilab/v1.1/index.html) are also based on this taxonomy. Each class in the plots corresponds to labels with a certain prefix:

Files format

XML Schema (admd)

For each traffic trace of the MAWI archive the traffic annotation is provided in the form of an admd file. admd is a meta-data format and associated tools for the analysis of pcap data. More information on this format is available on this website: http://admd.sourceforge.net/

Here is a brief explanation of the structure of the xml files:

<admd:annotation>
  <algorithm>
      "MAWILab logging information"
  </algorithm>

  <analysis>
      "Analyst description" 
  </analysis>

  <dataset>
      "Link to the analyzed dataset"
  </dataset>

  <anomaly type="T" value="Dn,Da,C0,V,C1"> (see explanation below)
     <description>
        "Structure of the community reporting the anomaly (in dot language)"
     </description>

    <slice>
        <filter "Traffic features describing the anomaly: 
			destination IP 
			and/or source IP
			and/or destination port
			and/or source port">
     </slice>
     <from "timestamp of the start of the anomaly">
     <to "timestamp of the end of the anomaly">
  </anomaly>
</admd:annotation>
The type and value of the anomaly tag provide more details about the reported traffic:

CSV format

Since MAWILab v1.1, anomalies are also reported in CSV format. Each line in the CSV files consists of a 4-tuple describing the traffic characteristics (similar to filters in the admd format) and additional information such as the heuristic and taxonomy classification results. The actual order of the fields is given by the CSV files header:

anomalyID, srcIP, srcPort, dstIP, dstPort, taxonomy, heuristic, distance, nbDetectors, label

How to read admd files?

C/C++

MAWILab is a collection of xml files describing the anomalies found in each trace of the MAWI archive. These xml files follow the admd XML Schema (admd.xsd). So the files collected on MAWILab are compatible with the C/C++ tools available on the admd website: http://admd.sourceforge.net/.

Python

An easiest way to manipulate the xml files from MAWILab is to use the python API available at http://www.fukuda-lab.org/mawilab/tools/admd.py. This API allows one to load an xml file (in the admd format) to a python object or export such object to an xml file.

The script http://www.fukuda-lab.org/mawilab/tools/listAno.py is an example using this API. This script reads an xml file and outputs the timestamps, IP addresses and port numbers corresponding to each anomaly identified in the file. The script should be in the same directory as the API and is executed as follow:

% wget http://www.fukuda-lab.org/mawilab/v1.0/2003/09/13/200309131400_anomalous_suspicious.xml % python2 listAno.py 200309131400_anomalous_suspicious.xml 200309131400_anomalous_suspicious.xml Anomaly 0, from 1063429217 to 1063430101 145.154.79.170:None --> None:None Anomaly 1, from 1063429201 to 1063430101 200.35.112.72:None --> 212.35.181.89:None None:None --> 212.35.181.89:80 None:None --> 212.35.181.89:None Anomaly 2, from 1063429201 to 1063430101 194.63.172.173:None --> 212.7.225.28:119 Anomaly 3, from 1063429411 to 1063430063 200.82.177.214:None --> None:None ...

This example along with the admd XML Schema (admd.xsd) is a good starting point to play with MAWILab. Note that this library is not compatible with python version 3.0 and higher.

Under the hood

The annotations collected in MAWILab result from the combination of several anomaly detectors. The detectors are combined with a method based on singular value decomposition and graph theory. The detailed description of this combination method is available in the following publication: "MAWILab: Combining diverse anomaly detectors for automated anomaly labeling and performance benchmarking", R.Fontugne, P.Borgnat, P.Abry, K.Fukuda, in CoNEXT 2010.