How to use MAWILab?
MAWILab locates anomalies in the MAWI archive with a simple traffic taxonomy that consists of four different labels: anomalous, suspicious, notice, and benign.
- The label anomalous is assigned to all abnormal traffic and should be identified by any efficient anomaly detector.
- The label suspicious is assigned to all traffic that is probably anomalous but not clearly identified by our method.
- The label notice is assigned to all traffic that is not identified anomalous by our method but that has been reported by at least one anomaly detector. This traffic should not be identified by any anomaly detector, we do not label them as benign in order to trace all the alarms reported by the combined detectors.
- All the other traffic are labeled benign because none of the anomaly detectors identified them.
For each traffic trace of the MAWI archive the traffic annotation is provided in the form of an admd file. admd is a meta-data format and associated tools for the analysis of pcap data. More information on this format is available on this website: http://admd.sourceforge.net/
Here is a brief explanation of the structure of the xml files:
<admd:annotation>
<algorithm>
"MAWILab logging information"
</algorithm>
<analysis>
"Analyst description"
</analysis>
<dataset>
"Link to the analyzed dataset"
</dataset>
<anomaly type="T" value="Dn,Da,C,V"> (see explanation below)
<description>
"Structure of the community reporting the anomaly (in dot language)"
</description>
<slice>
<filter "Traffic features describing the anomaly:
destination IP
and/or source IP
and/or destination port
and/or source port">
</slice>
<from "timestamp of the start of the anomaly">
<to "timestamp of the end of the anomaly">
</anomaly>
</admd:annotation>
The type and value of the anomaly tag provide more details about the reported traffic:- "T" is the label assigned to the anomaly, it can be either: anomalous, suspicious, or notice
- "Dn,Da" are the distance of the anomaly to reference points in the reduced SCANN space. Dn is the distance to the reference point representing normal traffic, while Da is the distance to the reference point standing for anomalous traffic.
-
"C" is the category assigned to the anomaly using heuristic based on port number, tcp flags and icmp code, if the value is lower than 500 it means the anomaly is a well known attack:
- 1:Sasser
- 2:Netbios
- 3:RPC
- 4:SMB
- 10:SYN scan
- 11:RST scan
- 12:FIN scan
- 20:Ping flood
- 51:Scan FTP
- 52:Scan SSH
- 53:Scan HTTP
- 54:Scan HTTPS
- else:Other
- 501:FTP
- 502:SSH
- 503:HTTP
- 504:HTTPS
- else:Other
- 901:Unknown
- "V" shows which detector with which parameters found the anomaly. It is a vector of binary values, 0 means the detector did not report the traffic whereas 1 means that the detector reported an alarm for the anomaly. There is four detectors (Hough, Gamma, KL, PCA) each using 3 different parameter tuning (sensitive, optimal, conservative).
The first value of the vector correspond to sensitive Hough, the second value is optimal Hough, the third is conservative Hough, the fourth is sensitive Gamma, etc... The order is Hough(sensitive,optimal,conservative), Gamma(sensitive,optimal,conservative), KL(sensitive,optimal,conservative), PCA(sensitive,optimal,conservative).
What is behind MAWILab?
Anomaly detector combination and graph based similarity estimation.
MAWILab results from the method proposed in "MAWILab: Combining diverse anomaly detectors for automated anomaly labeling and performance benchmarking", R.Fontugne, P.Borgnat, P.Abry, K.Fukuda, in CoNEXT 2010.