University of Applied Sciences Fulda - Network and Data Security Group - NDSec-1 Dataset

The NDSec-1 dataset

Abstract


NDSec-1 is a new network intrusion detection dataset collected at the University of Applied Sciences Fulda (Germany) in 2016. As part of our research (IntErA project), it emerged based on the current situation that very few public intrusion detection datasets exist to either validate own results or to benchmark other recent solutions. One of the dominant reasons found in literature are privacy concerns, which inhibit the publication of such datasets. Another issue is the network environment in which an intrusion detector has to perform. These can vary from traditional office environments to manufacturing facilities. Thus, the underlying legitimate traffic, a detector has to face, is highly dependent on the branch of an organization or enterprise. However, many classic attack types such as denial of service or bruteforce attacks provide distinct characteristics, which are largely invariant to the network structure even when considering different versions of attack instances. Therefore, we made a practical attempt to design a generic dataset focusing on attack traffic. NDSec-1 covers a set of classic and novel attack vectors encapsulated within simple but realistic scenarios that can be adopted to most network environments very easily. Hence, only a small proportion of legitimate traffic is contained in the dataset and so NDSec-1 can be utilized to salt other shared or private network traces. By no means, NDSec-1 was designed to either replace or retire any other intrusion detection dataset. Rather it should be considered supplemental providing an additional source of recorded attack vectors. We provide raw network traces including payload along with syslog and windows event logs. Additionally, a detailed ground truth is supplied, which relies on bidirectional flows captured using YAF1. In what follows, we supply the traces and outline useful information to read and map the ground truth. Note, we are more than thankful for any remarks and comments to further improve our on-going activities.

Trace files


In order to provide realistic attack footprints, we developed several scenarios captured seperately. Consequently, the entire dataset consists of multiple trace files. These groups can be divided into: BYOD, watering hole, botnet and others. Each group consists of a packet trace file (network traffic), a log file (all log events captured during the scenario at all involved machines) and the ground truth (annotation regarding good or bad traffic). To download the desired files, please follow the links below.

BYOD
byod.pcapng
log_byod.csv
gt_byod.csv

Watering hole
wateringhole.pcapng
log_wateringhole.csv
gt_wateringhole.csv

Botnet
botnet.pcapng
log_botnet.csv
gt_botnet.csv

Others
others.pcapng
log_others.csv
gt_others.csv

Ground truth details

The ground truth was captured using bidirectional flow semantics as base entities manifested in RFC 51032. We used the IPFIX exporter YAF (version 2.8.4) with default timeout parameters (30min active and 5min idle timeout). Moreover, ARP packets are contained within the ground truth files which were not supported by YAF. Each of these packets were considered as unidirectional flow and appended to YAF's output. All ARP pseudo flows are indicated by the value '-1' within the feature protocol. These resulting files were annotated with four additional columns (label, category_1, category_2, comment) representing details about the ground truth. The following table describes the meaning of those columns.

label category_1 category_2 comment
Indicates whether the biflow is considered normal or malicious. Specifies the category of an attack (e.g. DoS, Bruteforce, Probe, ...). In case of legitimate traffic category_1 has the value 'NORMAL'. Either contains information regarding the service of the flow in case of legitimate traffic or details about the attack (e.g. FTP-Bruteforce, UDP-Flood, SQL-Injection, ...). Additional attribute that holds information and remarks we appended to improve context and readability.

Usage

If you intend to use the dataset in any publication, please cite one of the following works: Also note that we cannot take responsibility for any potential harm caused by executing this dataset!

1 https://tools.netsa.cert.org/yaf/
2 https://tools.ietf.org/html/rfc5103.html

© Network and Data Security group headed by Prof. Dr. U. Bühler
University of Applied Sciences Fulda
Department of Computer Science
Leipziger Straße 123
36037 Fulda (Germany)