View on GitHub

OPCUADataset

The OPC UA Dataset Description

The OPC UA CSV source file can be downloaded here. You can also find it in the IEEE DataPort.

The generation of the dataset containing OPC UA traffic was possible due to the setup and execution of a laboratory CPPS testbed. This CPPS uses OPC UA standard for horizontal and vertical communications. Regarding the CPPS testbed setup, it consists on seven nodes in the network, as represented in the next Figure.

Each network node consist on a Raspberry Pi device, running the Python FreeOpcUa implementation. In this configuration, there are two production units, each one containing three devices, and one node representing a Manufacturing Execution System (MES). Each device implements both OPC UA server and client, where the server publish to a OPC UA variable updates regarding sensor readings and the client subscribes all OPC UA variables from all other devices in the same production unit. On the other side, the MES only implements the OPC UA client, which subscribes all OPC UA variables from all devices in both production units. Also, connected to this network, is an attack node as it is assumed that the attacker already gained access to the CPPS network.

After setting up the CPPS testbed, a python implementation that implements Tshark was used to capture OPC UA packets and export this traffic to a csv file format dataset. This traffic includes both normal and anomalous behaviour. Anomalous behaviour is achieved with the malicious node, which injects attacks into the CPPS network, targeting one or more device nodes and the MES. The attacks selected for the malicious activities are:

To perform the attacks mentioned, a python script is used, which implements the Scapy module for packet sniffing, injection and modification. Regarding the dataset generation, another python script, that implements Tshark (in this case Pyshark) was used to capture only OPC UA packets and export this traffic to a csv file format dataset. Actually, the OPC UA packets are converted to bidirectional communication flows, which are characterized by the following 32 features:

The generated dataset has 33.567 normal instances, 74.013 DoS attack instances, 50 impersonation attack instances, and 7 MITM attack instances. This gives a total of 107.634 instances. Also, all attacks were grouped into one class (anomaly - 1) and the rest of the instances belong to the normal class (0).

Free use of the OPC UA dataset for academic research purposes is hereby granted. However, use for commercial purposes should be agreed by the author.

For more information, please contact the author: Rui Pinto (email).