One of the open challenges of past and recent systems is to identify errors before they escalate into failures. To such extent, most of the Error Detectors or enterprise Intrusion Detection Systems adopt signature-based detection algorithms, which consist of looking for predefined patterns (or "signatures") in the monitored data in order to detect an error or an ongoing attack. Data is usually seen as a flow of data points, which represent observations of the values of the indicators at a given time. Signature-based approaches usually score high detection capabilities and low false positive rates when experimenting known errors or attacks, but they cannot effectively adapt their behaviour when systems evolve or when their configuration is modified. As an additional consequence, signature-based approaches are not meant to detect zero day attacks, which are novel attacks that cannot be matched to any known signature. Moreover, when a zero-day attack that exploit newly added or undiscovered system vulnerabilities is identified, its signature needs to be derived and added as a new rule to the IDS.
To deal with unknowns, research moved to techniques suited to detect unseen, novel attacks. Anomaly detectors are based on the assumption that an attack generates observable deviations from an expected – normal – behaviour. Briefly, they aim at finding patterns in data that do not conform to the expected behaviour of a system: such patterns are known as anomalies. Once an expected behaviour is defined, anomaly detectors target deviations from such expectations, protecting against known attacks, zero-day attacks and emerging threats. To such extent, most of the anomaly detection algorithms are unsupervised, suiting the detection, among others, of unknown errors or zero-day attacks, without requiring labels in training data
The primary learning objectives of the tutorial are to demonstrate the capability of unsupervised learning algorithm to detect cyber-attacks and in particular zero-day attacks, and to instruct the attendees on the process to perform a well-crafted evaluation campaign.
In fact, after showing the current threat landscape as expanded by technical reports of agencies as ENISA, we will introduce anomaly detection, which is acknowledged as the most reliable answer to the detection of unknown errors or attacks. The participants will understand and use unsupervised algorithms that are particularly suited for anomaly detection, the main families and the differences in the way they decide if a data point is anomalous or normal. Participants will be involved in an hands-on session by using the RELOAD tool, which allows executing unsupervised anomaly detection algorithms and observing metric scores they provide on different datasets. This hands-on session, which can be conducted individually or in groups, will originate the final session which will constitute the final takeover of the tutorial, based both on participants activities and organizers’ experience in the domain.
The RELOAD tutorial targets anyone who is interested in the application of unsupervised ML algorithms for intrusion detection, with PhD students or young researchers as primary target audience. Consequently, we expect a remarkable amount of conference attendees to be interested in the topics of this tutorial, which targets beginners, with some content for intermediate. In fact, the tool to be used in the hand-on session will allow PhD students, researchers and practitioners who are starting to explore the discipline to get their first quantitative estimation of attack detection capabilities of algorithms, hiding implementation details which may be difficult to control at a first stage.
The tutorial will be composed by the following blocks.
- B1. Digression on the Current Threat Landscape (10% of tutorial time). Starting from public reports e.g., ENISA, we will describe the current state of cyber-attacks.
- B2. Anomaly-Based Intrusion Detection (15% of tutorial time). This part highlights some key terms and components that will be used in the rest of the tutorial, alongside with its role in detecting intrusions.
- B3. Unsupervised Algorithms and their Characteristics (10% of tutorial time). We will introduce some of the most common algorithms to be used for unsupervised anomaly detection.
- B4. Presentation of the RELOAD Tool (15% of tutorial time): This part will let the audience understand what the RELOAD tool offers, and how to use the RELOAD tool for executing unsupervised algorithms.
- B5. Hands-On Session (40% of tutorial time): the attendees can use the tool to perform intrusion detection on public attack datasets that are previously downloaded by the organizers and shared with the slides.
- B6. Wrap-up and Final Discussion (10% of tutorial time): Results obtained during hand-on session will be discussed together with the audience, originating final discussions. We will prepare spare material for enriching the discussion, expanding on already existing studies.
- Tommaso Zoppi, University of Florence, Italy
- Andrea Ceccarelli, University of Florence, Italy
- Andrea Bondavalli, University of Florence, Italy