An anomaly-based intrusion detection
system, is an intrusion detection system for detecting both network
and computer intrusions and misuse by monitoring system activity and
classifying it as either normal or anomalous. This
is as opposed to signature-based systems, which can only detect attacks for
which a signature has previously been created. In order to positively identify
attack traffic, the system must be taught to recognize normal system activity.
The two phases of a majority of anomaly detection systems consist of the
training phase (where a profile of normal behaviors is built) and testing phase
(where current traffic is compared with the profile created in the training
Machine learning for
In data mining, anomaly detection is referred to the
identification of items or events that do not conform to an expected pattern or
to other items present in a dataset. Typically, these anomalous items have the
potential of getting translated into some kind of problems such as structural
defects, errors or frauds. Using machine learning for anomaly detection helps
in enhancing the speed of detection.
Intrusions are those activities that can damage
information systems. Intrusion detection has been gaining broad
attention. Anomaly detection can be a key for solving intrusions, as
while detecting anomalies, perturbations of normal behavior indicate a presence
of intended or unintended induced attacks, defects, faults, and so on. Machine learning
algorithms have the ability to learn from data and make predictions based on
that data. Machine learning for anomaly detection includes techniques that
provide a promising alternative for detection and classification of anomalies
based on an initially large set of features.
Supervised Machine Learning for
This method requires a labeled training set that
contains both normal and anomalous samples for constructing the predictive
model. Theoretically, supervised methods are believed to provide better
detection rate than unsupervised methods. The most common supervised algorithms
are supervised neural networks, parameterization of training model, support
vector machine learning, k-nearest neighbors, Bayesian networks and decision
trees. K-nearest neighbor (k-NN) is one of the most conventional nonparametric
techniques that are used in supervised learning for anomaly detection. It
calculates the approximate distances between different points on the input
vectors and then assigns the unlabeled point to the class of its K-nearest
neighbors. The Bayesian network is another popular model that can encode
probabilistic relationships among variables interest. This technique is
generally used for anomaly detection in combination with statistical schemes.
These supervised techniques have several advantages, including the capability
of encoding interdependencies between variables and of predicting events, along
with the ability to incorporate both prior knowledge and data.
Unsupervised Machine Learning for
These techniques do not require training data. They
are based on two basic assumptions. First, they presume that most of the
network connections are normal traffic and only a small amount of percentage is
abnormal. Second, they anticipate that malicious traffic is statistically
different from normal traffic. Based on these two assumptions, data groups of
similar instances that appear frequently are assumed to be normal traffic and
those data groups that are infrequent are considered to be malicious. The most
common unsupervised algorithms are self-organizing maps (SOM), K-means,
C-means, expectation-maximization meta-algorithm (EM), adaptive resonance
theory (ART), and one-class support vector machine. One popular technique is
the self-organizing map (SOM). The main objective of the SOM is to reduce
the dimension of data visualization.
Machine learning techniques are now receiving
considerable attention among the anomaly detection researchers to address the
weaknesses of knowledge base detection techniques.
Anomaly detection can effectively help in catching
the fraud, discovering strange activity in large and complex Big
Data sets. This can prove to be useful in areas such as banking security,
natural sciences, medicine, and marketing, which are prone to malicious
activities. With the machine, a learning organization can intensify search and
increase effectiveness of their digital business initiatives.