Anomaly Detection for Time Series Data with Deep Learning——本质分类正常和异常的行为,对于检测异常行为,采用预测正常行为方式来做

简介:

A sample network anomaly detection project

Suppose we wanted to detect network anomalies with the understanding that an anomaly might point to hardware failure, application failure, or an intrusion.

What our model will show us

The RNN will train on a numeric representation of network activity logs, feature vectors that translate the raw mix of text and numerical data in logs.

By feeding a large volume of network activity logs, with each log line a time step, to the RNN, the neural net will learn what normal expected network activity looks like. When this trained network is fed new activity from the network, it will be able to classify the activity as normal and expected, or anomalous.

Training a neural net to recognize expected behavior has an advantage, because it is rare to have a large volume of abnormal data, or certainly not enough to accurately classify all abnormal behavior. We train our network on the normal data we have, so that it alerts us to non-normal activity in the future. We train for the opposite where we have enough data about attacks.

As an aside, the trained network does not necessarily note that certain activities happen at certain times (it does not know that a particular day is Sunday), but it does notice those more obvious temporal patterns we would be aware of, along with other connections between events that might not be apparent.

We’ll outline how to approach this problem using Deeplearning4j, a widely used open-source library for deep learning on the JVM. Deeplearning4j comes with a variety of tools that are  useful throughout the model development process: DataVec is a collection of tools to assist with the extract-transform-load (ETL) tasks used to prepare data for model training. Just as Sqoop helps load data into Hadoop, DataVec helps load data into neural nets by cleaning, preprocessing, normalizing and standardizing data. It’s similar to Trifacta’s Wrangler but focused a bit more on binary data.

Getting started

The first stage includes typical big data tasks and ETL: We need to gather, move, store, prepare, normalize, and vectorize the logs. The size of the time steps must be decided. Data transformation may require significant effort, since JSON logs, text logs, and logs with  inconsistent labeling patterns will have to be read and converted into a numeric array.  DataVec can help transform and normalize that data. As is the norm when developing machine learning models, the data must be split into a training set and a test (or evaluation) set.

Training the network

The net’s initial training will run on the training split of the input data.

For the first training runs, you may need to adjust some hyperparameters (“hyperparameters” are parameters that control the “configuration” of the model and how it trains) so that the model actually learns from the data, and does so in a reasonable amount of time. We discuss a few hyperparameters below. As the model trains, you should look for a steady decrease in error.

There is a risk that a neural network model will "overfit" on the data. A model that has been trained to the point of overfitting the dataset will get good scores on the training data, but will not make accurate decisions about data it has never seen before. It doesn’t “generalize” -- in machine-learning parlance. Deeplearning4J provides regularization tools and “early stopping” that help prevent overfitting while training.

Training the neural net is the step that will take the most time and hardware. Running training on GPUs will lead to a significant decrease in training time, especially for image recognition, but additional hardware comes with additional cost, so it’s important that your deep-learning framework use hardware as efficiently as possible. Cloud services such as Azure and Amazon provide access to GPU-based instances, and neural nets can be trained on heterogenous clusters with scalable commodity servers as well as purpose-built machines.

Productionizing the model

Deeplearning4J provides a ModelSerializer class to save a trained model. A trained model can  be saved and either be used (i.e., deployed to production) or updated later with further training.

When performing network anomaly detection in production, log files need to be serialized into the same format that the model trained on, and based on the output of the neural network, you would get reports on whether the current activity was in the range of normal expected network behavior.

Sample code

The configuration of a recurrent neural network might look something like this:

MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()

                .seed(123) .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).iterations(1) .weightInit(WeightInit.XAVIER) .updater(Updater.NESTEROVS).momentum(0.9) .learningRate(0.005) .gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue) .gradientNormalizationThreshold(0.5) .list() .layer(0, new GravesLSTM.Builder().activation("tanh").nIn(1).nOut(10).build()) .layer(1, new RnnOutputLayer.Builder(LossFunctions.LossFunction.MCXENT) .activation("softmax").nIn(10).nOut(numLabelClasses).build()) .pretrain(false).backprop(true).build(); MultiLayerNetwork net = new MultiLayerNetwork(conf); net.init();

Let’s describe a few important lines of this code:

  • .seed(123)

sets a random seed to initialize the neural net’s weights, in order to obtain reproducible results. Typically, coefficients are initialized randomly, and so to obtain consistent results while adjusting other hyperparameters, we need to set a seed, so we can use the same random weights over and over as we tune and test.

  • .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).iterations(1)

determines which optimization algorithm to use (in this case, stochastic gradient descent) to determine how to modify the weights to improve the error score. You probably won’t have to modify this.

  • .learningRate(0.005)

When using stochastic gradient descent, the error gradient (that is, the relation of a change in coefficients to a change in the net’s error) is calculated and the weights are moved along this gradient in an attempt to move the error towards a minimum.  SGD gives us the direction of less error, and the learning rate determines how big of a step is taken in that direction. If the learning rate is too high, you may overshoot the error minimum; if it is too low, your training will take forever. This is a hyperparameter you may need to adjust.

Getting Help

There is an active community of Deeplearning4J users who can be found on several support channels on Gitter.

About the author

Tom Hanlon is currently at Skymind.IO where he is developing a Training Program for Deeplearning4J. The consistent thread in Tom’s career has been data, from MySQL to Hadoop and now neural networks.

 

摘自:https://www.infoq.com/articles/deep-learning-time-series-anomaly-detection












本文转自张昺华-sky博客园博客,原文链接:http://www.cnblogs.com/bonelee/p/6432083.html,如需转载请自行联系原作者



相关文章
|
机器学习/深度学习 搜索推荐 算法
【王喆-推荐系统】模型篇-(task5)wide&deep模型
Wide&Deep是工业界中有巨大影响力的模型,如果直接翻译成中文是宽和深的模型,其模型结构如下所示:wide和deep让模型兼具逻辑回归和深度神经网络的特点。
852 0
【王喆-推荐系统】模型篇-(task5)wide&deep模型
|
2月前
Google Earth Engine(GEE)——影像分类中出现的错误(Classifier confusionMatrix: Property ‘type‘ of feature ‘000000)
Google Earth Engine(GEE)——影像分类中出现的错误(Classifier confusionMatrix: Property ‘type‘ of feature ‘000000)
17 0
|
10月前
|
机器学习/深度学习 测试技术
机器学习系列 | 01:多类别分类任务(multi-class)中为何precision,recall和F1相等?
在 multi-class 分类任务中,如果使用 micro 类指标,那么 micro-precision, micro-recall和micro-F1值都是相等的。本文主要针对这个现象进行解释。
|
11月前
|
机器学习/深度学习 编解码 算法
深度学习论文阅读目标检测篇(四)中英文对照版:YOLOv1《 You Only Look Once: Unified, Real-Time Object Detection》
我们提出了 YOLO,一种新的目标检测方法。以前的目标检测工 作重复利用分类器来完成检测任务。相反,我们将目标检测框架看作 回归问题,从空间上分割边界框和相关的类别概率。单个神经网络在 一次评估中直接从整个图像上预测边界框和类别概率。由于整个检测 流水线是单一网络,因此可以直接对检测性能进行端到端的优化。
155 0
|
11月前
|
机器学习/深度学习 传感器 编解码
深度学习论文阅读目标检测篇(四)中文版:YOLOv1《 You Only Look Once: Unified, Real-Time Object Detection》
我们的统一架构非常快。我们的基础YOLO模型以45帧/秒的速度实时处理图像。FastYOLO是YOLO的一个较小版本,每秒能处理惊人的155帧图像,同时实现其它实时检测器两倍的mAP。与最先进的检测系统相比,YOLO虽然存在较多的定位错误,但很少将背景预测成假阳性(译者注:其它先进的目标检测算法将背景预测成目标的概率较大)。最后,YOLO能学习到目标非常通用的表示。当从自然图像到艺术品等其它领域泛化时,它都优于其它检测方法,包括DPM和R-CNN。
222 0
|
11月前
|
机器学习/深度学习 算法 数据挖掘
深度学习论文阅读目标检测篇(三):Faster R-CNN《 Towards Real-Time Object Detection with Region Proposal Networks》
 最先进的目标检测网络依靠region proposal算法来推理检测目标的位置。SPPnet[1]和Fast R-CNN[2]等类似的研究已经减少了这些检测网络的运行时间,使得region proposal计算成为一个瓶颈。在这项工作中,我们引入了一个region proposal网络(RPN),该网络与检测网络共享整个图像的卷积特征,从而使近乎零成本的region proposal成为可能。
264 0
|
11月前
|
算法 自动驾驶 数据可视化
计算机视觉论文速递(六)GANet: A Keypoint-based Global Association Network for Lane Detection 基于关键点建模的全局关联网络
 在CVPR 2022上,商汤智能汽车-创新研发中心团队提出一种新的基于关键点建模的车道线检测范式,即全局关联网络(GANet),通过直接回归车道线关键点到车道线起始点的偏移,来完成对车道线关键点的并行聚合,从而实现高效且准确的车道线检测。除此以外,本文还提出一个车道线感知的特征增强模块,以增强车道线的局部关键点关联,提升车道线局部连续性。本文所提方法在多个公开数据集上均超越已有方法,取得了良好的精度-速度均衡。
215 0
|
11月前
|
机器学习/深度学习 算法 Linux
时间序列异常点检测算法(Smoothed z-score algorithm)
时间序列异常点检测算法(Smoothed z-score algorithm)
|
机器学习/深度学习 索引 Python
python机器学习classification_report()函数 输出模型评估报告
python机器学习classification_report()函数 输出模型评估报告
1826 0
python机器学习classification_report()函数 输出模型评估报告
|
机器学习/深度学习 人工智能 计算机视觉