使用tensorflow预测时间序列：TFTS库

Tensorflow1.3版本中引入tensorflow time series模块，简称TFTS，专门设计一套针对时间序列预测问题的API，提供AR、anomaly mixture AR和LSTM三种预测模型

#项目地址
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/timeseries
#代码地址
https://github.com/hzy46/TensorFlow-Time-Series-Examples

1.时间序列数据读入

TFTS库提供了两个读取器NumpyReader(从numpy数组中读入数据)和CSVReader(从csv文件中读取数据)

1.从numpy数组中读入时间序列数据

#!/anaconda3/bin/python
# -*- coding: utf-8 -*-

from __future__ import print_function
import numpy as np
import matplotlib
import os
matplotlib.use('agg')
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.contrib.timeseries.python.timeseries import NumpyReader


print(os.getcwd())
x = np.array(range(1000))
noise = np.random.uniform(-0.2,0.2,1000)
y=np.sin(np.pi*x/100)+x/200+noise
plt.plot(x,y)
plt.savefig('gg.jpg')

TFTS读入x和y的方式：

data = {tf.contrib.timeseries.TrainEvalFeatures.TIMES: x, tf.contrib.timeseries.TrainEvalFeatures.VALUES: y,}
reader = NumpyReader(data)

首先将x和y变成python中的字典(变量data)，变量data中的键实际就是一个字符串"times"，值就是字符串"values"。故上面定义也可直接写成data={'times':x , 'values':y}，写成较复杂形式是为了与源码中写法保持一致

得到的reader有个read_full()方法，其返回值即时间序列对应的tensor

with tf.Session() as sess:
    full_data = reader.read_full()
    #调用read_full方法会生成读取序列
    #要用tf.train.start_queue_runners启动队列才能正常进行读取获取值
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(sess=sess,coord=coord)
    print(sess.run(full_data))
    coord.request_stop()

不能直接用sess.run(reader.read_full())从reader中读取所有数据，原因是方法read_full()会产生读取序列，而队列线程此时还未启动

训练时通常不会使用整个数据集进行训练，而是采用batch的形式，从reader出发建立batch数据的方法：

train_input_fn = tf.contrib.timeseries.RandomWindowInputFn(reader,batch_size=2,window_size=10)

tf.contrib.timeseries.RandomWindowInputFn会在reader所有数据中随机选取窗口长度为window_size的序列，幷包装成batch_size大小的batch数据。即一个batch共有batch_size个序列，每个序列长度为window_size

打印出一个batch内的数据：

with tf.Session() as sess:
    batch_data = train_input_fn.create_batch()
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(sess=sess,coord=coord)
    one_batch = sess.run(batch_data[0])
    coord.request_stop()
print(one_batch)

2.AR模型

自回归模型简称AR模型，是统计学上处理时间序列模型的基本方法之一，使用AR模型训练、验证并进行时间序列预测的示例程序如下，其中定义的AR模型每次接受长度=30的输入观测值并输出长度=10的预测序列，整个训练集是长度=1000的序列，前30个数首先被当作“初始观测序列”输入到模型中，由此可计算下面10步的预测值，接着再取30个数进行预测，这30个数中有10个数是前一步的预测值，新得到的预测值又会变成下一步的输入，以此类推

最终得到970个预测值，970个预测值就被记录在evaluation['mean']中。其他键值，如evaluation['loss']表示总损失，evaluation['times']表示evaluation['mean']对应的时间点等

evaluation['start_tuple']会被用于之后预测中，相当于最后30步的输出值和对应时间点，以此为起点可对1000步之后的值进行预测

#!/anaconda3/bin/python
# -*- coding: utf-8 -*-

from __future__ import print_function
import numpy as np
import matplotlib
matplotlib.use('agg')
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.contrib.timeseries.python.timeseries import  NumpyReader


def main(_):
    x = np.array(range(1000))
    noise = np.random.uniform(-0.2, 0.2, 1000)
    y = np.sin(np.pi * x / 100) + x / 200. + noise
    plt.plot(x, y)
    plt.savefig('timeseries_y.jpg')

    data = {
        tf.contrib.timeseries.TrainEvalFeatures.TIMES: x,
        tf.contrib.timeseries.TrainEvalFeatures.VALUES: y,
    }

    reader = NumpyReader(data)

    train_input_fn = tf.contrib.timeseries.RandomWindowInputFn(
        reader, batch_size=16, window_size=40)

    #periodicities表示序列规律性周期，根据np.sin函数判定周期=200
    #input_window_size表示模型每次输入的值
    #output_window_size表示模型每次输出的值，output_window_size+input_window_size=window_size
    #前30个数当作模型输入值，后10个数为输入对应的目标输出值
    #参数loss指定采取哪一种损失，两种选择NORMAL_LIKELIHOOD_LOSS、SQUARED_LOSS
    #num_features表示在一个时间点上观测到的数的维度
    ar = tf.contrib.timeseries.ARRegressor(
        periodicities=200, input_window_size=30, output_window_size=10,
        num_features=1,
        loss=tf.contrib.timeseries.ARModel.NORMAL_LIKELIHOOD_LOSS)

    ar.train(input_fn=train_input_fn, steps=6000)

    #TFTS中验证含义：使用训练好的模型在原先的训练集上进行计算，可观察到模型拟合效果
    evaluation_input_fn = tf.contrib.timeseries.WholeDatasetInputFn(reader)
    # keys of evaluation: ['covariance', 'loss', 'mean', 'observed', 'start_tuple', 'times', 'global_step']
    evaluation = ar.evaluate(input_fn=evaluation_input_fn, steps=1)

    #预测后250步的值
    (predictions,) = tuple(ar.predict(
        input_fn=tf.contrib.timeseries.predict_continuation_input_fn(
            evaluation, steps=250)))

    plt.figure(figsize=(15, 5))
    plt.plot(data['times'].reshape(-1), data['values'].reshape(-1), label='origin')
    plt.plot(evaluation['times'].reshape(-1), evaluation['mean'].reshape(-1), label='evaluation')
    plt.plot(predictions['times'].reshape(-1), predictions['mean'].reshape(-1), label='prediction')
    plt.xlabel('time_step')
    plt.ylabel('values')
    plt.legend(loc=4)
    plt.savefig('predict_result.jpg')


if __name__ == '__main__':
    tf.logging.set_verbosity(tf.logging.INFO)
    tf.app.run()

3.LSTM模型（单变量+多变量）

#!/anaconda3/bin/python
# -*- coding: utf-8 -*-

# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from os import path

import numpy as np
import tensorflow as tf

from tensorflow.contrib.timeseries.python.timeseries import estimators as ts_estimators
from tensorflow.contrib.timeseries.python.timeseries import model as ts_model
from tensorflow.contrib.timeseries.python.timeseries import  NumpyReader

import matplotlib
matplotlib.use("agg")
import matplotlib.pyplot as plt


class _LSTMModel(ts_model.SequentialTimeSeriesModel):
  """A time series model-building example using an RNNCell."""

  def __init__(self, num_units, num_features, dtype=tf.float32):
    """Initialize/configure the model object.
    Note that we do not start graph building here. Rather, this object is a
    configurable factory for TensorFlow graphs which are run by an Estimator.
    Args:
      num_units: The number of units in the model's LSTMCell.
      num_features: The dimensionality of the time series (features per
        timestep).
      dtype: The floating point data type to use.
    """
    super(_LSTMModel, self).__init__(
        # Pre-register the metrics we'll be outputting (just a mean here).
        train_output_names=["mean"],
        predict_output_names=["mean"],
        num_features=num_features,
        dtype=dtype)
    self._num_units = num_units
    # Filled in by initialize_graph()
    self._lstm_cell = None
    self._lstm_cell_run = None
    self._predict_from_lstm_output = None

  def initialize_graph(self, input_statistics):
    """Save templates for components, which can then be used repeatedly.
    This method is called every time a new graph is created. It's safe to start
    adding ops to the current default graph here, but the graph should be
    constructed from scratch.
    Args:
      input_statistics: A math_utils.InputStatistics object.
    """
    super(_LSTMModel, self).initialize_graph(input_statistics=input_statistics)
    self._lstm_cell = tf.nn.rnn_cell.LSTMCell(num_units=self._num_units)
    # Create templates so we don't have to worry about variable reuse.
    self._lstm_cell_run = tf.make_template(
        name_="lstm_cell",
        func_=self._lstm_cell,
        create_scope_now_=True)
    # Transforms LSTM output into mean predictions.
    self._predict_from_lstm_output = tf.make_template(
        name_="predict_from_lstm_output",
        func_=lambda inputs: tf.layers.dense(inputs=inputs, units=self.num_features),
        create_scope_now_=True)

  def get_start_state(self):
    """Return initial state for the time series model."""
    return (
        # Keeps track of the time associated with this state for error checking.
        tf.zeros([], dtype=tf.int64),
        # The previous observation or prediction.
        tf.zeros([self.num_features], dtype=self.dtype),
        # The state of the RNNCell (batch dimension removed since this parent
        # class will broadcast).
        [tf.squeeze(state_element, axis=0)
         for state_element
         in self._lstm_cell.zero_state(batch_size=1, dtype=self.dtype)])

  def _transform(self, data):
    """Normalize data based on input statistics to encourage stable training."""
    mean, variance = self._input_statistics.overall_feature_moments
    return (data - mean) / variance

  def _de_transform(self, data):
    """Transform data back to the input scale."""
    mean, variance = self._input_statistics.overall_feature_moments
    return data * variance + mean

  def _filtering_step(self, current_times, current_values, state, predictions):
    """Update model state based on observations.
    Note that we don't do much here aside from computing a loss. In this case
    it's easier to update the RNN state in _prediction_step, since that covers
    running the RNN both on observations (from this method) and our own
    predictions. This distinction can be important for probabilistic models,
    where repeatedly predicting without filtering should lead to low-confidence
    predictions.
    Args:
      current_times: A [batch size] integer Tensor.
      current_values: A [batch size, self.num_features] floating point Tensor
        with new observations.
      state: The model's state tuple.
      predictions: The output of the previous `_prediction_step`.
    Returns:
      A tuple of new state and a predictions dictionary updated to include a
      loss (note that we could also return other measures of goodness of fit,
      although only "loss" will be optimized).
    """
    state_from_time, prediction, lstm_state = state
    with tf.control_dependencies(
            [tf.assert_equal(current_times, state_from_time)]):
      transformed_values = self._transform(current_values)
      # Use mean squared error across features for the loss.
      predictions["loss"] = tf.reduce_mean(
          (prediction - transformed_values) ** 2, axis=-1)
      # Keep track of the new observation in model state. It won't be run
      # through the LSTM until the next _imputation_step.
      new_state_tuple = (current_times, transformed_values, lstm_state)
    return (new_state_tuple, predictions)

  def _prediction_step(self, current_times, state):
    """Advance the RNN state using a previous observation or prediction."""
    _, previous_observation_or_prediction, lstm_state = state
    lstm_output, new_lstm_state = self._lstm_cell_run(
        inputs=previous_observation_or_prediction, state=lstm_state)
    next_prediction = self._predict_from_lstm_output(lstm_output)
    new_state_tuple = (current_times, next_prediction, new_lstm_state)
    return new_state_tuple, {"mean": self._de_transform(next_prediction)}

  def _imputation_step(self, current_times, state):
    """Advance model state across a gap."""
    # Does not do anything special if we're jumping across a gap. More advanced
    # models, especially probabilistic ones, would want a special case that
    # depends on the gap size.
    return state

  def _exogenous_input_step(
          self, current_times, current_exogenous_regressors, state):
    """Update model state based on exogenous regressors."""
    raise NotImplementedError(
        "Exogenous inputs are not implemented for this example.")


if __name__ == '__main__':
  tf.logging.set_verbosity(tf.logging.INFO)
  #y对x的函数关系复杂，更适合用LSTM这样的模型找出其中规律
  x = np.array(range(1000))
  noise = np.random.uniform(-0.2, 0.2, 1000)
  y = np.sin(np.pi * x / 50 ) + np.cos(np.pi * x / 50) + np.sin(np.pi * x / 25) + noise

  data = {
      tf.contrib.timeseries.TrainEvalFeatures.TIMES: x,
      tf.contrib.timeseries.TrainEvalFeatures.VALUES: y,
  }

  reader = NumpyReader(data)

  #4个随机选取的序列，每个序列长度=100
  train_input_fn = tf.contrib.timeseries.RandomWindowInputFn(
      reader, batch_size=4, window_size=100)

  #num_features表示每个时间点上观察的量只是单独的一个数值
  #num_units表示使用隐层=128大小的LSTM模型
  estimator = ts_estimators.TimeSeriesRegressor(
      model=_LSTMModel(num_features=1, num_units=128),
      optimizer=tf.train.AdamOptimizer(0.001))

  estimator.train(input_fn=train_input_fn, steps=2000)
  evaluation_input_fn = tf.contrib.timeseries.WholeDatasetInputFn(reader)
  evaluation = estimator.evaluate(input_fn=evaluation_input_fn, steps=1)
  # Predict starting after the evaluation
  (predictions,) = tuple(estimator.predict(
      input_fn=tf.contrib.timeseries.predict_continuation_input_fn(
          evaluation, steps=200)))

  observed_times = evaluation["times"][0]
  observed = evaluation["observed"][0, :, :]
  evaluated_times = evaluation["times"][0]
  evaluated = evaluation["mean"][0]
  predicted_times = predictions['times']
  predicted = predictions["mean"]

  plt.figure(figsize=(15, 5))
  plt.axvline(999, linestyle="dotted", linewidth=4, color='r')
  observed_lines = plt.plot(observed_times, observed, label="observation", color="k")
  evaluated_lines = plt.plot(evaluated_times, evaluated, label="evaluation", color="g")
  predicted_lines = plt.plot(predicted_times, predicted, label="prediction", color="r")
  plt.legend(handles=[observed_lines[0], evaluated_lines[0], predicted_lines[0]],
             loc="upper left")
  plt.savefig('predict_result.jpg')

使用tensorflow预测时间序列：TFTS库

1.时间序列数据读入

2.AR模型

3.LSTM模型（单变量+多变量）

通过HPA+CronHPA组合应对业务复杂弹性伸缩场景

模型評估方法-K-S值-附R實現代碼

HIVE中join連接全解析

使用tensorflow預測時間序列：TFTS庫

windows10中使用jupyter lab

方差分析

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結