一、Estimator簡介

Estimator是TensorFlow對完整模型的高級表示。Tensorflow提供一個包含多個API層的編程堆棧：

Estimator封裝了操作：訓練、評估、預測、導出以供使用。

二、數據集

通過tf.data模塊，構建輸入管道，將數據傳送到模型中。tf.data模塊返回的是Dataset對象，每個Dataset包含（feature_dict, labels）對。

https://blog.csdn.net/woniu201411/article/details/89249689

三、定義特徵列

特徵列視爲原始數據和Estimator之間的媒介。要創建特徵列，需要調用tf.feature_column模塊的函數。

1、數值列

tf.feature_column.numeric_column將具有默認數據類型（tf.float32）的數值指定爲模型輸入。

2、分桶列

tf.feature_column.bucketized_column將數字列根據數值範圍分爲不同的類別（爲模型中加入非線性特徵，提高模型的表達能力）。

3、分類標識列

tf.feature_column.categorical_column_with_identity將每個分桶表示一個唯一整數，模型可以在分類標識列中學習每個類別各自的權重。

4、分類詞彙列

tf.feature_column.categorical_column_with_vocabulary_list將字符串表示爲獨熱矢量，根據明確的詞彙表將每個字符串映射到一個整數。

tf.feature_column.categorical_column_with_vocabulary_file將字符串表示爲獨熱矢量，根據文件中的詞彙將每個字符串映射到一個整數。

5、經過哈希處理的列

tf.feature_column.categorical_column_with_hash_bucket將類別數量非常大的特徵列，模型會計算輸入的哈希值，然後使用模運算符將其置於其中一個hash_bucket_size類別中。

6、特徵組合列

tf.feature_column.categorical_column_with_hash_bucket將任意分類列進行組合，但僅構建hash_bucket_size參數所請求的類別數量。

7、指標列和嵌入列

指標列（tf.feature_column.indicator_column）和嵌入列(tf.feature_column.embedding_column)將分類列視爲輸入。

四、estimator創建模型

預創建的Estimator是tf.estimator.Estimator基類的子類，而自定義的Estimator是tf.estimator.Estimator的實例。兩者的使用區別在於，預創建的Estimator已有模型函數，而自定義的Estimator需要自己編寫模型函數。

1、預創建的estimator

Tensorflow提供了三個預創建的分類器Estimator(Estimator代表一個完整的模型):

tf.estimator.DNNClassifier 多類別分類的深度模型

tf.estimator.LinearClassifier 基於線性模型的分類器

tf.estimator.DNNLinearCombinedClassifier 寬度和深度模型

2、自定義的estimator

定義模型函數，模型參數具有以下參數：

def my_model_fn(features, labels, mode, params):

features、labels是從輸入函數中返回的特徵和標籤批次。

model表示調用程序是請求訓練、預測還是評估。tf.estimator.ModeKeys

params是調用程序將params傳遞給Estimator的構造函數，轉而又傳遞給model_fn.例如：

classifier = tf.estimator.Estimator(
    model_fn=my_model,
    params={
        'feature_columns': my_feature_columns,
        # Two hidden layers of 10 nodes each.
        'hidden_units': [10, 10],
        # The model must choose between 3 classes.
        'n_classes': 3,
    })

模型-輸入層：將特徵字典和feature_columns轉換爲模型的輸入

模型隱藏層：tf.layers提供所有類型的隱藏層，包括卷積層、池化層和丟棄層。

模型輸出層：tf.layers.dense定義輸出層。使用tf.nn.softmax將分數轉換爲概率。

五、模型訓練、評估和預測

Estimator方法	Estimator模式
train()	ModeKeys.TRAIN
evaluate()	ModeKeys.EVAL
predict()	ModeKeys.PREDICT

1、模型訓練

classifer.train(

input_fn = lambda:iris_data.train_input_fn(train_x, train_y, args.batch_size),

max_steps=args.train_steps)

Estimator會調用模型函數並將mode設爲ModeKeys.TRAIN

input_fn:輸入數據。將input_fn調用封裝在lambda中以獲取參數，提供一個不採用任何參數的輸入函數。

max_steps:模型訓練的最多步數。

在my_model_fn中，定義損失函數和優化損失函數的方法：

# Calculate Loss (for both TRAIN and EVAL modes)
    loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)  # 多類別分類問題，採用softmax交叉熵用作損失函數

    # Configure the Training Op (for TRAIN mode)
    # 採用隨機梯度下降法優化損失函數，學習速率爲0.001
    if mode == tf.estimator.ModeKeys.TRAIN:
        optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
        train_op = optimizer.minimize(
            loss=loss,
            global_step=tf.train.get_global_step())
        return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)

2、模型評估

# Evaluate the model.
eval_result = classifier.evaluate(
    input_fn=lambda:iris_data.eval_input_fn(test_x, test_y,args.batch_size))

print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))

Estimator會調用模型函數並將mode設爲ModeKeys.EVAL。模型函數必須返回一個包含模型損失和一個或多個指標（可選）的tf.estimator.EstimatorSpec.

使用tf.estrics計算常用指標

# Add evaluation metrics (for EVAL mode), 準確率指標
eval_metric_ops = {"accuracy": tf.metrics.accuracy(labels=labels,
predictions=predictions["classes"])}
return tf.estimator.EstimatorSpec(mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)

3、模型預測

predictions = classifier.predict(
    input_fn=lambda:iris_data.eval_input_fn(predict_x,labels=None,batch_size=args.batch_size))

調用Estimator的predict方法，則model_fn會收到mode=ModeKeys.PREDICT,模型函數返回一個包含預測的tf.estimator.EstimatorSpec.

 predictions = {
        # Generate predictions (for PREDICT and EVAL mode)
        "classes": tf.argmax(input=logits, axis=1),
        # Add `softmax_tensor` to the graph. It is used for PREDICT and by the
        # `logging_hook`.
        "probabilities": tf.nn.softmax(logits, name="softmax_tensor"),
        # Generate image feature vector
        "feature": dense
    }

if mode == tf.estimator.ModeKeys.PREDICT:
        return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

predictions存儲的是三個鍵值對：

classes:存儲的是模型對此樣本預測的最有可能的類別id；

probabilities:存儲的是樣本屬於各個類別的概率值；

features：存儲的是樣本的特徵向量（倒數第二層）。

六、模型保存和恢復

Estimator自動將模型信息寫入磁盤：檢查點，訓練期間所創建的模型版本；事件文件，包含TensorBoard用於創建可視化圖表的信息。在Estimator的構造函數model_dir參數中定義模型保存路徑。

模型保存：

如圖所示，第一次調用train會將檢查點和事件文件添加到model_dir目錄中。

默認情況下，Estimator按照以下時間安排將檢查點保存到model_dir中：每10分鐘（600秒）寫入一個檢查點；在train方法開始（第一次迭代）和完成（最後一次迭代）時寫入一個檢查點；在目錄中保留5個最近寫入的檢查點。

通過tf.estimator.RunConfig對默認保存時間更改：

my_checkpointing_config = tf.estimator.RunConfig(
    save_checkpoints_secs = 20*60,  # Save checkpoints every 20 minutes.
    keep_checkpoint_max = 10,       # Retain the 10 most recent checkpoints.
)

classifier = tf.estimator.DNNClassifier(
    feature_columns=my_feature_columns,
    hidden_units=[10, 10],
    n_classes=3,
    model_dir='models/iris',
    config=my_checkpointing_config)

模型恢復：

第一次調用estimator的train方法時，TensorFlow將第一個檢查點保存到model_dir中，隨後每次調用Estimator的train、evaluate或predict方法時，都會：Estimator運行model_fn構建模型圖；Estimator根據最近寫入的檢查點中存儲的數據來初始化新模型的權重。

通過檢查點恢復模型的狀態僅在模型和檢查點兼容時可行。例如，訓練一個DNNClassifier estimator，它包含2個隱藏層且每層都有10個節點，在訓練後，將每個隱藏層中的神經元數量從10改爲20，然後重新訓練模型，由於檢查點中的狀態與模型不兼容，會出現錯誤：

does not match the shape stored in checkpoint.

參考資料：

https://www.tensorflow.org/guide/premade_estimators#evaluate_the_trained_model

https://www.tensorflow.org/guide/custom_estimators

https://www.tensorflow.org/guide/checkpoints

https://www.tensorflow.org/guide/feature_columns

[tensorflow]tf.estimator.Estimator構建tensorflow模型

一、Estimator簡介

二、數據集

三、定義特徵列

四、estimator創建模型

五、模型訓練、評估和預測

六、模型保存和恢復

[矩陣分解]基於隱式反饋的矩陣分解ALS（spark實現）

[spark性能調優]spark submit資源參數調優及amazon集羣示例

[近鄰推薦]基於鄰域的算法-協同過濾算法

[kaggle]Titanic生還概率預測，accuracy-0.79425

[tensorflow]tf.data.Dataset數據輸入管道

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結