wide and deep tensorflow 實現

github: https://github.com/chenlongzhen/wide-and-deep

wide and deep

wide and deep 代碼結構：

part	功能	代碼位置
reader	讀取數據	reader.py
processer	數據embedding,onehot等	train.py
build graph	wide and deep	deep.py
freeze and load graph	固化模型/加載模型	freeze.py/loadGraph.py

git:https://git.corpautohome.com/gp_mb_ad_algo_test/wandd/tree/master/wd_lab

|-- model_dir
|-- src
|   `-- train.py
`-- util
    |-- __init__.py
    |-- deep.py
    |-- feature_processing.py
    |-- freeze.py
    |-- loadGraph.py
    |-- outGraph.pb
    |-- reader.py

reader

讀取本地訓練測試數據餵給placeholder。主要使用pandas 生成迭代器分塊讀取數據，每次讀取的數據量爲一個batch。同時對缺失數據進行填充，數值型數據填充爲”0”,字符型數據填充爲”missing”。

主要步驟：
- 對文件夾下文件shuffle
- 將shuffle後文件順序循環讀取爲迭代器
- 缺失值填充
- 每次調用從迭代器取一個batch

processer

包含embedding，onehot,wide processing等。

embedding

初始化embedding： func multiEmbedding
對輸入數據hash／strig2number，再lookup：func get_emb
水平拼接每個特徵提取的emb矩陣：func get_emb

onehot

對輸入數據hash／strig2number: func get_onehot
水平拼接每個特徵提取的emb矩陣：func get_onehot
concat

numeric

string2number: func get_numeric
concat

wide processing

對數據按照特定格式分割，對每個特徵hash和crossing後構造稀疏矩陣。沒有使用dense matrix是因爲矩陣過大內存放不下。在訓練時使用sparse embedding look up 模仿 $wx+b$ 的形式解決tf.dense 不接受稀疏矩陣輸入的問題。

split by “##”
hash the split value
change it to sparse use tf.SparseTensor
cross pv and clks
concat sparse features use tf.sparse_concat

在concat遇到的問題是，每個特徵hash後的value 都是[0，bucket)，在concat時需要把特徵整理到[0,bucket1+bucket2+bucket3),所以，每個特徵hash的值需要

feature1 hash： hash value + 0
feautre2 hash：hash value + feature1 bucket
...

build graph

deep

deep.py

dense(1024)
relu
dropout(0.9)

dense(512)
relu
dropout(0.9)

dense(256)
relu
dropout(0.9)

wide

使用tf.nn.embedding_lookup_sparse模仿 $wx+b$ 的形式解決tf.dense不接受稀疏矩陣輸入的問題。

    with tf.variable_scope('wide_model', values = (wide_input,)) as dnn_hidden_scope:

        embeddings = tf.Variable(tf.truncated_normal(
            (w_number,),
            mean=0.0,
            stddev=1.0,
            dtype=tf.float32,
            seed=None,
            name="wide_init_w"
        ))
        bias = tf.Variable(tf.random_normal((1,)))

        wide_logits = tf.nn.embedding_lookup_sparse(embeddings, wide_input, None, combiner="sum") + bias

freezing and load graph

freezing graph 主要是用於固化模型和權重用對跨設備部署

freezing

freeze.py

#!/bin/env python

from tensorflow.python.tools import freeze_graph


prefix = "/data/new/wandd/wd_lab/model_test/"
input_graph_path = prefix + "widendeep.pbtxt"  # 圖的pbtxt文件
input_saver_def_path = ""
input_binary = False
output_node_names = "prediction"   # 輸出op的名字
restore_op_name = "save"   
filename_tensor_name = "save/Const:0"  #Const:0 是固定格式
clear_devices = True # 是否清楚設備的信息
input_meta_graph = prefix + "my-model.meta" #模型meta
checkpoint_path = prefix + "my-model-300"   #checkpoint
output_graph_filename= "./outGraph.pb"      #輸出pb
freeze_graph.freeze_graph(
    input_graph_path, input_saver_def_path, input_binary, checkpoint_path,
    output_node_names, restore_op_name, filename_tensor_name,
    output_graph_filename, clear_devices, "")

load graph

加載freeze好的graph，並且制定 output 和 input op 就可以做預測或者訓練。

loadGraph.py

參考：

https://blog.metaflow.fr/tensorflow-how-to-freeze-a-model-and-serve-it-with-a-python-api-d4f3596b3adc

TODO

有幾個問題：
- 讀取數據不是shuffle
- 讀取數據不是多線程效率低
- train.py 可以吧processing 拆出來
- 預加載訓練好的模型再訓練時候，tensorboard有問題，不知道怎麼獲取上一次訓練的step存到summary裏。
- 沒寫parsing

wide and deep multigpu

BUG: embedding 的的梯度沒有更新,待查.

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

wide and deep tensorflow 實現

wide and deep

reader

processer

embedding

onehot

numeric

wide processing

build graph

deep

wide

freezing and load graph

freezing

load graph

TODO

wide and deep multigpu

HTML頁面關於高分屏的設置

druid數據源 xml配置

vgg16 finetune tensorflow實現

sbt 安裝以及編譯spark程序

訓練CNN你需要知道的tricks/tips

tensorflow 動態分配內存以及設置使用那塊gpu的方法

tensorflow 基礎

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結