基於深度學習的推薦(二):基於FM初始化的FNN

公衆號

關注公衆號:推薦算法工程師,輸入"進羣",加入交流羣,和小夥伴們一起討論機器學習,深度學習,推薦算法.

前言

論文地址:https://arxiv.org/pdf/1601.02376.pdf
論文開源代碼(基於Theano實現):https://github.com/wnzhang/deep-ctr
參考代碼(無FM初始化):https://github.com/Sherryuu/CTR-of-deep-learning
重構代碼:https://github.com/wyl6/Recommender-Systems-Samples/tree/master/RecSys%20And%20Deep%20Learning/DNN/fnn

FNN = FM+MLP

FM在到底初始化什麼

FNN首先使用FM初始化輸入embedding層,然後使用MLP來進行CTR預估,具體怎麼做的呢?看論文中的一張圖:
在這裏插入圖片描述
單看圖來理解的有一定的迷惑性,加上z的輸出結公式就更有迷惑性了:
在這裏插入圖片描述
其中wi爲第i個field經FM初始化得到的一次項係數,vi就是隱向量,K爲隱向量vi的維度.如果初始化的是z,那Dense Real Layer顯示的結果顯示每個field只有1個wi和vi,這不對啊,之前看FM的時候每個field的每個特徵都對應一個wi和vi,這是怎麼回事呢?實際上,FM初始化的是係數向量x到dense layer之間的權重矩陣W:
在這裏插入圖片描述
我給大家畫張圖,假設樣本有3個field,3個field維度分別爲N1,N2,N3,那我們經過FM初始化可以獲得N=N1+N2+N3個隱向量和一次項係數w,用它們組成權重矩陣W0:
在這裏插入圖片描述
但是作者並沒有直接將x和權重矩陣相乘來計算z,這樣計算出的結果是K+1維,相當於把樣本的所有非零特徵對應的K+1維向量加起來.降維太過了,數據壓縮太厲害總會損失一部分信息,因此作者將每個field分別相乘得到K+1維結果,最後把所有field的結果串聯起來:
在這裏插入圖片描述
這樣初始化時,由於樣本每個field只有一個非零值,第i個field得到的z值就是非零特徵對應的w和v:
在這裏插入圖片描述

FNN的流程

瞭解FM初始化的是權重矩陣W0後,FNN流程就清楚了,從後往前看,一步到位:
在這裏插入圖片描述

代碼實戰

數據格式

數據共有22個field,各field中屬性取值的可枚舉個數爲:

FIELD_SIZES = [1037, 151, 59, 1603, 4, 333, 77890, 1857, 9, 8, 4, 7, 22, 3, 92, 56, 4, 920, 38176, 240, 2697, 4]

樣本x則被劃分爲:
在這裏插入圖片描述
由於是模擬CTR預估,所以標籤y是二分類的,實驗中y∈{0,1}.

參數保存與加載

FNN源代碼中沒有FM初始化這部分,只有MLP,博主自己加上了.

參數保存,常見的就是使用tf.train.Saver.保存所有模型的參數和值,然後加載部分參數或全部參數;或者保存指定參數和參數值,然後加載想要的參數和參數的值.爲了和源代碼藉口保持一致,我們並沒有使用tf.train.Saver,而是直接獲取參數值,構造一個字典,保存到本地:

def dump(self, model_path):
#        weight = [self.vars['w'], self.vars['v'], self.vars['b']]
#        saver = tf.train.Saver(weight)
#        saver.save(self.sess, model_path)
#        print(self.sess.run(self.vars['w']))
#        print(self.sess.run('w:0'))
#        print(self.vars['w'])
#        for i,j in self.vars.items():
#            print(i, j)
#            print(self.sess.run(j))
        var_map = {}
        for name, var in self.vars.items():
            print('----------------',name, var)
            var_map[name] = self.sess.run(var)
        pkl.dump(var_map, open(model_path, 'wb'))
        print('model dumped at', model_path)
        load_var_map = pkl.load(open(model_path, 'rb'))
        print('load_var_map[w]', load_var_map['w'])

pkl.dump可以保存多種類型的數據,用pkl.load加載,下面是加載的部分:

            feature_size = sum(field_sizes)
            init_vars.append(('w', [feature_size, 1], 'fm', dtype))
            init_vars.append(('v', [feature_size, embed_size], 'fm', dtype))
            init_vars.append(('b', [1, ], 'fm', dtype))
            
            self.vars = utils.init_var_map(init_vars, init_path)
            init_w0 = tf.concat([self.vars['w'],self.vars['v']], 1)
            lower, upper = 0, field_sizes[0]
            for i in range(num_inputs):
                if(i != 0):
                    lower, upper = upper, upper+field_sizes[i]
                self.vars['embed_%d' % i] = init_w0[lower:upper]
            w0 = [self.vars['embed_%d' % i] for i in range(num_inputs)]

其中的init_var_map函數如下:

def init_var_map(init_vars, init_path=None):
    if init_path is not None:
        load_var_map = pkl.load(open(init_path, 'rb'))
        print('load variable map from', init_path, load_var_map.keys())
    var_map = {}
    for var_name, var_shape, init_method, dtype in init_vars:
        if init_method == 'zero':
            var_map[var_name] = tf.Variable(tf.zeros(var_shape, dtype=dtype), name=var_name, dtype=dtype)
        elif init_method == 'one':
            var_map[var_name] = tf.Variable(tf.ones(var_shape, dtype=dtype), name=var_name, dtype=dtype)
        elif init_method == 'normal':
            var_map[var_name] = tf.Variable(tf.random_normal(var_shape, mean=0.0, stddev=STDDEV, dtype=dtype),
                                            name=var_name, dtype=dtype)
        elif init_method == 'tnormal':
            var_map[var_name] = tf.Variable(tf.truncated_normal(var_shape, mean=0.0, stddev=STDDEV, dtype=dtype),
                                            name=var_name, dtype=dtype)
        elif init_method == 'uniform':
            var_map[var_name] = tf.Variable(tf.random_uniform(var_shape, minval=MINVAL, maxval=MAXVAL, dtype=dtype),
                                            name=var_name, dtype=dtype)
        elif init_method == 'xavier':
            maxval = np.sqrt(6. / np.sum(var_shape))
            minval = -maxval
            value = tf.random_uniform(var_shape, minval=minval, maxval=maxval, dtype=dtype)
            var_map[var_name] = tf.Variable(value, name=var_name, dtype=dtype)
        elif isinstance(init_method, int) or isinstance(init_method, float):
            var_map[var_name] = tf.Variable(tf.ones(var_shape, dtype=dtype) * init_method, name=var_name, dtype=dtype)
        elif init_method == 'fm':
            var_map[var_name] = tf.Variable(load_var_map[var_name], name=var_name, dtype=dtype)
        else:
            print('BadParam: init method', init_method)
    return var_map

模型如何使用

調試的過程如下,首先設置algo='fm',獲得一次項係數w和隱向量v,保存參數;然後algo='fnn',進行CTR預測.

# algo = 'fm'
algo = 'fnn'
if algo in {'fnn','anfm','amlp','ccpm','pnn1','pnn2'}:
    train_data = utils.split_data(train_data)
    test_data = utils.split_data(test_data)
    tmp = []
    for x in field_sizes:
        if x > 0:
            tmp.append(x)
    field_sizes = tmp
    print('remove empty fields', field_sizes)
    
if algo == 'fm':
    fm_params = {
        'input_dim': input_dim,
        'factor_order': 128,
        'opt_algo': 'gd',
        'learning_rate': 0.1,
        'l2_w': 0,
        'l2_v': 0,
    }
    print(fm_params)
    model = FM(**fm_params)
elif algo == 'fnn':
    fnn_params = {
        'field_sizes': field_sizes,
        'embed_size': 129,
        'layer_sizes': [500, 1],
        'layer_acts': ['relu', None],
        'drop_out': [0, 0],
        'opt_algo': 'gd',
        'learning_rate': 0.1,
        'embed_l2': 0,
        'layer_l2': [0, 0],
        'random_seed': 0,
        'init_path':pkl_path,
    }
    print(fnn_params)
    model = FNN(**fnn_params)

運行結果

FNN使用‘Xavier’初始化時:

        for i in range(num_inputs):
            init_vars.append(('embed_%d' % i, [field_sizes[i], embed_size], 'xavier', dtype))

運行10次效果爲:

[0]     training...
[0]     evaluating...
[0]     loss (with l2 norm):0.358097    train-auc: 0.610657     eval-auc: 0.661392
[1]     training...
[1]     evaluating...
[1]     loss (with l2 norm):0.350506    train-auc: 0.624879     eval-auc: 0.679986
[2]     training...
[2]     evaluating...
[2]     loss (with l2 norm):0.348581    train-auc: 0.631834     eval-auc: 0.688470
[3]     training...
[3]     evaluating...
[3]     loss (with l2 norm):0.347268    train-auc: 0.637031     eval-auc: 0.694607
[4]     training...
[4]     evaluating...
[4]     loss (with l2 norm):0.346279    train-auc: 0.641287     eval-auc: 0.699670
[5]     training...
[5]     evaluating...
[5]     loss (with l2 norm):0.345490    train-auc: 0.644798     eval-auc: 0.703892
[6]     training...
[6]     evaluating...
[6]     loss (with l2 norm):0.344828    train-auc: 0.647727     eval-auc: 0.707407
[7]     training...
[7]     evaluating...
[7]     loss (with l2 norm):0.344262    train-auc: 0.650155     eval-auc: 0.710297
[8]     training...
[8]     evaluating...
[8]     loss (with l2 norm):0.343769    train-auc: 0.652261     eval-auc: 0.712707
[9]     training...
[9]     evaluating...
[9]     loss (with l2 norm):0.343332    train-auc: 0.654116     eval-auc: 0.714787

FM迭代50次後初始化:FNN運行結果爲:

[0]     training...
[0]     evaluating...
[0]     loss (with l2 norm):0.361066    train-auc: 0.607293     eval-auc: 0.642668
[1]     training...
[1]     evaluating...
[1]     loss (with l2 norm):0.353281    train-auc: 0.634517     eval-auc: 0.679833
[2]     training...
[2]     evaluating...
[2]     loss (with l2 norm):0.350498    train-auc: 0.640884     eval-auc: 0.688085
[3]     training...
[3]     evaluating...
[3]     loss (with l2 norm):0.347988    train-auc: 0.648423     eval-auc: 0.696806
[4]     training...
[4]     evaluating...
[4]     loss (with l2 norm):0.345739    train-auc: 0.657166     eval-auc: 0.706803
[5]     training...
[5]     evaluating...
[5]     loss (with l2 norm):0.343678    train-auc: 0.665929     eval-auc: 0.716429
[6]     training...
[6]     evaluating...
[6]     loss (with l2 norm):0.341738    train-auc: 0.674693     eval-auc: 0.725318
[7]     training...
[7]     evaluating...
[7]     loss (with l2 norm):0.339869    train-auc: 0.682893     eval-auc: 0.733139
[8]     training...
[8]     evaluating...
[8]     loss (with l2 norm):0.338055    train-auc: 0.690134     eval-auc: 0.739590
[9]     training...
[9]     evaluating...
[9]     loss (with l2 norm):0.336269    train-auc: 0.696557     eval-auc: 0.744801

auc值變大,明顯得到改善。

參考代碼(無FM初始化):https://github.com/Sherryuu/CTR-of-deep-learning
重構代碼:https://github.com/wyl6/Recommender-Systems-Samples/tree/master/RecSys%20And%20Deep%20Learning/DNN/fnn

參考

https://arxiv.org/pdf/1601.02376.pdf

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章