日萌社

人工智能AI：Keras PyTorch MXNet TensorFlow PaddlePaddle 深度學習實戰（不定時更新）

2.3 多召回策略

學習目標

目標
- 知道多召回策略的設計
應用
- 應用完成用戶多召回策略實現

2.3.1 召回模塊

召回模塊作用: 針對當前用戶儘可能多的覆蓋其喜好而進行帖子的篩選。
召回模塊組成: 召回模塊包括, ETL模塊, 召回池(包含各種召回策略), 召回池二級緩存, 規則過濾器服務。
ETL模塊計算技術解決方案: 使用AWS彈性雲計算服務, 針對海量數據進行處理, 以最小的代價獲取和配置計算能力。
召回策略計算技術解決方案: 圖數據neo4j的內部計算與查詢。

2.3.2 多召回策略介紹

公共召回策略
- 熱門召回: 取當前交互操作得分最多的前n個帖子入召回池
- 時間召回: 取指定時間段內(一般取最新發布)的前n個帖子入召回池
- 速度召回: 取單位時間被進行交互操作次數最多的前n個帖子入召回池
- 加速度召回: 取單位時間內速度變化（上升）最快的前n個帖子入召回池
- 協同過濾召回:取與用戶隱含向量最相似的n個帖子
- 明星二度關係召回定義:取與用戶隱含向量最相似的n個帖子
- 隨機召回: 隨機選擇若干帖子
個性化召回策略
- n度關係召回: 圖譜中存在基於邊的查詢, 因此可以使用n度關係進行個性化召回, 最簡單的n度關係召回就是基於用戶的協同過濾
- 興趣召回: 對具有相同的興趣的用戶召回相關的帖子, 這裏將用戶關注的明星視爲其感興趣的點。
自定義召回策略
- 由產品運營根據產品活動等自定義的召回策略

2.3.3 多召回代碼實現

2.3.3.1 用戶推薦請求

get_recomm

獲取用戶推薦結果

@api_view(['GET', 'POST'])
def get_recomm(request):
    IP = request.META.get("HTTP_X_REAL_IP")
    result = r_api._get_recomm(str(IP))
    return HttpResponse(json.dumps(result, ensure_ascii=False))

其中都會通過from recomm import api as r_api這個包的相關函數進行推薦，我們推薦邏輯主要都在recomm模塊中，這是自定義命名的，當做推薦模塊使用。創建一個recomm模塊，添加api.py文件

其中_get_recomm的推薦主體邏輯如下

def _get_recomm(IP, uid):
    """推薦全流程"""
    # 1、獲取召回數據
    #（1）獲得熱門召回數據
    hot_data = _get_hot()
    # 獲得最近發佈召回數據
    last_data = get_last()
    # 獲得單位時間內增長速度最快的帖子
    v_data = get_v()
    # 獲得基於用戶的協同過濾召回數據
    r_data = get_r(uid)
    # 隨機召回數據
    random_data = get_random()
    all_data = [hot_data] + [last_data] + [v_data] + [r_data] + [random_data]
    # 進行金字塔規則計算並寫入
    # （2）不給召回策略施加任何權重的召回金字塔計算
    j_data = pyramid_array(all_data)
    # 將數據寫入金子塔並返回應該推送給規則過濾器的數據
    r_data = j_data_write(uid, j_data)
    # （3）將數據推送給規則規律器做數據內部去重
    # f_data = rfilter(r_data)

    # 2、排序部分
    # 根據uid，pid組合特徵
    # 根據規則過濾器中的數據索引獲得特徵
    feature = get_feature_from_neo(r_data)   
    # 特徵預處理
    feature = fea_process(feature)
    # 加載模型並對結果排序
    rank_data = model_use(feature)
    return v_get_cache(IP)

主要流程：獲取多個召回結果，進行金字塔規則計算寫入，規則過濾器去重，特徵預處理，模型排序，寫入緩存並推薦出去

目前我們先關注召回的獲取，然後再繼續往後講解

召回連接設置

from neo4j.v1 import GraphDatabase

NEO4J_CONFIG = dict({
    "uri": "bolt://192.168.19.137:7687",
    "auth": ("neo4j", "itcast"),
    "encrypted": False
})

_driver = GraphDatabase.driver(**NEO4J_CONFIG)

2.3.3.2公共召回

1、熱門召回實現

熱門召回cypher查詢：'match(a:SuperfansPost) set a.hot_score = 2*a.commented_num+a.liked_num+3*a.forwarded_num return a.pid order by a.hot_score desc limit 100'

語句解釋：匹配所有的帖子，並且計算每個帖子的熱度分數(公式：2*評論數量+喜歡數量+3*轉發數量)，按照熱度分數倒序排序取前100高的帖子ID

def _get_hot():
    """獲得熱門召回推薦"""
    with _driver.session() as session:
        cypher = "match(a:SuperfansPost) set a.hot_score = 2*a.commented_num+a.liked_num+3*a.forwarded_num return a.pid order by a.hot_score desc limit 100"
        record = session.run(cypher)
        result = list(map(lambda x: x[0], record))
    return result

2、時間召回推薦

時間召回實現:"match(a:SuperfansPost) return a.pid order by a.publish_time desc limit 100"
- cypher語句：查詢匹配所有帖子，按照發布時間排序取最早的前100個帖子

def get_last():
    """獲得時間召回推薦"""
    with _driver.session() as session:
        cypher = "match(a:SuperfansPost)  return a.pid order by a.publish_time desc limit 100"
        record = session.run(cypher)
        result = list(map(lambda x: x[0], record))
    return result

2.3.3.4 速度召回實現

速度召回實現:"MATCH (a:SuperfansPost_A) MATCH(b:SuperfansPost) where(a.pid = b.pid) SET b.v=a.hot_score- b.hot_score return b.pid order by b.v desc"
- 語句解釋：SuperfansPost_A代表過去某段時間的帖子信息, 計算他們的熱度差值並排序取最高的前100

def get_v():
    """獲得速度召回推薦"""
    with _driver.session() as session:
        cypher = "MATCH (a:SuperfansPost_A) MATCH(b:SuperfansPost) where(a.pid = b.pid) SET b.v=a.hot_score- b.hot_score return b.pid order by b.v desc"
        record = session.run(cypher)
        result = list(map(lambda x: x[0], record))
    return result

2.3.3.3 個性化召回

1、n度關係召回實現(就是UserCF，用戶協同過濾)

n度關係召回實現，個性化召回策略必須傳入參數uid, 根據uid進行n度關係的查詢, 來召回指定數據。

n度關係召回實現:"match(a{uid:%d})-[r]-(b:SuperfansPost)-[r2]-(c:SuperfansUser)-[r3]-(d:SuperfansPost) return d.pid limit 100"
- 語句解釋：查詢某用戶uid發生過行爲關係的b個帖子，b個帖子有關係變的c個用戶，然後將c個用戶他們所發生行爲關係的文章d推薦出去100個帖子

def get_r(uid):
    """獲得基於用戶的帖子二度關係召回"""
    with _driver.session() as session:
        cypher = "match(a{uid:%d})-[r]-(b:SuperfansPost)-[r2]-(c:SuperfansUser)-[r3]-(d:SuperfansPost) return d.pid limit 100" % int(uid)
        record = session.run(cypher)
        result = list(map(lambda x: x[0], record))
    return result

2.3.3.5 隨機召回

隨機召回實現:"match(a:SuperfansPost) return a.pid limit 1000"
- 查詢匹配所有帖子取出1000個返回

def get_random():
    """獲得運營自定義召回,這裏是隨機召回"""
    with _driver.session() as session:
        cypher = "match(a:SuperfansPost) return a.pid limit 1000"
        record = session.run(cypher)
        pid_list = list(map(lambda x: x[0], record))
    result = random.sample(pid_list, 20)
    return result

2.3.3.4 召回策略設置調整

召回策略調整: 召回策略將不斷作出調整和優化, 主要基於推薦成功帖子中各召回策略所佔比例, 一般以周爲週期, 進行一次調整, 修改佔比最小的召回策略。
擴展:召回中還可以存在許多使用數據挖掘和機器學習方法的策略, 根據應用本身的屬性來定義更多可能相關的用戶興趣, 如影視愛好者, 基於此對用戶的簽名, 評論內容以及發佈內容進行建模, 來擴展召回策略。

2.3.4 小結

召回模塊的流程
多種召回策略的定義以及代碼實現

召回模塊：多召回策略

日萌社

2.3 多召回策略

學習目標

2.3.1 召回模塊

2.3.2 多召回策略介紹

2.3.3 多召回代碼實現

2.3.4 小結

DAPPER 事務 TRANSACTION

泛娛樂推薦系統

召回模塊：用戶推薦邏輯完善

召回模塊：召回金字塔

召回模塊：規則過濾器

排序模塊：泛娛樂特徵工程與模型代碼構建

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結