【量化】4天學會python機器學習與量化交易-筆記3(p16~p20)

原創

2020-07-02 15:02

文章目錄

平臺：https://www.ricequant.com/quant/#?tag=algorithm-ol&id=1339040
api1：https://www.ricequant.com/doc/rqdata-institutional#research-API-get_fundamentals
api2：https://www.ricequant.com/doc/api/python/chn#wizard-stock

p16 案例：多因子的市值因子選股介紹

多讀書，多看“報”（證券研究報告）

p17 案例：多因子的市值因子選股演示

視頻：https://www.bilibili.com/video/av55456917?p=17
代碼：

def init(context):
	context.hs300 = index_components("000300.XSHG")

# before_trading此函數會在每天策略交易開始前被調用，當天只會被調用一次
def before_trading(context):
    
    # 獲取過濾的股票
    q = query(
        fundamentals.eod_derivative_indicator.market_cap
    ).order_by(
        fundamentals.eod_derivative_indicator.market_cap
    ).filter(
        fundamentals.stockcode.in_(context.hs300)
    ).limit(20)

    funds = get_fundamentals(q)

    # 獲得10只股票的名字
    context.stock_list = funds.T.index

def handle_bar(context, bar_dict):
    # 賣出
    # 去positions裏面獲取倉位
    for stock in context.portfolio.positions.keys():
        if stock not in context.stock_list:
            order_target_percent(stock, 0)

    # 買入
    for stock in context.stock_list:
        order_target_percent(stock, 1.0/20)

結果：

p18 多因子策略流程、因子數據組成、去極值介紹

視頻：https://www.bilibili.com/video/av55456917?p=18

notebook位置：https://www.ricequant.com/research/user/user_358930/tree?
（原來以爲功能被刪了，後來直接輸入網址找到了）

（兩年前玩過一會這個，可是基礎太差直接放棄了。一轉眼研究生都快讀完了，重新拾起，這個方向肯定是我要研究的，給自己一個期限，看5年後是否能有所建樹。）

1，研究平臺api

get_price(“000001.XSHE”, start_date=“2017-01-01”, end_date=“2017-01-06”) #一隻股票
get_price([“000001.XSHE”, “000005.XSHE”, “000002.XSHE”], start_date=“2017-01-01”, end_date=“2017-01-06”, fields=“close”) #多隻股票
get_trading_dates(start_date=“2017-01-01”, end_date=“2018-01-01”) #獲取交易日日期
fund = get_fundamentals(q, entry_date=‘2017-01-03’)

# 獲取財務數據
q = query(fundamentals.income_statement.revenue,
         fundamentals.income_statement.cost_of_goods_sold
         ).filter(fundamentals.stockcode.in_(['000001.XSHE', '000002.XSHE']))
fund = get_fundamentals(q, entry_date='2017-01-03')
fund

結果：

<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 1 (major_axis) x 2 (minor_axis)
Items axis: revenue to cost_of_goods_sold
Major_axis axis: 2017-01-03 00:00:00 to 2017-01-03 00:00:00
Minor_axis axis: 000001.XSHE to 000002.XSHE

2，數據的組成

面板數據：存儲三維結構，由截面數據和序列數據組成。如上
截面數據：在同一時間，不同統計單位相同統計指標組成的數據列
序列數據：不同時間點上收集的數據，反映了某一事物、現象隨時間的變化狀態或程度
面板數據轉換成截面數據
fund[:, ‘2017-01-03’,:]

3，多因子數據的處理

多因子分析使用的是截面數據。
因子數據的處理
– 並不是去除異常數據，而是拉回到正常值。
三種方法：
– 分位數去極值：中位數，四分位數，百分位數
– 中位數絕對偏差去極值
– 正態分佈去極值

p19 案例：中位數去極值和3背中位數去極值

視頻：https://www.bilibili.com/video/av55456917?p=19
四分位數：從小到大排列，排第25%，50%，75%位置的數

指定分位區間以外的數用分位點替換

1，
代碼：

from scipy.stats.mstats import winsorize

# 對pe_ratio進行去極值
fund = get_fundamentals(query(fundamentals.eod_derivative_indicator.pe_ratio), entry_date='20170103')[:, '20170103', :]

fund['pe_ratio_winsorize'] = winsorize(fund['pe_ratio'], limits=0.025)

fund['pe_ratio'][:500].plot()
fund['pe_ratio_winsorize'][:500].plot()

結果：

自實現分位數：

2，中位數絕對偏差去極值

3倍中位數去極值（常用）

import numpy as np

def mad(factor):
    '''中位數絕對偏差去極值'''
    # 1,找出中位數
    me = np.median(factor)
    
    # 2,得到每個因子值與中位數的絕對偏差值 |x-median|
    # 3，得到絕對偏差的中位數mad = median(|x-median|)
    mad = np.median(abs(factor - me))
    
    # 4，計算MAD_e = 1.4826*MAD，然後確定參數n，做出調整
    # n取3，表示3倍中位數去極值
    # 求出3倍中位數的上下限
    up = me + (3* 1.4826* mad)
    down = me - (3* 1.4826* mad)
    
    # 利用上下限去極值
    factor = np.where(factor>up, up, factor)
    factor = np.where(factor<down, down, factor)
    
    return factor

#對pe_ratio去極值
fund['pe_ratio_3md'] = mad(fund['pe_ratio'])
fund['pe_ratio'][:500].plot()
fund['pe_ratio_3md'][:500].plot()

結果：

p20 案例：3sigma法去極值

視頻：https://www.bilibili.com/video/av55456917?p=20
（老師說不常用）

代碼：

# 3sigma方法去極值
def threesigma(factor):
    # 計算平均值和標準差
    mean = factor.mean()
    std = factor.std()
    
    # 計算上下限的數據
    up = mean + 3*std
    down = mean - 3*std
    
    # 替換極值
    factor = np.where(factor>up, up, factor)
    factor = np.where(factor<down, down, factor)
    
    return factor

#對pe_ratio去極值
fund['pe_ratio_3sigma'] = threesigma(fund['pe_ratio'])
fund['pe_ratio'][:500].plot()
fund['pe_ratio_3sigma'][:500].plot()

結果：

去極值：推薦中位數絕對偏差去極值，其次用分位數。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

【量化】4天學會python機器學習與量化交易-筆記3(p16~p20)

文章目錄

p16 案例：多因子的市值因子選股介紹

p17 案例：多因子的市值因子選股演示

p18 多因子策略流程、因子數據組成、去極值介紹

p19 案例：中位數去極值和3背中位數去極值

p20 案例：3sigma法去極值

爲什麼要⽤ Foundry

【筆記】動手學深度學習-預備知識

py發送email

MySQL 分庫分表方案，總結太全了。。

Qt/C++音視頻開發71-指定mjpeg/h264格式採集本地攝像頭/存儲文件到mp4/設備推流/採集推流

WPF開源輕便、快速的桌面啓動器

公司來了個新同事，把 DDD 運用得爐火純青！

中文詞向量的下載與使用探索 (tensorflow加載詞向量)

Windows10安裝Rtools [+解決system('g++ -v' 127錯誤]

A20.從零開始前後端react+flask - 查找數據

【金融】技術指標計算-筆記

【量化】4天學會python機器學習與量化交易-筆記3(p16~p20)

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結