预测A股行业板块动向

1.写在前面

微信和米哥加了个好友,看他朋友圈推荐了一个kesci比赛的东西,关于预测A股行业板块动向的比赛,我平时也有炒股,而且感觉米哥这个人是靠谱的,比赛的数据也是tushare提供的,所以就参加了这个比赛,用了一些比较常见的算法。
比赛的网址

比赛明确说明用新闻信息,NLP方面,但是这次比赛的新闻信息用的是新闻联播,预测的时间周期是1,2,3天,感觉和新闻联播可能关系不大(可能我对新闻确实不怎么关注)。本着玩票的心态参加了这个比赛,没用新闻数据,只用了成交价,做一些简单的指标分析。客观得分还不错,主观得分需要kesci专家review代码,不是我能左右的。
客观得分应该是前三名(大概是第二)
在这里插入图片描述

2.指标选择

关于指标选择,本人选择了比较常见的几个指标,百度都能查得到。
BOLL:布林线上轨的值 , 布林线中轨的值, 布林线下轨的值,布林线是否穿越下轨(从下向上穿越下轨用1,其它算作0)
KDJ:kdj的值,kd金叉(金叉算作1,其它算作0),kj是否金叉,J是否超卖
MACD:macd的值
CCI:6日cci的值,10日cci的值,6日cci是否超卖,10日cci是否超卖,6日cci是否金叉10日cci
WILLR:6日willr的值,10日willr的值,6日willr是否超卖,10日willr是否超卖,6日willr是否金叉10日willr
RSI:6日rsi的值,10日rsi的值,6日rsi是否超卖,10日rsi是否超卖,6日rsi是否金叉10日rsi

3.算法选择

关于算法,也选择了很常见的算法(logistic,svm,randomforest),用着三种算法根据历史数据进行学习,然后进行预测,如果两种算法预测上涨,则判定为上涨,如果两种算法预测下跌,则判定为下跌。

algorithms = [
    [RandomForestClassifier(random_state=1, n_estimators=100, min_samples_split=4, min_samples_leaf=2),
     ["boll_upper", "boll_middle", "boll_lower", "k", "d", "j", "macd", "macd_signal", "macd_hist",
      "cci6", "rsi6", "willr6", "cci10", "rsi10", "willr10",
      'boll_cross', 'kd_cross', 'kj_cross', 'cci_cross', 'rsi_cross', 'willr_cross',
      'over_cci6', 'over_cci10', 'over_willr6', 'over_willr10', 'over_j']],
    [LogisticRegression(random_state=1, solver='liblinear'),
     ["boll_upper", "boll_middle", "boll_lower", "k", "d", "j", "macd", "macd_signal", "macd_hist",
      "cci6", "rsi6", "willr6", "cci10", "rsi10", "willr10",
      'boll_cross', 'kd_cross', 'kj_cross', 'cci_cross', 'rsi_cross', 'willr_cross',
      'over_cci6', 'over_cci10', 'over_willr6', 'over_willr10', 'over_j']],
    [SVC(C=1.0, kernel='linear', probability=True),
     ["boll_upper", "boll_middle", "boll_lower", "k", "d", "j", "macd", "macd_signal", "macd_hist",
      "cci6", "rsi6", "willr6", "cci10", "rsi10", "willr10",
      'boll_cross', 'kd_cross', 'kj_cross', 'cci_cross', 'rsi_cross', 'willr_cross',
      'over_cci6', 'over_cci10', 'over_willr6', 'over_willr10', 'over_j']]
]

4.python重要方法

由于机器学习比较耗时,所以使用了并发(pool.map),下面代码是一个关于pool.map的demo

from multiprocessing.pool import Pool


def task(i):
    return [i for i in range(i, i + 5)]


def pool_method():
    result_list = list()
    pool = Pool()
    temp_result_list = pool.map(task, [1, 2, 3, 4, 5])
    result_list.extend(temp_result_list)
    pool.close()
    pool.join()
    print(result_list)

pool_method()

5.完整代码

由于比赛使用的是notebook,所以只能给个代码的地址了。
代码地址

6.输入数据格式

从tushare中就可以下载。下面是下载数据的代码,下载之前注册一个key。

import tushare as ts


def download_and_save_sw_job(pro):
    """
    下载并保存基本面数据
    :param file_name:
    :param pro:
    :return:
    """
    # 获取股票基本信息
    logger.info('下载申万数据')
    for str_datetime in __get_date_range(BASIC_INFO_START_DATE, __get_today_str()):
        logger.info(str_datetime)
        basic_data_dataframe = down_load_daily_sw_data(pro, str_datetime)
        if basic_data_dataframe is None:
            logger.info('获取{0}的sw信息失败(pro.sw_daily)'.format(str_datetime))
            continue
        basic_data_dataframe = basic_data_dataframe[basic_data_dataframe['ts_code'].isin(['801010.SI',
                                                                                          '801020.SI',
                                                                                          '801030.SI',
                                                                                          '801040.SI',
                                                                                          '801050.SI',
                                                                                          '801080.SI',
                                                                                          '801110.SI',
                                                                                          '801120.SI',
                                                                                          '801130.SI',
                                                                                          '801140.SI',
                                                                                          '801150.SI',
                                                                                          '801160.SI',
                                                                                          '801170.SI',
                                                                                          '801180.SI',
                                                                                          '801200.SI',
                                                                                          '801210.SI',
                                                                                          '801230.SI',
                                                                                          '801250.SI',
                                                                                          '801260.SI',
                                                                                          '801270.SI',
                                                                                          '801280.SI',
                                                                                          '801300.SI',
                                                                                          '801710.SI',
                                                                                          '801720.SI',
                                                                                          '801730.SI',
                                                                                          '801740.SI',
                                                                                          '801750.SI',
                                                                                          '801760.SI',
                                                                                          '801770.SI',
                                                                                          '801780.SI',
                                                                                          '801790.SI',
                                                                                          '801880.SI',
                                                                                          '801890.SI',
                                                                                          '802600.SI'])]
        if os.path.exists(os.path.join(SW_DATA_STORE_FOLDER, 'sw_{0}.csv'.format(str_datetime))):
            os.remove(os.path.join(SW_DATA_STORE_FOLDER, 'sw_{0}.csv'.format(str_datetime)))
        basic_data_dataframe.to_csv(os.path.join(SW_DATA_STORE_FOLDER, 'sw_{0}.csv'.format(str_datetime)))
    combine_dataframe()


def combine_dataframe():
    if os.path.exists(os.path.join(SW_DATA_STORE_FOLDER, 'TRAINSET_STOCK.csv')):
        os.remove(os.path.join(SW_DATA_STORE_FOLDER, 'TRAINSET_STOCK.csv'))
    file_list = os.listdir(SW_DATA_STORE_FOLDER)
    base_data_frame = pd.read_csv(os.path.join(SW_DATA_STORE_FOLDER, file_list[0]))
    for i in range(1, len(file_list)):
        path = os.path.join(SW_DATA_STORE_FOLDER, file_list[i])
        temp_data_frame = pd.read_csv(path)
        base_data_frame = base_data_frame.append(temp_data_frame, ignore_index=True)
    base_data_frame.to_csv(os.path.join(SW_DATA_STORE_FOLDER, 'TRAINSET_STOCK.csv'))


if __name__ == '__main__':
	ts.set_token('***')
	pro = ts.pro_api()
	pro = ts.pro_api()
	download_and_save_sw_job(pro)  # 下载申万数据
    combine_dataframe()
,Unnamed: 0,ts_code,trade_date,name,open,low,high,close,change,pct_change,vol,amount,pe,pb
0,4,801010.SI,20170405,农林牧渔,3228.07,3227.19,3271.9,3271.9,49.6,1.54,83229.0,997867.0,29.15,3.98
1,13,801020.SI,20170405,采掘,3499.94,3499.94,3549.68,3549.68,64.05,1.84,130225.0,1082993.0,57.27,1.8
2,18,801030.SI,20170405,化工,3339.79,3339.17,3394.7,3394.7,65.47,1.97,327918.0,4424141.0,42.27,2.95
3,25,801040.SI,20170405,钢铁,2792.78,2792.76,2822.85,2822.5,80.8,2.95,126113.0,663732.0,54.97,1.71
4,27,801050.SI,20170405,有色金属,3774.51,3774.51,3872.85,3872.85,106.59,2.83,285608.0,3330624.0,88.06,3.21
5,37,801080.SI,20170405,电子,3224.77,3224.62,3286.47,3286.47,70.31,2.19,227326.0,3245112.0,64.27,3.77
6,48,801110.SI,20170405,家用电器,5743.21,5718.12,5780.01,5764.28,17.12,0.3,64192.0,1136937.0,19.9,3.36
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章