廢話不多說,假設你的數據大概是這樣的:
In [25]: df[:1]
Out[25]:
timestamp open high low close \
timestamp
2019-01-01 08:00:00 2019-01-01 08:00:00 3703.5 3703.5 3703.0 3703.0
volume e_open e_high e_low e_close e_volume \
timestamp
2019-01-01 08:00:00 524924.0 133.35 133.35 133.3 133.35 77507.0
rate change change_str cum_change cum_change_str
timestamp
2019-01-01 08:00:00 27.769029 -0.0 -0.00% -0.0 -0.00%
In [26]: df.columns
Out[26]:
Index(['timestamp', 'open', 'high', 'low', 'close', 'volume', 'e_open',
'e_high', 'e_low', 'e_close', 'e_volume', 'rate', 'change',
'change_str', 'cum_change', 'cum_change_str'],
dtype='object')
放入elasticsearch之前,需要創建一下索引,主要是爲了設置timestamp
的格式:
PUT btc_eth_data
{
"mappings": {
"doc":{
"properties":{
"timestamp" : {
"type" : "date",
"format": "yyyy-MM-dd HH:mm:ss"
}
}
}
}
}
直接上代碼:
import pandas as pd
from elasticsearch import Elasticsearch
import json
es = Elasticsearch()
#假設你有一堆數據,通過df加載,並且進行可必要的處理
df = pd.read_csv("/Users/lex/Code/bitmex_arbitrage/data.csv")
#
# 數據處理
# 然後準備輸入到elasticsearch當中
df_as_json = df.to_json(orient='records', lines=True)
bulk_data = []
for json_document in df_as_json.split('\n'):
bulk_data.append({"index":{
'_index': "btc_eth_data",
'_type': "doc",
}})
bulk_data.append(json.loads(json_document))
# 一次bulk request包含1000條數據
if len(bulk_data) > 1000:
es.bulk(bulk_data)
bulk_data = []
es.bulk(bulk_data)