數據集解析
glove.5B.50d.json
word to vector轉換表
訓練集test.json與驗證集val.json
- 驗證集分爲兩部分(***比例???***),以實現測試:sample a pair of input and standard output file from the validation set.
- 格式解析
file_name: Json file storing the data in the following format
{
“P155”: # relation id
[
{
“token”: [“Hot”, “Dance”, “Club”, …], # sentence
“h”: [“song for a future generation”, “Q7561099”, [[16, 17, …]]], # head entity [word, id, location]
“t”: [“whammy kiss”, “Q7990594”, [[11, 12]]], # tail entity [word, id, location]
},
…
],
“P177”:
[
…
]
…
}
word_vec_file_name: Json file storing word vectors in the following format
[
{‘word’: ‘the’, ‘vec’: [0.418, 0.24968, …]},
{‘word’: ‘,’, ‘vec’: [0.013441, 0.23682, …]},
…
]
max_length: The length that all the sentences need to be extend to.
case_sensitive: Whether the data processing is case-sensitive(是否區分大小寫), default as False.
reprocess: Do the pre-processing whether there exist pre-processed files, default as False.
cuda: Use cuda or not, default as True.