問題:
I load some machine learning data from a CSV file.我從 CSV 文件加載了一些機器學習數據。 The first 2 columns are observations and the remaining columns are features.前兩列是觀測值,其餘列是特徵。
Currently, I do the following:目前,我執行以下操作:
data = pandas.read_csv('mydata.csv')
which gives something like:這給出了類似的東西:
data = pandas.DataFrame(np.random.rand(10,5), columns = list('abcde'))
I'd like to slice this dataframe in two dataframes: one containing the columns a
and b
and one containing the columns c
, d
and e
.我想將此數據幀分成兩個數據幀:一個包含列a
和b
,另一個包含列c
、 d
和e
。
It is not possible to write something like不可能寫出類似的東西
observations = data[:'c']
features = data['c':]
I'm not sure what the best method is.我不確定最好的方法是什麼。 Do I need a pd.Panel
?我需要一個pd.Panel
嗎?
By the way, I find dataframe indexing pretty inconsistent: data['a']
is permitted, but data[0]
is not.順便說一下,我發現數據幀索引非常不一致:允許使用data['a']
,但不允許使用data[0]
。 On the other side, data['a':]
is not permitted but data[0:]
is.另一方面,不允許使用data['a':]
但允許使用data[0:]
。 Is there a practical reason for this?這有實際的原因嗎? This is really confusing if columns are indexed by Int, given that data[0] != data[0:1]
考慮到data[0] != data[0:1]
,如果列由 Int 索引,這真的很令人困惑
解決方案:
參考一: https://en.stackoom.com/question/ikgT參考二: https://stackoom.com/question/ikgT