Split a large dataframe into a list of data frames based on common value in column

問題:

I have a data frame with 10 columns, collecting actions of "users", where one of the columns contains an ID (not unique, identifying user)(column 10).我有一個包含 10 列的數據框,收集“用戶”的操作,其中一列包含一個 ID(不是唯一的,識別用戶)(第 10 列)。 the length of the data frame is about 750000 rows.數據幀的長度約爲 750000 行。 I am trying to extract individual data frames (so getting a list or vector of data frames) split by the column containing the "user" identifier, to isolate the actions of a single actor.我試圖提取由包含“用戶”標識符的列拆分的單個數據幀(因此獲取數據幀的列表或向量),以隔離單個參與者的動作。

ID | Data1 | Data2 | ... | UserID
1  | aaa   | bbb   | ... | u_001
2  | aab   | bb2   | ... | u_001
3  | aac   | bb3   | ... | u_001
4  | aad   | bb4   | ... | u_002

resulting into導致

list(
ID | Data1 | Data2 | ... | UserID
1  | aaa   | bbb   | ... | u_001
2  | aab   | bb2   | ... | u_001
3  | aac   | bb3   | ... | u_001
,
4  | aad   | bb4   | ... | u_002
...)

The following works very well for me on a small sample (1000 rows):在小樣本(1000 行)上,以下對我來說效果很好:

paths = by(smallsampleMat, smallsampleMat[,"userID"], function(x) x)

and then accessing the element I want by paths[1] for instance.然後例如通過路徑 [1] 訪問我想要的元素。

When applying on the original large data frame or even a matrix representation, this chokes my machine ( 4GB RAM, MacOSX 10.6, R 2.15) and never completes (I know that a newer R version exists, but I believe this is not the main problem).在應用原始大數據幀甚至矩陣表示時,這會阻塞我的機器(4GB RAM、MacOSX 10.6、R 2.15)並且永遠不會完成(我知道存在較新的 R 版本,但我相信這不是主要問題)。

It seems that split is more performant and after a long time completes, but I do not know ( inferior R knowledge) how to piece the resulting list of vectors into a vector of matrices.似乎 split 性能更高,並且在很長時間後完成,但我不知道(較差的 R 知識)如何將結果向量列表拼湊成矩陣向量。

path = split(smallsampleMat, smallsampleMat[,10]) 

I have considered also using big.matrix etc, but without much success that would speed up the process.我也考慮過使用big.matrix等,但沒有太大的成功可以加快進程。


解決方案:

參考: https://stackoom.com/en/question/1FjjP
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章