Split a large dataframe into a list of data frames based on common value in column

问题:

I have a data frame with 10 columns, collecting actions of "users", where one of the columns contains an ID (not unique, identifying user)(column 10).我有一个包含 10 列的数据框,收集“用户”的操作,其中一列包含一个 ID(不是唯一的,识别用户)(第 10 列)。 the length of the data frame is about 750000 rows.数据帧的长度约为 750000 行。 I am trying to extract individual data frames (so getting a list or vector of data frames) split by the column containing the "user" identifier, to isolate the actions of a single actor.我试图提取由包含“用户”标识符的列拆分的单个数据帧(因此获取数据帧的列表或向量),以隔离单个参与者的动作。

ID | Data1 | Data2 | ... | UserID
1  | aaa   | bbb   | ... | u_001
2  | aab   | bb2   | ... | u_001
3  | aac   | bb3   | ... | u_001
4  | aad   | bb4   | ... | u_002

resulting into导致

list(
ID | Data1 | Data2 | ... | UserID
1  | aaa   | bbb   | ... | u_001
2  | aab   | bb2   | ... | u_001
3  | aac   | bb3   | ... | u_001
,
4  | aad   | bb4   | ... | u_002
...)

The following works very well for me on a small sample (1000 rows):在小样本(1000 行)上,以下对我来说效果很好:

paths = by(smallsampleMat, smallsampleMat[,"userID"], function(x) x)

and then accessing the element I want by paths[1] for instance.然后例如通过路径 [1] 访问我想要的元素。

When applying on the original large data frame or even a matrix representation, this chokes my machine ( 4GB RAM, MacOSX 10.6, R 2.15) and never completes (I know that a newer R version exists, but I believe this is not the main problem).在应用原始大数据帧甚至矩阵表示时,这会阻塞我的机器(4GB RAM、MacOSX 10.6、R 2.15)并且永远不会完成(我知道存在较新的 R 版本,但我相信这不是主要问题)。

It seems that split is more performant and after a long time completes, but I do not know ( inferior R knowledge) how to piece the resulting list of vectors into a vector of matrices.似乎 split 性能更高,并且在很长时间后完成,但我不知道(较差的 R 知识)如何将结果向量列表拼凑成矩阵向量。

path = split(smallsampleMat, smallsampleMat[,10]) 

I have considered also using big.matrix etc, but without much success that would speed up the process.我也考虑过使用big.matrix等,但没有太大的成功可以加快进程。


解决方案:

参考: https://stackoom.com/en/question/1FjjP
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章