熊猫操作期间的进度指示器 - Progress indicator during pandas operations

问题:

I regularly perform pandas operations on data frames in excess of 15 million or so rows and I'd love to have access to a progress indicator for particular operations.我经常对超过 1500 万行左右的数据帧执行 Pandas 操作,我很想访问特定操作的进度指示器。

Does a text based progress indicator for pandas split-apply-combine operations exist?是否存在用于 Pandas split-apply-combine 操作的基于文本的进度指示器?

For example, in something like:例如,在类似:

df_users.groupby(['userID', 'requestDate']).apply(feature_rollup)

where feature_rollup is a somewhat involved function that take many DF columns and creates new user columns through various methods.其中feature_rollup是一个有点feature_rollup函数,它采用许多 DF 列并通过各种方法创建新的用户列。 These operations can take a while for large data frames so I'd like to know if it is possible to have text based output in an iPython notebook that updates me on the progress.对于大型数据帧,这些操作可能需要一段时间,所以我想知道是否有可能在 iPython 笔记本中具有基于文本的输出来更新我的进度。

So far, I've tried canonical loop progress indicators for Python but they don't interact with pandas in any meaningful way.到目前为止,我已经尝试了 Python 的规范循环进度指示器,但它们并没有以任何有意义的方式与 Pandas 交互。

I'm hoping there's something I've overlooked in the pandas library/documentation that allows one to know the progress of a split-apply-combine.我希望在 Pandas 库/文档中有一些我忽略的东西,可以让人们知道拆分应用组合的进度。 A simple implementation would maybe look at the total number of data frame subsets upon which the apply function is working and report progress as the completed fraction of those subsets.一个简单的实现可能会查看apply函数正在运行的数据帧子集的总数,并将进度报告为这些子集的完成部分。

Is this perhaps something that needs to be added to the library?这可能是需要添加到库中的东西吗?


解决方案:

参考一: https://en.stackoom.com/question/1G3Yk
参考二: https://stackoom.com/question/1G3Yk
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章