如何從Pandas遷移到Spark?這8個問答解決你所有疑問

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"size","attrs":{"size":10}},{"type":"strong"}],"text":"本文最初發佈於Medium網站,經原作者授權由InfoQ中文站翻譯並分享。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"size","attrs":{"size":10}},{"type":"strong"}],"text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/towardsdatascience.com\/moving-from-pandas-to-spark-7b0b7d956adb?fileGuid=HnLkF26yDSUQEi0O","title":"","type":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"https:\/\/towardsdatascience.com\/moving-from-pandas-to-spark-7b0b7d956adb"}],"marks":[{"type":"italic"},{"type":"size","attrs":{"size":10}},{"type":"strong"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當你的數據集變得越來越大,遷移到Spark可以提高速度並節約時間。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"多數數據科學工作流程都是從Pandas開始的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pandas是一個很棒的庫,你可以用它做各種變換,可以處理各種類型的數據,例如CSV或JSON等。我喜歡Pandas — 我還爲它做了一個名爲“爲什麼Pandas是新時代的Excel”的"},{"type":"link","attrs":{"href":"https:\/\/podcasts.apple.com\/us\/podcast\/17-why-pandas-is-the-new-excel\/id1453716761?i=1000454831790&fileGuid=HnLkF26yDSUQEi0O","title":"","type":null},"content":[{"type":"text","text":"播客"}]},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我仍然認爲Pandas是數據科學家武器庫中的一個很棒的庫。但總有一天你需要處理非常大的數據集,這時候Pandas就要耗盡內存了。而這種情況正是Spark的用武之地。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/07\/ce\/076cb95057d0a6985f137943d0008dce.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"Spark非常適合大型數據集❤️"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這篇博文會以問答形式涵蓋你可能會遇到的一些問題,和我一開始遇到的一些疑問。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"問題一:Spark是什麼?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Spark是一個處理海量數據集的框架。它能以分佈式方式處理大數據文件。它使用幾個worker來應對和處理你的大型數據集的各個塊,所有worker都由一個驅動節點編排。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個框架的分佈式特性意味着它可以擴展到TB級數據。你不再受單機器的內存限制。Spark生態系統現在發展得相當成熟,你無需擔心worker編排事宜,它還是開箱即用的,且速度飛快。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/52\/87\/52ee476525d289041b75ba5043cefb87.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"Spark生態系統["},{"type":"link","attrs":{"href":"https:\/\/spark.apache.org\/?fileGuid=HnLkF26yDSUQEi0O","title":"","type":null},"content":[{"type":"text","text":"參考"}]},{"type":"text","text":"]"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"問題二:我什麼時候應該離開Pandas並認真考慮改用Spark?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這取決於你機器的內存大小。我覺得大於10GB的數據集對於Pandas來說就已經很大了,而這時候Spark會是很好的選擇。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"假設你的數據集中有10列,每個單元格有100個字符,也就是大約有100個字節,並且大多數字符是ASCII,可以編碼成1個字節 — 那麼規模到了大約10M行,你就應該想到Spark了。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"問題三:Spark在所有方面都比Pandas做得更好嗎?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"並非如此!對於初學者來說,Pandas絕對更容易學習。Spark學起來更難,但有了最新的API,你可以使用數據幀來處理大數據,它們和Pandas數據幀用起來一樣簡單。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此外,直到最近,Spark對可視化的支持都不怎麼樣。你只能對數據子集進行可視化。最近情況發生了變化,因爲Databricks宣佈他們將對Spark中的可視化提供原生支持(我還在等着看他們的成果)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但在這一支持成熟之前,Spark至少不會在可視化領域完全取代Pandas。你完全可以通過df.toPandas()將Spark數據幀變換爲Pandas,然後運行可視化或Pandas代碼。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"問題四:Spark設置起來很困呢。我應該怎麼辦?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Spark可以通過PySpark或Scala(或R或​​SQL)用Python交互。我寫了一篇在本地或在自定義服務器上開始使用PySpark的"},{"type":"link","attrs":{"href":"https:\/\/towardsdatascience.com\/how-to-get-started-with-pyspark-1adc142456ec?fileGuid=HnLkF26yDSUQEi0O","title":"","type":null},"content":[{"type":"text","text":"博文"}]},{"type":"text","text":"— 評論區都在說上手難度有多大。我覺得你可以直接使用託管雲解決方案來嘗試運行Spark。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我推薦兩種入門Spark的方法:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Databricks"},{"type":"text","text":"——它是一種完全託管的服務,可爲你管理AWS\/Azure\/GCP中的Spark集羣。他們有筆記本可用,與Jupyter筆記本很像。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Amazon"},{"type":"text","text":"****"},{"type":"text","marks":[{"type":"strong"}],"text":"EMR和Zeppelin****筆記本"},{"type":"text","text":"——它是AWS的半托管服務。你需要託管一個SparkEMR端點,然後運行​​Zeppelin筆記本與其交互。其他雲供應商也有類似的服務,這裏就不贅述了。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/04\/4b\/049c8eaca60e7217b3eea3302e03e84b.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"Databricks是一種Spark集羣的流行託管方式"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"問題五:Databricks和EMR哪個更好?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我花了幾個小時試圖瞭解每種方法的優缺點後,總結出了一些要點:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"EMR完全由亞馬遜管理,你無需離開AWS生態系統。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"如果你有DevOps專業知識或有DevOps人員幫助你,EMR可能是一個更便宜的選擇——你需要知道如何在完成後啓動和關閉實例。話雖如此,EMR可能不夠穩定,你可能需要花幾個小時進行調試。DatabricksSpark要穩定許多。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"使用Databricks很容易安排作業——你可以非常輕鬆地安排筆記本在一天或一週的特定時間裏運行。它們還爲GangliaUI中的指標提供了一個接口。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"對於Spark作業而言,Databricks作業的成本可能比EMR高30-40%。但考慮到靈活性和穩定性以及強大的客戶支持,我認爲這是值得的。在Spark中以交互方式運行筆記本時,Databricks收取6到7倍的費用——所以請注意這一點。鑑於在30\/60\/120分鐘的活動之後你可以關閉實例從而節省成本,我還是覺得它們總體上可以更便宜。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"考慮以上幾點,如果你開始的是第一個Spark項目,我會推薦你選擇Databricks;但如果你有充足的DevOps專業知識,你可以嘗試EMR或在你自己的機器上運行Spark。如果你不介意公開分享你的工作,你可以免費試用Databricks社區版或使用他們的企業版試用14天。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"問題六:PySpark與Pandas相比有哪些異同?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我覺得這個主題可以另起一篇文章了。作爲Spark貢獻者的Andrew Ray的這次"},{"type":"link","attrs":{"href":"https:\/\/www.youtube.com\/watch?v=XrpSRCwISdk&fileGuid=HnLkF26yDSUQEi0O","title":"","type":null},"content":[{"type":"text","text":"演講"}]},{"type":"text","text":"應該可以回答你的一些問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"它們的主要相似之處有:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"Spark數據幀與Pandas數據幀非常像。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"PySpark的groupby、aggregations、selection和其他變換都與Pandas非常像。與Pandas相比,PySpark稍微難一些,並且有一點學習曲線——但用起來的感覺也差不多。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"它們的主要區別是:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"Spark允許你查詢數據幀——我覺得這真的很棒。有時,在SQL中編寫某些邏輯比在Pandas\/PySpark中記住確切的API更容易,並且你可以交替使用兩種辦法。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Spark數據幀是不可變的"},{"type":"text","text":"。不允許切片、覆蓋數據等。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Spark是延遲求值的"},{"type":"text","text":"。它構建了所有變換的一個圖,然後在你實際提供諸如collect、show或take之類的動作時對它們延遲求值。變換可以是寬的(查看所有節點的整個數據,也就是orderBy或groupBy)或窄的(查看每個節點中的單個數據,也就是contains或filter)。與窄變換相比,執行多個寬變換可能會更慢。與Pandas相比,你需要更加留心你正在使用的寬變換!"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/ea\/13\/ea57d5fbc1873c983d5097e771d1f113.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"Spark中的窄與寬變換。寬變換速度較慢。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"問題七:Spark還有其他優勢嗎?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Spark不僅提供數據幀(這是對RDD的更高級別的抽象),而且還提供了用於流數據和通過MLLib進行分佈式機器學習的出色API。因此,如果你想對流數據進行變換或想用大型數據集進行機器學習,Spark會很好用的。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"問題八:有沒有使用Spark的數據管道架構的示例?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有的,下面是一個ETL管道,其中原始數據從數據湖(S3)處理並在Spark中變換,加載回S3,然後加載到數據倉庫(如Snowflake或Redshift)中,然後爲Tableau或Looker等BI工具提供基礎。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/68\/y9\/689595ef251cacdcf70f606405a47yy9.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"用於BI工具大數據處理的ETL管道示例"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/e6\/b6\/e636ee8799889107bb47226631c75fb6.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"在Amazon SageMaker中執行機器學習的管道示例"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"你還可以先從倉庫內的不同來源收集數據,然後使用Spark變換這些大型數據集,將它們加載到Parquet文件中的S3中,然後從SageMaker讀取它們(假如你更喜歡使用SageMaker而不是Spark的MLLib)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SageMaker的另一個優勢是它讓你可以輕鬆部署並通過Lambda函數觸發模型,而Lambda函數又通過API Gateway中的REST端點連接到外部世界。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我寫了一篇關於這個架構的"},{"type":"link","attrs":{"href":"https:\/\/towardsdatascience.com\/what-makes-aws-sagemaker-great-for-machine-learning-c8a42c208aa3#:~:text=Well%2C%20SageMaker%20lets%20you%20decide,possible%20on%20a%20local%20setup.?fileGuid=HnLkF26yDSUQEi0O","title":"","type":null},"content":[{"type":"text","text":"博文"}]},{"type":"text","text":"。此外,Jules Damji所著的《"},{"type":"link","attrs":{"href":"https:\/\/www.amazon.com\/dp\/1492050040\/?tag=omnilence-20&fileGuid=HnLkF26yDSUQEi0O","title":"","type":null},"content":[{"type":"text","text":"Learning Spark"}]},{"type":"text","text":"》一書非常適合大家瞭解Spark。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文到此結束。我們介紹了一些Spark和Pandas的異同點、開始使用Spark的最佳方法以及一些利用Spark的常見架構。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如有任何問題或意見,請在"},{"type":"link","attrs":{"href":"https:\/\/www.linkedin.com\/in\/sanketgupta107\/?fileGuid=HnLkF26yDSUQEi0O","title":"","type":null},"content":[{"type":"text","text":"領英"}]},{"type":"text","text":"("},{"type":"link","attrs":{"href":"https:\/\/www.linkedin.com\/in\/sanketgupta107\/?fileGuid=HnLkF26yDSUQEi0O","title":"","type":null},"content":[{"type":"text","text":"https:\/\/www.linkedin.com\/in\/sanketgupta107\/"}]},{"type":"text","text":")上聯繫我!"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"資源"},{"type":"text","text":":"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"JulesDamji關於Spark幕後工作原理的"},{"type":"link","attrs":{"href":"https:\/\/databricks.com\/session\/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets?fileGuid=HnLkF26yDSUQEi0O","title":"","type":null},"content":[{"type":"text","text":"演講"}]},{"type":"text","text":"真的很棒。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"JulesDamji的《"},{"type":"link","attrs":{"href":"https:\/\/www.amazon.com\/dp\/1492050040\/?tag=omnilence-20&fileGuid=HnLkF26yDSUQEi0O","title":"","type":null},"content":[{"type":"text","text":"Learning"}]},{"type":"link","attrs":{"href":"https:\/\/www.amazon.com\/dp\/1492050040\/?tag=omnilence-20&fileGuid=HnLkF26yDSUQEi0O","title":"","type":null},"content":[{"type":"text","text":" "}]},{"type":"link","attrs":{"href":"https:\/\/www.amazon.com\/dp\/1492050040\/?tag=omnilence-20&fileGuid=HnLkF26yDSUQEi0O","title":"","type":null},"content":[{"type":"text","text":"Spark"}]},{"type":"text","text":"》一書。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"AndrewRay的"},{"type":"link","attrs":{"href":"https:\/\/www.amazon.com\/dp\/1492050040\/?tag=omnilence-20&fileGuid=HnLkF26yDSUQEi0O","title":"","type":null},"content":[{"type":"text","text":"演講"}]},{"type":"text","text":"對比了Pandas與PySpark的語法。"}]}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章