Python太慢了嗎？

原創

2021-01-14 08:53

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雖然Python比許多編譯語言都慢，但它易於使用，而且功能多樣。對於許多人來說，語言的實用性要勝過速度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我是一名Python工程師，因此你可能會認爲我帶有偏見。但是我想澄清一些對Python的批評，並反思一下在使用Python進行數據工程、數據科學及分析等日常工作中，對速度的擔憂是否有必要。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Python太慢了嗎?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我認爲，這類問題應該基於特定的上下文或用例來說。與C之類的編譯語言相比，Python處理數值的速度慢嗎？是的，它是慢的。這一事實人們很多年前就已經知道了，這就是爲什麼會存在在速度方面起着至關重要作用的Python庫了，比如numpy，它的底層使用的是C。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是對於所有用例來說，Python難道都比其他（更難學習和使用的）語言慢得多嗎？如果你查看那些爲解決特定問題而優化了的Python庫的性能基準測試，就會發現與編譯語言相比，它們的表現是相當不錯的。例如，看看FastAPI的性能基準測試——顯然，作爲編譯語言的Go比Python快得多。不過，FastAPI在構建REST API方面還是勝過了一些Go庫："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/b5\/b500337ee7556598c7b0ff065a6dd635.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"Web框架基準測試——圖片由作者提供"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"旁註：上面的列表中不包含具有更高性能的C++和Java Web框架。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同樣地，在數據密集型神經成像管道中，對Dask（用Python編寫）和Spark（用Scala編寫）進行對比時，作者得出瞭如下結論："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"總體而言，我們的結果表明，兩個引擎之間的性能沒有實質性的差異。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們應該捫心自問的問題是我們真正需要的是怎樣的速度。如果你每天只需觸發一次ETL作業，則可以不必關心它是需要20秒還是200秒。然後，你可能更希望代碼易於理解、打包和維護，特別是考慮到，與昂貴的引擎耗時相比，計算資源變得越來越便宜了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"代碼速度 vs 實用性"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從實用的角度來看，在爲日常工作而選擇編程語言時，我們需要回答幾個不同的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"用這種語言你能可靠地解決多個業務問題嗎?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果你只關心速度，那就不要用Python了。對於各種用例，都有更快的替代方案。Python的最主要的優點在於它的可讀性、易用性，以及可以用它解決各種問題。Python可以作爲一種“粘合劑”，將各種不同的系統、服務和用例連接在一起。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"你能找到足夠多的懂這門語言的員工嗎？"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於Python非常易於學習和使用，因此Python的用戶數一直在不斷增長。以前用Excel處理數值的業務用戶，現在可以非常快速地學會用Pandas編碼，從而學會在不依賴IT資源的情況下自給自足。同時，這也卸下了IT和分析部門的負擔。同時還縮短了價值實現的時間。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如今，找到懂Python並能用這種語言維護Spark數據處理應用程序的數據工程師，比找到做同樣事情的Java或Scala工程師要容易得多。許多組織僅僅因爲找到會“講”這種語言的員工的機會更高些，而逐漸在許多用例上都轉向使用Python來處理了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"相比之下，我知道有些公司迫切需要Java或C#開發人員來維護現有的應用程序，但是這些語言很難（需要數年的時間才能掌握），而且對於新手程序員來說似乎沒有吸引力，因爲他們可以利用更簡單的語言（比如Go或Python）來獲得更多的收入。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"不同領域專家之間的協同效應"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果你的公司使用Python，那麼業務用戶、數據分析師、數據科學家、數據工程師、後端和Web開發人員、DevOps工程師甚至系統管理員都很有可能會使用相同的語言。這可以在項目中產生協同效應，使來自不同領域的人們可以一起工作，並利用相同的工具。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"數據處理中真正的瓶頸是什麼？"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根據我自己的工作，我通常遇到的瓶頸其實不是語言本身，而是外部資源。更具體地，我們來看幾個例子。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"寫入到關係型數據庫"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當以ETL的方式進行數據處理時，我們最終需要將這些數據加載到某個集中的地方。雖然我們可以利用Python中的多線程來更快地（通過使用更多的線程）將數據寫入到某些關係型數據庫中，但是並行寫入數的增加可能會使該數據庫的CPU容量達到最大值。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實際上，當我使用多線程來加快對AWS上RDS Aurora數據庫的寫入時，這種情況就發生過一次。隨後我注意到writer節點的CPU利用率上升得非常高，以至於我不得不通過使用更少的線程來故意降低代碼的速度，以確保不會破壞數據庫實例。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這意味着Python具有並行化和加速許多操作的機制，但是關係型數據庫（受CPU內核數量的限制）有其侷限性，僅通過使用更快的編程語言是不可能解決這個問題的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"調用外部API"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另一個語言本身不是瓶頸的例子是使用外部REST API（你可能希望從中提取數據以滿足數據分析的需求）。雖然我們可以利用並行來加快數據提取的速度，但這可能是徒勞的，因爲許多外部API限制了我們在特定時間段內可以發出的請求數。因此，你可能經常會發現自己需要故意降低腳本的運行速度，以確保不超過API的請求限制："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"time.sleep(10)\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"處理大數據"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根據我處理大量數據集的經驗，無論使用哪種語言，都無法將真正的“大數據”加載到筆記本電腦的內存中。對於此類用例，你可能需要利用分佈式處理框架，如Dask、Spark、Ray等。使用單個服務器實例或筆記本電腦時，可以處理的數據量是有限的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果你想把實際的數據處理工作轉移到一組計算節點上，甚至可能想利用GPU實例來進一步加快計算速度，那麼Python恰好擁有一個龐大的框架生態系統，可以簡化這項任務："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"你想利用GPU來加快數據科學的計算速度嗎？使用Pytorch、Tensorflow、Ray或Rapids（甚至是使用SQL ——BlazingSQL）"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"你想加快處理大數據的Python代碼的速度嗎？使用Spark（或Databricks）、Dask或Prefect（在底層抽象化了Dask）"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"你想加快數據分析的處理速度嗎？使用fast專用於內存的列式數據庫，僅通過使用SQL查詢即可確保高速處理。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果你需要對計算節點集羣上進行的數據處理進行編排和監控的話，有幾個Python編寫的工作流管理平臺可以使用，它們可以加快數據管道的開發和維護，比如Apache Airflow、Prefect或Dagster。如果你想了解更多相關知識，請查看我之前的文章。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"順便說一句，我可以想象一些抱怨Python的人並沒有充分利用它，或者可能沒有使用恰當的數據結構來解決手頭的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"總而言之，如果你需要快速處理大量的數據，可能需要更多的計算資源，而不是更快的編程語言，而且有些Python庫可以方便地將工作分發到數百個節點上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"結論"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在本文中，我們討論了Python是否是當前數據處理領域的真正瓶頸。雖然Python比許多編譯語言都慢，但它易於使用，而且功能多樣。我們注意到，對於許多人來說，語言的實用性要勝過速度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後，我們討論了，至少在數據工程中，語言本身可能不是瓶頸，而是外部系統的限制，以及無論選擇哪種編程語言，都無法在單個機器上處理的大量數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"參考："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[1] TechEmpower："},{"type":"link","attrs":{"href":"https:\/\/www.techempower.com\/benchmarks\/#section=data-r19&hw=ph&test=query&l=ziix33-1r&f=0-0-0-0-0-1ekg-0-0-4fti4g-0-0","title":"","type":null},"content":[{"type":"text","text":"Web框架基準測試"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[2]"},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/abs\/1907.13030","title":"","type":null},"content":[{"type":"text","text":"“數據密集型神經成像管道的Dask和Apache Spark的性能比較”"}]},{"type":"text","text":"——"},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/search\/cs?searchtype=author&query=Dugr%C3%A9%2C+M","title":"","type":null},"content":[{"type":"text","text":"Mathieu Dugré"}]},{"type":"text","text":","},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/search\/cs?searchtype=author&query=Hayot-Sasson%2C+V","title":"","type":null},"content":[{"type":"text","text":"Valérie Hayot-Sasson"}]},{"type":"text","text":","},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/search\/cs?searchtype=author&query=Glatard%2C+T","title":"","type":null},"content":[{"type":"text","text":"Tristan Glatard"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文鏈接："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/towardsdatascience.com\/is-python-really-a-bottleneck-786d063e2921?gi=3b1490fa23d3","title":"","type":null},"content":[{"type":"text","text":"https:\/\/towardsdatascience.com\/is-python-really-a-bottleneck-786d063e2921?gi=3b1490fa23d3"}]}]}]}

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

相關文章

從前端到全棧 -- 最全面向對象總結

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragr

程序员海军

2021-12-21 10:54:01

跨語言的多模態、多任務檢索模型MURAL解讀

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-12-21 10:54:01

鴻蒙智聯設備開發，這五大法寶你應該擁有｜HDC2021技術分論壇

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

HarmonyOS开发者社区

2021-12-20 11:23:56

【HZERO微服務平臺3】源碼分析之oauth服務token生成、校驗、獲取信息、傳遞

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"headin

2021-12-20 11:08:55

程序員如何建立第二大腦

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-12-20 10:43:54

「如何從零到一實現一個玩具瀏覽器🌏」

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

2021-12-18 13:28:55

實用機器學習筆記一：概述

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"前言：","attr

2021-12-17 17:58:58

聊聊 Kafka：Producer 源碼解析

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一、前言","att

老周聊架构

2021-12-17 17:58:58

閃馬智能：Serverless 如何賦能大前端？| GMTC 2021

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

闪马智能吴佳浩

2021-12-17 17:58:58

進程崩潰/應用卡死，故障頻頻怎麼辦？｜HDC2021技術分論壇

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

HarmonyOS开发者社区

2021-12-17 17:58:58

盤點分佈式軟總線數據傳輸技術中的黑科技｜HDC2021技術分論壇

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

HarmonyOS开发者社区

2021-12-16 17:03:57

【教程直播第4期】揭祕數據遷移之 OceanBase CDC & OMS 社區版能力

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

OceanBase 数据库

2021-12-16 15:03:59

Python的while循環

1.while循環的格式 while 條件: 條件滿足時，做的事情1 條件滿足時，做的事情2 條件滿足時，做的事情3 ...(省略)... demo

2023-10-10 11:37:31

python初識第二天

認識現實世界與虛擬世界的橋樑感受python帶來的魔力數據類型 Python裏，最常用的數據類型有三種——字符串(str)、整數(int)和浮點數(float) 字符串，字符串英文string，簡寫str 字符串的識別方式非常簡單—

2023-02-01 22:01:30

Python 的十大特性

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

Rupam Choudhary

2021-12-16 16:04:03

24小時熱門文章

最新文章

Python太慢了嗎？

最新評論文章