數據科學家常犯的15個編碼錯誤

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文作者介紹了數據科學家在編寫代碼時常犯的幾個錯誤,並給出了自己對問題的看法以及相應的解決方案。希望文中的觀點能給讀者帶來一些啓發。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"編寫應用於數據科學項目的Python代碼,並按照自己的期望運行起來,可能沒有什麼困難。但是,如果你想讓自己的代碼對其他人(包括未來的自己)有高可讀性,並且可重現及運行時維持高效率,可能就沒那麼容易了。我們可以通過減少開發中常見的不良做法來解決這個問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在我從事數據科學的職業生涯中,我逐漸意識到,通過應用軟件工程的最佳實踐,可以交付質量更高的項目。高質量的項目意味着極少的錯誤、可復現準確結果以及高效的代碼執行效率。本文不會事無鉅細地向你介紹這些最佳實踐。相反,我總結了幾點開發中最常見到的問題(也是我自己之前經常犯的錯誤),並有針對性地給出相應的解決方法及其相關學習資料。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 沒有配置獨立的開發環境"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從某一方面來看,這可能不是編碼問題,但我仍然堅持認爲獨立的運行環境是代碼健康運行的保證。我認爲要給每個項目配置獨立的專用環境,這樣才能保證代碼的可重現性。項目代碼未來可能會運行在你的電腦上,或者是你同事的電腦上,甚至有可能部署到生產環境中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果你不清楚什麼是依賴管理,那麼最好先了解和學習下 "},{"type":"link","attrs":{"href":"https:\/\/docs.conda.io\/projects\/conda\/en\/latest\/user-guide\/tasks\/manage-environments.html","title":null,"type":null},"content":[{"type":"text","text":"Anaconda Virtual Environment"}]},{"type":"text","text":" 以及 "},{"type":"link","attrs":{"href":"https:\/\/realpython.com\/pipenv-guide\/","title":null,"type":null},"content":[{"type":"text","text":"Pipenv"}]},{"type":"text","text":"。我個人最常使用Anaconda,你可以點擊"},{"type":"link","attrs":{"href":"https:\/\/towardsdatascience.com\/a-guide-to-conda-environments-bc6180fc533","title":null,"type":null},"content":[{"type":"text","text":"鏈接"}]},{"type":"text","text":"學習下入門教程。如果你想進階或者進行工程化實踐,那麼可以考慮使用Docker。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 過度使用Jupyter Notebooks"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Notebooks非常適合用於教學以及初期項目研究,使用它可以快速完成一些小的棘手項目。儘管如此,它仍然不能算是一個好的IDE。工欲善其事必先利其器,好的IDE是數據科學家真正的武器,優秀的工具可以極大地提高你的工作效率。有很多大神指出過Notebooks的一些缺點,Joel Grus 曾經發表過一次"},{"type":"link","attrs":{"href":"https:\/\/www.youtube.com\/watch?v=7jiPeIFXb6U","title":null,"type":null},"content":[{"type":"text","text":"演講"}]},{"type":"text","text":",內容非常搞笑幽默,這裏推薦給大家。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Notebooks非常適合項目前期的試驗研究,而且可以非常方便地向他人展示研究成果,這一點非常不錯。然而,當涉及到進行長週期、協作及可部署的項目時,它非常容易出錯。這個時候,你最好使用一個專業的IDE,比如VScode、Pycharm、Spyder 等。在項目週期不超過一天的情況下,我也會時不時地使用一下Notebooks,這可能是我想到的唯一使用它的場景了。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3. 項目代碼結構混亂"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我見過不少人將項目的所有代碼以及相關文件存儲在一個目錄裏,這是一個十分不專業的做法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如下圖所示,想象一下你要接手一個項目,你更喜歡哪種項目代碼結構。圖片中右面的項目代碼結構絕對會讓你和其他接盤俠瘋掉的,因爲這會讓你花費數倍的時間來研究項目代碼。毋庸置疑,左邊的代碼結構要比右邊合理許多。所以,我們應該怎麼構建項目結構呢?這裏推薦給大家一個工具—— "},{"type":"link","attrs":{"href":"https:\/\/drivendata.github.io\/cookiecutter-data-science\/","title":null,"type":null},"content":[{"type":"text","text":"Cookiecutter"}]},{"type":"text","text":",這是一個十分優秀的開源項目,它促進了數據科學項目代碼結構的標準化,你可以從中學習一下。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/bd\/0d\/bd3f998b32c762f7874718499e83a40d.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4. 項目代碼使用絕對路徑而不是相對路徑"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"你有在個人開源的項目中有看到過“請修復你的文件路徑”的評論嗎?這樣的評論往往暗示了糟糕的代碼設計。修復該問題一般包括兩個步驟:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與他人共享項目結構(參考本文第三條建議)"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"將你的IDE根目錄\/工作目錄設置爲項目根目錄,該目錄通常是項目中最外層目錄。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二點有時候不是那麼簡單,但是值得你花時間這麼去做,這樣別人就可以在不用修改代碼的情況下成功運行你的代碼。下面給出一個例子供大家參考。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"import pandas as pd\nimport numpy as np\nimport os\n#### BAD WAY ####\n# please change it to your file path\nexcel_path1 = \"C:\\\\Users\\\\gerold\\\\Desktop\\\\CEU\\trim1\\\\DataEng1\\\\Team_asgn\\\\CrimeOneYearofData_2006.xlsx\"\nexcel_path2 = \"C:\\\\Users\\\\gerold\\\\Desktop\\\\CEU\\trim1\\\\DataEng1\\\\Team_asgn\\\\CrimeOneYearofData_2007.xlsx\"\n# read in excel\nmydf = pd.read_excel(excel_path1)\nmyd2 = pd.read_excel(excel_path2)\n#### END BAD WAY ####\n#### GOOD WAY ####\n# first put your 2 excels into the data folder\n# set the working directory in your IDE to the root (Team_asgn)\nDATA_DIR = \"data\" # indicate magical constansts (maybe rather put it on the top of the script)\n# fix gruesome var namescrime06_filename = \"\nCrimeOneYearofData_2006.xlsx\"crime07_filename = \"\nCrimeOneYearofData_2007.xlsx\"\n# fix gruesome var names\ncrime06_df = pd.read_excel(os.path.join(DATA_DIR, crime06_filename))\ncrime07_df = pd.read_excel(os.path.join(DATA_DIR, crime07_filename))\n#### END GOOD WAY ####"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5. 使用“幻數”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"幻數是在代碼中沒有任何上下文的數字。代碼中頻繁大量地使用幻數,可能會遇到難以追蹤的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面的代碼示例中,我們在乘法計算時簡單地使用了一個未分配變量的數字,而且沒有任何上下文來解釋這個數字的含義。如果你以後不得不對其進行修改,就會面臨十分尷尬的局面,因爲你不知道該數字的具體含義。因此,對於此類常量,按照慣例在Python中使用大寫命名。當然你也可以堅持不使用大寫,但是將“常量”與“常規變量”區分開來,是一個不錯的編程習慣。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"# assign revenues in $ to marketing campaigns\ncamp1_revenue = 50000\ncamp2_revenue = 100000\n#### BAD WAY ####\n# calc whic performed better\ncamps_revenue_diff = (camp2_revenue * 0.65) - camp1_revenue\n#### END BAD WAY ####\n#### GOOD WAY ####\nCAMP2_NORMALIZER = 0.65 # we need to normalize because the campaign ran in peak season\n# calc whic performed better\ncamps_revenue_diff = (camp2_revenue * CAMP2_NORMALIZER) - camp1_revenue\n#### END GOOD WAY ####"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"6. 不處理告警信息"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"估計很多人都有這樣的習慣:對代碼運行過程中產生的告警信息置之不理。我們對代碼能夠正常運行並能夠輸出期望的結果已經非常滿意了,所以爲什麼要處理告警信息呢?確實,告警信息不是錯誤,但是這些告警信息可能會引起潛在的問題或者錯誤。儘管代碼能運行成功,但出現這些告警信息實際上並不符合我們的預期。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在做數據分析時,我遇到的最常見的告警信息是Pandas 的 SettingwithCopyWarning 和 DeprecationWarning。"},{"type":"link","attrs":{"href":"https:\/\/www.youtube.com\/watch?v=4R4WsDJ-KVc","title":null,"type":null},"content":[{"type":"text","text":"DataSchool"}]},{"type":"text","text":" 的教學視頻以簡潔的方式解釋瞭如何觸發SettingwithCopyWarning。DeprecationWarning 告警說明 Pandas 已棄用某些方法,未來你的項目代碼在使用更高版本時會有中斷的風險。當然,還有一些其他的告警類型。依照我的經驗,產生這些告警大部分是因爲使用了工具類非原本設計的調用方式。所以,瞭解函數的源代碼總是有幫助的,這樣就可以避免大多數的異常告警了。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"7. 不使用類型註解"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這也是我最近學到的一種做法,因爲我已經體會到了使用類型註解帶來的好處。類型註解(或類型提示)簡單來講就是爲變量指定數據類型。基本上,使用IDE自帶的代碼擴展提示就可以完成代碼變量的註解。使用代碼註解,可以讓你的代碼更易於自己和他人閱讀。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了證明這一點,我摘取了Daniel Starner在"},{"type":"link","attrs":{"href":"https:\/\/dev.to\/dstarner\/using-pythons-type-annotations-4cfe","title":null,"type":null},"content":[{"type":"text","text":"dev.to"}]},{"type":"text","text":"博客中的代碼片段來舉個例子。如下代碼所示,在沒有類型提示的情況下,mystery_combine() 使用整數或字符串作爲輸入並相應地返回整數或字符串作爲結果。對於開發人員來講,該方法的描述有點模棱兩可。如果使用了類型註解,就可以清晰的表達函數意圖,避免產生誤解,同時會給其他開發人員以及未來的自己帶來一些便利。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"# code taken from https:\/\/dev.to\/dstarner\/using-pythons-type-annotations-4cfe\n# Our original function\ndef mystery_combine(a, b, times):\nreturn (a + b) * times\nprint(mystery_combine(2, 3, 4))\n# 20\nprint(mystery_combine('Hello ', 'World! ', 4))\n# Hello World! Hello World! Hello World! Hello World!\n# show your intents explicitly by indicating types of your argument and returned value\ndef mystery_combine(a: str, b: str, times: int) -> str:\nreturn (a + b) * times"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此外,使用類型註解可以在無需運行代碼的情況下,靜態地檢查代碼是否存在錯誤。下圖的示例展示了沒有按函數類型註解指定對應參數,靜態檢查給出了相應的錯誤提示。靜態檢查是在運行項目之前進行代碼預檢查的一種十分有用的方法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/f2\/2f\/f2b715b3ce4f432604bbb29524457b2f.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"8. 不習慣使用列表推導表達式"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"列表推導表達式是Python非常強大的特性之一。使用列表推導表達式,可以讓for循環更加易於閱讀,更符合Python的習慣表達,而且執行效率會更高。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面的一段示例代碼嘗試讀取目錄中的CSV文件。在這種情況下你可能會說,不使用列表推導式也挺優雅呀,沒有什麼不妥。但是,如果目錄裏有其他格式的文件,比如JSON文件,此時,使用列表表達式的便捷性和可讀性會提升一個檔次,而且,代碼也更容易維護。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"import pandas as pd\nimport os\nDATA_PATH = \"data\"\nfilename_list = os.listdir(DATA_PATH)\n#### BAD WAY ####\n# read in bunch of csv-s from a dir\ncsv_list = []\nfor fileaname in filename_list:\ncsv_list.append(pd.read_csv(os.path.join(DATA_PATH, filename)))\n#### END BAD WAY ####\n#### GOOD WAY ####\ncsv_list = [pd.read_csv(os.path.join(DATA_PATH, filename)) for filename in filename_list]\n# what about if not only .csv-s are present? easy to tackle this with list comprehensions\ncsv_list = [\npd.read_csv(os.path.join(DATA_PATH, filename)) for filename in filename_list if filename.endswith(\".csv\")]\n#### END GOOD WAY ####"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"9. pandas代碼可讀性差"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"方法鏈調用是 Pandas 中的一個很棒的特性,但是如果你堅持在一行中表達所有內容,代碼的可讀性會變差。有一個技巧可以讓你對錶達式進行分解。如下的代碼所示,可以將整個表達式放入括號中,然後表達式的每個組成部分可以單獨使用一行,這樣處理後的代碼看起來就清爽多了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"# lets aggregate click and time spent to its mean in a Q\nvar_list = [\"clicks\", \"time_spent\"]\nvar_list_Q = [varname + \"_Q\" for varname in var_list]\n#### BAD WAY ####\ndf_Q = df.groupby(\"id\").rolling(window=3, min_periods=1, on=\"yearmonth\")[var_list].mean().reset_index().rename(columns=dict(zip(var_list, var_list_Q)))\n#### BAD WAY ####\n#### GOOD WAY ####\ndf_Q = (\n df\n .groupby(\"id\")\n .rolling(window=3, min_periods=1, on=\"yearmonth\")[var_list]\n .mean()\n .reset_index()\n .rename(columns=dict(zip(var_list, var_list_Q))))\n#### END GOOD WAY ####"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"10. 排斥使用Python自帶的date工具"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在Python中使用日期模塊確實不是特別友好,因爲它的語法比較奇怪,而且讓人難以理解並記憶。我經常看到很多人像處理數字一樣處理日期對象,這種做法實在不夠優雅。雖然很多時候這麼做能夠跑通代碼,但是這樣非常容易出錯,而且維護起來非常困難。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以下面的實例代碼爲例,它的功能是實現以 %Y%m 格式列出兩個日期之間的所有月份。如果你藉助datetime工具實現,代碼可讀性和可維護性得到了提高。實話講,即使是現在,我在處理日期問題時仍然依賴谷歌搜索,這很正常,習慣就好了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"import datetime\nfrom dateutil.relativedelta import relativedelta\n# task: get months between two dates in YM format\n#### BAD WAY ####\nstart_num = 201910\nend_num = 202012\nres_list = []\niter_num = start_num\nwhile iter_num < end_num:\nif abs(iter_num) % 100 > 12:\n iter_num += 88\n res_list.append(iter_num)\n iter_num += 1\nelse:\n res_list.append(iter_num)\n iter_num += 1\nres_list.append(iter_num)\n#### END BAD WAY ####\n#### GOOD WAY ####\n# initialize datetimes\nstart_datetime = datetime.datetime(2019, 10, 1)\nend_datetime = datetime.datetime(2020, 12, 1)\n# find months between end and astart date\nr = relativedelta(end_datetime, start_datetime)\nmonths_between = r.months + (12*r.years)\nmyres = [\n start_datetime + relativedelta(months=_)\nfor _ in range(1, months_between + 1)]\n# format dates\nmyres = [element.strftime(\"%Y%m\") for element in myres]\n#### END GOOD WAY ####"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"11. 變量命名不規範"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在循環中給變量使用i,j,k,df等非描述性字符進行命名,會使代碼的可讀性降低,尤其是循環中的邏輯處理較爲複雜的時候。代碼中變量命名短小精悍,往往容易混淆項目開發人員,這一點相信大家深有體會。不要擔心使用較長的變量名,也不要吝嗇使用下劃線“_”對變量進行命名。推薦給大家一篇有關"},{"type":"link","attrs":{"href":"https:\/\/towardsdatascience.com\/data-scientists-your-variable-names-are-awful-heres-how-to-fix-them-89053d2855be","title":null,"type":null},"content":[{"type":"text","text":"變量命名"}]},{"type":"text","text":"的高質量博客,一定會對你有所啓發。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"12. 不對代碼進行模塊化重構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"模塊化意味着將冗長且複雜的代碼分解成簡單的模塊,以執行細粒度的、特定的任務。不要只爲項目創建一個冗長的執行腳本。在代碼入口文件開頭定義大量的類或函數是不推薦的做法,因爲這樣做代碼很難閱讀和維護。相反,要根據代碼功能創建相應的模塊(包)。這方面的詳細內容,可以參考這篇博客 "},{"type":"link","attrs":{"href":"https:\/\/realpython.com\/courses\/python-modules-packages\/","title":null,"type":null},"content":[{"type":"text","text":"Python Modules and Packages"}]},{"type":"text","text":" 。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"13. 沒有遵循PEP約定"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當我剛開始使用Python編寫項目代碼的時候,寫出的代碼十分醜陋,難以閱讀。並且自己還努力地制定屬於自己的設計原則,好讓自己的代碼看起來沒有那麼糟糕。想出這些原則花費了我不少時間,但是我並沒有一直堅持這些原則,回想起來,受限於自己在Python的經驗,很多自己設計原則沒有那麼合理。最終,我發現了"},{"type":"link","attrs":{"href":"https:\/\/www.python.org\/dev\/peps\/","title":null,"type":null},"content":[{"type":"text","text":"PEP"}]},{"type":"text","text":"設計原則,它是Python的官方約定指南。我很喜歡PEP提出的約定,因爲它可以標準化我的代碼,從而使協作編程更加方便。順便說一下,有些特殊情況下我確實沒有按照PEP規則來做,但在絕大多數情況下,我會按照PEP規範來寫代碼。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"幾乎所有的IDE都支持linter擴展,下圖展示了linter的工作原理,它可以指出代碼中存在的問題。如果你仍然感覺不夠直觀,你可以查看具體的PEP索引提示,如括號中提示所示。如果你想查看有哪些可用的linter,可以參考"},{"type":"link","attrs":{"href":"https:\/\/realpython.com\/python-code-quality\/#linters","title":null,"type":null},"content":[{"type":"text","text":"realpythong.org"}]},{"type":"text","text":" 網站上的學習資源。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/2d\/bc\/2d65f64fb7945c9d152f77fc6a7690bc.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"14. 從不使用編碼助手"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果你想大幅提高寫代碼的效率,那麼就開始使用編碼助手吧。該工具可以巧妙地幫助你自動完成代碼、添加描述文檔以及給你的代碼提供修改建議。我最喜歡使用的編碼提示工具是由微軟開發的pylance,它支持在VScode中使用。Kite是另一個比較流行的編碼助手,同樣非常好用,許多編輯器都支持使用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"代碼提示工具的使用效果視頻可以點擊"},{"type":"link","attrs":{"href":"https:\/\/thumbs.gfycat.com\/BaggyNiceLemur-mobile.mp4","title":null,"type":null},"content":[{"type":"text","text":"此處"}]},{"type":"text","text":"進行查看。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"15. 缺少信息安全意識"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"將重要信息(密碼、密鑰)推送到公共 GitHub 倉庫是一個普遍存在的安全問題。如果你想了解這個問題的嚴重性,請查看 "},{"type":"link","attrs":{"href":"https:\/\/qz.com\/674520\/companies-are-sharing-their-secret-access-codes-on-github-and-they-may-not-even-know-it\/","title":null,"type":null},"content":[{"type":"text","text":"qz"}]},{"type":"text","text":" 上的這篇文章。互聯網上有專門的爬蟲機器人等待着你犯這樣的錯誤。從我的經歷來看,安全這一課題幾乎從來沒有在數據科學的相關課程中提到過。所以,你需要自己來填充這方面知識的空白。我建議首先去了解一下操作系統的環境變量相關知識,"},{"type":"link","attrs":{"href":"https:\/\/dev.to\/biplov\/handling-passwords-and-secret-keys-using-environment-variables-2ei0","title":null,"type":null},"content":[{"type":"text","text":"dev.to"}]},{"type":"text","text":"的這篇文章就是一個很好的開始,強烈推薦大家閱讀學習。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"作者介紹"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Gerold Csendes,現就職於EPAM,數據科學家,機器學習工程師。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文鏈接"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/www.kdnuggets.com\/2021\/03\/15-common-mistakes-python.html","title":null,"type":null},"content":[{"type":"text","text":"15 common mistakes data scientists make in Python (and how to fix them)"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章