Pandas高級教程之:Dataframe的合併

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"簡介","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pandas提供了很多合併Series和Dataframe的強大的功能,通過這些功能可以方便的進行數據分析。本文將會詳細講解如何使用Pandas來合併Series和Dataframe。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"使用concat","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"concat是最常用的合併DF的方法,先看下concat的定義:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"pd.concat(objs, axis=0, join='outer', ignore_index=False, keys=None,\n levels=None, names=None, verify_integrity=False, copy=True)\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"看一下我們經常會用到的幾個參數:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"objs是Series或者Series的序列或者映射。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"axis指定連接的軸。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"join","attrs":{}}],"attrs":{}},{"type":"text","text":" : {‘inner’, ‘outer’}, 連接方式,怎麼處理其他軸的index,outer表示合併,inner表示交集。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ignore_index: 忽略原本的index值,使用0,1,… n-1來代替。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"copy:是否進行拷貝。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"keys:指定最外層的多層次結構的index。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們先定義幾個DF,然後看一下怎麼使用concat把這幾個DF連接起來:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"In [1]: df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],\n ...: 'B': ['B0', 'B1', 'B2', 'B3'],\n ...: 'C': ['C0', 'C1', 'C2', 'C3'],\n ...: 'D': ['D0', 'D1', 'D2', 'D3']},\n ...: index=[0, 1, 2, 3])\n ...: \n\nIn [2]: df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],\n ...: 'B': ['B4', 'B5', 'B6', 'B7'],\n ...: 'C': ['C4', 'C5', 'C6', 'C7'],\n ...: 'D': ['D4', 'D5', 'D6', 'D7']},\n ...: index=[4, 5, 6, 7])\n ...: \n\nIn [3]: df3 = pd.DataFrame({'A': ['A8', 'A9', 'A10', 'A11'],\n ...: 'B': ['B8', 'B9', 'B10', 'B11'],\n ...: 'C': ['C8', 'C9', 'C10', 'C11'],\n ...: 'D': ['D8', 'D9', 'D10', 'D11']},\n ...: index=[8, 9, 10, 11])\n ...: \n\nIn [4]: frames = [df1, df2, df3]\n\nIn [5]: result = pd.concat(frames)\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"df1,df2,df3定義了同樣的列名和不同的index,然後將他們放在frames中構成了一個DF的list,將其作爲參數傳入concat就可以進行DF的合併。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d3/d36c05f261823a754d20e42ef0b3a243.png","alt":"","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"舉個多層級的例子:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"In [6]: result = pd.concat(frames, keys=['x', 'y', 'z'])\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/5a/5aac5b9aae6cf0e9bfb6481f5216bb0d.png","alt":"","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用keys可以指定frames中不同frames的key。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用的時候,我們可以通過選擇外部的key來返回特定的frame:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"In [7]: result.loc['y']\nOut[7]: \n A B C D\n4 A4 B4 C4 D4\n5 A5 B5 C5 D5\n6 A6 B6 C6 D6\n7 A7 B7 C7 D7\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面的例子連接的軸默認是0,也就是按行來進行連接,下面我們來看一個例子按列來進行連接,如果要按列來連接,可以指定axis=1:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"In [8]: df4 = pd.DataFrame({'B': ['B2', 'B3', 'B6', 'B7'],\n ...: 'D': ['D2', 'D3', 'D6', 'D7'],\n ...: 'F': ['F2', 'F3', 'F6', 'F7']},\n ...: index=[2, 3, 6, 7])\n ...: \n\nIn [9]: result = pd.concat([df1, df4], axis=1, sort=False)\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6a/6a497f1f0fcfa5634b0a0ef9172698ed.png","alt":"","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"默認的 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"join='outer'","attrs":{}}],"attrs":{}},{"type":"text","text":",合併之後index不存在的地方會補全爲NaN。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面看一個join=’inner’的情況:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"In [10]: result = pd.concat([df1, df4], axis=1, join='inner')\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8b/8beede189acfb0eae039afa932406654.png","alt":"","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"join=’inner’ 只會選擇index相同的進行展示。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果合併之後,我們只想保存原來frame的index相關的數據,那麼可以使用reindex:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"In [11]: result = pd.concat([df1, df4], axis=1).reindex(df1.index)\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"或者這樣:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"In [12]: pd.concat([df1, df4.reindex(df1.index)], axis=1)\nOut[12]: \n A B C D B D F\n0 A0 B0 C0 D0 NaN NaN NaN\n1 A1 B1 C1 D1 NaN NaN NaN\n2 A2 B2 C2 D2 B2 D2 F2\n3 A3 B3 C3 D3 B3 D3 F3\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"看下結果:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a8/a84706c720355c3d3fff00b4b4ec3717.png","alt":"","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以合併DF和Series:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"In [18]: s1 = pd.Series(['X0', 'X1', 'X2', 'X3'], name='X')\n\nIn [19]: result = pd.concat([df1, s1], axis=1)\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/43/4317cda167e32a35a812217952e9b9cf.png","alt":"","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果是多個Series,使用concat可以指定列名:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"In [23]: s3 = pd.Series([0, 1, 2, 3], name='foo')\n\nIn [24]: s4 = pd.Series([0, 1, 2, 3])\n\nIn [25]: s5 = pd.Series([0, 1, 4, 5])\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"In [27]: pd.concat([s3, s4, s5], axis=1, keys=['red', 'blue', 'yellow'])\nOut[27]: \n red blue yellow\n0 0 0 0\n1 1 1 1\n2 2 2 4\n3 3 3 5\n","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"使用append","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"append可以看做是concat的簡化版本,它沿着","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"axis=0","attrs":{}}],"attrs":{}},{"type":"text","text":" 進行concat:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"In [13]: result = df1.append(df2)\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/0b/0b4ef483ce6d64603bbd95b13035ebe0.png","alt":"","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果append的兩個 DF的列是不一樣的會自動補全NaN:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"In [14]: result = df1.append(df4, sort=False)\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/2d/2d6c248fe8dac30078a85ec9519d4f55.png","alt":"","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果設置ignore_index=True,可以忽略原來的index,並重寫分配index:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"In [17]: result = df1.append(df4, ignore_index=True, sort=False)\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/00/00eb9a975303f2233500579e1a5f2e00.png","alt":"","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"向DF append一個Series:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"In [35]: s2 = pd.Series(['X0', 'X1', 'X2', 'X3'], index=['A', 'B', 'C', 'D'])\n\nIn [36]: result = df1.append(s2, ignore_index=True)\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/79/79cf18a1bd2554ff6c8c553ab122e61b.png","alt":"","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"使用merge","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"和DF最類似的就是數據庫的表格,可以使用merge來進行類似數據庫操作的DF合併操作。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"先看下merge的定義:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None,\n left_index=False, right_index=False, sort=True,\n suffixes=('_x', '_y'), copy=True, indicator=False,\n validate=None)\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Left, right是要合併的兩個DF 或者 Series。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"on代表的是join的列或者index名。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"left_on:左連接","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"right_on","attrs":{}}],"attrs":{}},{"type":"text","text":":右連接","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"left_index","attrs":{}}],"attrs":{}},{"type":"text","text":": 連接之後,選擇使用左邊的index或者column。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"right_index","attrs":{}}],"attrs":{}},{"type":"text","text":":連接之後,選擇使用右邊的index或者column。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"how:連接的方式,","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"'left'","attrs":{}}],"attrs":{}},{"type":"text","text":", ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"'right'","attrs":{}}],"attrs":{}},{"type":"text","text":", ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"'outer'","attrs":{}}],"attrs":{}},{"type":"text","text":", ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"'inner'","attrs":{}}],"attrs":{}},{"type":"text","text":". 默認 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"inner","attrs":{}}],"attrs":{}},{"type":"text","text":".","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"sort","attrs":{}}],"attrs":{}},{"type":"text","text":": 是否排序。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"suffixes","attrs":{}}],"attrs":{}},{"type":"text","text":": 處理重複的列。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"copy","attrs":{}}],"attrs":{}},{"type":"text","text":": 是否拷貝數據","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"先看一個簡單merge的例子:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"In [39]: left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],\n ....: 'A': ['A0', 'A1', 'A2', 'A3'],\n ....: 'B': ['B0', 'B1', 'B2', 'B3']})\n ....: \n\nIn [40]: right = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],\n ....: 'C': ['C0', 'C1', 'C2', 'C3'],\n ....: 'D': ['D0', 'D1', 'D2', 'D3']})\n ....: \n\nIn [41]: result = pd.merge(left, right, on='key')\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b6/b60d74f8499ac4daeaccf0646f979bd3.png","alt":"","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面兩個DF通過key來進行連接。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"再看一個多個key連接的例子:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"In [42]: left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],\n ....: 'key2': ['K0', 'K1', 'K0', 'K1'],\n ....: 'A': ['A0', 'A1', 'A2', 'A3'],\n ....: 'B': ['B0', 'B1', 'B2', 'B3']})\n ....: \n\nIn [43]: right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],\n ....: 'key2': ['K0', 'K0', 'K0', 'K0'],\n ....: 'C': ['C0', 'C1', 'C2', 'C3'],\n ....: 'D': ['D0', 'D1', 'D2', 'D3']})\n ....: \n\nIn [44]: result = pd.merge(left, right, on=['key1', 'key2'])\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/5b/5b0cb42e4713c4d4903a6877dc3db6ac.png","alt":"","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"How 可以指定merge方式,和數據庫一樣,可以指定是內連接,外連接等:","attrs":{}}]},{"type":"embedcomp","attrs":{"type":"table","data":{"content":"
合併方法SQL 方法
leftLEFT OUTER JOIN
rightRIGHT OUTER JOIN
outerFULL OUTER JOIN
innerINNER JOIN
"}}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"In [45]: result = pd.merge(left, right, how='left', on=['key1', 'key2'])\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/64/6423b9ef159f062b378f7f8a44ab6d5f.png","alt":"","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"指定indicator=True ,可以表示具體行的連接方式:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"In [60]: df1 = pd.DataFrame({'col1': [0, 1], 'col_left': ['a', 'b']})\n\nIn [61]: df2 = pd.DataFrame({'col1': [1, 2, 2], 'col_right': [2, 2, 2]})\n\nIn [62]: pd.merge(df1, df2, on='col1', how='outer', indicator=True)\nOut[62]: \n col1 col_left col_right _merge\n0 0 a NaN left_only\n1 1 b 2.0 both\n2 2 NaN 2.0 right_only\n3 2 NaN 2.0 right_only\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果傳入字符串給indicator,會重命名indicator這一列的名字:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"In [63]: pd.merge(df1, df2, on='col1', how='outer', indicator='indicator_column')\nOut[63]: \n col1 col_left col_right indicator_column\n0 0 a NaN left_only\n1 1 b 2.0 both\n2 2 NaN 2.0 right_only\n3 2 NaN 2.0 right_only\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"多個index進行合併:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"In [112]: leftindex = pd.MultiIndex.from_tuples([('K0', 'X0'), ('K0', 'X1'),\n .....: ('K1', 'X2')],\n .....: names=['key', 'X'])\n .....: \n\nIn [113]: left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],\n .....: 'B': ['B0', 'B1', 'B2']},\n .....: index=leftindex)\n .....: \n\nIn [114]: rightindex = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'),\n .....: ('K2', 'Y2'), ('K2', 'Y3')],\n .....: names=['key', 'Y'])\n .....: \n\nIn [115]: right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],\n .....: 'D': ['D0', 'D1', 'D2', 'D3']},\n .....: index=rightindex)\n .....: \n\nIn [116]: result = pd.merge(left.reset_index(), right.reset_index(),\n .....: on=['key'], how='inner').set_index(['key', 'X', 'Y'])\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/0b/0bb8baaee11e4c170d5c39a376f85f55.png","alt":"","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"支持多個列的合併:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"In [117]: left_index = pd.Index(['K0', 'K0', 'K1', 'K2'], name='key1')\n\nIn [118]: left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],\n .....: 'B': ['B0', 'B1', 'B2', 'B3'],\n .....: 'key2': ['K0', 'K1', 'K0', 'K1']},\n .....: index=left_index)\n .....: \n\nIn [119]: right_index = pd.Index(['K0', 'K1', 'K2', 'K2'], name='key1')\n\nIn [120]: right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],\n .....: 'D': ['D0', 'D1', 'D2', 'D3'],\n .....: 'key2': ['K0', 'K0', 'K0', 'K1']},\n .....: index=right_index)\n .....: \n\nIn [121]: result = left.merge(right, on=['key1', 'key2'])\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b8/b8318a8a89c73227d75fd16880c60c73.png","alt":"","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"使用join","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"join將兩個不同index的DF合併成一個。可以看做是merge的簡寫。","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"In [84]: left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],\n ....: 'B': ['B0', 'B1', 'B2']},\n ....: index=['K0', 'K1', 'K2'])\n ....: \n\nIn [85]: right = pd.DataFrame({'C': ['C0', 'C2', 'C3'],\n ....: 'D': ['D0', 'D2', 'D3']},\n ....: index=['K0', 'K2', 'K3'])\n ....: \n\nIn [86]: result = left.join(right)\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/9c/9c21f3649c7d6ac86b83585fffe82308.png","alt":"","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以指定how來指定連接方式:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"In [87]: result = left.join(right, how='outer')\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7f/7f8885fade2ddc74c5d9b21b49660393.png","alt":"","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"默認join是按index來進行連接。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"還可以按照列來進行連接:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"In [91]: left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],\n ....: 'B': ['B0', 'B1', 'B2', 'B3'],\n ....: 'key': ['K0', 'K1', 'K0', 'K1']})\n ....: \n\nIn [92]: right = pd.DataFrame({'C': ['C0', 'C1'],\n ....: 'D': ['D0', 'D1']},\n ....: index=['K0', 'K1'])\n ....: \n\nIn [93]: result = left.join(right, on='key')\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f7/f72eb99201364df645db696f712418a5.png","alt":"","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"單個index和多個index進行join:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"In [100]: left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],\n .....: 'B': ['B0', 'B1', 'B2']},\n .....: index=pd.Index(['K0', 'K1', 'K2'], name='key'))\n .....: \n\nIn [101]: index = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'),\n .....: ('K2', 'Y2'), ('K2', 'Y3')],\n .....: names=['key', 'Y'])\n .....: \n\nIn [102]: right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],\n .....: 'D': ['D0', 'D1', 'D2', 'D3']},\n .....: index=index)\n .....: \n\nIn [103]: result = left.join(right, how='inner')\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/48/48b4f7f335ec97736e936abe2650e504.png","alt":"","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"列名重複的情況:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"In [122]: left = pd.DataFrame({'k': ['K0', 'K1', 'K2'], 'v': [1, 2, 3]})\n\nIn [123]: right = pd.DataFrame({'k': ['K0', 'K0', 'K3'], 'v': [4, 5, 6]})\n\nIn [124]: result = pd.merge(left, right, on='k')\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/03/03b9431b14294cfa2caeb06a93e9d2c2.png","alt":"","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以自定義重複列名的命名規則:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"In [125]: result = pd.merge(left, right, on='k', suffixes=('_l', '_r'))\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8e/8e7b33ef22bbfa3ab6a16ff544b5baa9.png","alt":"","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"覆蓋數據","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有時候我們需要使用DF2的數據來填充DF1的數據,這時候可以使用combine_first:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"In [131]: df1 = pd.DataFrame([[np.nan, 3., 5.], [-4.6, np.nan, np.nan],\n .....: [np.nan, 7., np.nan]])\n .....: \n\nIn [132]: df2 = pd.DataFrame([[-42.6, np.nan, -8.2], [-5., 1.6, 4]],\n .....: index=[1, 2])\n .....: \n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"In [133]: result = df1.combine_first(df2)\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/34/3472b7cdb7b79b45503eece8c035c6c4.png","alt":"","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"或者使用update:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"In [134]: df1.update(df2)\n","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"本文已收錄於 ","attrs":{}},{"type":"link","attrs":{"href":"http://www.flydean.com/04-python-pandas-merge/","title":null,"type":null},"content":[{"type":"text","text":"http://www.flydean.com/04-python-pandas-merge/","attrs":{}}],"marks":[{"type":"italic"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"最通俗的解讀,最深刻的乾貨,最簡潔的教程,衆多你不知道的小技巧等你來發現!","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"歡迎關注我的公衆號:「程序那些事」,懂技術,更懂你!","attrs":{}}]}],"attrs":{}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章