pandas與表處理頂原薦

查詢寫入操作

pandas可以類似sql一樣有強大的查詢功能，而且寫法簡單：

print tips[['total_bill', 'tip', 'smoker', 'time']]
#顯示'total_bill', 'tip', 'smoker', 'time'列,功能類似於sql中的select命令

print tips[tips['time'] == 'Dinner']
#顯示time列中等於Dinner的數據，功能類似於sql中的Where命令

print tips[(tips['size'] >= 5) | (tips['total_bill'] > 45)]
print tips[(tips['time'] == 'Dinner') & (tips['tip'] > 5.00)]
# |功能類似於sql中的or命令， &功能類似於sql中的and命令

#index和label查詢
df.iloc[i:j,k:p]#iloc操作index，輸出第i行到第j行和第k列和第p列中的數值
df.loc['20130102':'20130104',['A','B']]#loc操作label，輸出行爲'20130102':'20130104',列爲'A','B'
df.at[dates[0],'A']#返回特定行label和列label的數值

#map函數操作
df['Oid'] = df['Name'].map(lambda x: int(x.split(' - ')[0]))

#刪除列
del df['smoker']
#增加列
df['smoker'] = np.nan
#刪除行
df = df.drop([i for i in range(1,100)],axis=0)#刪除100行
#增加行
df = df.append(pd.DataFrame(
index=[i for i in range(100,200)],columns=df.columns),ignore_index=True)#增加一百行

使用pandas寫一個將一維關係表寫成展二維開式關係表，代碼如下：

def one2two(filepath,col_value):
    '''
    該關係表爲一個Oid字段和一個Did的字段，兩個字段對應一個數字co_value，該函數將Oid和Did
    字段中數值轉換成一個以Oid爲列，Did爲行的二維數據表。
    '''
    df = pd.read_csv(filepath)
    newdf = pd.DataFrame(columns=df['Oid'].unique(),index=df['Did'].unique())
    time = len(newdf.index)
    for i in newdf.index:
        for c in newdf.columns:
            #通過查詢獲得Oid和Did對應的值
            value = df[df.Did==c][df[df.Did==c].Oid==i]
            newdf[c][i] = value[col_value]
        time=time-1
        print 'Ater %d the app will leave.'%time
    print 'Ready to write.'
    newdf.to_csv(col_value+'.csv')
    print 'Finsh write, the %s.cvs was generated'%col_value

pandas除了查詢不錯在bigfile處理也相當可觀，如下面從一個大文件中提取要素保存的函數：

def save(pathfile,outPath):
    reader = pd.read_csv(pathfile,iterator=True)#使用iterator，使pandas可以分開讀取文件
    loop = True
    chunkSize = 1000000
    chunks = []
    while loop:
        try:
            #劃分成chunksize行大小的塊進行讀取
            df = reader.get_chunk(chunkSize)
            chunks.append(df)
        except StopIteration:
            loop = False
            print 'Iteration is stopped.'

    try:
        #將塊連接起來，這裏用了一個try，因爲不知道怎麼的總是發生內存錯誤，如果不用try..finally後面
        #代碼總是無法運行，但不知道加了try..finally對數據是否有影響？
        df = pd.concat(chunks, ignore_index=True)
    finally:
        df = df[['Name','Total_length','Total_time']]
        #提出Name字段中數值中' - '之前的放入Oid中
        df['Oid'] = df['Name'].map(lambda x: int(x.split(' - ')[0]))
        df['Did'] = df['Name'].map(lambda x: int(x.split(' - ')[1]))
        del df['Name']
        df.to_csv(outPath)
        print 'Finsh.'

pandas與表處理頂原薦

MySQL 核心模塊揭祕 | 18 期 | 鎖在內存里長什麼樣*

使用perf工具生成火焰圖

響應式界面控件DevExtreme * 更強的數據分析和可視化功能

大齡程序員思考

HttpSecurity 是如何組裝過濾器鏈的

數說海南——近6年海南各市縣人口簡單看

長序列中Transformers的高級注意力機制總結

WebStorm 創建 Vue 項目

nuget添加readme

基於go手動寫個轉發代理服務的代碼實現

作文自動批閱程序簡介頂原

mysql配置頂原

用Docker部署一個自己的可視化爬蟲系統頂原薦

tmux快捷鍵頂原

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

pandas與表處理 頂 原 薦

pandas與表處理頂原薦