python_pandasDAY_19(2)數據IO

原創

2020-06-16 14:32

學習內容
pandas數據操作
重點
1.csv數據的讀入

import pandas as pd
import numpy as np

print(pd.read_csv("data1.csv"))#直接讀取csv文件
       a b c d
0      1 2 3 4
1  11 12 13 14
#這麼看來我們發現數據貌似不對行



print(pd.read_csv("data1.csv", header=None, names=["one", 'two', 'three', 'four']))
#指定列索引讀取數據，讓數據更規整
           one  two  three  four
0      a b c d  NaN    NaN   NaN
1      1 2 3 4  NaN    NaN   NaN
2  11 12 13 14  NaN    NaN   NaN
#但是我們發現數據全部併入one這一列，並不是我們所想要的



print(pd.read_csv("data1.csv", sep="\s+", header=None, 
names=["one", 'two', 'three', 'four']))
#引入正則表達式，\s+表示檢索空格或多個空格爲分隔符
  one two three four
0   a   b     c    d
1   1   2     3    4
2  11  12    13   14
#這樣數據就對整齊了



print(pd.read_csv("data1.csv", sep='\s+', header=None, 
names=["one", 'two', 'three', 'four']，index_col='one'))
#在上述基礎上，我們可以指定關鍵字作爲分組
    two three four
one               
a     b     c    d
1     2     3    4
11   12    13   14
#完美處理數據




print(pd.read_csv("data1.csv", sep="\s+", header=None, 
names=["one", 'two', 'three', 'four']，na_values=["d",2,11]))
#在此基礎上我們可以指定位置內容爲空
   one  two three  four
0    a    b     c   NaN
1    1  NaN     3   4.0
2  NaN   12    13  14.0
#這樣對應位置就變成NAN




print(pd.read_csv("data1.csv", sep="\s+", header=None,
 names=["one", 'two', 'three', 'four'],
 na_values={"one":["d",2],"two":["d",11],"four":[2,11]}))
 #在上述基礎上，我們可以精確到指定列的具體值爲空
   one two three four
0   a   b     c    d
1   1   2     3    4
2  11  12    13   14
#one沒有d和2，所以沒有值爲空，其他也是同理



print(pd.read_csv("data1.csv", sep="\s+", header=None, 
names=["one", 'two', 'three', 'four'],nrows=2))
#只讀兩行
  one two three four
0   a   b     c    d
1   1   2     3    4

超大數據處理
有時候我們所要處理的數據可能成千上億行，這時候就需要用到chunksize來讀取一定數據

import pandas as  pd
s=pd.read_csv("data2.csv",chunksize=1000)
print(s)#讀取數據的前1000行
#這個時候s就成了可迭代對象，支持python的循環
#我們如果要統計數據的具體列出現相同內容的次數，就可以結合for循環
<pandas.io.parsers.TextFileReader object at 0x000001126C1B6978>




import pandas as pd
import chunk

s = pd.read_csv("data2.csv", chunksize=1000)
result = pd.Series([])
for m in s:
    result = result.add(chunk['key'].value_counts(), fill_value=0)
print(result[:10])
#這樣就可以統計前1000行，列表籤key所重複出現內容的次數,並打印前10個

3.文件寫入磁盤

import pandas as pd
import chunk

s = pd.read_csv("data1.csv", sep="\s+",header=None)
print(s)
    0   1   2   3
0   a   b   c   d
1   1   2   3   4
2  11  12  13  14
s.to_csv("data1-1.csv")
#這個時候文件右側欄就會找到data1-1的文件，這時候打開文件
,0,1,2,3
0,a,b,c,d
1,1,2,3,4
2,11,12,13,14
#再打開源文件
a b c d
1 2 3 4
11 12 13 14
#相比較，我們發現，由於加上了索引導致內容會有偏差
s.to_csv("data1-1.csv",index=False,header=None)
#這樣就忽略掉索引，內容於data1一樣了


s.to_csv("data1-1.csv",index=False,header=None,columns=[1,3])
#這樣雖然忽略了索引，但是存入時只存列索引爲1，3的內容
b,d
2,4
12,14
s.to_csv("data1-1.csv",index=False,header=None,columns=[1,3]，sep=",")
#通過sep加入分隔符

二進制文件的操作
用pickle包進行處理，這裏不在敘述

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

python_pandasDAY_19(2)數據IO

再談23種設計模式（3）：行爲型模式（學習筆記）

Power Automate Desktop 安裝完，登錄後老是提示one driver 錯誤

微前端學習筆記(4):從微前端到微模塊之EMP與hel-micro方案探索

微前端學習筆記（1）：微前端總體架構概述，從微服務發微

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

記一次 .NET某工控視覺自動化系統卡死分析

WindowsServer--SQL Server搭建主從同步實現讀寫分離 - 事務性分發

pytorch---張量（tensor）的基本操作

pytorch簡單實例1---線性迴歸實例

基於GAN的無線通信與網絡應用設計----通信信號的特徵提取

matplotlib_pandas_numpy階段小結

python_matplotlib DAY_21(1)

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結