python cookbook阅读之——6.数据编码与处理

原創

2020-05-22 01:03

6.1 读写CSV数据

例如：写一个文件test.csv文件，再打开

>>> import csv
>>> headers = ['class','name','sex','height','year']
>>> rows = [[1,'xiaoming','male',168,23],[1,'xiaohong','female',162,22],[2,'xiaozhang','female',163,21],[2,'xiaoli','male',158,21]]
>>> with open('test.csv','w')as f:
...     f_csv = csv.writer(f)
...     f_csv.writerow(headers)
...     f_csv.writerows(rows)
... 
28

写入后打开看看内容

#读取
>>> with open('test.csv') as f:
...     f_csv = csv.reader(f) 
...     headers = next(f_csv)   #不读取header行
...     for row in f_csv:
...             print("row = ", row)
... 
row =  ['1', 'xiaoming', 'male', '168', '23']
row =  ['1', 'xiaohong', 'female', '162', '22']
row =  ['2', 'xiaozhang', 'female', '163', '21']
row =  ['2', 'xiaoli', 'male', '158', '21']

#优先使用这种方式
>>> with open('test.csv') as f:
...     f_csv = csv.reader(f)    #读取header行
...     for row in f_csv:
...             print("row = ", row)
... 
row =  ['class', 'name', 'sex', 'height', 'year']
row =  ['1', 'xiaoming', 'male', '168', '23']
row =  ['1', 'xiaohong', 'female', '162', '22']
row =  ['2', 'xiaozhang', 'female', '163', '21']
row =  ['2', 'xiaoli', 'male', '158', '21']

下面这种常用方式，需注意：这种方式需要自己处理细节，比如字段被引号引起来要自己去除引号，被引用字段恰好包含一个逗号，产出的那一行会因为大小的错误而使得代码崩溃（因为原始数据是用逗号分开的）

>>> with open('test.csv') as f:   
...     for line in f:
...             row = line.split(',')
...             print (row)
... 
['class', 'name', 'sex', 'height', 'year\n']
['1', 'xiaoming', 'male', '168', '23\n']
['1', 'xiaohong', 'female', '162', '22\n']
['2', 'xiaozhang', 'female', '163', '21\n']
['2', 'xiaoli', 'male', '158', '21\n']

上面这种访问，要使用索引去访问不方便。下面介绍将数据读取为字典序列，使用标头去访问，如下：

>>> with open('test.csv') as f:
...     f_csv = csv.DictReader(f)
...     for row in f_csv:
...             print(row)
... 
OrderedDict([('class', '1'), ('name', 'xiaoming'), ('sex', 'male'), ('height', '168'), ('year', '23')])
OrderedDict([('class', '1'), ('name', 'xiaohong'), ('sex', 'female'), ('height', '162'), ('year', '22')])
OrderedDict([('class', '2'), ('name', 'xiaozhang'), ('sex', 'female'), ('height', '163'), ('year', '21')])
OrderedDict([('class', '2'), ('name', 'xiaoli'), ('sex', 'male'), ('height', '158'), ('year', '21')])

6.2 json打印，格式化输出，排序输出

>>> from pprint import pprint
>>> import json
>>> a  = {"a":"1","c":2,"b":4}
#传了indent参数，书上说跟pprint输出格式是一样的，但是我验证pprint输出不是
>>> print(json.dumps(a,indent=4))    
{
    "a": "1",
    "c": 2,
    "b": 4
}
>>> pprint(a)
{'a': '1', 'b': 4, 'c': 2}
#使用sort_keys排序
>>> print(json.dumps(a,sort_keys=True))   
{"a": "1", "b": 4, "c": 2}

6.3 XML解析

用xml.etree.ElementTree解析简单的XML。对于更加高级的应用，应该考虑使用lxml。lxml采用的编程接口和ElementTree一样。lxml完全兼容XML标准，而且运行速度非常快，还提供验证、XSLT以及XPath这样的功能。使用方式只要将xml的导入语句form xml.etree.ElementTree import parse 改成 form lxml.etree.ElementTree import parse即可。

6.4 数据统计和汇总

任何涉及统计、时间序列以及其他相关技术数据的分析问题，都应该使用Pandas库。

Pandas是一个庞大的库，尤其适用于：需要分析大型的数据集、将数据归组、执行统计分析或者其他类似任务。

>>> import pandas
>>> rats = pandas.read_csv('test.csv')
>>> rats
   class       name     sex  height  year
0      1   xiaoming    male     168    23
1      1   xiaohong  female     162    22
2      2  xiaozhang  female     163    21
3      2     xiaoli    male     158    21

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

python cookbook阅读之——6.数据编码与处理

《Python进阶》学习笔记

Leetcode 3161. 物块放置查询

leetcode 60 排列序列

一个docker容器暴露多个端口

微服务实践之使用 Visual Studio 2022 调试Dapr 应用程序

wpf附加属性理解 WPF附加属性

html+js/jquery前端頁面下載圖片資源

MariaDB10.3.17 mysql啓動報錯解決辦法 [ERROR] Unknown/unsupported storage engine: InnoDB

docker: Error response from daemon: No command specified.

mysql/mariadb GROUP_CONCAT超過默認長度徹底解決辦法

python/tornado/websocket 報錯 There is no current event loop in thread 'Thread-208'.

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結