文章目錄

1. SQLite 與 Pandas 異同點比較

1.1 數據導入

1.1.1 SQLIte

SQLite 需要首先導入數據庫文件並使用 select 語句選取記錄：

使用Jupyter Notebook 運行 SQL 語句需安裝 ipython-sql
%sql 以及 %%sql 爲在 Notebook 中運行 SQL 語句，在 SQLite 命令行或 SQLite Stiduo 中不需要 %sql 或 %%sql

%load_ext sql
%sql sqlite:///DataBase/weather_stations.db

'Connected: @DataBase/weather_stations.db'

%%sql
select * from station_data
limit 0,3

 * sqlite:///DataBase/weather_stations.db
Done.

station_number	report_code	year	month	day	dew_point	station_pressure	visibility	wind_speed	temperature	snow_depth	fog	rain	hail	thunder	tornado
143080	34DDA7	2002	12	21	33.8	987.4	3.4	0.2	36	None	1	1	1	1	1
766440	39537B	1998	10	1	72.7	1014.6	5.9	6.7	83.3	None	0	0	0	0	0
176010	C3C6D5	2001	5	18	55.7	None	7.3	4.3	69.1	None	0	0	0	0	0

1.1.2 Pandas

在 Python 中我們可以利用 sqlite3 模塊來讀取數據庫並轉換爲 Pandas 的 DataFrame 格式：

import sqlite3
import pandas as pd
con = sqlite3.connect('./DataBase/weather_stations.db')
cursor = con.execute('select * from station_data')
rows = cursor.fetchall()
df = pd.DataFrame(rows, columns=[x[0] for x in cursor.description])

df.head(3)

	station_number	report_code	year	month	day	dew_point	station_pressure	visibility	wind_speed	temperature	snow_depth	fog	rain	hail	thunder	tornado
0	143080	34DDA7	2002	12	21	33.8	987.4	3.4	0.2	36.0	NaN	1	1	1	1	1
1	766440	39537B	1998	10	1	72.7	1014.6	5.9	6.7	83.3	NaN	0	0	0	0	0
2	176010	C3C6D5	2001	5	18	55.7	NaN	7.3	4.3	69.1	NaN	0	0	0	0	0

1.2 選取數據

1.2.1 SQLite

在 SQLite 中，我們需要通過 select 語句來篩選數據，並可以使用 alias、內置函數和文字拼接等對數據進行處理：

%%sql
select
station_number ||'_'|| report_code as number, -- 將兩個拼接
round(temperature*9/5+32, 2) as Fahrenheit -- 將攝氏度轉換爲華氏度
from station_data
limit 0,3

 * sqlite:///DataBase/weather_stations.db
Done.

number	Fahrenheit
143080_34DDA7	96.0
766440_39537B	181.94
176010_C3C6D5	156.38

1.2.2 Pandas

Pandas 中似乎沒有直接作用於字符的拼接函數，因此需要多一步轉換的過程才能得到與 SQL 中 concat 相同的效果：

pd.concat( [pd.Series([str(num)+'_'+code for (num,code) in 
                       zip(list(df['station_number']), 
                           list(df['report_code']))], name='number'), 
            pd.Series(round(df['temperature']*9/5+32, 2), name='Fahrenheit')],
          axis=1).head(3)

	number	Fahrenheit
0	143080_34DDA7	96.80
1	766440_39537B	181.94
2	176010_C3C6D5	156.38

1.3 篩選數據

1.3.1 SQLite

SQLite 中使用 where 對數據進行篩選，如篩選 2005 年至 2010 年的數據：

%%sql
select * from station_data
where year>=2005 and year<=2010
limit 0,3

 * sqlite:///DataBase/weather_stations.db
Done.

station_number	report_code	year	month	day	dew_point	station_pressure	visibility	wind_speed	temperature	snow_depth
125600	145150	2007	10	14	33	None	6.9	2.5	39.7	None
598550	C5C66E	2006	10	15	72.9	None	14.2	1.7	82	None
941830	229317	2007	4	19	66.5	994.9	None	4	76.3	None

篩選月份爲 3、6、9、12 的數據：

%%sql 
select * from station_data 
where Month in (3,6,9,12)
limit 0,3;

 * sqlite:///DataBase/weather_stations.db
Done.

station_number	report_code	year	month	day	dew_point	station_pressure	visibility	wind_speed	temperature	precipitation	snow_depth	fog	rain	hail	thunder	tornado
143080	34DDA7	2002	12	21	33.8	987.4	3.4	0.2	36	0	None	1	1	1	1	1
821930	1F8A7B	1953	6	18	72.8	1007.1	12.4	3.6	81.3	0	None	0	0	0	0	0
478070	D028D8	1981	6	27	73.4	None	7.9	3	77	1.93	None	0	0	0	0	0

1.3.2 Pandas

篩選 2005 年至 2010 年的數據：

df[(df['year']>=2005) & (df['year']<=2010)].head(3)

	station_number	report_code	year	month	day	dew_point	station_pressure	visibility	wind_speed	temperature	snow_depth
3	125600	145150	2007	10	14	33.0	NaN	6.9	2.5	39.7	NaN
9	598550	C5C66E	2006	10	15	72.9	NaN	14.2	1.7	82.0	NaN
18	941830	229317	2007	4	19	66.5	994.9	NaN	4.0	76.3	NaN

篩選月份爲 3、6、9、12 的數據：

df[ [ [data in [3,6,9,12]] for data in df['month'].values ] ].head(3)

	station_number	report_code	year	month	day	dew_point	station_pressure	visibility	wind_speed	temperature	precipitation	snow_depth	fog	rain	hail	thunder	tornado
0	143080	34DDA7	2002	12	21	33.8	987.4	3.4	0.2	36.0	0.00	NaN	1	1	1	1	1
5	821930	1F8A7B	1953	6	18	72.8	1007.1	12.4	3.6	81.3	0.00	NaN	0	0	0	0	0
6	478070	D028D8	1981	6	27	73.4	NaN	7.9	3.0	77.0	1.93	NaN	0	0	0	0	0

1.4 數據聚合與分組

1.4.1 SQLite

在 SQLite 中，我們使用 group by 和內置聚合函數實現聚合分組操作，如統計每個月龍捲風的記錄次數：

%%sql
select year, month,
count(*) as record_count
from station_data
where tornado == 1
group by year, month
order by year, month
limit 0,5;

 * sqlite:///DataBase/weather_stations.db
Done.

year	month	record_count
1937	7	3
1941	8	3
1942	10	3
1943	1	3
1943	4	3

1.4.2 Pandas

同樣在 Pandas 中，也有 groupby 函數實現分組操作：

df[df['tornado']==1].groupby(['year','month'])['tornado'].count().head(5)

year  month
1937  7        3
1941  8        3
1942  10       3
1943  1        3
      4        3
Name: tornado, dtype: int64

2. 總結

從數據導入、選取、篩選、聚合與分組可以看出， SQLite 具有結構化的特點，
容易操作且易上手，代碼一目瞭然。反觀 Pandas ，則繼承了 Pythonic 的特點。
雖然部分功能需要自己寫循環，但由於列表推倒式的存在，使得這代碼依然可以保持短小精悍。
總體而言，SQLite 的功能實現更加方便快捷，而 Pandas 則擁有更高的自由度，
但需要對 Python 比較熟悉，兩者各有千秋！

SQLite | SQLite 與 Pandas 比較篇之一

文章目錄

1. SQLite 與 Pandas 異同點比較

1.1 數據導入

1.1.1 SQLIte

1.1.2 Pandas

1.2 選取數據

1.2.1 SQLite

1.2.2 Pandas

1.3 篩選數據

1.3.1 SQLite

1.3.2 Pandas

1.4 數據聚合與分組

1.4.1 SQLite

1.4.2 Pandas

2. 總結

使用neovim打造go ide(支持代碼跳轉, 代碼補全, 實時語法檢查)

挑戰程序設計競賽 2.3章習題 poj 3046 Ant Counting

Shell/Python中的用戶名獲取

機器學習 | 目錄（持續更新）

無監督學習 | GMM 高斯混合聚類原理及Sklearn實現

無監督學習 | KMeans與KMeans++原理

無監督學習 | DBSCAN 原理及Sklearn實現

SQLite | SQLite 與 Pandas 比較篇之一

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結