目標:

使用BigQuery (一個基於web的可以將SQL用於大數據集合的，由google提供的雲服務)，使用這個服務的好處是我們對數據進行分析的時候，是雲端的資源正在被調用，而本地計算機不需要過多的計算開銷。

1. BigQuery指令基礎 (project->dataset->table)

1.1首先導入BigQuery包

from google.cloud import bigquery

如果嫌麻煩可以在kaggle上啓動一個Notebook運行該命令或者使用pip本地進行安裝

1.2 爲了操作數據庫首先需要創建Client對象。

該對象的作用是負責提供接口，幫程序猿從一個BigQuery datasets中獲取信息。

使用bigquery.Client()創建對象。

# Create a "Client" object
client = bigquery.Client()

1.2 接下來我們會通過查看 Hacker News 上發佈的數據來演示這個client是如何工作的。（Hacker News主要是一些計算機科學、網絡安全相關的新聞）

在BigQuery 中，每一個數據集（裏面有很多表格文件）是包含在一個對應的項目裏面的。

在這個例子中hacker_news數據集包含於bigquery-public-data 這個項目中

首先要創建一個對數據集的參考
```
dataset_ref
```
使用的是client 提供的dataset(數據集名稱，項目名稱)方法。
接着結合剛纔生成的關於數據集的參考參考獲得這個數據集，
```
dataset
```
使用client 提供的 get_dataset() 方法

# Construct a reference to the "hacker_news" dataset
dataset_ref = client.dataset("hacker_news", project="bigquery-public-data")

# API request - fetch the dataset
dataset = client.get_dataset(dataset_ref)

每個數據集本質上是一堆表文件。可以吧數據集當做電子表格文件（spreadsheet file）裏面有很多子表。

我們可以使用client提供的

list_tables（client.get_dataset(dataset_ref))

方法獲取一個關於所有table名稱的list，然後通過for循環對這個返回的list中的每一個元素的名字參數table_id進行打印，於是就看到了裏面所有的表格名稱。

# List all the tables in the "hacker_news" dataset
tables = list(client.list_tables(dataset))

# Print names of all tables in the dataset (there are four!)
for table in tables:  
    print(table.table_id)

*不能直接打印table裏面的東西，每一個表格此時並不是一個字符，而是一個table對象，需要通過.table_id來訪問“表名稱”屬性。

for _ in table:
    print(_)

輸出：
<google.cloud.bigquery.table.TableListItem object at 0x7fea1ec984a8>
<google.cloud.bigquery.table.TableListItem object at 0x7fea1ec98cf8>
<google.cloud.bigquery.table.TableListItem object at 0x7fea1ec98f98>
<google.cloud.bigquery.table.TableListItem object at 0x7fea1ec987f0>

for _ in table:
    print(_.table_id)


輸出：
comments
full
full_201510
stories

1.3：現在嘗試獲取數據集中的特定表格

和剛纔1.2 中我們獲取一個數據集的過程一樣，我們先要找dataset_ref t幫忙創建一個關於表格的參考（reference)，接着使用這個參考讓client幫忙獲取這個表格。

這裏需要獲取的表格的名稱是 full

# Construct a reference to the "full" table
table_ref = dataset_ref.table("full")
# API request - fetch the table
table = client.get_table(table_ref)

2. 表格架構（Table schema）

2.1 表格架構（Table Structure）又稱爲(Table Schema)

瞭解架構，可以便於我們拉(pull)出我們感興趣的數據。

爲何要提到schema的名字呢，因爲這也是接口的名字。

對於每一個table對象都有一個schema方法，這個方法可以告訴我們一些每一個feature的信息

# Print information on all the columns in the "full" table in the "hacker_news" dataset
table.schema


輸出：[SchemaField('title', 'STRING', 'NULLABLE', 'Story title', ()),
 SchemaField('url', 'STRING', 'NULLABLE', 'Story url', ()),
 SchemaField('text', 'STRING', 'NULLABLE', 'Story or comment text', ()),
 SchemaField('dead', 'BOOLEAN', 'NULLABLE', 'Is dead?', ()),
 SchemaField('by', 'STRING', 'NULLABLE', "The username of the item's author.", ()),
 SchemaField('score', 'INTEGER', 'NULLABLE', 'Story score', ()),
 SchemaField('time', 'INTEGER', 'NULLABLE', 'Unix time', ()),
 SchemaField('timestamp', 'TIMESTAMP', 'NULLABLE', 'Timestamp for the unix time', ()),
 SchemaField('type', 'STRING', 'NULLABLE', 'Type of details (comment, comment_ranking, poll, story, job, pollopt)', ()),
 SchemaField('id', 'INTEGER', 'NULLABLE', "The item's unique id.", ()),
 SchemaField('parent', 'INTEGER', 'NULLABLE', 'Parent comment ID', ()),
 SchemaField('descendants', 'INTEGER', 'NULLABLE', 'Number of story or poll descendants', ()),
 SchemaField('ranking', 'INTEGER', 'NULLABLE', 'Comment ranking', ()),
 SchemaField('deleted', 'BOOLEAN', 'NULLABLE', 'Is deleted?', ())]

所以扔出來的是這麼一些東西：

feature名稱	數據類型（field type）	模式（mode）	描述(description)
feature名稱	數據類型（field type）	模式（mode）	描述(description)
feature名稱	數據類型（field type）	模式（mode）	描述(description)
feature名稱	數據類型（field type）	模式（mode）	描述(description)
feature名稱	數據類型（field type）	模式（mode）	描述(description)
feature名稱	數據類型（field type）	模式（mode）	描述(description)
feature名稱	數據類型（field type）	模式（mode）	描述(description)
feature名稱	數據類型（field type）	模式（mode）	描述(description)

例如其中：

 SchemaField('parent', 'INTEGER', 'NULLABLE', 'Parent comment ID', ()),

代表的是：

feature名稱	數據類型（field type）	模式（mode）	描述(description)
parent	整數	可以爲空（NULL）	Parent comment ID

2.2 我們可以使用client提供的

client.list_rows(table對象，最大顯示行數).to_datafram

接口在notebook中展示一下簡要的表格內容

client.list_rows(table,max_results=3).to_dataframe()

title	url	text	dead	by	score	time	timestamp	type	id	parent	descendants	ranking	deleted
0	None	None	Easier said than done, power can be cut in an ...	None	vectorpush	None	1318202785	2011-10-09 23:26:25+00:00	comment	3091617	3091595	None	None	None
1	None	None	So that's the alternative to scholarly journal...	None	daniel-cussen	None	1335415184	2012-04-26 04:39:44+00:00	comment	3892462	3892310	None	None	None
2	None	None	That's not censorship, and as such is not...	None	isleyaardvark	None	1486658591	2017-02-09 16:43:11+00:00	comment	13608258	13607977 本行未完..........

當然這個方法也可以在鎖定行範圍的同時鎖定列的範圍。

client.list_rows(table, selected_fields=table.schema[:1], max_results=5).to_dataframe()

	title	url
0	None	None
1	None	None
2	None	None

Reference:

[1] https://www.kaggle.com/dansbecker/getting-started-with-sql-and-bigquery

Kaggle 數據庫-Note-1-基礎操作

目標:

1. BigQuery指令基礎 (project->dataset->table)

2. 表格架構（Table schema）

Ch3_2並查集抽象數據結構python 3

一個可視化B-tree的網站

Matlab驗證特徵值爲1的矩陣能對二維空間的點進行壓縮到一維空間

Ch3_3 使用QuickUnion 表示-並查集-抽象數據結構python 3

Ch3_1 使用Set表示-並查集-抽象數據結構python 3

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

Kaggle 數據庫-Note-1-基礎操作

目標:

1. BigQuery指令基礎 (project->dataset->table)

2. 表格架構 （Table schema）

2. 表格架構（Table schema）