tensorflow中的數據對象Dataset

注：學習中遇到了tensorflow中的Dataset，在此記錄相關內容（僅自己遇到）

問題描述： 使用pandas讀取了excel表格中的數據，提取其中的幾列後將其轉換爲numpy數組。在此數組基礎上使用了tf.data.Dataset.from_tensor_slices()及shuffle()、batch()和make_one_shot_iterator()方法。以下爲代碼實例。

代碼片段：

import pandas as pd
import numpy as np

# 讀取excel表格中數據
df = pd.read_excel('./test.xlsx')
x = np.array(df[['band1', 'band2', 'band3', 'band4', 'band5', 'band6', 'band7']])
# 輸出x
print(x)

輸出結果爲：

[[ 423  332  643  460 2909 1973  895]
 [ 395  309  617  452 2863 1997  908]
 [ 374  291  599  448 2823 2013  919]
 [ 394  304  612  465 2820 2042  943]
 [ 393  304  613  466 2814 2048  951]
 [ 399  311  621  469 2826 2049  955]
 [ 395  311  622  467 2816 2029  953]
 [ 398  316  629  473 2798 2002  956]
 [ 351  293  617  477 2712 1965  976]
 [ 268  250  595  489 2561 1924 1021]]

切片:

# 注意此處被切片的數據的第一個維度值必須相同，例中都爲10
db_train = tf.data.Dataset.from_tensor_slices((x, [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]))
print(db_train)

輸出結果爲：

'''
輸出結果中顯示db_train是一個Dataset實例。
(7,)表示變量x按照第一個維度值——10切分後，單個切片是秩爲7的數組
()表示原本秩爲10的數組按照第一個維度值——10切分後，單個切片變是標量
'''
<DatasetV1Adapter shapes: ((7,), ()), types: (tf.int64, tf.int32)>

查看每一個切片：

# 創建迭代器，枚舉此數據集元素
iterator = db_train.make_one_shot_iterator()
# 獲取下一次迭代值
ele = iterator.get_next()
# 創建會話
with tf.Session() as sess:
	for i in range(10):
		(x, y) = sess.run(ele)
		print(x)
		print(y)

輸出爲：

[ 423  332  643  460 2909 1973  895]
1
-----
[ 395  309  617  452 2863 1997  908]
2
-----
[ 374  291  599  448 2823 2013  919]
3
-----
[ 394  304  612  465 2820 2042  943]
4
-----
[ 393  304  613  466 2814 2048  951]
5
-----
[ 399  311  621  469 2826 2049  955]
6
-----
[ 395  311  622  467 2816 2029  953]
7
-----
[ 398  316  629  473 2798 2002  956]
8
-----
[ 351  293  617  477 2712 1965  976]
9
-----
[ 268  250  595  489 2561 1924 1021]
0
-----

查看每一個切片前，可以對db_train做shuffle()或batch()操作，如下：

# 打亂元素順序，按照新順序分批
# 打亂切片順序，參數buffer_size表示從現有dataset中採樣到固定數量的元素到維護的buffer中，並從buffer中隨機選出一個元素
db_train = db_train.shuffle(buffer_size=10) 
db_train = db_train.batch(2) # 分批，參數爲2表示在元素集中將2個連續元素分爲1個批次
# 逐個輸出
iterator = db_train.make_one_shot_iterator()
ele = iterator.get_next()
with tf.Session() as sess:
    for i in range(5):
        (x, y) = sess.run(ele)
        print(x)
        print(y)
        print('-----')

輸出爲：

[[ 423  332  643  460 2909 1973  895]
 [ 394  304  612  465 2820 2042  943]]
[1 4]
-----
[[ 395  309  617  452 2863 1997  908]
 [ 374  291  599  448 2823 2013  919]]
[2 3]
-----
[[ 395  311  622  467 2816 2029  953]
 [ 398  316  629  473 2798 2002  956]]
[7 8]
-----
[[ 393  304  613  466 2814 2048  951]
 [ 399  311  621  469 2826 2049  955]]
[5 6]
-----
[[ 351  293  617  477 2712 1965  976]
 [ 268  250  595  489 2561 1924 1021]]
[9 0]
-----

可以看到上面輸出，總體順序是打亂的，且按照打亂後的順序，每兩個元素分爲1個批。

完整代碼：

import pandas as pd
import numpy as np

# 讀取excel表格中數據
df = pd.read_excel('./test.xlsx')
x = np.array(df[['band1', 'band2', 'band3', 'band4', 'band5', 'band6', 'band7']])
# 輸出x
print(x)

# 注意此處被切片的數據的第一個維度值必須相同，例中都爲10
db_train = tf.data.Dataset.from_tensor_slices((x, [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]))
print(db_train)

# 創建迭代器，枚舉此數據集元素
iterator = db_train.make_one_shot_iterator()
# 獲取下一次迭代值
ele = iterator.get_next()
# 創建會話
with tf.Session() as sess:
	for i in range(10):
		(x, y) = sess.run(ele)
		print(x)
		print(y)

# 打亂元素順序，按照新順序分批
# 打亂切片順序，參數buffer_size表示從現有dataset中採樣到固定數量的元素到維護的buffer中，並從buffer中隨機選出一個元素
db_train = db_train.shuffle(buffer_size=10) 
db_train = db_train.batch(2) # 分批，參數爲2表示在元素集中將2個連續元素分爲1個批次
# 逐個輸出
iterator = db_train.make_one_shot_iterator()
ele = iterator.get_next()
with tf.Session() as sess:
    for i in range(5):
        (x, y) = sess.run(ele)
        print(x)
        print(y)
        print('-----')

遇到再補充…

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

tensorflow中的數據對象Dataset

注：學習中遇到了tensorflow中的Dataset，在此記錄相關內容（僅自己遇到）

.NET有哪些好用的定時任務調度框架

Python 將PDF轉爲PDF/A、PDF/X，以及PDF/A轉回PDF

elk3

Kafka存儲機制

aws語音呼叫調用，告警電話

深度學習框架火焰圖pprof和CUDA Nsys配置指南

爬蟲兩種繞過5s盾的方法

【轉】[C#] WebAPI 防止併發調用二（冥等性）

【轉】[SQL Server]關掉 SSMS 的 IntelliSense

號稱能打敗MLP的KAN到底行不行？數學核心原理全面解析

sychronized關鍵字學習筆記

AQS框架學習

tensorflow中的數據對象Dataset

Cannot create PoolalbeConnectionFactory(Could not create connection to database server)異常

PointNet-環境搭建：win10、cuda10.1、cudnn7、python3.7、tensorflow-gpu1.13 （詳細）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結