手擼碼驗證以上結果。
1. 生成1000萬數據。
import numpy as np
N=10000000
with open('data.txt','w') as data:
for _ in range(N):
data.write(str(10*np.random.random())+',')
2. 以txt格式讀取,並轉換一份npy和csv存儲。
import numpy as np
import time
import pandas as pd
start=time.time()
with open('data.txt','r') as data:
string_data=data.read()
list_data=string_data.split(',')
list_data.pop()
end=time.time()
data_array=np.array(list_data,dtype=float).reshape(10000,1000)
print('### 10 million points of data ###')
print('\nData summary:\n',data_array)
print('\nData shape:\n',data_array.shape)
print(f'\nTime to read:{round(end-start,5)} seconds.')
np.save('data.npy',data_array)
data=pd.DataFrame(data_array)
data.to_csv('data.csv',index=None)
3. 讀取npy和csv時間。
import numpy as np
import time
import pandas as pd
start_npy=time.time()
data=np.load('data.npy')
end_npy=time.time()
start_csv=time.time()
data_array=pd.read_csv('data.csv')
end_csv=time.time()
print(f'time to read npy:{round(end_npy-start_npy,5)} seconds')
print(f'time to read csv:{round(end_csv-start_csv,5)} seconds')
結果顯示 讀取時間 csv > txt > npy
與國外網友的有出入。