防止忘記,在此做個記錄
目錄:
- 讀取csv
- 篩選
- 排序
1.讀取csv
import pandas as pd
file = pd.read_csv(r'D:\projects\PycharmProjects\final_wangwei\final_news_all.csv', usecols=['entity_id', 'post_title','publish_year','publish_month'])
2.根據列值篩選
news=file[((file['publish_year']==2018) & (file['publish_month']>4))|((file['publish_year']==2019) & (file['publish_month']<5))]
3.根據某列值排序(升序)
news=news.sort_values('publish_month',ascending=True)
4.對於pandas.core.frame.DataFrame提取某列,並轉換爲list
news['entity_id'].values.tolist()
5.根據某列統計
news['publish_month'].value_counts()
6.讀取txt,concat
df_empty = pd.DataFrame(columns=['doc'])
data1=pd.read_csv('linshi/5079161.txt',names=["doc"])
df=pd.concat([df_empty,data1,data2,data3,data4],axis=0) #縱向
7.獲取當前時間
import time
print(time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time())))
8.按行創建dataframe
sdp=pd.DataFrame.from_items([('months',months),('shoucangs',shoucangs),('dianzans',dianzans),('pingluns',pingluns)])