大家好,我又回來了。含辛茹苦寫的代碼分享給大家。。。pandas方法很多,記住主要的就迎刃而解了
import numpy as np import pandas as pd # 1. 利用字典 data 和列表 labels 完成以下操作 data = {'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],'priority': ['yes', np.nan, 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']} labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'] # (1) 創建 DataFrame 類型 df,效果如下圖, df =pd.DataFrame(data, index = labels) # df1=df.priority.fillna('yes') # df['priority']=df['priority'].replace('NaN','yes')#不同數據類型的值不能隨便替換? df.iloc[1,3]='yes'#替換成功 print(df) # (2)輸出 df 的前三行,並選擇所有 visits 屬性值大於 2 的所有行# df1=df[0:3][:] # print(df1) # (3)輸出 df 缺失值所在的行,輸出'age'與'animal'兩列數據 # df3=df[df.isnull().values ==True] # print(df3) # df1=df.iloc[0:2,:] # print(df1) # print(df.where('age','animal')) # (4) 輸出 animal==cat 且 age<3 的所有行,並將行爲”f”列爲”age”的元 # 素值修改爲 1.5 # df.iloc[5,1]=1.5 # df5= df[df.animal=='cat'] # df55=df5[df5.age<3] # print(df55) # (5)計算 animal 列所有取值的出現的次數loczifu # n=df.iloc[:,0].value_counts() # print(n) # (6)將 animal 列中所有 snake 替換爲 tangyudi # df['animal'] = df['animal'].replace('snake', 'tangyudi') # (7)對 df 按列 anaomal 進行排序 # print(df.sort_values(by='animal')) # print(df.sort_index(axis=1))#按列的索引排序 # (8)在 df 的在後一列後添加一列列名爲 No.數據 0,1,2,3,4,5,6,7,8,9 # num = pd.Series([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], index=df.index) # # print(num) # # df['No.'] = num # # df['animal'] = df['animal'].str.upper() # # print(df) # (9)對 df 中的'visits'列求平均值以及乘積、和 # print(df) # age1=df.visits.mean() # avg= df.visits.sum() # j =df.visits.prod()#cumprod()和cumsum()不一樣,累計的話輸出每一項運算的結果 # print(age1) # print(avg) # print(j) # (10)將 anaomal 對應的列中所有字符串字母變爲大寫 # df2= df.animal.str.capitalize() # print(df2) # (11)利用淺複製方式創建 df 的副本 df2 並將其所有缺失值填充爲 3 # df2 = df.copy() # df2.fillna(value=3) # print(df2) # (12)利用淺複製方式創建 df 的副本 df3 並將其刪除缺失值所在的行 # df3 = df.copy() # df3.dropna(how='any') # print(df3) # (13)將 df 寫入 animal.csv 文件 # df.to_csv('animal.csv',mode='w+',encoding="utf_8_sig") # 2.讀取文件“haberman-kmes.dat”生成名爲 df 的 DataFrame,並進行 # 如下操作: # import csv # df = pd.read_csv('haberman-kmes.dat',header=None,encoding='utf-8',delimiter="\t",quoting=csv.QUOTE_NONE) # print(df) # 18 # 數據分析編程基礎實驗教程 # 19 # (1) 列名爲“Class”中取值分別將“negative”和“positive”替換爲數字 0 和 1,並統計 0 和 1 各自出現的頻數; # (2) 創建df的副本df2,其中df2爲除了df最後一列之外的所有列; (3) 將 df2 的每一列數據進行歸一化處理,即 # x − 𝑥𝑚𝑖𝑛 # 𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛 # 其中 x 爲列中的任一數據,𝑥𝑚𝑖𝑛,𝑥𝑚𝑎𝑥分比爲列中所有數據的最 # 大值和最小值; # (4)計算 df2 行(樣本或觀測值)與行(樣本或觀測值)之間的歐 # 式距離,並組成新的歐式距離數組 df3。 # (5)將 df3 中所有的行中的數據從小到大的順序進行排序 # 3. 統計下文中每個單詞出現的次數,並利用餅圖其中出現次數最多 # 的前五個單詞。 # text ='''Hooray! It's snowing! It's time to make a snowman.James runs out. He # makes a big pile of snow. He puts a big snowball on top. He adds a # scarf and a hat. He adds an orange for the nose. He adds coal for the # eyes and buttons.In the evening, James opens the door. What does he # see? The snowman is moving! James invites him in. The snowman has # never been inside a house. He says hello to the cat. He plays with # paper towels.A moment later, the snowman takes James's hand and # goes out.They go up, up, up into the air ! They are flying ! What a # wonderful night!The next morning, James jumps out of bed. He runs # to the door.He wants to thank the snowman. But he's gone.''' # text=text.replace(',','').replace('.','').replace('!','') # text=text.split() # print(text) # setword=set(text) # for i in setword: # count=text.count(i) # print(i,'出現次數:',count)