題目描述:
6.) Proving Afzal Wrong
We have detoured from the original aim of this question for long enough. Compare the popularity of dance music genres and pop music genres across the dataset using appropiate visualisation/s. Make the assumption that the popularity of a genre is defined by the average popularity column entry across all songs in the appropriate genres.
Hint/s
- Dance sub-genres can be considered: edm, dance pop, trap music, big room, brostep
- Pop sub-genres can be considered anything with
pop
in the name
原始數據集:
本題我卡在瞭如何找出Genre列含有’pop‘字段的行,如行0、行2、行3等。然後解決這個題還涉及一些python的常識性小tips,就記錄一下。
搜索了衆多python函數後,我還是沒有找到可以一鍵替換的函數,看來只能遍歷了。
import re #正則表達式的包
import pandas as pd
songData['dance_or_pop'] = '' # 新建一列來存儲音樂類型
songData['dance_or_pop'].loc[(songData['Genre'] == 'edm')|(songData['Genre'] == 'brostep')|(songData['Genre'] == 'dance pop')|(songData['Genre'] == 'trap music')|(songData['Genre'] == 'big room')] = 'dance' # 根據題目給舞曲型賦值
pop_song = re.compile('.*pop.*') # 定義正則表達式,即任何含pop的字段
for i in songData['Genre']._stat_axis.values : # 根據行號遍歷dataframe
item = songData.loc[i,'Genre'] # python中如何找到某特定單元格的內容
if (re.match(pop_song, item) != None): # python中函數返回爲空是等於None
songData.loc[i, 'dance_or_pop'] = 'pop'
songData
完成後是這個樣子: