题目描述:
6.) Proving Afzal Wrong
We have detoured from the original aim of this question for long enough. Compare the popularity of dance music genres and pop music genres across the dataset using appropiate visualisation/s. Make the assumption that the popularity of a genre is defined by the average popularity column entry across all songs in the appropriate genres.
Hint/s
- Dance sub-genres can be considered: edm, dance pop, trap music, big room, brostep
- Pop sub-genres can be considered anything with
pop
in the name
原始数据集:
本题我卡在了如何找出Genre列含有’pop‘字段的行,如行0、行2、行3等。然后解决这个题还涉及一些python的常识性小tips,就记录一下。
搜索了众多python函数后,我还是没有找到可以一键替换的函数,看来只能遍历了。
import re #正则表达式的包
import pandas as pd
songData['dance_or_pop'] = '' # 新建一列来存储音乐类型
songData['dance_or_pop'].loc[(songData['Genre'] == 'edm')|(songData['Genre'] == 'brostep')|(songData['Genre'] == 'dance pop')|(songData['Genre'] == 'trap music')|(songData['Genre'] == 'big room')] = 'dance' # 根据题目给舞曲型赋值
pop_song = re.compile('.*pop.*') # 定义正则表达式,即任何含pop的字段
for i in songData['Genre']._stat_axis.values : # 根据行号遍历dataframe
item = songData.loc[i,'Genre'] # python中如何找到某特定单元格的内容
if (re.match(pop_song, item) != None): # python中函数返回为空是等于None
songData.loc[i, 'dance_or_pop'] = 'pop'
songData
完成后是这个样子: