这几天洗数据的时候碰到一个问题,原数据里面的year和month还有Day是分开的,现在要将这两列聚合成一列
import pandas as pd
import numpy as np
df_a = pd.DataFrame([[2019, 4, 1], [2019, 10, 13]], columns=['Year', 'Month', 'Day'])
先新建一个测试用的dataframe
然后上处理代码
# 先处理月份
df_m_s = df_a[df_a['Month'] < 10].copy()
df_m_b = df_a[df_a['Month'] >= 10].copy()
df_m_s['YearMonth'] = df_m_s['Year'].astype(str) + ('0' + df_m_s['Month'].astype(str))
df_m_b['YearMonth'] = df_m_b['Year'].astype(str) + df_m_b['Month'].astype(str)
df_m = pd.concat([df_m_s, df_m_b])
if 'Day' in df_a.columns:
# 再处理日
df_d_s = df_m[df_m['Day'] < 10].copy()
df_d_b = df_m[df_m['Day'] > 10].copy()
df_d_s['YearMonthDay'] = df_d_s['YearMonth'].astype(str) + ('0' + df_d_s['Day'].astype(str))
df_d_b['YearMonthDay'] = df_d_b['YearMonth'].astype(str) + df_d_b['Day'].astype(str)
df_d = pd.concat([df_d_s, df_d_b])
结果: