用戶消費行爲分析(回購率和復購率)
相關業務需求如下:
1-統計不同月份的下單人數
2-統計用戶三月份的回購率和復購率
3-統計男女的消費頻次是否有差異
4-統計多次消費的用戶,第一次和最後一次消費時間的間隔
5-統計不同年齡段的用戶消費金額是否有差異
1. 瞭解表結構
(1)用戶信息表(user_info)
用戶信息表主要由三個字段構成,用戶ID、用戶性別和用戶出生日期。
(2)訂單信息表(order_info)
訂單信息表是由用戶ID、訂單ID、是否支付、價格和支付日期構成。
2. 解決相關業務問題
1-統計不同月份的下單人數
解題思路:(1)不同月份可以使用month或者date_format來轉換,因爲我想知道是哪年的哪個月份,所以我直接使用date_format()來轉換;(2)下單人數,這裏強調了是“下單”和“人數”,“下單”則是限定ispaid=已支付,“人數”則使用count計數就可以了。參考答案如下:
SELECT DATE_FORMAT(paidtime,'%Y-%m') as '月份',COUNT(DISTINCT orderid) as '下單人數'
FROM order_info
WHERE ispaid='已支付'
GROUP BY DATE_FORMAT(paidtime,'%Y-%m');
查詢結果如下:
2-統計用戶三月份的回購率和復購率
(1)用戶三月份的回購率
回購率是指用戶在三月份購買的用戶,在四月份依舊購買
# 第一種寫法
SELECT substring(a.paidtime,1,7) '年月',COUNT(DISTINCT a.userid) AS
'本月消費的用戶數',COUNT(DISTINCT b.userid) AS'本月回購的用戶數'
FROM
(SELECT userid,paidtime
FROM order_info
WHERE ispaid='已支付'
GROUP BY userid,paidtime) AS a
LEFT JOIN
(SELECT userid,paidtime
FROM order_info
WHERE ispaid='已支付'
GROUP BY userid,paidtime) AS b on a.userid = b.userid and SUBSTRING(a.paidtime,6,2) =SUBSTRING(b.paidtime,6,2)-1
GROUP BY SUBSTRING(a.paidtime,1,7)
# 第二種寫法 僅針對三、四月份
SELECT COUNT(a.ct) as '三月份購買的總人數',COUNT(b.userid) as '四月份繼續回購的用戶'
FROM
(SELECT userid,COUNT(userid) as ct
FROM order_info
WHERE SUBSTRING(paidtime,6,2)='03' and ispaid='已支付'
GROUP BY userid) as a
LEFT JOIN
(SELECT userid
FROM order_info
WHERE SUBSTRING(paidtime,6,2)='04' and ispaid='已支付'
GROUP BY userid) as b on a.userid=b.userid;
查詢結果:
(2)用戶在三月份的復購率
復購率(在這個月購買次數在兩次以上的用戶)
解題思路:我這裏是先求出三月份購買的總人數,然後用0來佔位,接下來求出三月份多次購買的用戶數,用0來佔位,最後使用union all來連接
SELECT COUNT(userid) AS '三月份購買的總人數',0 AS '三月份多次購買的用戶數'
FROM order_info
WHERE DATE_FORMAT(paidtime,'%Y-%m')='2016-03' AND ispaid='已支付'
UNION ALL
SELECT 0 AS '三月份購買的總人數',COUNT(userid) AS '三月份多次購買的用戶數'
FROM order_info
WHERE DATE_FORMAT(paidtime,'%Y-%m')='2016-03' AND ispaid='已支付'
GROUP BY userid
HAVING COUNT(userid)>1
查詢結果如下:
3-統計男女的消費頻次是否有差異
SELECT a.sex,COUNT(b.userid) as '消費總數'
FROM
(SELECT userid,sex
FROM user_info) as a
INNER JOIN
(SELECT userid
FROM order_info
WHERE ispaid='已支付'
GROUP BY userid) as b on a.userid = b.userid
WHERE a.sex is not NULL
GROUP BY a.sex;
查詢結果如下:
4-統計多次消費的用戶,第一次和最後一次消費時間的間隔
select a.userid, datediff(a.max_paidtime,a.min_paidtime) as '時間間隔'
from
(select userid,count(orderid) as ct,max(paidtime) as max_paidtime,min(paidtime) as min_paidtime
from order_info
where ispaid='已支付'
group by userid) as a
group by a.userid
having count(a.ct)>=2;
這部分思路我覺得沒有問題,但是結果爲空,可能是跟數據集有關。
5-統計不同年齡段的用戶消費金額是否有差異
解題思路:不同年齡段可以根據具體的業務需求進行劃分,我這裏是隨意劃分爲三個年齡段。
SELECT b.age as '年齡段',sum(a.price) as '用戶消費金額'
FROM
(SELECT userid,sum(price) as price
from order_info
WHERE ispaid='已支付'
GROUP BY userid) as a
INNER JOIN
(SELECT userid,
case when YEAR(now())-YEAR(birthday)>=0 and YEAR(now())-YEAR(birthday)<18 THEN '0-17歲'
WHEN YEAR(now())-YEAR(birthday)>=18 and YEAR(now())-YEAR(birthday)<30 THEN '18-30歲'
else '30歲以上' end as age
FROM user_info
WHERE birthday>'1901-00-00') as b on a.userid = b.userid
GROUP BY b.age
ORDER BY b.age;
查詢結果如下:
從結果上可以看出,用戶消費金額最多的是處於18-30歲之間,而0-17歲的用戶消費的金額最少。
6-統計消費的二八法則,消費的top20%用戶,貢獻了多少額度
解題思路:這裏我使用了窗口函數,使用ntile()進行切片,20%可以分爲五份,消費的top20%即爲第一份。
SELECT SUM(price) as '消費的top20%用戶消費的金額'
FROM
(SELECT userid,SUM(price) as price,
ntile(5) over(ORDER BY SUM(price) desc) as rank_
FROM order_info
WHERE ispaid='已支付'
GROUP BY userid) as a
WHERE a.rank_=1
ORDER BY a.userid;
查詢結果如下:
總結:其實SQL有很多種解題思路,我的可能存在些疏漏和不足,可能也有些可能存在錯誤,也希望大家能幫我指出來,大家一起學習!