統計視頻觀看數Top10
- 思路:使用order by按照 views 字段做一個全局排序,設置只顯示前10條即可
實現SQL:
select videoid,uploader,age,category,length,views,rate,ratings,comments
from guliyingyin_video_orc
order by views desc
limit 10;
![在這裏插入圖片描述](https://img-blog.csdnimg.cn/20200222225830179.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2xpYW5naGVjYWk1MjE3MTMxNA==,size_16,color_FFFFFF,t_70)
- 結果:
統計視頻類別熱度Top10
思路:炸開數組”視頻類別“字段,然後按照類別分組,最後按照熱度(視頻個數)排序。
因爲當前表結構爲:一個視頻對應一個或多個類別。所以如果要 group by 類別,需要先將類別進行列轉行(展開),然後再進行count即可
- 分析過程:
- 炸開類別,記爲臨時表t1
select videoId,catName
from guliyingyin_video_orc
lateral view explode(category) tb_category as catName; — t1 - 按照類別 group by 聚合,然後count組內的videoId個數,記爲臨時表t2
select catName,count(*) hot
from t1
group by catName; —t2 - 最後按照熱度排序,顯示前10條。
select catName, hot
from t2
order by hot desc
limit 10;
- 完整的SQL:
SELECT catName, hot FROM ( SELECT catName, count( * ) hot FROM ( SELECT videoId, catName FROM guliyingyin_video_orc lateral VIEW explode ( category ) tb_category AS catName ) t1 GROUP BY catName ) t2 ORDER BY hot DESC LIMIT 10;
將完整的SQL語句保存到guliyingying.sql文件中,並上傳到Linux。然後執行命令:
結果:
統計視頻觀看數Top20所屬類別
- 思路:
- 先找到觀看數最高的20個視頻,記爲臨時表t1
select videoid,views,category
from guliyingyin_video_orc
order by views desc
limit 20; ----t1 - 把這20條信息中的category分裂出來(列轉行),記爲臨時表t2
select videoid,catName
from t1
lateral view explode(category) tb_category as catName; — t2 - 去重
select distinct catName
from t2;
- 完整SQL
SELECT DISTINCT
catName
FROM
(
SELECT
videoid,
catName
FROM
( SELECT videoid, views, category FROM guliyingyin_video_orc ORDER BY views DESC LIMIT 20 ) t1 lateral VIEW explode ( category ) tb2_category AS catName
) t2;
將完整的SQL語句保存到guliyingying.sql文件中,並上傳到Linux。然後執行命令:
結果:
統計視頻觀看數Top50所關聯視頻的類別的Rank
思路分析:
思路:
- 查詢出觀看數最多的前50個視頻的所有信息(包含每個視頻對應的關聯視頻),記爲臨時表t1
select videoId, views, category, relatedId
from guliyingyin_video_orc
order by views desc
limit 50; ----t1 - 炸裂關聯視頻id:將找到的50條視頻信息的相關視頻的relatedId列轉行,記爲臨時表t2
select distinct videoId_name
from t1
lateral view explode(relatedId) tb_relatedId as videoId_name; ----t2 - 將關聯視頻的id和guliyingyin_video_orc表進行inner join操作,得到每個關聯視頻id的詳細數據,記爲臨時表t4
select *
from t2
inner join guliyingyin_video_orc t3
on t2.videoId_name=t3.videoId; ---- t4 - 炸裂關聯視頻的類別
select *
from t4
lateral view explode(category) tb_category as catName; ----t5 - 統計類別個數
select catName, count(*) hot
from t5
group by catName; ----t6 - 統計類別的熱度排名
select *
from t6
order by hot desc;
- 完整SQL:
SELECT
*
FROM
(
SELECT
catName,
count( * ) hot
FROM
(
SELECT
*
FROM
(
SELECT
*
FROM
(
SELECT DISTINCT
videoId_name
FROM
( SELECT videoId, views, category, relatedId FROM guliyingyin_video_orc ORDER BY views DESC LIMIT 50 ) t1 lateral VIEW explode ( relatedId ) tb_relatedId AS videoId_name
) t2
INNER JOIN guliyingyin_video_orc t3 ON t2.videoId_name = t3.videoId
) t4 lateral VIEW explode ( category ) tb_category AS catName
) t5
GROUP BY
catName
) t6
ORDER BY
hot DESC;
將完整的SQL語句保存到guliyingying.sql文件中,並上傳到Linux。然後執行命令:
結果:
統計每個類別中的視頻熱度Top10,以Music爲例
思路:要想統計Music類別中的視頻熱度Top10,需要先找到Music類別,那麼就需要將category炸開。
-
創建一個臨時表用於存放categoryId炸開的數據
create table guliyingyin_category_orc( videoId string, uploader string, age int, categoryId string, length int, views int, rate float, ratings int, comments int, relatedId array<string> ) row format delimited fields terminated by "\t" collection items terminated by "&" stored as orc;
-
向category展開的表中插入數據。
insert overwrite table guliyingyin_category_orc select videoid,uploader,age,categoryId,length,views,rate,ratings,comments,relatedId from guliyingyin_video_orc lateral view explode(category) tb_category as categoryId;
-
統計對應類別(Music)中的視頻熱度。
select videoId,views from guliyingyin_category_orc where categoryId = "Music" order by views desc limit 10;
結果:
統計每個類別中視頻流量Top10,以Music爲例
select videoId,ratings
from guliyingyin_category_orc
where categoryId = "Music"
order by ratings desc
limit 10;
結果:
統計上傳視頻最多的用戶Top10以及他們上傳的觀看次數在前20的視頻
- 分析
-
上傳視頻最多的用戶Top10,記爲表t1
select uploader from guliyingyin_user_orc order by videos desc limit 10; -----t1
-
觀看次數在前20的視頻
select t1.uploader,t2.videoId,views from t1 inner join guliyingyin_video_orc t2 on t1.uploader= t2.uploader order by views desc limit 20;
- 完整SQL語句
SELECT
t1.uploader,
t2.videoId,
views
FROM
( SELECT uploader FROM guliyingyin_user_orc ORDER BY videos DESC LIMIT 10 ) t1
INNER JOIN guliyingyin_video_orc t2 ON t1.uploader = t2.uploader
ORDER BY
views DESC
LIMIT 20;
- 結果
統計每個類別視頻觀看數Top10
- 思路:
-
先得到categoryId展開的中間表
-
子查詢按照categoryId進行分區,然後分區內排序,並生成遞增數字,該遞增數字這一列起名爲rank列
select videoid,categoryid,views, row_number() over(partition by categoryid order by views desc) rank234 from guliyingyin_category_orc;
-
通過子查詢產生的臨時表,查詢rank值小於等於10的數據行即可
select * from t1 where rank <= 10;
- 完全SQL語句:
SELECT * FROM ( SELECT videoid, categoryid, views, row_number ( ) over ( PARTITION BY categoryid ORDER BY views DESC ) rank FROM guliyingyin_category_orc ) t1 WHERE rank234 <= 10;
- 結果