穀粒影音:業務處理

統計視頻觀看數Top10

  • 思路:使用order by按照 views 字段做一個全局排序,設置只顯示前10條即可
    實現SQL:
select videoid,uploader,age,category,length,views,rate,ratings,comments
from guliyingyin_video_orc
order by views desc
limit 10;
![在這裏插入圖片描述](https://img-blog.csdnimg.cn/20200222225830179.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2xpYW5naGVjYWk1MjE3MTMxNA==,size_16,color_FFFFFF,t_70)
  • 結果:
    在這裏插入圖片描述

統計視頻類別熱度Top10

思路:炸開數組”視頻類別“字段,然後按照類別分組,最後按照熱度(視頻個數)排序。

因爲當前表結構爲:一個視頻對應一個或多個類別。所以如果要 group by 類別,需要先將類別進行列轉行(展開),然後再進行count即可

  • 分析過程:
  1. 炸開類別,記爲臨時表t1

    select videoId,catName
    from guliyingyin_video_orc
    lateral view explode(category) tb_category as catName; — t1

  2. 按照類別 group by 聚合,然後count組內的videoId個數,記爲臨時表t2

    select catName,count(*) hot
    from t1
    group by catName; —t2

  3. 最後按照熱度排序,顯示前10條。

    select catName, hot
    from t2
    order by hot desc
    limit 10;

  • 完整的SQL:
    SELECT
    	catName,
    	hot 
    FROM
    	(
    	SELECT
    		catName,
    		count( * ) hot 
    	FROM
    		( SELECT videoId, catName FROM guliyingyin_video_orc lateral VIEW explode ( category ) tb_category AS catName ) t1 
    	GROUP BY
    		catName 
    	) t2 
    ORDER BY
    	hot DESC 
    	LIMIT 10;
    

將完整的SQL語句保存到guliyingying.sql文件中,並上傳到Linux。然後執行命令:
在這裏插入圖片描述
結果:
在這裏插入圖片描述

統計視頻觀看數Top20所屬類別

  • 思路:
  1. 先找到觀看數最高的20個視頻,記爲臨時表t1

    select videoid,views,category
    from guliyingyin_video_orc
    order by views desc
    limit 20; ----t1

  2. 把這20條信息中的category分裂出來(列轉行),記爲臨時表t2

    select videoid,catName
    from t1
    lateral view explode(category) tb_category as catName; — t2

  3. 去重

    select distinct catName
    from t2;

  • 完整SQL
SELECT DISTINCT
	catName 
FROM
	(
	SELECT
		videoid,
		catName 
	FROM
	( SELECT videoid, views, category FROM guliyingyin_video_orc ORDER BY views DESC LIMIT 20 ) t1 lateral VIEW explode ( category ) tb2_category AS catName 
	) t2;

將完整的SQL語句保存到guliyingying.sql文件中,並上傳到Linux。然後執行命令:
在這裏插入圖片描述
結果:
在這裏插入圖片描述

統計視頻觀看數Top50所關聯視頻的類別的Rank

思路分析:
在這裏插入圖片描述
思路:

  1. 查詢出觀看數最多的前50個視頻的所有信息(包含每個視頻對應的關聯視頻),記爲臨時表t1

    select videoId, views, category, relatedId
    from guliyingyin_video_orc
    order by views desc
    limit 50; ----t1

  2. 炸裂關聯視頻id:將找到的50條視頻信息的相關視頻的relatedId列轉行,記爲臨時表t2

    select distinct videoId_name
    from t1
    lateral view explode(relatedId) tb_relatedId as videoId_name; ----t2

  3. 將關聯視頻的id和guliyingyin_video_orc表進行inner join操作,得到每個關聯視頻id的詳細數據,記爲臨時表t4

    select *
    from t2
    inner join guliyingyin_video_orc t3
    on t2.videoId_name=t3.videoId; ---- t4

  4. 炸裂關聯視頻的類別

    select *
    from t4
    lateral view explode(category) tb_category as catName; ----t5

  5. 統計類別個數

    select catName, count(*) hot
    from t5
    group by catName; ----t6

  6. 統計類別的熱度排名

    select *
    from t6
    order by hot desc;

  • 完整SQL:
SELECT
	* 
FROM
	(
	SELECT
		catName,
		count( * ) hot 
	FROM
		(
		SELECT
			* 
		FROM
			(
			SELECT
				* 
			FROM
				(
				SELECT DISTINCT
					videoId_name 
				FROM
					( SELECT videoId, views, category, relatedId FROM guliyingyin_video_orc ORDER BY views DESC LIMIT 50 ) t1 lateral VIEW explode ( relatedId ) tb_relatedId AS videoId_name 
				) t2
				INNER JOIN guliyingyin_video_orc t3 ON t2.videoId_name = t3.videoId 
			) t4 lateral VIEW explode ( category ) tb_category AS catName 
		) t5 
	GROUP BY
		catName 
	) t6 
ORDER BY
	hot DESC;

將完整的SQL語句保存到guliyingying.sql文件中,並上傳到Linux。然後執行命令:
在這裏插入圖片描述
結果:
在這裏插入圖片描述


統計每個類別中的視頻熱度Top10,以Music爲例

思路:要想統計Music類別中的視頻熱度Top10,需要先找到Music類別,那麼就需要將category炸開。

  1. 創建一個臨時表用於存放categoryId炸開的數據

    create table guliyingyin_category_orc(
    videoId string, 
    uploader string, 
    age int, 
    categoryId string, 
    length int, 
    views int, 
    rate float, 
    ratings int, 
    comments int,
    relatedId array<string>
    )
    row format delimited 
    fields terminated by "\t"
    collection items terminated by "&"
    stored as orc;
    

    在這裏插入圖片描述

  2. 向category展開的表中插入數據。

    insert overwrite table guliyingyin_category_orc
    select  videoid,uploader,age,categoryId,length,views,rate,ratings,comments,relatedId
    from guliyingyin_video_orc
    lateral view explode(category) tb_category as categoryId;
    

    在這裏插入圖片描述

  3. 統計對應類別(Music)中的視頻熱度。

    select  videoId,views
    from guliyingyin_category_orc
    where categoryId = "Music"
    order by views desc
    limit 10;
    

    在這裏插入圖片描述
    結果:
    在這裏插入圖片描述

統計每個類別中視頻流量Top10,以Music爲例

select  videoId,ratings
from guliyingyin_category_orc
where categoryId = "Music"
order by ratings desc
limit 10;

在這裏插入圖片描述
結果:

在這裏插入圖片描述

統計上傳視頻最多的用戶Top10以及他們上傳的觀看次數在前20的視頻

  • 分析
  1. 上傳視頻最多的用戶Top10,記爲表t1

    select uploader
    from guliyingyin_user_orc
    order by videos desc
    limit 10; -----t1
    
  2. 觀看次數在前20的視頻

    select t1.uploader,t2.videoId,views
    from t1 inner join guliyingyin_video_orc t2
    on t1.uploader= t2.uploader 
    order by views desc
    limit 20;
    
  • 完整SQL語句
SELECT
	t1.uploader,
	t2.videoId,
	views 
FROM
	( SELECT uploader FROM guliyingyin_user_orc ORDER BY videos DESC LIMIT 10 ) t1
	INNER JOIN guliyingyin_video_orc t2 ON t1.uploader = t2.uploader 
ORDER BY
	views DESC 
	LIMIT 20;
  • 結果
    在這裏插入圖片描述

統計每個類別視頻觀看數Top10

  • 思路:
  1. 先得到categoryId展開的中間表

  2. 子查詢按照categoryId進行分區,然後分區內排序,並生成遞增數字,該遞增數字這一列起名爲rank列

    select videoid,categoryid,views,
    row_number() over(partition by categoryid order by views desc) rank234
    from guliyingyin_category_orc;
    
  3. 通過子查詢產生的臨時表,查詢rank值小於等於10的數據行即可

    select * 
    from t1
    where rank <= 10;
    
  • 完全SQL語句:
    SELECT * 
    FROM
    	( SELECT videoid, categoryid, views, row_number ( ) over ( PARTITION BY categoryid ORDER BY views DESC ) rank FROM guliyingyin_category_orc ) t1 
    WHERE
    	rank234 <= 10;
    
  • 結果
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章