此次博主爲大家帶來的是Hive項目實戰系列的第三部分，也是最終部分。

1. 統計視頻觀看數Top10

思路：使用order by按照views字段做一個全局排序即可，同時我們設置只顯示前10條。

最終代碼：

select 
    videoId, 
    uploader, 
    age, 
    category, 
    length, 
    views, 
    rate, 
    ratings, 
    comments 
from 
    video_orc
order by 
    views 
desc limit 
    10;

2. 統計視頻類別熱度Top10

思路：
1.即統計每個類別有多少個視頻，顯示出包含視頻最多的前10個類別。
2. 我們需要按照類別group by聚合，然後count組內的videoId個數即可。
3.因爲當前表結構爲：一個視頻對應一個或多個類別。所以如果要group by類別，需要先將類別進行列轉行(展開)，然後再進行count即可。
4.最後按照熱度排序，顯示前10條。

最終代碼：

select 
    category_name as category, 
    count(t1.videoId) as hot 
from (
    select 
        videoId,
        category_name 
    from 
        video_orc lateral view explode(category) t_catetory as category_name) t1 
group by 
    t1.category_name 
order by 
    hot 
desc limit 
    10;

3. 統計出視頻觀看數最高的20個視頻的所屬類別以及類別包含Top20視頻的個數

思路：
1.先找到觀看數最高的20個視頻所屬條目的所有信息，降序排列
2.把這20條信息中的category分裂出來(列轉行)
3.最後查詢視頻分類名稱和該分類下有多少個Top20的視頻

最終代碼：

select 
    category_name as category, 
    count(t2.videoId) as hot_with_views 
from (
    select 
        videoId, 
        category_name 
    from (
        select 
            * 
        from 
            video_orc 
        order by 
            views 
        desc limit 
            20) t1 lateral view explode(category) t_catetory as category_name) t2 
group by 
    category_name 
order by 
    hot_with_views 
desc;

4. 統計視頻觀看數Top50所關聯視頻的所屬類別排序

思路：
1.查詢出觀看數最多的前50個視頻的所有信息(當然包含了每個視頻對應的關聯視頻)，記爲臨時表t1
2.將找到的50條視頻信息的相關視頻relatedId列轉行，記爲臨時表t2
3. 將相關視頻的id和gulivideo_orc表進行inner join操作
4. 按照視頻類別進行分組，統計每組視頻個數，然後排行

1. 觀看數前50的視頻

select 
    * 
from 
    video_orc 
order by 
    views 
desc limit 
    50;

2. 將相關視頻的id進行列轉行操作

select 
    explode(relatedId) as videoId 
from 
	t1;

3. 得到兩列數據，一列是category，一列是之前查詢出來的相關視頻id

 (select 
    distinct(t2.videoId), 
    t3.category 
from 
    t2
inner join 
    video_orc t3 on t2.videoId = t3.videoId) t4 lateral view explode(category) t_catetory as category_name;

4. 按照視頻類別進行分組，統計每組視頻個數，然後排行

最終代碼

select 
    category_name as category, 
    count(t5.videoId) as hot 
from (
    select 
        videoId, 
        category_name 
    from (
        select 
            distinct(t2.videoId), 
            t3.category 
        from (
            select 
                explode(relatedId) as videoId 
            from (
                select 
                    * 
                from 
                    video_orc 
                order by 
                    views 
                desc limit 
                    50) t1) t2 
        inner join 
            video_orc t3 on t2.videoId = t3.videoId) t4 lateral view explode(category) t_catetory as category_name) t5
group by 
    category_name 
order by 
    hot 
desc;

5. 統計每個類別中的視頻熱度Top10，以Music爲例

思路：
1.要想統計Music類別中的視頻熱度Top10，需要先找到Music類別，那麼就需要將category展開，所以可以創建一張表用於存放categoryId展開的數據。
2. 向category展開的表中插入數據。
3. 統計對應類別（Music）中的視頻熱度。

最終代碼：

1. 創建表類別表：

create table gulivideo_category(
    videoId string, 
    uploader string, 
    age int, 
    categoryId string, 
    length int, 
    views int, 
    rate float, 
    ratings int, 
    comments int, 
    relatedId array<string>)
row format delimited 
fields terminated by "\t" 
collection items terminated by "&" 
stored as orc;

2. 向類別表中插入數據：

insert into table gulivideo_category  
    select 
        videoId,
        uploader,
        age,
        categoryId,
        length,
        views,
        rate,
        ratings,
        comments,
        relatedId 
    from 
        video_orc lateral view explode(category) catetory as categoryId;

3. 統計Music類別的Top10（也可以統計其他）

select 
    videoId, 
    categories,
    views
from 
    gulivideo_category 
where 
    categoryId = "Music" 
order by 
    views 
desc limit
    10;

6. 統計每個類別中視頻流量Top10，以Music爲例

思路：
1.創建視頻類別展開表（categoryId列轉行後的表）
2.按照ratings排序即可

最終代碼：

select 
    videoId,
    views,
    ratings 
from 
    gulivideo_category 
where 
    categoryId = "Music" 
order by 
    ratings 
desc limit 
    10;

7. 統計每個類別視頻觀看數Top10

思路：
1.先得到categoryId展開的表數據
2.子查詢按照categoryId進行分區，然後分區內排序，並生成遞增數字，該遞增數字這一列起名爲rank列
3.通過子查詢產生的臨時表，查詢rank值小於等於10的數據行即可。

最終代碼：

select 
    t1.* 
from (
    select 
        videoId,
        categoryId,
        views,
        row_number() over(partition by categoryId order by views desc) rank from gulivideo_category) t1 
where 
    rank <= 10;

好了，關於此次實戰的全部內容已經更新完畢了。

$\color{#FF0000}{看完就贊，養成習慣！！！}$ ^ _ ^ ❤️ ❤️ ❤️
碼字不易，大家的支持就是我堅持下去的動力。點贊後不要忘了關注我哦！

Hive項目實戰系列(3) | 業務分析

目錄

1. 統計視頻觀看數Top10

2. 統計視頻類別熱度Top10

3. 統計出視頻觀看數最高的20個視頻的所屬類別以及類別包含Top20視頻的個數

4. 統計視頻觀看數Top50所關聯視頻的所屬類別排序

5. 統計每個類別中的視頻熱度Top10，以Music爲例

6. 統計每個類別中視頻流量Top10，以Music爲例

7. 統計每個類別視頻觀看數Top10

Spark快速入門系列(4) | Spark環境搭建—standalone(1) 集羣的搭建

Spark快速入門系列(3) | 簡單一文了解Spark核心概念

Spark快速入門系列(2) | Spark 運行模式之Local本地模式

Spark快速入門系列(1) | 深入淺出，一文讓你瞭解什麼是Spark

scala快速入門系列(1) | scala的簡單介紹

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結