场景举例

北京市学生成绩分析

成绩的数据格式

exercise5_1.txt 文件中的每一行就是一个学生的成绩信息。字段之间的分隔符是","

时间,学校,年纪,姓名,科目,成绩

样例数据

2013,北大,1,黄渤,语文,97
2013,北大,1,徐峥,语文,52
2013,北大,1,刘德华,语文,85
2012,清华,0,马云,英语,61
2015,北理工,3,李彦宏,物理,81
2016,北科,4,马化腾,化学,92
2014,北航,2,刘强东,数学,70
2012,清华,0,刘诗诗,英语,59
2014,北航,2,刘亦菲,数学,49
2014,北航,2,刘嘉玲,数学,77

建表导数据

create database if not exists exercise;
use exercise;
drop table if exists exercise5_1;
create table exercise5_1(year int, school string, grade int, name string, course string, score int) row format delimited fields terminated by ',';
load data local inpath "/home/hadoop/exercise5_1.txt" into table exercise5_1;
select * from exercise5_1;
desc exercise5_1;

需求

1、分组TopN，选出今年每个学校、每个年级、分数前三的科目

select t.* 
from 
(select school, grade, course, score,
row_number() over (partition by school, grade, course order by score desc) rank_code 
from exercise5_1 
where year = "2017"
) t
where t.rank_code <= 3;

详解如下：
row_number函数：row_number() 按指定的列进行分组生成行序列，从 1 开始，如果两行记录的分组列相同，则行序列 +1。
over 函数：是一个窗口函数。
over (order by score) 按照 score 排序进行累计，order by 是个默认的开窗函数。
over (partition by grade) 按照班级分区。
over (partition by grade order by score) 按照班级分区，并按着分数排序。
over (order by score range between 2 preceding and 2 following) 窗口范围为当前行的数据幅度减2加2后的范围内的数据求和。

2、今年，北航，每个班级，每科的分数，及分数上下浮动2分的总和

select school, grade, course, score,
sum(score) over (order by score range between 2 preceding and 2 following) sscore
from exercise5_1 
where year = "2017" and school="北航";

3、where与having：今年，清华1年级，总成绩大于200分的学生以及学生数

select school, grade, name, sum(score) as total_score,
count(1) over (partition by school, grade) nct
from exercise5_1
where year = "2017" and school="清华" and grade = 1
group by school, grade, name
having total_score > 200;

having 是分组（group by）后的筛选条件，分组后的数据组内再筛选，也就是说 HAVING 子句可以让我们筛选成组后的各组数据。
where 则是在分组，聚合前先筛选记录。也就是说作用在 GROUP BY 子句和 HAVING 子句前。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Hive--笔试题05_1--求TopN

场景举例

成绩的数据格式

样例数据

建表导数据

需求

1、分组TopN，选出今年每个学校、每个年级、分数前三的科目

2、今年，北航，每个班级，每科的分数，及分数上下浮动2分的总和

3、where与having：今年，清华1年级，总成绩大于200分的学生以及学生数

Hive的SQL編譯源碼詳解

Spark的任務提交和執行流程詳解

Hive--筆試題05_2--求TopN

Python全詳解--大綱（全網最清晰學習思路）

四百多篇博客專欄歸類讓你直接晉級老手

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結