Hive sql語句必練50題-入門到精通(1)(轉載)

Hive sql語句必練50題-入門到精通(1)

原始鏈接:https://blog.csdn.net/Thomson617/article/details/83212338


原創Thomson617 最後發佈於2018-10-20 12:22:19 閱讀數 9481  收藏
展開
hive學習之經典sql 50題 hive版

建表:

create table student(s_id string,s_name string,s_birth string,s_sex string) row format delimited fields terminated by '\t';

create table course(c_id string,c_name string,t_id string) row format delimited fields terminated by '\t';

create table teacher(t_id string,t_name string) row format delimited fields terminated by '\t';

create table score(s_id string,c_id string,s_score int) row format delimited fields terminated by '\t';
1
2
3
4
5
6
7
生成數據

vi /export/data/hivedatas/student.csv

01 趙雷 1990-01-01 男
02 錢電 1990-12-21 男
03 孫風 1990-05-20 男
04 李雲 1990-08-06 男
05 周梅 1991-12-01 女
06 吳蘭 1992-03-01 女
07 鄭竹 1989-07-01 女
08 王菊 1990-01-20 女

vi /export/data/hivedatas/course.csv

01 語文 02
02 數學 01
03 英語 03

vi /export/data/hivedatas/teacher.csv

01 張三
02 李四
03 王五

vi /export/data/hivedatas/score.csv

01 01 80
01 02 90
01 03 99
02 01 70
02 02 60
02 03 80
03 01 80
03 02 80
03 03 80
04 01 50
04 02 30
04 03 20
05 01 76
05 02 87
06 01 31
06 03 34
07 02 89
07 03 98

導數據到hive

load data local inpath '/export/data/hivedatas/student.csv' into table student;

load data local inpath '/export/data/hivedatas/course.csv' into table course;

load data local inpath '/export/data/hivedatas/teacher.csv' into table teacher;

load data local inpath '/export/data/hivedatas/score.csv' into table score;
1
2
3
4
5
6
7
–注:–hive查詢語法

SELECT [ALL | DISTINCT] select_expr, select_expr, ...
    FROM table_reference
    [WHERE where_condition]
    [GROUP BY col_list [HAVING condition]]
    [CLUSTER BY col_list
      | [DISTRIBUTE BY col_list] [SORT BY| ORDER BY col_list]
    ]
    [LIMIT number]
1
2
3
4
5
6
7
8
– 1、查詢"01"課程比"02"課程成績高的學生的信息及課程分數:

select student.*,a.s_score as 01_score,b.s_score as 02_score
from student
  join score a on student.s_id=a.s_id and a.c_id='01'
  left join score b on student.s_id=b.s_id and b.c_id='02'
where  a.s_score>b.s_score;
1
2
3
4
5
–答案2

select student.*,a.s_score as 01_score,b.s_score as 02_score
from student
join score a on  a.c_id='01'
join score b on  b.c_id='02'
where  a.s_id=student.s_id and b.s_id=student.s_id and a.s_score>b.s_score;
1
2
3
4
5
– 2、查詢"01"課程比"02"課程成績低的學生的信息及課程分數:

select student.*,a.s_score as 01_score,b.s_score as 02_score
from student
join score a on student.s_id=a.s_id and a.c_id='01'
left join score b on student.s_id=b.s_id and b.c_id='02'
where a.s_score<b.s_score;
1
2
3
4
5
–答案2

select student.*,a.s_score as 01_score,b.s_score as 02_score
from student
join score a on  a.c_id='01'
join score b on  b.c_id='02'
where  a.s_id=student.s_id and b.s_id=student.s_id and a.s_score<b.s_score;
1
2
3
4
5
– 3、查詢平均成績大於等於60分的同學的學生編號和學生姓名和平均成績:

select  student.s_id,student.s_name,tmp.平均成績 from student
  join (
    select score.s_id,round(avg(score.s_score),1)as 平均成績
        from score group by s_id)as tmp
  on tmp.平均成績>=60
where student.s_id = tmp.s_id
1
2
3
4
5
6
–答案2

select  student.s_id,student.s_name,round(avg (score.s_score),1) as 平均成績 from student
join score on student.s_id = score.s_id
group by student.s_id,student.s_name
having avg (score.s_score) >= 60;
1
2
3
4
– 4、查詢平均成績小於60分的同學的學生編號和學生姓名和平均成績:
– (包括有成績的和無成績的)

select  student.s_id,student.s_name,tmp.avgScore from student
join (
select score.s_id,round(avg(score.s_score),1)as avgScore from score group by s_id)as tmp
on tmp.avgScore < 60
where student.s_id=tmp.s_id
union all
select  s2.s_id,s2.s_name,0 as avgScore from student s2
where s2.s_id not in
    (select distinct sc2.s_id from score sc2);
1
2
3
4
5
6
7
8
9
–答案2

select  score.s_id,student.s_name,round(avg (score.s_score),1) as avgScore from student
inner join score on student.s_id=score.s_id
group by score.s_id,student.s_name
having avg (score.s_score) < 60
union all
select  s2.s_id,s2.s_name,0 as avgScore from student s2
where s2.s_id not in
    (select distinct sc2.s_id from score sc2);
1
2
3
4
5
6
7
8
– 5、查詢所有同學的學生編號、學生姓名、選課總數、所有課程的總成績:

select student.s_id,student.s_name,(count(score.c_id) )as total_count,sum(score.s_score)as total_score
from student
left join score on student.s_id=score.s_id
group by student.s_id,student.s_name ;
1
2
3
4
– 6、查詢"李"姓老師的數量:

select t_name,count(1) from teacher  where t_name like '李%' group by t_name;
1
– 7、查詢學過"張三"老師授課的同學的信息:

select student.* from student
join score on student.s_id =score.s_id
join  course on course.c_id=score.c_id
join  teacher on course.t_id=teacher.t_id and t_name='張三';
1
2
3
4
– 8、查詢沒學過"張三"老師授課的同學的信息:

select student.* from student
left join (select s_id from score
      join  course on course.c_id=score.c_id
      join  teacher on course.t_id=teacher.t_id and t_name='張三')tmp
on  student.s_id =tmp.s_id
where tmp.s_id is null;
1
2
3
4
5
6
– 9、查詢學過編號爲"01"並且也學過編號爲"02"的課程的同學的信息:

select * from student
join (select s_id from score where c_id =1 )tmp1
    on student.s_id=tmp1.s_id
join (select s_id from score where c_id =2 )tmp2
    on student.s_id=tmp2.s_id;
1
2
3
4
5
– 10、查詢學過編號爲"01"但是沒有學過編號爲"02"的課程的同學的信息:

select student.* from student
join (select s_id from score where c_id =1 )tmp1
    on student.s_id=tmp1.s_id
left join (select s_id from score where c_id =2 )tmp2
    on student.s_id =tmp2.s_id
where tmp2.s_id is null;
1
2
3
4
5
6
– 11、查詢沒有學全所有課程的同學的信息:
–先查詢出課程的總數量

   select count(1) from course;
1
–再查詢所需結果

select student.* from student
left join(
      select s_id
        from score
          group by s_id
            having count(c_id)=3)tmp
on student.s_id=tmp.s_id
where tmp.s_id is null;
1
2
3
4
5
6
7
8
–方法二(一步到位):

select student.* from student
join (select count(c_id)num1 from course)tmp1
left join(
      select s_id,count(c_id)num2
        from score group by s_id)tmp2
on student.s_id=tmp2.s_id and tmp1.num1=tmp2.num2
where tmp2.s_id is null;
1
2
3
4
5
6
7
– 12、查詢至少有一門課與學號爲"01"的同學所學相同的同學的信息:

select student.* from student
join (select c_id from score where score.s_id=01)tmp1
join (select s_id,c_id from score)tmp2
    on tmp1.c_id =tmp2.c_id and student.s_id =tmp2.s_id
where student.s_id  not in('01')
group by student.s_id,s_name,s_birth,s_sex;
1
2
3
4
5
6
– 13、查詢和"01"號的同學學習的課程完全相同的其他同學的信息:
–備註:hive不支持group_concat方法,可用 concat_ws(’|’, collect_set(str)) 實現

select student.*,tmp1.course_id from student
join (select s_id ,concat_ws('|', collect_set(c_id)) course_id from score
      group by s_id having s_id not in (1))tmp1
  on student.s_id = tmp1.s_id
join (select concat_ws('|', collect_set(c_id)) course_id2
            from score  where s_id=1)tmp2
      on tmp1.course_id = tmp2.course_id2;
1
2
3
4
5
6
7
– 14、查詢沒學過"張三"老師講授的任一門課程的學生姓名:

select student.* from student
  left join (select s_id from score
          join (select c_id from course join  teacher on course.t_id=teacher.t_id and t_name='張三')tmp2
          on score.c_id=tmp2.c_id )tmp
  on student.s_id = tmp.s_id
  where tmp.s_id is null;
1
2
3
4
5
6
– 15、查詢兩門及其以上不及格課程的同學的學號,姓名及其平均成績:

select student.s_id,student.s_name,tmp.avg_score from student
inner join (select s_id from score
      where s_score<60
        group by score.s_id having count(s_id)>1)tmp2
on student.s_id = tmp2.s_id
left join (
    select s_id,round(AVG (score.s_score)) avg_score
      from score group by s_id)tmp
      on tmp.s_id=student.s_id;
1
2
3
4
5
6
7
8
9
– 16、檢索"01"課程分數小於60,按分數降序排列的學生信息:

select student.*,s_score from student,score
where student.s_id=score.s_id and s_score<60 and c_id='01'
order by s_score desc;
1
2
3
– 17、按平均成績從高到低顯示所有學生的所有課程的成績以及平均成績:

select a.s_id,tmp1.s_score as chinese,tmp2.s_score as math,tmp3.s_score as english,
    round(avg (a.s_score),2) as avgScore
from score a
left join (select s_id,s_score  from score s1 where  c_id='01')tmp1 on  tmp1.s_id=a.s_id
left join (select s_id,s_score  from score s2 where  c_id='02')tmp2 on  tmp2.s_id=a.s_id
left join (select s_id,s_score  from score s3 where  c_id='03')tmp3 on  tmp3.s_id=a.s_id
group by a.s_id,tmp1.s_score,tmp2.s_score,tmp3.s_score order by avgScore desc;
1
2
3
4
5
6
7
– 18.查詢各科成績最高分、最低分和平均分:以如下形式顯示:課程ID,課程name,最高分,最低分,平均分,及格率,中等率,優良率,優秀率:
–及格爲>=60,中等爲:70-80,優良爲:80-90,優秀爲:>=90

select course.c_id,course.c_name,tmp.maxScore,tmp.minScore,tmp.avgScore,tmp.passRate,tmp.moderate,tmp.goodRate,tmp.excellentRates from course
join(select c_id,max(s_score) as maxScore,min(s_score)as minScore,
    round(avg(s_score),2) avgScore,
    round(sum(case when s_score>=60 then 1 else 0 end)/count(c_id),2)passRate,
    round(sum(case when s_score>=60 and s_score<70 then 1 else 0 end)/count(c_id),2) moderate,
    round(sum(case when s_score>=70 and s_score<80 then 1 else 0 end)/count(c_id),2) goodRate,
    round(sum(case when s_score>=80 and s_score<90 then 1 else 0 end)/count(c_id),2) excellentRates
from score group by c_id)tmp on tmp.c_id=course.c_id;
1
2
3
4
5
6
7
8
– 19、按各科成績進行排序,並顯示排名:
– row_number() over()分組排序功能(mysql沒有該方法)

select s1.*,row_number()over(order by s1.s_score desc) Ranking
    from score s1 where s1.c_id='01'order by Ranking asc
union all select s2.*,row_number()over(order by s2.s_score desc) Ranking
    from score s2 where s2.c_id='02'order by Ranking asc
union all select s3.*,row_number()over(order by s3.s_score desc) Ranking
    from score s3 where s3.c_id='03'order by Ranking asc;
1
2
3
4
5
6
– 20、查詢學生的總成績並進行排名:

select score.s_id,s_name,sum(s_score) sumscore,row_number()over(order by sum(s_score) desc) Ranking
  from score ,student
    where score.s_id=student.s_id
    group by score.s_id,s_name order by sumscore desc;
1
2
3
4
後續部分參見:
https://blog.csdn.net/Thomson617/article/details/83280617
Hive下的SQL語法總結:

(1).Hive不支持join的非等值連接,不支持or
分別舉例如下及實現解決辦法。
  不支持不等值連接
       錯誤:select * from a inner join b on a.id<>b.id
       替代方法:select * from a inner join b on a.id=b.id and a.id is null;
 不支持or
       錯誤:select * from a inner join b on a.id=b.id or a.name=b.name
       替代方法:select * from a inner join b on a.id=b.id
                union all
                select * from a inner join b on a.name=b.name
  兩個sql union all的字段名必須一樣或者列別名要一樣。
        
(2).分號字符:不能智能識別concat(‘;’,key),只會將‘;’當做SQL結束符號。
    •分號是SQL語句結束標記,在HiveQL中也是,但是在HiveQL中,對分號的識別沒有那麼智慧,例如:
        •select concat(key,concat(';',key)) from dual;
    •但HiveQL在解析語句時提示:
        FAILED: Parse Error: line 0:-1 mismatched input '<EOF>' expecting ) in function specification
    •解決的辦法是,使用分號的八進制的ASCII碼進行轉義,那麼上述語句應寫成:
        •select concat(key,concat('\073',key)) from dual;

(3).不支持INSERT INTO 表 Values(), UPDATE, DELETE等操作.這樣的話,就不要很複雜的鎖機制來讀寫數據。
    INSERT INTO syntax is only available starting in version 0.8。INSERT INTO就是在表或分區中追加數據。

(4).HiveQL中String類型的字段若是空(empty)字符串, 即長度爲0, 那麼對它進行IS NULL的判斷結果是False,使用left join可以進行篩選行。

(5).不支持 ‘< dt <’這種格式的範圍查找,可以用dt in(”,”)或者between替代。

(6).Hive不支持將數據插入現有的表或分區中,僅支持覆蓋重寫整個表,示例如下:
    INSERT OVERWRITE TABLE t1 SELECT * FROM t2;
    
(7).group by的字段,必須是select後面的字段,select後面的字段不能比group by的字段多.
    如果select後面有聚合函數,則該select語句中必須有group by語句
    而且group by後面不能使用別名
    
(8).hive的0.13版之前select , where 及 having 之後不能跟子查詢語句(一般使用left join、right join 或者inner join替代)

(9).先join(及inner join) 然後left join或right join

(10).hive不支持group_concat方法,可用 concat_ws('|', collect_set(str)) 實現

(11).not in 和 <> 不起作用,可用left join tmp on tableName.id = tmp.id where tmp.id is null 替代實現
... ...

原文鏈接:https://blog.csdn.net/Thomson617/article/details/83212338

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章