用户访问表(visit_table)
user_id(用户ID) | Url(访问地址) |
1 | A |
1 | B |
2 | C |
2 | A |
1 | A |
SQL查询,访问过A并且访问过B的用户数量
实现1:
with user_visit as ( select 1 as user_id, 'A' as url union all select 1 as user_id, 'A' as url union all select 1 as user_id, 'B' as url union all select 1 as user_id, 'C' as url union all select 2 as user_id, 'B' as url union all select 2 as user_id, 'B' as url union all select 3 as user_id, 'A' as url union all select 4 as user_id, 'A' as url union all select 4 as user_id, 'B' as url union all select 5 as user_id, 'C' as url union all select 1 as user_id, 'A' as url ) --即访问A,又访问B页面的用户数 select count(user_id) from ( select user_id, collect_set(url) as url_set from user_visit where url = 'A' or url = 'B' group by user_id ) a where size(url_set) = 2
实现2:
set hive.strict.checks.cartesian.product=false; with user_visit as ( select 1 as user_id, 'A' as url union all select 1 as user_id, 'A' as url union all select 1 as user_id, 'B' as url union all select 1 as user_id, 'C' as url union all select 2 as user_id, 'B' as url union all select 2 as user_id, 'B' as url union all select 3 as user_id, 'A' as url union all select 4 as user_id, 'A' as url union all select 4 as user_id, 'B' as url union all select 5 as user_id, 'C' as url union all select 1 as user_id, 'A' as url ) --即访问A,又访问B页面的用户数 select count(user_id) from ( select distinct user_id, url from user_visit where url = 'A' ) a join ( select distinct user_id, url from user_visit where url = 'B' ) b