hive 两个表关联查询 连接用法(join、left outer join、left semi join和full outer join):
建表:
CREATE TABLE table1(id INT, name STRING) COMMENT 'this is a test2'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;
CREATE TABLE table2(id INT, score STRING) COMMENT 'this is a test2'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;
CREATE TABLE table3(id INT, book STRING) COMMENT 'this is a test2'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;
从本地磁盘导入数据:
LOAD DATA LOCAL INPATH '/home/dic_khxw/table1' INTO TABLE table1;
LOAD DATA LOCAL INPATH '/home/dic_khxw/table2' INTO TABLE table2;
LOAD DATA LOCAL INPATH '/home/dic_khxw/table3' INTO TABLE table3;
hive> LOAD DATA LOCAL INPATH '/home/dic_khxw/table2' INTO TABLE table2;
Copying data from file:/home/dic_khxw/table2
Copying file: file:/home/dic_khxw/table2
Loading data to table test_yh.table2
Table test_yh.table2 stats: [numFiles=1, numRows=0, totalSize=16, rawDataSize=0]
OK
Time taken: 0.734 seconds
三张表:
表1:table1 表2:table2 表3:table3
id nameid socreidbook
1 yuhan11001数据结构
2 test2802C语言
3 lbj4601离散数据
select a.id a_id,b.id b_id, a.name,b.book from table1 a , table3 where a.id=b.id
OK
11yuhan数据结构
22testC语言
11yuhan离散数学
==========================================================
join 用法:满足关联条件的左右表都会显示, 关联的结果有重复记录,那么记录会全部保留。
select a.id as id_a, a.name, b.id as id_b, b.score from table1 a join table2 b on a.id = b.id;
idnameid_bscore
1yuhan1100
2test280
OK
1yuhan1100
2test280
==========================================================
如果关联的结果有重复记录,那么记录会全部保留。为了说明这一点,请先看下面一段代码的输出结果:
select a.id as id_a, a.name, c.id as id_c, c.book from table1 a join table3 c on a.id = c.id;
idnameid_cbook
1yuhan1数据结构
2test280
1yuhan1离散数学
OK
1yuhan1数据结构
2test2C语言
1yuhan1离散数学
===========================================================
left outer join的用法:以左表为主,若右表有多条记录对应左边 ,则显示多行,若左表对应右表没有 则显示NULL
select a.id as id_a, a.name, b.id as id_b, b.score from table1 a left outer join table2 b on a.id = b.id;
idnameid_bscore
1yuhan1100
2test280
3lbjNULLNULL
==========================================================
select a.id as id_a, a.name, b.id as id_b, b.book from table1 a left outer join table3 b on a.id = b.id;
OK
1yuhan1数据结构
1yuhan1离散数学
2test2C语言
3lbjNULLNULL
==========================================================
上述代码中加上where b.id is not null,把在b表中值为空的记录剔除,实现的其实是传统sql中的exists in操作。输出结果为:
idnameid_bscore
1yuhan1100
2test280
============================================================
取a中存在但b中不存在的记录
select a.id as id_a, a.name, b.id as id_b, b.score from table1 a left outer join table2 b on a.id = b.id where b.id is null;
idnameid_bscore
3lbjNULLNULL
OK
3lbjNULLNULL
同样的,在left outer join操作中,如果结果集出现重复记录,可以使用distinct去重。
===========================================================
left semi join的用法: 有一个限制条件,即右表的字段只能出现在on子句中,而不能在select和where子句中引用。
select a.id, a.name from table1 a left semi join table2 b on a.id = b.id
OK
1yuhan
2test
============================================================
full outer join的用法:full outer join可以实现全连结
select a.id as id_a, a.name, b.id as id_b, b.score from table1 a full outer join table2 b on a.id = b.id;
OK
1yuhan1100
2test280
3lbjNULLNULL
NULLNULL460
===========================================================
left outer join where is not null与left semi join的联系与区别:两者均可实现exists in操作,不同的是,前者允许右表的字段在select或where子句中引用,而后者不允许。
除了left outer join,Hive QL中还有right outer join,其功能与前者相当,只不过左表和右表的角色刚好相反。
另外,Hive QL中没有left join、right join、full join以及right semi join等操作。
本文暂时没有评论,来添加一个吧(●'◡'●)