专业的编程技术博客社区

网站首页 > 博客文章 正文

hive关联查询举例(hive连接查询)

baijin 2024-08-27 11:22:38 博客文章 3 ℃ 0 评论

hive 两个表关联查询 连接用法(join、left outer join、left semi join和full outer join):

建表:

CREATE TABLE table1(id INT, name STRING) COMMENT 'this is a test2'

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS TEXTFILE;

CREATE TABLE table2(id INT, score STRING) COMMENT 'this is a test2'

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS TEXTFILE;

CREATE TABLE table3(id INT, book STRING) COMMENT 'this is a test2'

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS TEXTFILE;

从本地磁盘导入数据:

LOAD DATA LOCAL INPATH '/home/dic_khxw/table1' INTO TABLE table1;

LOAD DATA LOCAL INPATH '/home/dic_khxw/table2' INTO TABLE table2;

LOAD DATA LOCAL INPATH '/home/dic_khxw/table3' INTO TABLE table3;

hive> LOAD DATA LOCAL INPATH '/home/dic_khxw/table2' INTO TABLE table2;

Copying data from file:/home/dic_khxw/table2

Copying file: file:/home/dic_khxw/table2

Loading data to table test_yh.table2

Table test_yh.table2 stats: [numFiles=1, numRows=0, totalSize=16, rawDataSize=0]

OK

Time taken: 0.734 seconds

三张表:

表1:table1 表2:table2 表3:table3

id nameid socreidbook

1 yuhan11001数据结构

2 test2802C语言

3 lbj4601离散数据

select a.id a_id,b.id b_id, a.name,b.book from table1 a , table3 where a.id=b.id

OK

11yuhan数据结构

22testC语言

11yuhan离散数学

==========================================================

join 用法:满足关联条件的左右表都会显示, 关联的结果有重复记录,那么记录会全部保留。

select a.id as id_a, a.name, b.id as id_b, b.score from table1 a join table2 b on a.id = b.id;

idnameid_bscore

1yuhan1100

2test280

OK

1yuhan1100

2test280

==========================================================

如果关联的结果有重复记录,那么记录会全部保留。为了说明这一点,请先看下面一段代码的输出结果:

select a.id as id_a, a.name, c.id as id_c, c.book from table1 a join table3 c on a.id = c.id;

idnameid_cbook

1yuhan1数据结构

2test280

1yuhan1离散数学

OK

1yuhan1数据结构

2test2C语言

1yuhan1离散数学

===========================================================

left outer join的用法:以左表为主,若右表有多条记录对应左边 ,则显示多行,若左表对应右表没有 则显示NULL

select a.id as id_a, a.name, b.id as id_b, b.score from table1 a left outer join table2 b on a.id = b.id;

idnameid_bscore

1yuhan1100

2test280

3lbjNULLNULL

==========================================================

select a.id as id_a, a.name, b.id as id_b, b.book from table1 a left outer join table3 b on a.id = b.id;

OK

1yuhan1数据结构

1yuhan1离散数学

2test2C语言

3lbjNULLNULL

==========================================================

上述代码中加上where b.id is not null,把在b表中值为空的记录剔除,实现的其实是传统sql中的exists in操作。输出结果为:

idnameid_bscore

1yuhan1100

2test280

============================================================

取a中存在但b中不存在的记录

select a.id as id_a, a.name, b.id as id_b, b.score from table1 a left outer join table2 b on a.id = b.id where b.id is null;

idnameid_bscore

3lbjNULLNULL

OK

3lbjNULLNULL

同样的,在left outer join操作中,如果结果集出现重复记录,可以使用distinct去重。

===========================================================

left semi join的用法: 有一个限制条件,即右表的字段只能出现在on子句中,而不能在select和where子句中引用。

select a.id, a.name from table1 a left semi join table2 b on a.id = b.id

OK

1yuhan

2test

============================================================

full outer join的用法:full outer join可以实现全连结

select a.id as id_a, a.name, b.id as id_b, b.score from table1 a full outer join table2 b on a.id = b.id;

OK

1yuhan1100

2test280

3lbjNULLNULL

NULLNULL460

===========================================================

left outer join where is not null与left semi join的联系与区别:两者均可实现exists in操作,不同的是,前者允许右表的字段在select或where子句中引用,而后者不允许。

除了left outer join,Hive QL中还有right outer join,其功能与前者相当,只不过左表和右表的角色刚好相反。

另外,Hive QL中没有left join、right join、full join以及right semi join等操作。

Tags:

本文暂时没有评论,来添加一个吧(●'◡'●)

欢迎 发表评论:

最近发表
标签列表