hive关联查询举例（hive连接查询）

hive 两个表关联查询连接用法(join、left outer join、left semi join和full outer join)：

建表：

CREATE TABLE table1(id INT, name STRING) COMMENT 'this is a test2'

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS TEXTFILE;

CREATE TABLE table2(id INT, score STRING) COMMENT 'this is a test2'

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS TEXTFILE;

CREATE TABLE table3(id INT, book STRING) COMMENT 'this is a test2'

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS TEXTFILE;

从本地磁盘导入数据：

LOAD DATA LOCAL INPATH '/home/dic_khxw/table1' INTO TABLE table1;

LOAD DATA LOCAL INPATH '/home/dic_khxw/table2' INTO TABLE table2;

LOAD DATA LOCAL INPATH '/home/dic_khxw/table3' INTO TABLE table3;

hive> LOAD DATA LOCAL INPATH '/home/dic_khxw/table2' INTO TABLE table2;

Copying data from file:/home/dic_khxw/table2

Copying file: file:/home/dic_khxw/table2

Loading data to table test_yh.table2

Table test_yh.table2 stats: [numFiles=1, numRows=0, totalSize=16, rawDataSize=0]

Time taken: 0.734 seconds

三张表：

表1：table1 表2：table2 表3：table3

id nameid socreidbook

1 yuhan11001数据结构

2 test2802C语言

3 lbj4601离散数据

select a.id a_id,b.id b_id, a.name,b.book from table1 a , table3 where a.id=b.id

11yuhan数据结构

22testC语言

11yuhan离散数学

==========================================================

join 用法：满足关联条件的左右表都会显示，关联的结果有重复记录，那么记录会全部保留。

select a.id as id_a, a.name, b.id as id_b, b.score from table1 a join table2 b on a.id = b.id;

idnameid_bscore

1yuhan1100

2test280

1yuhan1100

2test280

==========================================================

如果关联的结果有重复记录，那么记录会全部保留。为了说明这一点，请先看下面一段代码的输出结果：

select a.id as id_a, a.name, c.id as id_c, c.book from table1 a join table3 c on a.id = c.id;

idnameid_cbook

1yuhan1数据结构

2test280

1yuhan1离散数学

1yuhan1数据结构

2test2C语言

1yuhan1离散数学

===========================================================

left outer join的用法：以左表为主，若右表有多条记录对应左边，则显示多行，若左表对应右表没有则显示NULL

select a.id as id_a, a.name, b.id as id_b, b.score from table1 a left outer join table2 b on a.id = b.id;

idnameid_bscore

1yuhan1100

2test280

3lbjNULLNULL

==========================================================

select a.id as id_a, a.name, b.id as id_b, b.book from table1 a left outer join table3 b on a.id = b.id;

1yuhan1数据结构

1yuhan1离散数学

2test2C语言

3lbjNULLNULL

==========================================================

上述代码中加上where b.id is not null，把在b表中值为空的记录剔除，实现的其实是传统sql中的exists in操作。输出结果为：

idnameid_bscore

1yuhan1100

2test280

============================================================

取a中存在但b中不存在的记录

select a.id as id_a, a.name, b.id as id_b, b.score from table1 a left outer join table2 b on a.id = b.id where b.id is null;

idnameid_bscore

3lbjNULLNULL

同样的，在left outer join操作中，如果结果集出现重复记录，可以使用distinct去重。

===========================================================

left semi join的用法: 有一个限制条件，即右表的字段只能出现在on子句中，而不能在select和where子句中引用。

select a.id, a.name from table1 a left semi join table2 b on a.id = b.id

1yuhan

2test

============================================================

full outer join的用法:full outer join可以实现全连结

select a.id as id_a, a.name, b.id as id_b, b.score from table1 a full outer join table2 b on a.id = b.id;

1yuhan1100

2test280

3lbjNULLNULL

NULLNULL460

===========================================================

left outer join where is not null与left semi join的联系与区别：两者均可实现exists in操作，不同的是，前者允许右表的字段在select或where子句中引用，而后者不允许。

除了left outer join，Hive QL中还有right outer join，其功能与前者相当，只不过左表和右表的角色刚好相反。

另外，Hive QL中没有left join、right join、full join以及right semi join等操作。

网站首页 > 博客文章正文

猜你喜欢

本文暂时没有评论，来添加一个吧(●'◡'●)

取消回复欢迎你发表评论:

网站首页 > 博客文章 正文

hive关联查询举例（hive连接查询）

猜你喜欢

本文暂时没有评论，来添加一个吧(●'◡'●)

取消回复欢迎 你 发表评论:

网站首页 > 博客文章正文

取消回复欢迎你发表评论: