[Hadoop] Hive HQL
http://wiki.apache.org/hadoop/Hive/GettingStarted
Hive Shell 例子
设置Map和Reduce
set mapred.map.tasks=1;
SET mapred.reduce.tasks=1;
http://www.cxrs.org/SQL/10584.html
CREATE TABLE pokes (foo INT, bar STRING);
/user/hive/warehouse/pokes
CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);
SHOW TABLES;
SHOW TABLES '.*s';
ALTER TABLE pokes ADD COLUMNS (new_col INT);
ALTER TABLE invites ADD COLUMNS (new_col2 INT COMMENT 'a comment');
ALTER TABLE events RENAME TO 3koobecaf;
DROP TABLE pokes;
DML 操作
默认列之间,用"",new String(new byte[]{1})进行分隔。
一般表
LOAD DATA LOCAL INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;
分区表
LOAD DATA LOCAL INPATH './examples/files/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-15');
LOAD DATA LOCAL INPATH './examples/files/kv3.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-08');
LOAD DATA INPATH '/user/myname/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-15');
SQL操作
设置MR参数
SET mapred.reduce.tasks=1;
查看当前所有参数
SET -v ;
查询
SELECT a.foo FROM invites a WHERE a.ds='2008-08-15';
INSERT OVERWRITE DIRECTORY '/tmp/hdfs_out' SELECT a.* FROM invites a WHERE a.ds='<DATE>';
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/local_out' SELECT a.* FROM pokes a;
INSERT OVERWRITE TABLE events SELECT a.* FROM profiles a;
INSERT OVERWRITE TABLE events SELECT a.* FROM profiles a WHERE a.key < 100;
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/reg_3' SELECT a.* FROM events a;
INSERT OVERWRITE DIRECTORY '/tmp/reg_4' select a.invites, a.pokes FROM profiles a;
INSERT OVERWRITE DIRECTORY '/tmp/reg_5' SELECT COUNT(1) FROM invites a WHERE a.ds='<DATE>';
INSERT OVERWRITE DIRECTORY '/tmp/reg_5' SELECT a.foo, a.bar FROM invites a;
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/sum' SELECT SUM(a.pc) FROM pc1 a;
Sum of a column. avg, min, max can also be used
双Partition Table,可以按天,按小时分配数据
CREATE TABLE day_hour_ptable (id INT, content STRING) PARTITIONED BY (ds STRING, hour string);
LOAD DATA LOCAL INPATH './examples/files/kv2.txt' OVERWRITE INTO TABLE day_hour_ptable PARTITION (ds='2008-04-01',hour='01');
LOAD DATA LOCAL INPATH './examples/files/kv3.txt' OVERWRITE INTO TABLE day_hour_ptable PARTITION (ds='2008-04-01',hour='02');
select * from day_hour_ptable a where a.ds='2008-04-01' and a.hour='01' and a.id=238;
生成1天,24小时的数据,启用下列语句的时候,Hadoop生成24个Map。
set mapred.map.tasks=1;
SET mapred.reduce.tasks=1;
select count(a.id) from day_hour_ptable a where a.ds='2008-04-01' and a.hour='01';
执行时间:23秒
SET mapred.reduce.tasks=24;
select count(a.id) from day_hour_ptable a;
执行时间:37秒
创建用'\t'作为表的原始文件分隔符,取代''
CREATE TABLE u_data (
userid INT,
movieid INT,
rating INT,
unixtime STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;
创建分析Apache Web log的数据表
CREATE TABLE apachelog (
host STRING,
identity STRING,
user STRING,
time STRING,
request STRING,
status STRING,
size STRING,
referer STRING,
agent STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\"))?",
"output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s"
)
STORED AS TEXTFILE;
Hive QL例子
hive-0.3.99.1+0\examples
D:\7g\Personal\Resources\Architecture\Hadoop\hive-0.3.99.1+0\src\ql\src\test\queries\clientnegative
分享到:
相关推荐
Hadoop+ZooKeeper+HBase+hive(HQL)安装步骤
Hive开发规范及要点,hql 开发基础知识,规范
1、 load data local inpath '/input/files/tb.txt' overwrite into table tb; ...2 、hive –e 执行hql语句 -i 初始化参数,多用于加载UDF -f 执行hql文件,因为无法传参数,所以项目中基本不用
hadoop-hive的hql知识点详细总结,纯干货,建表,分区,分桶,hive的dml语句,hive的函数,hive的序列化与反序列化
解压下载的Hive0.8.1版本 [hadoop@hadoop ~]$ tar xvf hive-0.8.1.tar.gz [hadoop@hadoop ~]$mv hive-0.8.1/usr/local/hadoop/ 3.2
大数据分析环境安装与设置,大数据分析课程,hadoop原理,hive操作,函数,Hadoop原理与概念,HQL核心技能1--常用函数--表连接
Hadoop-2.8.0-HA-Hive安装部署与HQL05.mysql的安装详细步骤.mp4
Hadoop-2.8.0-HA-Hive安装部署与HQL07.hive.mp4
Hadoop-2.8.0-HA-Hive安装部署与HQL06.hive的安装和基本使用.mp4
Hadoop-2.8.0-HA-Hive安装部署与HQL08.hive的脚本化运行使用方式.mp4
Hadoop-2.8.0-HA-Hive安装部署与HQL04.hive的基本功能机制和概念.mp4
Hadoop-2.8.0-Day08-Hive函数与HQL详解-课件与资料.zip
Hadoop-2.8.0-HA-Hive安装部署与HQL09.hive的基本语法--建表语法.mp4
Hadoop-2.8.0-HA-Hive安装部署与HQL11.hive的基本语法--CTAS建表.mp4
由于 SQL 被广泛的应用在数据仓库中,因此,专门针对 Hive 的特性设计了类 SQL 的查询语言 HQL。熟悉 SQL 开发的开发者可以很方便的使用 Hive 进行开发。 2. 数据存储位置。Hive 是建立在 Hadoop 之上的,所有 Hive ...
Hive是基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据库表,并提供类SQL查询功能(HQL); 其本质是将SQL转换为MapReduce的任务进行运算,底层由HDFS来提供数据的存储,hive可以理解为一个将...
Hadoop-2.8.0-HA-Hive安装部署与HQL10.hive的基本语法--内部表和外部表.mp4
Hadoop-2.8.0-HA-Hive安装部署与HQL12.hive的基本语法--数据导入--从本地--从hdfs.mp4
Hadoop-2.8.0-Day07-HA-Hive安装部署与HQL-课件与资料.zip
hadoop hive 数据仓库 入门 hive是一个基于hadoop的数据仓库。使用hadoop-hdfs作为数据存储层;提供类似SQL的语言(HQL),通过hadoop-mapreduce完成数据计算;通过HQL语言提供使用者部分传统RDBMS一样的表格查询...