[Hadoop大数据]——Hive部署入门教程

  1. 云栖社区>
  2. 博客>
  3. 正文

[Hadoop大数据]——Hive部署入门教程

青夜之衫 2017-12-05 16:20:00 浏览1169
展开阅读全文

Hive是为了解决hadoop中mapreduce编写困难,提供给熟悉sql的人使用的。只要你对SQL有一定的了解,就能通过Hive写出mapreduce的程序,而不需要去学习hadoop中的api。

在部署前需要确认安装jdk以及Hadoop

如果需要安装jdk以及hadoop可以参考我之前的博客:

Linux下安装jdk
Linux下安装hadoop伪分布式

449064-20160816120734734-1205093992.png

在安装之前,先了解下Hive都有哪些东西。

下载并解压缩

去主页选择镜像地址:

http://www.apache.org/dyn/closer.cgi/hive/

在镜像地址中选择下载的版本,我这里使用的是最新的2.1.0

https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-2.1.0/

下载后,解压缩

# 解压缩
tar -zxvf apache-hive-2.1.0-bin.tar.gz

# 拷贝到指定目录
mv apache-hive-2.1.0-bin /usr

修改环境变量

# 编辑/etc/profile
vi /etc/profile

# set hive environment
export HIVE_HOME=/usr/apache-hive-2.1.0-bin
export PATH=$PATH:$HIVE_HOME/bin

# 配置文件生效
source /etc/profile

创建并修改配置文件

# 进入conf目录
cd $HIVE_HOME/conf

# 拷贝hive-site.xml
cp hive-default.xml.template hive-site.xml

# 修改其中的参数
修改所有的system.io.tmpdir为指定的目录(自己定,我放在/usr/hive/tmp下了)
修改所有的system.user.name为自己的名字(我直接修改成xingoo了)

初始化schema

由于Hive中所有的原信息都需要存储到关系型数据库里面,因此需要初始化数据库表。我这里直接使用的是默认的derby数据库,关于数据的配置就不用修改了,直接使用默认的就好了。

# 进入指定的目录
cd $HIVE_HOME/bin

# 初始化,如果是mysql则derby可以直接替换成mysql
./schematool -initSchema -dbType derby

注意这里有可能报错:

[root@localhost bin]# ./schematool -initSchema -dbType derby createDatabaseIfNotExist=true
which: no hbase in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/java/jdk1.8.0/bin:/root/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/hadoop/hadoop-2.6.4/bin:/usr/apache-hive-2.1.0-bin/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:    jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver :    org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User:   APP
Starting metastore schema initialization to 2.1.0
Initialization script hive-schema-2.1.0.derby.sql
Error: FUNCTION 'NUCLEUS_ASCII' already exists. (state=X0Y68,code=30000)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
Use --verbose for detailed stacktrace.
*** schemaTool failed ***

这时需要修改derby初始化脚本

# 进入指定的目录
cd $HIVE_HOME/scripts/metastore/upgrade/derby

# 修改错误堆栈中的脚本hive-schema-2.1.0.derby.sql,注释上面两句
-- ----------------------------------------------
-- DDL Statements for functions
-- ----------------------------------------------

-- CREATE FUNCTION "APP"."NUCLEUS_ASCII" (C CHAR(1)) RETURNS INTEGER LANGUAGE JAVA PARAMETER STYLE JAVA READS SQL DATA CALLED ON NULL INPUT EXTERNAL NAME 'org.datanucleus.store.rdbms.adapter.DerbySQLFunction.ascii' ;

-- CREATE FUNCTION "APP"."NUCLEUS_MATCHES" (TEXT VARCHAR(8000),PATTERN VARCHAR(8000)) RETURNS INTEGER LANGUAGE JAVA PARAMETER STYLE JAVA READS SQL DATA CALLED ON NULL INPUT EXTERNAL NAME 'org.datanucleus.store.rdbms.adapter.DerbySQLFunction.matches' ;

再次执行,就可以了

[root@localhost bin]# vim ../scripts/metastore/upgrade/derby/hive-schema-2.1.0.derby.sql 
[root@localhost bin]# ./schematool -initSchema -dbType derby createDatabaseIfNotExist=true
which: no hbase in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/java/jdk1.8.0/bin:/root/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/hadoop/hadoop-2.6.4/bin:/usr/apache-hive-2.1.0-bin/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:    jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver :    org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User:   APP
Starting metastore schema initialization to 2.1.0
Initialization script hive-schema-2.1.0.derby.sql
Initialization script completed
schemaTool completed

启动hive cli测试

[root@localhost bin]# hive
which: no hbase in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/java/jdk1.8.0/bin:/root/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/hadoop/hadoop-2.6.4/bin:/usr/apache-hive-2.1.0-bin/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/usr/apache-hive-2.1.0-bin/lib/hive-common-2.1.0.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> show databases;
OK
default
Time taken: 1.931 seconds, Fetched: 1 row(s)

至此,Hive就算安装完啦!

踩过的坑

没有初始化数据库,定义Schema

[root@localhost bin]# hive
which: no hbase in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/java/jdk1.8.0/bin:/root/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/hadoop/hadoop-2.6.4/bin:/usr/apache-hive-2.1.0-bin/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/usr/apache-hive-2.1.0-bin/lib/hive-common-2.1.0.jar!/hive-log4j2.properties Async: true
Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema. If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql))
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:578)
    at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:518)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:705)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema. If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql))
    at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:226)
    at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:366)
    at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:310)
    at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:290)
    at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:266)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:545)
    ... 9 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema. If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql))
    at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3593)
    at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:236)
    at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:221)
    ... 14 more
Caused by: MetaException(message:Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema. If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql))
    at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3364)
    at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3336)
    at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3590)
    ... 16 more

解决办法:

执行命令$HIVE_HOME/bin/schematool -initSchema -dbType derby

derby数据库脚本定义有问题

[root@localhost bin]# ./schematool -initSchema -dbType derby createDatabaseIfNotExist=true
which: no hbase in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/java/jdk1.8.0/bin:/root/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/hadoop/hadoop-2.6.4/bin:/usr/apache-hive-2.1.0-bin/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:    jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver :    org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User:   APP
Starting metastore schema initialization to 2.1.0
Initialization script hive-schema-2.1.0.derby.sql
Error: FUNCTION 'NUCLEUS_ASCII' already exists. (state=X0Y68,code=30000)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
Use --verbose for detailed stacktrace.
*** schemaTool failed ***

解决办法:

注释掉sql脚本中的两句定义..

配置文件中${system.io.tmpdir}不识别

[root@localhost bin]# hive
which: no hbase in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/java/jdk1.8.0/bin:/root/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/hadoop/hadoop-2.6.4/bin:/usr/apache-hive-2.1.0-bin/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/usr/apache-hive-2.1.0-bin/lib/hive-common-2.1.0.jar!/hive-log4j2.properties Async: true
Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
    at org.apache.hadoop.fs.Path.initialize(Path.java:206)
    at org.apache.hadoop.fs.Path.<init>(Path.java:172)
    at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:631)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)
    at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:518)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:705)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
    at java.net.URI.checkPath(URI.java:1823)
    at java.net.URI.<init>(URI.java:745)
    at org.apache.hadoop.fs.Path.initialize(Path.java:203)
    ... 12 more

解决办法:

替换hive-site.xml中的${system.io.tmpdir}属性

hive-site.xml中${system.user.name}不识别

[root@localhost bin]# hive
which: no hbase in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/java/jdk1.8.0/bin:/root/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/hadoop/hadoop-2.6.4/bin:/usr/apache-hive-2.1.0-bin/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/usr/apache-hive-2.1.0-bin/lib/hive-common-2.1.0.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> show databases;
OK
Failed with exception java.io.IOException:java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:user.name%7D
Time taken: 1.781 seconds

解决办法:

替换hive-site.xml中的${system.user.name}属性

参考

FUNCTION 'NUCLEUS_ASCII' already exists.
如何在hive中定义schema
Hive部署注意事项
Hive配置入门

本文转自博客园xingoo的博客,原文链接:[Hadoop大数据]——Hive部署入门教程,如需转载请自行联系原博主。

网友评论

登录后评论
0/500
评论
青夜之衫
+ 关注