sqoop搭建

本文涉及的产品
云数据库 RDS MySQL Serverless,0.5-2RCU 50GB
简介:

sqoop版本1.99.7#此搭建过程在最后启动job的时候失败了,每个版本的差异性蛮大的。

版本下载链接:http://pan.baidu.com/s/1pKYrusz 密码:7ib5

搭建sqoop之前,已经配置好了hadoop和java的环境

当第一次启动后,KILL掉HADOOP进程后出现的各种问题,重启机器解决问题。

This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh

错误: 找不到或无法加载主类 org.apache.hadoop.hdfs.tools.GetConf

错误: 找不到或无法加载主类 org.apache.hadoop.hdfs.server.namenode.NameNode


/usr/local/hadoop/sbin/hadoop-daemon.sh start  namenode ##起来后用jps查看有namenode 和dataname

/usr/local/hadoop/sbin/yarn-daemon.sh start  resourcemanager


##按服务起来

提示什么鬼 都可以略过,这里被坑一次

1、下载包

2、解压安装包

tar -xzvf sqoop-1.99.7-bin-hadoop200.tar.gz -C /usr/local/

cd /usr/local/

mv sqoop-1.99.7-bin-hadoop200 sqoop1

3、配置环境变量

export SQOOP_HOME=/usr/local/sqoop

export PATH=$PATH:$SQOOP_HOME/bin

配置了hadoop的环境变量,在sqoop2版本也需要配置以下几个,不然重启后就问题了:

Can't load the Hadoop related java lib, please check the setting for the following environment variables:

    HADOOP_COMMON_HOME, HADOOP_HDFS_HOME, HADOOP_MAPRED_HOME, HADOOP_YARN_HOME



export HADOOP_COMMON_HOME=$HADOOP_HOME/share/hadoop/common

export HADOOP_HDFS_HOME=$HADOOP_HOME/share/hadoop/hdfs

export  HADOOP_MAPRED_HOME=$HADOOP_HOME/share/hadoop/mapreduce

export HADOOP_YARN_HOME=$HADOOP_HOME/share/hadoop/yarn

4、修改sqoop的配置文件

/usr/local/sqoop/conf/sqoop.properties 在144行

org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/usr/local/hadoop/etc/hadoop/

#这里配置的hadoop的配置文件路径,没配置好会导致

not a directory or permission issues

5、拷贝驱动包到sqoop/lib目录下

拷贝mysql-connector-java-5.1.6-bin.jar

链接:http://pan.baidu.com/s/1qXIGeSG 密码:iykt 下载包

6、配置hadoop代理访问

sqoop访问Hadoop的MapReduce使用的是代理的方式,必须在Hadoop中配置所接受的proxy用户和组。

找到Hadoop的core-site.xml配置文件

vi /usr/local/hadoop/etc/hadoop/container-executor.cfg 

allowed.system.users=hadoop

<property>

<name>hadoop.proxyuser.hadoop.hosts</name>

<value>*</value>

</property><property>

     <name>hadoop.proxyuser.hadoop.groups</name>

      <value>*</value></property>

</property>


7、配置 Tool class org.apache.sqoop.tools.tool.VerifyTool has failed(没有配置yarm-site.xml)

 <property>

   <name>mapreduce.framework.name</name>

    <value>yarn</value>

 </property>

8、验证配置是否有效

bin/sqoop2-tool verify 返回

Verification was successful.

Tool class org.apache.sqoop.tools.tool.VerifyTool has finished correctly.

9、启动sqoop服务

sqoop2-server start

通过JDK中的jps工具查看是否已经正确启动起来,正常情况下会有个SqoopJettyServer的进程,

Sqoop server是基于jetty实现的。

10、进入SQOOP

进入Client的shell环境:./sqoop2-shell

问题:Exception in thread "main" java.lang.UnsatisfiedLinkError:

 /usr/local/jdk/jre/lib/i386/xawt/libmawt.so: libXext.so.6: 无法打开共对象文件: 没有那个文件或目录


yum install glibc.i686 libXtst.i686


问题2:

 /usr/local/jdk/jre/lib/i386/xawt/libmawt.so: libXrender.so.1:

需要安装一个RPM包,根据系统的版本642.4.2.el6.x86_64 #1

包下载地址:http://down.51cto.com/data/2260998

然后rpm -ivh libXrender-0.9.10-1.fc26.i686.rpm 

系统包的缺失是个坑

最后在执行sqoop2-shell 

[root@nod2 bin]# ./sqoop2-shell

Setting conf dir: /usr/local/sqoop/bin/../conf

Sqoop home directory: /usr/local/sqoop

Sqoop Shell: Type 'help' or '\h' for help.


sqoop:000> 


测试mysql到hadoop数据的导入

1、在MYSQL添加一个用户

grant all privileges on *.* to sqoop@'192.168.%' identified by 'sqoop' with grants option;


mysql> create database sqoop;

Query OK, 1 row affected (0.02 sec)


mysql> use sqoop

Database changed

mysql> create table sqoop(id int,c_time timestamp,name varchar(20))

    -> ;

Query OK, 0 rows affected (0.10 sec)


mysql> insert into sqoop(id,name)value(1,'sqoop')

    -> ;

Query OK, 1 row affected (0.02 sec)


mysql> insert into sqoop(id,name)value(2,'hadoop')

    -> ;

Query OK, 1 row affected (0.02 sec)


mysql> insert into sqoop(id,name)value(3,'hive')

    -> ;

Query OK, 1 row affected (0.02 sec)


mysql> select * from sqoop;

+------+---------------------+--------+

| id   | c_time              | name   |

+------+---------------------+--------+

|    1 | 2016-11-22 15:04:04 | sqoop  |

|    2 | 2016-11-22 15:04:13 | hadoop |

|    3 | 2016-11-22 15:04:21 | hive   |

+------+---------------------+--------+

3 rows in set (0.00 sec)


2、测试连接sqoop

首相了解下1.97的sqoop的操作和其他的版本有区别的:

#查看版本信息

show version /show version --all

#查看sqoop所有连接方式,重点

sqoop:000> show connector

+------------------------+---------+------------------------------------------------------------+----------------------+

|          Name          | Version |                           Class                            | Supported Directions |

+------------------------+---------+------------------------------------------------------------+----------------------+

| oracle-jdbc-connector  | 1.99.7  | org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO              |

| sftp-connector         | 1.99.7  | org.apache.sqoop.connector.sftp.SftpConnector              | TO                   |

| kafka-connector        | 1.99.7  | org.apache.sqoop.connector.kafka.KafkaConnector            | TO                   |

| kite-connector         | 1.99.7  | org.apache.sqoop.connector.kite.KiteConnector              | FROM/TO              |

| ftp-connector          | 1.99.7  | org.apache.sqoop.connector.ftp.FtpConnector                | TO                   |

| hdfs-connector         | 1.99.7  | org.apache.sqoop.connector.hdfs.HdfsConnector              | FROM/TO              |

| generic-jdbc-connector | 1.99.7  | org.apache.sqoop.connector.jdbc.GenericJdbcConnector       | FROM/TO              |

+------------------------+---------+------------------------------------------------------------+----------------------+


#查看当前的link

show link

#查看job

show job

#创建一个hdfs连接,这个和其他版本操作有区别的,

##这个版本创建一个链接必须指定好根据show link的类型才可以,

sqoop:000> create link -c hdfs-connector

Creating link for connector with name hdfs-connector

Please fill following values to create new link object

Name: hdfs_link


HDFS cluster


URI: hdfs://mycat:9000

Conf directory: /usr/local/hadoop/etc/hadoop

Additional configs:: 

There are currently 0 values in the map:

entry# 

New link was successfully created with validation status OK and name hdfs_link

sqoop:000> show link

+-----------+----------------+---------+

|   Name    | Connector Name | Enabled |

+-----------+----------------+---------+

| show link | hdfs-connector | true    |

| hdfs_link | hdfs-connector | true    |

+-----------+----------------+---------+

##创建mysql连接

sqoop:000> create link  -connector generic-jdbc-connector

Creating link for connector with name generic-jdbc-connector

Please fill following values to create new link object

Name: mysql


Database connection


Driver class: com.mysql.jdbc.Driver

Connection String: jdbc:mysql://192.168.1.107/sqoop

Username: sqoop

Password: *******

Fetch Size: 

Connection Properties: 

There are currently 0 values in the map:

entry# protocol=tcp

There are currently 1 values in the map:

protocol = tcp

entry# 

Identifier enclose: 

New link was successfully created with validation status OK and name mysql

sqoop:000> show link

+-----------+------------------------+---------+

|   Name    |     Connector Name     | Enabled |

+-----------+------------------------+---------+

| show link | hdfs-connector         | true    |

| hdfs_link | hdfs-connector         | true    |

| mysql     | generic-jdbc-connector | true    |

+-----------+------------------------+---------+

配置好连接,就要配置传输事物job,需要指定JOB,用于提交给mapreduce

Name: mysql-hdfs


Database source


Schema name: sqoop

Table name: sqoop

SQL statement: 

Column names: 

There are currently 1 values in the list:

1

element# 

Partition column: 

Partition column nullable: 

Boundary query: 


Incremental read


Check column: 

Last value: 


Target configuration


Override null value: false

Null value: 

File format: 

  0 : TEXT_FILE

  1 : SEQUENCE_FILE

  2 : PARQUET_FILE

Choose: 2

Compression codec: 

  0 : NONE

  1 : DEFAULT

  2 : DEFLATE

  3 : GZIP

  4 : BZIP2

  5 : LZO

  6 : LZ4

  7 : SNAPPY

  8 : CUSTOM

Choose: 0

Custom codec: 

Output directory: hdfs:/home/sqoop

Append mode: false


Throttling resources


Extractors: 2

Loaders: 2


Classpath configuration


Extra mapper jars: 

There are currently 1 values in the list:

1

element# 

New job was successfully created with validation status OK  and name mysql-hdfs

sqoop:000> 


sqoop:000> create job -f 2 -t 1  

Creating job for links with from id 1 and to id 6  

Please fill following values to create new job object  

Name: mysql_openfire--设置 任务名称  

FromJob configuration  

Schema name:(Required)sqoop --库名:必填  

Table name:(Required)sqoop --表名:必填  

Table SQL statement:(Optional) --选填  

Table column names:(Optional) --选填  

Partition column name:(Optional) id --选填  

Null value allowed for the partition column:(Optional) --选填  

Boundary query:(Optional) --选填  

ToJob configuration  

Output format:  

0 : TEXT_FILE  

1 : SEQUENCE_FILE  

Output format:  

0 : TEXT_FILE  

1 : SEQUENCE_FILE  

Choose: 0 --选择文件压缩格式  

Compression format:  

0 : NONE  

1 : DEFAULT  

2 : DEFLATE  

3 : GZIP  

4 : BZIP2  

5 : LZO  

6 : LZ4  

7 : SNAPPY  

8 : CUSTOM  

Choose: 0 --选择压缩类型  

Custom compression format:(Optional) --选填  

Output directory:hdfs:/ns1/sqoop --HDFS存储目录(目的地)  

Driver Config  

Extractors: 2 --提取器  

Loaders: 2 --加载器  

New job was successfully created with validation status OK and persistent id 1  

sqoop:000> show job

+----+------------+--------------------------------+----------------------------+---------+

| Id |    Name    |         From Connector         |        To Connector        | Enabled |

+----+------------+--------------------------------+----------------------------+---------+

| 1  | mysql-hdfs | mysql (generic-jdbc-connector) | hdfs_link (hdfs-connector) | true    |

+----+------------+--------------------------------+----------------------------+---------+

sqoop:000> 

 常用命令列表

sqoop:001> show link 显示所有链接

sqoop:001> carete link --cid 1床架连接

sqoop:000> delete link --lid 1 删除link

sqoop:001> show job 显示所有job

sqoop:001> create job --f 2 --t 1 创建job ( 从link 2 向link 1导入数据)

sqoop:000> start job --jid 1 启动job

sqoop:000> status job --jid 1 查看导入状态

sqoop:000> delete job --jid 1 删除job




ERROR tool.ImportTool: Encountered IOException running import job: 

java.io.IOException: No columns to generate for ClassWriter

连接器问题,换把

Exception in thread "main" Java.lang.IncompatibleClassChangeError: 

Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected


hadoop与sqoop版本问题 不一致


Exception has occurred during processing command

Exception: org.apache.sqoop.common.SqoopException Message: CLIENT_0001:Server has returned exception

根本不知道这个提示说什么,通过修改设置:

set option --name verbose --value true


#####新建LINK 到JOB 最后到失败。。。。

sqoop:000> update job -n cmd

Updating job with name cmd

Please update job:

Name: cmd


Database source


Schema name: sqoop

Table name: test

SQL statement: 

Column names: 

There are currently 1 values in the list:

proctor=tcp

element# 

Partition column: 

Partition column nullable: 

Boundary query: 


Incremental read


Check column: 

Last value: 


Target configuration


Override null value: 

Null value: 

File format: 

  0 : TEXT_FILE

  1 : SEQUENCE_FILE

  2 : PARQUET_FILE

Choose: 0

Compression codec: 

  0 : NONE

  1 : DEFAULT

  2 : DEFLATE

  3 : GZIP

  4 : BZIP2

  5 : LZO

  6 : LZ4

  7 : SNAPPY

  8 : CUSTOM

Choose: 0

Custom codec: 

Output directory: hdfs://mycat:9000/home/sqoop

Append mode: 


Throttling resources


Extractors: 2

Loaders: 2


Classpath configuration


Extra mapper jars: 

There are currently 1 values in the list:

proctor=tcp

element# 

Job was successfully updated with status OK


Caused by: Exception: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException Message: 

You have an error in your SQL syntax; check the manual that corresponds to your MySQL 

server version for the right syntax to use near '"sqoop"."test"' at line 1

Stack trace:

at  sun.reflect.GeneratedConstructorAccessor77 (null:-1)  

at  sun.reflect.DelegatingConstructorAccessorImpl (DelegatingConstructorAccessorImpl.java:45)  

at  java.lang.reflect.Constructor (Constructor.java:526)  

at  com.mysql.jdbc.Util (Util.java:406)  

at  com.mysql.jdbc.Util (Util.java:381)  

at  com.mysql.jdbc.SQLError (SQLError.java:1030)  

at  com.mysql.jdbc.SQLError (SQLError.java:956)  

at  com.mysql.jdbc.MysqlIO (MysqlIO.java:3491)  

at  com.mysql.jdbc.MysqlIO (MysqlIO.java:3423)  

at  com.mysql.jdbc.MysqlIO (MysqlIO.java:1936)  

at  com.mysql.jdbc.MysqlIO (MysqlIO.java:2060)  

at  com.mysql.jdbc.ConnectionImpl (ConnectionImpl.java:2542)  

at  com.mysql.jdbc.PreparedStatement (PreparedStatement.java:1734)  

at  com.mysql.jdbc.PreparedStatement (PreparedStatement.java:1885)  

at  org.apache.sqoop.connector.jdbc.GenericJdbcFromInitializer (GenericJdbcFromInitializer.java:257)  

at  org.apache.sqoop.connector.jdbc.GenericJdbcFromInitializer (GenericJdbcFromInitializer.java:61)  

at  org.apache.sqoop.connector.jdbc.GenericJdbcFromInitializer (GenericJdbcFromInitializer.java:47)  

at  org.apache.sqoop.driver.JobManager$1 (JobManager.java:520)  

at  org.apache.sqoop.driver.JobManager$1 (JobManager.java:517)  

at  org.apache.sqoop.utils.ClassUtils (ClassUtils.java:281)  

at  org.apache.sqoop.driver.JobManager (JobManager.java:516)  

at  org.apache.sqoop.driver.JobManager (JobManager.java:423)  

at  org.apache.sqoop.driver.JobManager (JobManager.java:317)  

at  org.apache.sqoop.handler.JobRequestHandler (JobRequestHandler.java:353)  

at  org.apache.sqoop.handler.JobRequestHandler (JobRequestHandler.java:114)  

at  org.apache.sqoop.server.v1.JobServlet (JobServlet.java:84)  

at  org.apache.sqoop.server.SqoopProtocolServlet (SqoopProtocolServlet.java:81)  

at  javax.servlet.http.HttpServlet (HttpServlet.java:710)  

at  javax.servlet.http.HttpServlet (HttpServlet.java:790)  

at  org.eclipse.jetty.servlet.ServletHolder (ServletHolder.java:808)  

at  org.eclipse.jetty.servlet.ServletHandler$CachedChain (ServletHandler.java:1669)  

at  org.apache.hadoop.security.authentication.server.AuthenticationFilter (AuthenticationFilter.java:595)  

at  org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter (DelegationTokenAuthenticationFilter.java:291)  

at  org.apache.hadoop.security.authentication.server.AuthenticationFilter (AuthenticationFilter.java:554)  

at  org.eclipse.jetty.servlet.ServletHandler$CachedChain (ServletHandler.java:1652)  

at  org.eclipse.jetty.servlet.ServletHandler (ServletHandler.java:585)  

at  org.eclipse.jetty.server.handler.ContextHandler (ContextHandler.java:1127)  

at  org.eclipse.jetty.servlet.ServletHandler (ServletHandler.java:515)  

at  org.eclipse.jetty.server.handler.ContextHandler (ContextHandler.java:1061)  

at  org.eclipse.jetty.server.handler.ScopedHandler (ScopedHandler.java:141)  

at  org.eclipse.jetty.server.handler.HandlerWrapper (HandlerWrapper.java:97)  

at  org.eclipse.jetty.server.Server (Server.java:499)  

at  org.eclipse.jetty.server.HttpChannel (HttpChannel.java:310)  

at  org.eclipse.jetty.server.HttpConnection (HttpConnection.java:257)  

at  org.eclipse.jetty.io.AbstractConnection$2 (AbstractConnection.java:540)  

at  java.util.concurrent.ThreadPoolExecutor (ThreadPoolExecutor.java:1145)  

at  java.util.concurrent.ThreadPoolExecutor$Worker (ThreadPoolExecutor.java:615)  

at  java.lang.Thread (Thread.java:745)  


本文转自 DBAspace 51CTO博客,原文链接:http://blog.51cto.com/dbaspace/1875955

相关实践学习
基于CentOS快速搭建LAMP环境
本教程介绍如何搭建LAMP环境,其中LAMP分别代表Linux、Apache、MySQL和PHP。
全面了解阿里云能为你做什么
阿里云在全球各地部署高效节能的绿色数据中心,利用清洁计算为万物互联的新世界提供源源不断的能源动力,目前开服的区域包括中国(华北、华东、华南、香港)、新加坡、美国(美东、美西)、欧洲、中东、澳大利亚、日本。目前阿里云的产品涵盖弹性计算、数据库、存储与CDN、分析与搜索、云通信、网络、管理与监控、应用服务、互联网中间件、移动服务、视频服务等。通过本课程,来了解阿里云能够为你的业务带来哪些帮助 &nbsp; &nbsp; 相关的阿里云产品:云服务器ECS 云服务器 ECS(Elastic Compute Service)是一种弹性可伸缩的计算服务,助您降低 IT 成本,提升运维效率,使您更专注于核心业务创新。产品详情: https://www.aliyun.com/product/ecs
相关文章
|
5月前
|
分布式计算 Java 关系型数据库
73 sqoop安装
73 sqoop安装
28 2
|
3月前
|
分布式计算 Hadoop 关系型数据库
使用Sqoop将数据导入Hadoop的详细教程
使用Sqoop将数据导入Hadoop的详细教程
|
4月前
|
SQL 分布式计算 关系型数据库
sqoop快速安装
sqoop快速安装
24 0
|
11月前
|
SQL 分布式计算 关系型数据库
sqoop的安装和使用
sqoop的安装和使用
|
SQL 分布式计算 关系型数据库
Sqoop笔记
Sqoop笔记
147 0
Sqoop笔记
|
SQL 关系型数据库 MySQL
CDH 搭建_Hive_搭建完成|学习笔记
快速学习 CDH 搭建_Hive_搭建完成
178 0
CDH 搭建_Hive_搭建完成|学习笔记
|
SQL 分布式计算 Oracle
【Sqoop】(一)Sqoop 概述及安装环境搭建
【Sqoop】(一)Sqoop 概述及安装环境搭建
315 0
【Sqoop】(一)Sqoop 概述及安装环境搭建
|
存储 SQL 分布式计算
Sqoop简介及安装部署
Apache Sqoop是专为Apache Hadoop和结构化数据存储如关系数据库之间的数据转换工具的有效工具。你可以使用Sqoop从外部结构化数据存储的数据导入到Hadoop分布式文件系统或相关系统如Hive和HBase。相反,Sqoop可以用来从Hadoop的数据提取和导出到外部结构化数据存储如关系数据库和企业数据仓库。 Sqoop专为大数据批量传输设计,能够分割数据集并创建Hadoop任务来处理每个区块。
181 0
|
SQL 监控 关系型数据库
sqoop学习总结
自行学习总结
|
分布式计算 大数据 关系型数据库
Sqoop安装
Sqoop是Hadoop生态里,实现Hive、Hbase等大数据数据库与MySQL、Oracle导入导出的工具。 其实就是包含两部分功能:(1)HDFS的读写能力(2)加载JDBC。
1770 0