搭建Gateway向E-MapReduce集群提交作业

简介: 搭建Gateway向E-MapReduce集群提交作业

Gateway

一些客户需要自主搭建Gateway向E-MapReduce集群提交作业,目前E-MapReduce在产品页面上不支持购买Gateway,后续可以在产品上直接购买Gateway,并把Hadoop环境准备好供用户使用。

购买ECS

在ECS控制台购买ECS,公共镜像系统选择CentOS 7.2

网络

首先要保证Gateway机器在EMR对应集群的安全组中,Gateway节点可以顺利的访问EMR集群。设置机器的安全组请参考ECS的安全组设置说明。

环境

  • EMR-3.1.1版本

将下面脚本拷贝到Gataway机器并执行.

示例: sh deploy.sh 10.27.227.223 /root/master_password_file
备注:

master_ip是master的内网ip

master_password_file里面保存登陆master机器的密码

#!/usr/bin/bash
if [ $# != 2 ]
then
   echo "Usage: $0 master_ip master_password_file"
   exit 1;
fi
masterip=$1
masterpwdfile=$2

if ! type sshpass >/dev/null 2>&1; then
   yum install -y sshpass
fi

if ! type java >/dev/null 2>&1; then
   yum install -y java-1.8.0-openjdk
fi

mkdir -p /opt/apps
mkdir -p /etc/emr

echo "Start to copy package from $masterip to local gateway(/opt/apps)"
echo " -copying hadoop-2.7.2"
sshpass -f $masterpwdfile scp -r -o 'StrictHostKeyChecking no' root@$masterip:/opt/apps/hadoop-2.7.2 /opt/apps/
echo " -copying hive-2.0.1"
sshpass -f $masterpwdfile scp -r root@$masterip:/opt/apps/apache-hive-2.0.1-bin /opt/apps/
echo " -copying spark-2.1.1"
sshpass -f $masterpwdfile scp -r root@$masterip:/opt/apps/spark-2.1.1-bin-hadoop2.7 /opt/apps/

echo "Start to link /usr/lib/\${app}-current to /opt/apps/\${app}"
if [ -L /usr/lib/hadoop-current ]
then
   unlink /usr/lib/hadoop-current
fi
ln -s /opt/apps/hadoop-2.7.2  /usr/lib/hadoop-current
if [ -L /usr/lib/hive-current ]
then
   unlink /usr/lib/hive-current
fi
ln -s /opt/apps/apache-hive-2.0.1-bin  /usr/lib/hive-current
if [ -L /usr/lib/spark-current ]
then
   unlink /usr/lib/spark-current
fi
ln -s /opt/apps/spark-2.1.1-bin-hadoop2.7 /usr/lib/spark-current

echo "Start to copy conf from $masterip to local gateway(/etc/emr)"
sshpass -f $masterpwdfile scp -r root@$masterip:/etc/emr/hadoop-conf-2.7.2  /etc/emr/hadoop-conf-2.7.2
if [ -L /etc/emr/hadoop-conf ]
then
   unlink /etc/emr/hadoop-conf
fi
ln -s /etc/emr/hadoop-conf-2.7.2  /etc/emr/hadoop-conf
sshpass -f $masterpwdfile scp -r root@$masterip:/etc/emr/hive-conf-2.0.1  /etc/emr/hive-conf-2.0.1/
if [ -L /etc/emr/hive-conf ]
then
   unlink /etc/emr/hive-conf
fi
ln -s /etc/emr/hive-conf-2.0.1  /etc/emr/hive-conf
sshpass -f $masterpwdfile scp -r root@$masterip:/etc/emr/spark-conf /etc/emr/spark-conf-2.1.1
if [ -L /etc/emr/spark-conf ]
then
   unlink /etc/emr/spark-conf
fi
ln -s /etc/emr/spark-conf-2.1.1  /etc/emr/spark-conf

echo "Start to copy environment from $masterip to local gateway(/etc/profile.d)"
sshpass -f $masterpwdfile scp root@$masterip:/etc/profile.d/hadoop.sh /etc/profile.d/

if [ -L /usr/lib/jvm/java ]
then
   unlink /usr/lib/jvm/java
fi
ln -s /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.131-3.b12.el7_3.x86_64/jre /usr/lib/jvm/java
echo "Start to copy host info from $masterip to local gateway(/etc/hosts)"
sshpass -f $masterpwdfile scp root@$masterip:/etc/hosts /etc/hosts_bak
cat /etc/hosts_bak | grep emr | grep cluster >>/etc/hosts

if ! id hadoop >& /dev/null
then
   useradd hadoop
fi
  • EMR-3.2.0

将下面脚本拷贝到Gataway机器并执行.

示例: sh deploy.sh 10.27.227.223 /root/master_password_file

#!/usr/bin/bash
if [ $# != 2 ]
then
   echo "Usage: $0 master_ip master_password_file"
   exit 1;
fi
masterip=$1
masterpwdfile=$2

if ! type sshpass >/dev/null 2>&1; then
   yum install -y sshpass
fi

if ! type java >/dev/null 2>&1; then
   yum install -y java-1.8.0-openjdk
fi

mkdir -p /opt/apps
mkdir -p /etc/ecm

echo "Start to copy package from $masterip to local gateway(/opt/apps)"
echo " -copying hadoop-2.7.2"
sshpass -f $masterpwdfile scp -r -o 'StrictHostKeyChecking no' root@$masterip:/opt/apps/ecm/service/hadoop/2.7.2/package/hadoop-2.7.2 /opt/apps/
echo " -copying hive-2.0.1"
sshpass -f $masterpwdfile scp -r root@$masterip:/opt/apps/ecm/service/hive/2.0.1/package/apache-hive-2.0.1-bin /opt/apps/
echo " -copying spark-2.1.1"
sshpass -f $masterpwdfile scp -r root@$masterip:/opt/apps/ecm/service/spark/2.1.1/package/spark-2.1.1-bin-hadoop2.7 /opt/apps/

echo "Start to link /usr/lib/\${app}-current to /opt/apps/\${app}"
if [ -L /usr/lib/hadoop-current ]
then
   unlink /usr/lib/hadoop-current
fi
ln -s /opt/apps/hadoop-2.7.2  /usr/lib/hadoop-current
if [ -L /usr/lib/hive-current ]
then
   unlink /usr/lib/hive-current
fi
ln -s /opt/apps/apache-hive-2.0.1-bin  /usr/lib/hive-current
if [ -L /usr/lib/spark-current ]
then
   unlink /usr/lib/spark-current
fi
ln -s /opt/apps/spark-2.1.1-bin-hadoop2.7 /usr/lib/spark-current

echo "Start to copy conf from $masterip to local gateway(/etc/ecm)"
sshpass -f $masterpwdfile scp -r root@$masterip:/etc/ecm/hadoop-conf-2.7.2  /etc/ecm/hadoop-conf-2.7.2
if [ -L /etc/ecm/hadoop-conf ]
then
   unlink /etc/ecm/hadoop-conf
fi
ln -s /etc/ecm/hadoop-conf-2.7.2  /etc/ecm/hadoop-conf
sshpass -f $masterpwdfile scp -r root@$masterip:/etc/ecm/hive-conf-2.0.1  /etc/ecm/hive-conf-2.0.1/
if [ -L /etc/ecm/hive-conf ]
then
   unlink /etc/ecm/hive-conf
fi
ln -s /etc/ecm/hive-conf-2.0.1  /etc/ecm/hive-conf
sshpass -f $masterpwdfile scp -r root@$masterip:/etc/ecm/spark-conf /etc/ecm/spark-conf-2.1.1
if [ -L /etc/ecm/spark-conf ]
then
   unlink /etc/ecm/spark-conf
fi
ln -s /etc/ecm/spark-conf-2.1.1  /etc/ecm/spark-conf

echo "Start to copy environment from $masterip to local gateway(/etc/profile.d)"
sshpass -f $masterpwdfile scp root@$masterip:/etc/profile.d/hdfs.sh /etc/profile.d/
sshpass -f $masterpwdfile scp root@$masterip:/etc/profile.d/yarn.sh /etc/profile.d/
sshpass -f $masterpwdfile scp root@$masterip:/etc/profile.d/hive.sh /etc/profile.d/
sshpass -f $masterpwdfile scp root@$masterip:/etc/profile.d/spark.sh /etc/profile.d/

if [ -L /usr/lib/jvm/java ]
then
   unlink /usr/lib/jvm/java
fi
echo "" >>/etc/profile.d/hdfs.sh
echo export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.131-3.b12.el7_3.x86_64/jre >>/etc/profile.d/hdfs.sh
ln -s /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.131-3.b12.el7_3.x86_64/jre /usr/lib/jvm/java
echo "Start to copy host info from $masterip to local gateway(/etc/hosts)"
sshpass -f $masterpwdfile scp root@$masterip:/etc/hosts /etc/hosts_bak
cat /etc/hosts_bak | grep emr | grep cluster >>/etc/hosts

if ! id hadoop >& /dev/null
then
   useradd hadoop
fi

完成以上以后,配置就完成了。

测试

切换到hadoop账号

  • Hive
[hadoop@iZ23bc05hrvZ ~]$ hive
hive> show databases;
OK
default
Time taken: 1.124 seconds, Fetched: 1 row(s)
hive> create database school;
OK
Time taken: 0.362 seconds
hive>
  • 运行Hadoop作业
[hadoop@iZ23bc05hrvZ ~]$ hadoop  jar /usr/lib/hadoop-current/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 10 10
Number of Maps  = 10
Samples per Map = 10
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
  File Input Format Counters 
      Bytes Read=1180
  File Output Format Counters 
      Bytes Written=97
Job Finished in 29.798 seconds
Estimated value of Pi is 3.20000000000000000000
  • 运行Spark作业
[hadoop@iZ23bc05hrvZ ~]$spark-submit  --class org.apache.spark.examples.JavaWordCount --master yarn-client ./sparkbench-4.0-SNAPSHOT-MR2-spark1.4-jar-with-dependencies.jar /path/Input /path/Output
相关实践学习
数据湖构建DLF快速入门
本教程通过使⽤数据湖构建DLF产品对于淘宝用户行为样例数据的分析,介绍数据湖构建DLF产品的数据发现和数据探索功能。
快速掌握阿里云 E-MapReduce
E-MapReduce 是构建于阿里云 ECS 弹性虚拟机之上,利用开源大数据生态系统,包括 Hadoop、Spark、HBase,为用户提供集群、作业、数据等管理的一站式大数据处理分析服务。 本课程主要介绍阿里云 E-MapReduce 的使用方法。
目录
相关文章
|
4月前
|
分布式计算 Java Hadoop
IDEA 打包MapReduce程序到集群运行的两种方式以及XShell和Xftp过期的解决
IDEA 打包MapReduce程序到集群运行的两种方式以及XShell和Xftp过期的解决
|
4月前
|
分布式计算 Hadoop Java
【集群模式】执行MapReduce程序-wordcount
【集群模式】执行MapReduce程序-wordcount
|
7月前
|
数据库 数据安全/隐私保护
阿里云E-MapReduce集群-开源Ldap密码不安全问题解决方案
社区开源Ldap密码不安全问题解决方案
|
8月前
|
分布式计算 资源调度 Hadoop
Hadoop基础学习---5、MapReduce概述和WordCount实操(本地运行和集群运行)、Hadoop序列化
Hadoop基础学习---5、MapReduce概述和WordCount实操(本地运行和集群运行)、Hadoop序列化
|
11月前
|
分布式计算 Ubuntu Hadoop
【集群模式】执行MapReduce程序-wordcount
因为是在hadoop集群下通过jar包的方式运行我们自己写的wordcount案例,所以需要传递的是 HDFS中的文件路径,所以我们需要修改上一节【本地模式】中 WordCountRunner类 的代码
|
负载均衡 Java API
Spring Cloud Gateway整合Nacos实现服务路由及集群负载均衡
我们都知道Spring Cloud Gateway是一个基于Spring Boot、Spring WebFlux、Project Reactor构建的高性能网关,旨在提供简单、高效的API路由。
Spring Cloud Gateway整合Nacos实现服务路由及集群负载均衡
|
前端开发 Java 中间件
MyCat - 环境搭建 - 微服务网关 gateway 搭建 | 学习笔记
快速学习 MyCat - 环境搭建 - 微服务网关 gateway 搭建
195 0
MyCat - 环境搭建 - 微服务网关 gateway 搭建 | 学习笔记
|
弹性计算 分布式计算 Java
E-MapReduce集群-JAVA客户端远程连接HDFS
阿里云E-MapReduce集群-JAVA客户端远程连接HDFS
|
分布式计算 资源调度 Java
Hadoop中的MapReduce概述、优缺点、核心思想、编程规范、进程、官方WordCount源码、提交到集群测试、常用数据序列化类型、WordCount案例实操
Hadoop中的MapReduce概述、优缺点、核心思想、编程规范、进程、官方WordCount源码、提交到集群测试、常用数据序列化类型、WordCount案例实操
Hadoop中的MapReduce概述、优缺点、核心思想、编程规范、进程、官方WordCount源码、提交到集群测试、常用数据序列化类型、WordCount案例实操