大数据工具篇之flume1.4-安装部署指南

简介: 一、引言   flume-ng是一个分布式、高可靠和高效的日志收集系统,flume-ng是flume的新版本的意思,其中“ng”意为new generate(新一代),目前来说,flume-ng 1.4是最新的版本。

一、引言

  flume-ng是一个分布式、高可靠和高效的日志收集系统,flume-ng是flume的新版本的意思,其中“ng”意为new generate(新一代),目前来说,flume-ng 1.4是最新的版本。flume-ng与flume相比,发生了很大的变化,因为之前一直在flume0.9的版本,一直没有升级到flume-ng,最近因为项目需要,做了一次升级,发现了一些问题,特记录下来,分享给大家。

二、版本说明

  flume-ng 1.4.0

三、安装步骤

  下载、解压、安装JDK、设置环境变量部分已经有很多介绍性的问题,不做说明。需要特别说明之处的是,flume-ng不需要要zookeeper,无需设置。

四、flume-ng bug  

  安装完成后运行flume-ng会出现错误信息,这主要是因为shell脚本的问题,我将修改后的flume-ng完整的上传如下,其中标注:#zhangzl下面的行是需要修改的部分。完整脚本如下所示:  

  1 #!/bin/bash
  2 #
  3 #
  4 # Licensed to the Apache Software Foundation (ASF) under one
  5 # or more contributor license agreements.  See the NOTICE file
  6 # distributed with this work for additional information
  7 # regarding copyright ownership.  The ASF licenses this file
  8 # to you under the Apache License, Version 2.0 (the
  9 # "License"); you may not use this file except in compliance
 10 # with the License.  You may obtain a copy of the License at
 11 #
 12 #   http://www.apache.org/licenses/LICENSE-2.0
 13 #
 14 # Unless required by applicable law or agreed to in writing,
 15 # software distributed under the License is distributed on an
 16 # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 17 # KIND, either express or implied.  See the License for the
 18 # specific language governing permissions and limitations
 19 # under the License.
 20 #
 21 
 22 ################################
 23 # constants
 24 ################################
 25 
 26 FLUME_AGENT_CLASS="org.apache.flume.node.Application"
 27 FLUME_AVRO_CLIENT_CLASS="org.apache.flume.client.avro.AvroCLIClient"
 28 FLUME_VERSION_CLASS="org.apache.flume.tools.VersionInfo"
 29 FLUME_TOOLS_CLASS="org.apache.flume.tools.FlumeToolsMain"
 30 
 31 CLEAN_FLAG=1
 32 ################################
 33 # functions
 34 ################################
 35 
 36 info() {
 37   if [ ${CLEAN_FLAG} -ne 0 ]; then
 38     local msg=$1
 39     echo "Info: $msg" >&2
 40   fi
 41 }
 42 
 43 warn() {
 44   if [ ${CLEAN_FLAG} -ne 0 ]; then
 45     local msg=$1
 46     echo "Warning: $msg" >&2
 47   fi
 48 }
 49 
 50 error() {
 51   local msg=$1
 52   local exit_code=$2
 53 
 54   echo "Error: $msg" >&2
 55 
 56   if [ -n "$exit_code" ] ; then
 57     exit $exit_code
 58   fi
 59 }
 60 
 61 # If avail, add Hadoop paths to the FLUME_CLASSPATH and to the
 62 # FLUME_JAVA_LIBRARY_PATH env vars.
 63 # Requires Flume jars to already be on FLUME_CLASSPATH.
 64 add_hadoop_paths() {
 65   local HADOOP_IN_PATH=$(PATH="${HADOOP_HOME:-${HADOOP_PREFIX}}/bin:$PATH" \
 66       which hadoop 2>/dev/null)
 67 
 68   if [ -f "${HADOOP_IN_PATH}" ]; then
 69     info "Including Hadoop libraries found via ($HADOOP_IN_PATH) for HDFS access"
 70 
 71     # determine hadoop java.library.path and use that for flume
 72     local HADOOP_CLASSPATH=""
 73     local HADOOP_JAVA_LIBRARY_PATH=$(HADOOP_CLASSPATH="$FLUME_CLASSPATH" \
 74         ${HADOOP_IN_PATH} org.apache.flume.tools.GetJavaProperty \
 75         java.library.path)
 76 
 77     # look for the line that has the desired property value
 78     # (considering extraneous output from some GC options that write to stdout)
 79     # IFS = InternalFieldSeparator (set to recognize only newline char as delimiter)
 80     IFS=$'\n'
 81     for line in $HADOOP_JAVA_LIBRARY_PATH; do
 82       #if [[ $line =~ ^java\.library\.path=(.*)$ ]]; then
 83       if [[ "$line" =~ "^java\.library\.path=(.*)$" ]]; then
 84         HADOOP_JAVA_LIBRARY_PATH=${BASH_REMATCH[1]}
 85         break
 86       fi
 87     done
 88     unset IFS
 89 
 90     if [ -n "${HADOOP_JAVA_LIBRARY_PATH}" ]; then
 91       FLUME_JAVA_LIBRARY_PATH="$FLUME_JAVA_LIBRARY_PATH:$HADOOP_JAVA_LIBRARY_PATH"
 92     fi
 93 
 94     # determine hadoop classpath
 95     HADOOP_CLASSPATH=$($HADOOP_IN_PATH classpath)
 96 
 97     # hack up and filter hadoop classpath
 98     local ELEMENTS=$(sed -e 's/:/ /g' <<<${HADOOP_CLASSPATH})
 99     local ELEMENT
100     for ELEMENT in $ELEMENTS; do
101       local PIECE
102       for PIECE in $(echo $ELEMENT); do
103           #zhangzl
104         if [[ $PIECE =~ "slf4j-(api|log4j12).*\.jar" ]]; then
105           info "Excluding $PIECE from classpath"
106           continue
107         else
108           FLUME_CLASSPATH="$FLUME_CLASSPATH:$PIECE"
109         fi
110       done
111     done
112 
113   fi
114 }
115 add_HBASE_paths() {
116   local HBASE_IN_PATH=$(PATH="${HBASE_HOME}/bin:$PATH" \
117       which hbase 2>/dev/null)
118 
119   if [ -f "${HBASE_IN_PATH}" ]; then
120     info "Including HBASE libraries found via ($HBASE_IN_PATH) for HBASE access"
121 
122     # determine HBASE java.library.path and use that for flume
123     local HBASE_CLASSPATH=""
124     local HBASE_JAVA_LIBRARY_PATH=$(HBASE_CLASSPATH="$FLUME_CLASSPATH" \
125         ${HBASE_IN_PATH} org.apache.flume.tools.GetJavaProperty \
126         java.library.path)
127 
128     # look for the line that has the desired property value
129     # (considering extraneous output from some GC options that write to stdout)
130     # IFS = InternalFieldSeparator (set to recognize only newline char as delimiter)
131     IFS=$'\n'
132     for line in $HBASE_JAVA_LIBRARY_PATH; do
133     #zhangzl
134       if [[ $line =~ "^java\.library\.path=(.*)$" ]]; then
135         HBASE_JAVA_LIBRARY_PATH=${BASH_REMATCH[1]}
136         break
137       fi
138     done
139     unset IFS
140 
141     if [ -n "${HBASE_JAVA_LIBRARY_PATH}" ]; then
142       FLUME_JAVA_LIBRARY_PATH="$FLUME_JAVA_LIBRARY_PATH:$HBASE_JAVA_LIBRARY_PATH"
143     fi
144 
145     # determine HBASE classpath
146     HBASE_CLASSPATH=$($HBASE_IN_PATH classpath)
147 
148     # hack up and filter HBASE classpath
149     local ELEMENTS=$(sed -e 's/:/ /g' <<<${HBASE_CLASSPATH})
150     local ELEMENT
151     for ELEMENT in $ELEMENTS; do
152       local PIECE
153       for PIECE in $(echo $ELEMENT); do
154       #zhangzl
155         if [[ $PIECE =~ "slf4j-(api|log4j12).*\.jar" ]]; then
156           info "Excluding $PIECE from classpath"
157           continue
158         else
159           FLUME_CLASSPATH="$FLUME_CLASSPATH:$PIECE"
160         fi
161       done
162     done
163     FLUME_CLASSPATH="$FLUME_CLASSPATH:$HBASE_HOME/conf"
164 
165   fi
166 }
167 
168 set_LD_LIBRARY_PATH(){
169 #Append the FLUME_JAVA_LIBRARY_PATH to whatever the user may have specified in
170 #flume-env.sh
171   if [ -n "${FLUME_JAVA_LIBRARY_PATH}" ]; then
172     export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${FLUME_JAVA_LIBRARY_PATH}"
173   fi
174 }
175 
176 display_help() {
177   cat <<EOF
178 Usage: $0 <command> [options]...
179 
180 commands:
181   help                  display this help text
182   agent                 run a Flume agent
183   avro-client           run an avro Flume client
184   version               show Flume version info
185 
186 global options:
187   --conf,-c <conf>      use configs in <conf> directory
188   --classpath,-C <cp>   append to the classpath
189   --dryrun,-d           do not actually start Flume, just print the command
190   --plugins-path <dirs> colon-separated list of plugins.d directories. See the
191                         plugins.d section in the user guide for more details.
192                         Default: \$FLUME_HOME/plugins.d
193   -Dproperty=value      sets a Java system property value
194   -Xproperty=value      sets a Java -X option
195 
196 agent options:
197   --conf-file,-f <file> specify a config file (required)
198   --name,-n <name>      the name of this agent (required)
199   --help,-h             display help text
200 
201 avro-client options:
202   --rpcProps,-P <file>   RPC client properties file with server connection params
203   --host,-H <host>       hostname to which events will be sent
204   --port,-p <port>       port of the avro source
205   --dirname <dir>        directory to stream to avro source
206   --filename,-F <file>   text file to stream to avro source (default: std input)
207   --headerFile,-R <file> File containing event headers as key/value pairs on each new line
208   --help,-h              display help text
209 
210   Either --rpcProps or both --host and --port must be specified.
211 
212 Note that if <conf> directory is specified, then it is always included first
213 in the classpath.
214 
215 EOF
216 }
217 
218 run_flume() {
219   local FLUME_APPLICATION_CLASS
220 
221   if [ "$#" -gt 0 ]; then
222     FLUME_APPLICATION_CLASS=$1
223     shift
224   else
225     error "Must specify flume application class" 1
226   fi
227 
228   if [ ${CLEAN_FLAG} -ne 0 ]; then
229     set -x
230   fi
231   $EXEC $JAVA_HOME/bin/java $JAVA_OPTS -cp "$FLUME_CLASSPATH" \
232       -Djava.library.path=$FLUME_JAVA_LIBRARY_PATH "$FLUME_APPLICATION_CLASS" $*
233 }
234 
235 ################################
236 # main
237 ################################
238 
239 # set default params
240 FLUME_CLASSPATH=""
241 FLUME_JAVA_LIBRARY_PATH=""
242 JAVA_OPTS="-Xmx20m"
243 LD_LIBRARY_PATH=""
244 
245 opt_conf=""
246 opt_classpath=""
247 opt_plugins_dirs=""
248 opt_java_props=""
249 opt_dryrun=""
250 
251 mode=$1
252 shift
253 
254 case "$mode" in
255   help)
256     display_help
257     exit 0
258     ;;
259   agent)
260     opt_agent=1
261     ;;
262   node)
263     opt_agent=1
264     warn "The \"node\" command is deprecated. Please use \"agent\" instead."
265     ;;
266   avro-client)
267     opt_avro_client=1
268     ;;
269   tool)
270     opt_tool=1
271     ;;
272   version)
273    opt_version=1
274    CLEAN_FLAG=0
275    ;;
276   *)
277     error "Unknown or unspecified command '$mode'"
278     echo
279     display_help
280     exit 1
281     ;;
282 esac
283 
284 args=""
285 while [ -n "$*" ] ; do
286   arg=$1
287   shift
288 
289   case "$arg" in
290     --conf|-c)
291       [ -n "$1" ] || error "Option --conf requires an argument" 1
292       opt_conf=$1
293       shift
294       ;;
295     --classpath|-C)
296       [ -n "$1" ] || error "Option --classpath requires an argument" 1
297       opt_classpath=$1
298       shift
299       ;;
300     --dryrun|-d)
301       opt_dryrun="1"
302       ;;
303     --plugins-path)
304       opt_plugins_dirs=$1
305       shift
306       ;;
307     -D*)
308       opt_java_props="$opt_java_props $arg"
309       ;;
310     -X*)
311       opt_java_props="$opt_java_props $arg"
312       ;;
313     *)
314       args="$args $arg"
315       ;;
316   esac
317 done
318 
319 # make opt_conf absolute
320 if [[ -n "$opt_conf" && -d "$opt_conf" ]]; then
321   opt_conf=$(cd $opt_conf; pwd)
322 fi
323 
324 # allow users to override the default env vars via conf/flume-env.sh
325 if [ -z "$opt_conf" ]; then
326   warn "No configuration directory set! Use --conf <dir> to override."
327 elif [ -f "$opt_conf/flume-env.sh" ]; then
328   info "Sourcing environment configuration script $opt_conf/flume-env.sh"
329   source "$opt_conf/flume-env.sh"
330 fi
331 
332 # append command-line java options to stock or env script JAVA_OPTS
333 if [ -n "${opt_java_props}" ]; then
334   JAVA_OPTS="${JAVA_OPTS} ${opt_java_props}"
335 fi
336 
337 # prepend command-line classpath to env script classpath
338 if [ -n "${opt_classpath}" ]; then
339   if [ -n "${FLUME_CLASSPATH}" ]; then
340     FLUME_CLASSPATH="${opt_classpath}:${FLUME_CLASSPATH}"
341   else
342     FLUME_CLASSPATH="${opt_classpath}"
343   fi
344 fi
345 
346 if [ -z "${FLUME_HOME}" ]; then
347   FLUME_HOME=$(cd $(dirname $0)/..; pwd)
348 fi
349 
350 # prepend $FLUME_HOME/lib jars to the specified classpath (if any)
351 if [ -n "${FLUME_CLASSPATH}" ] ; then
352   FLUME_CLASSPATH="${FLUME_HOME}/lib/*:$FLUME_CLASSPATH"
353 else
354   FLUME_CLASSPATH="${FLUME_HOME}/lib/*"
355 fi
356 
357 # load plugins.d directories
358 PLUGINS_DIRS=""
359 if [ -n "${opt_plugins_dirs}" ]; then
360   PLUGINS_DIRS=$(sed -e 's/:/ /g' <<<${opt_plugins_dirs})
361 else
362   PLUGINS_DIRS="${FLUME_HOME}/plugins.d"
363 fi
364 
365 unset plugin_lib plugin_libext plugin_native
366 for PLUGINS_DIR in $PLUGINS_DIRS; do
367   if [[ -d ${PLUGINS_DIR} ]]; then
368     for plugin in ${PLUGINS_DIR}/*; do
369       if [[ -d "$plugin/lib" ]]; then
370         plugin_lib="${plugin_lib}${plugin_lib+:}${plugin}/lib/*"
371       fi
372       if [[ -d "$plugin/libext" ]]; then
373         plugin_libext="${plugin_libext}${plugin_libext+:}${plugin}/libext/*"
374       fi
375       if [[ -d "$plugin/native" ]]; then
376         plugin_native="${plugin_native}${plugin_native+:}${plugin}/native"
377       fi
378     done
379   fi
380 done
381 
382 if [[ -n "${plugin_lib}" ]]
383 then
384   FLUME_CLASSPATH="${FLUME_CLASSPATH}:${plugin_lib}"
385 fi
386 
387 if [[ -n "${plugin_libext}" ]]
388 then
389   FLUME_CLASSPATH="${FLUME_CLASSPATH}:${plugin_libext}"
390 fi
391 
392 if [[ -n "${plugin_native}" ]]
393 then
394   if [[ -n "${FLUME_JAVA_LIBRARY_PATH}" ]]
395   then
396     FLUME_JAVA_LIBRARY_PATH="${FLUME_JAVA_LIBRARY_PATH}:${plugin_native}"
397   else
398     FLUME_JAVA_LIBRARY_PATH="${plugin_native}"
399   fi
400 fi
401 
402 # find java
403 if [ -z "${JAVA_HOME}" ] ; then
404   warn "JAVA_HOME is not set!"
405   # Try to use Bigtop to autodetect JAVA_HOME if it's available
406   if [ -e /usr/libexec/bigtop-detect-javahome ] ; then
407     . /usr/libexec/bigtop-detect-javahome
408   elif [ -e /usr/lib/bigtop-utils/bigtop-detect-javahome ] ; then
409     . /usr/lib/bigtop-utils/bigtop-detect-javahome
410   fi
411 
412   # Using java from path if bigtop is not installed or couldn't find it
413   if [ -z "${JAVA_HOME}" ] ; then
414     JAVA_DEFAULT=$(type -p java)
415     [ -n "$JAVA_DEFAULT" ] || error "Unable to find java executable. Is it in your PATH?" 1
416     JAVA_HOME=$(cd $(dirname $JAVA_DEFAULT)/..; pwd)
417   fi
418 fi
419 
420 # look for hadoop libs
421 add_hadoop_paths
422 add_HBASE_paths
423 
424 # prepend conf dir to classpath
425 if [ -n "$opt_conf" ]; then
426   FLUME_CLASSPATH="$opt_conf:$FLUME_CLASSPATH"
427 fi
428 
429 set_LD_LIBRARY_PATH
430 # allow dryrun
431 EXEC="exec"
432 if [ -n "${opt_dryrun}" ]; then
433   warn "Dryrun mode enabled (will not actually initiate startup)"
434   EXEC="echo"
435 fi
436 
437 # finally, invoke the appropriate command
438 if [ -n "$opt_agent" ] ; then
439   run_flume $FLUME_AGENT_CLASS $args
440 elif [ -n "$opt_avro_client" ] ; then
441   run_flume $FLUME_AVRO_CLIENT_CLASS $args
442 elif [ -n "${opt_version}" ] ; then
443   run_flume $FLUME_VERSION_CLASS $args
444 elif [ -n "${opt_tool}" ] ; then
445   run_flume $FLUME_TOOLS_CLASS $args
446 else
447   error "This message should never appear" 1
448 fi
449 
450 exit 0
View Code

五、测试配置文件

  在conf目录下创建example-conf.properties文件,属性如下所示:  

 1 # Describe the source 
 2 a1.sources = r1
 3 a1.sinks = k1
 4 a1.channels = c1
 5 
 6 # Describe/configure the source
 7 a1.sources.r1.type = avro
 8 a1.sources.r1.bind = localhost
 9 a1.sources.r1.port = 44444
10 
11 # Describe the sink
12 # 将数据输出至日志中
13 a1.sinks.k1.type = logger
14 
15 
16 # Use a channel which buffers events in memory
17 a1.channels.c1.type = memory
18 a1.channels.c1.capacity = 1000
19 a1.channels.c1.transactionCapacity = 100
20 
21 # Bind the source and sink to the channel
22 a1.sources.r1.channels = c1
23 a1.sinks.k1.channel = c1

六、运行命令

  6.1 启动代理

[hadoop@hadoop1 conf]$ flume-ng agent -n a1 -f example-conf.properties

  6.2 启动avro-client客户端向agent代理发送数据-需要单独启动新的窗口

[hadoop@hadoop1 conf]$ flume-ng avro-client -H localhost -p 44444 -F file01

七、结果查看

1 14/01/16 22:26:34 INFO ipc.NettyServer: [id: 0x0100c7e4, /127.0.0.1:54289 => /127.0.0.1:44444] OPEN
2 14/01/16 22:26:34 INFO ipc.NettyServer: [id: 0x0100c7e4, /127.0.0.1:54289 => /127.0.0.1:44444] BOUND: /127.0.0.1:44444
3 14/01/16 22:26:34 INFO ipc.NettyServer: [id: 0x0100c7e4, /127.0.0.1:54289 => /127.0.0.1:44444] CONNECTED: /127.0.0.1:54289
4 14/01/16 22:26:36 INFO ipc.NettyServer: [id: 0x0100c7e4, /127.0.0.1:54289 :> /127.0.0.1:44444] DISCONNECTED
5 14/01/16 22:26:36 INFO ipc.NettyServer: [id: 0x0100c7e4, /127.0.0.1:54289 :> /127.0.0.1:44444] UNBOUND
6 14/01/16 22:26:36 INFO ipc.NettyServer: [id: 0x0100c7e4, /127.0.0.1:54289 :> /127.0.0.1:44444] CLOSED
7 14/01/16 22:26:36 INFO ipc.NettyServer: Connection to /127.0.0.1:54289 disconnected.
8 14/01/16 22:26:38 INFO sink.LoggerSink: Event: { headers:{} body: 68 65 6C 6C 6F 20 77 6F 72 6C 64                hello world }

 


作者:张子良
出处:http://www.cnblogs.com/hadoopdev
本文版权归作者所有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利。

相关实践学习
简单用户画像分析
本场景主要介绍基于海量日志数据进行简单用户画像分析为背景,如何通过使用DataWorks完成数据采集 、加工数据、配置数据质量监控和数据可视化展现等任务。
SaaS 模式云数据仓库必修课
本课程由阿里云开发者社区和阿里云大数据团队共同出品,是SaaS模式云原生数据仓库领导者MaxCompute核心课程。本课程由阿里云资深产品和技术专家们从概念到方法,从场景到实践,体系化的将阿里巴巴飞天大数据平台10多年的经过验证的方法与实践深入浅出的讲给开发者们。帮助大数据开发者快速了解并掌握SaaS模式的云原生的数据仓库,助力开发者学习了解先进的技术栈,并能在实际业务中敏捷的进行大数据分析,赋能企业业务。 通过本课程可以了解SaaS模式云原生数据仓库领导者MaxCompute核心功能及典型适用场景,可应用MaxCompute实现数仓搭建,快速进行大数据分析。适合大数据工程师、大数据分析师 大量数据需要处理、存储和管理,需要搭建数据仓库?学它! 没有足够人员和经验来运维大数据平台,不想自建IDC买机器,需要免运维的大数据平台?会SQL就等于会大数据?学它! 想知道大数据用得对不对,想用更少的钱得到持续演进的数仓能力?获得极致弹性的计算资源和更好的性能,以及持续保护数据安全的生产环境?学它! 想要获得灵活的分析能力,快速洞察数据规律特征?想要兼得数据湖的灵活性与数据仓库的成长性?学它! 出品人:阿里云大数据产品及研发团队专家 产品 MaxCompute 官网 https://www.aliyun.com/product/odps&nbsp;
相关文章
|
存储 消息中间件 SQL
搭建flume-1.9.0
Apache Flume 是一个从可以收集例如日志,事件等数据资源,并将这些数量庞大的数据从各项数据资源中集中起来存储的工具/服务,或者数集中机制。flume具有高可用,分布式,配置工具,其设计的原理也是基于将数据流,如日志数据从各种网站服务器上汇集起来存储到HDFS,HBase等集中存储器中
495 1
搭建flume-1.9.0
|
5月前
|
数据采集 分布式计算 Hadoop
62 Flume的安装部署
62 Flume的安装部署
45 0
|
6月前
|
数据采集 消息中间件 监控
大数据组件-Flume集群环境搭建
大数据组件-Flume集群环境搭建
108 0
|
分布式计算 Hadoop Linux
Flume安装部署
Flume安装部署
228 0
|
消息中间件 数据采集 缓存
Flume 组成架构|学习笔记
快速学习 Flume 组成架构
137 0
Flume 组成架构|学习笔记
|
分布式计算 负载均衡 监控
分布式日志收集框架Flume下载安装与使用(一)
分布式日志收集框架Flume下载安装与使用(一)
150 0
分布式日志收集框架Flume下载安装与使用(一)
|
消息中间件 SQL 存储
分布式日志收集框架Flume下载安装与使用(二)
分布式日志收集框架Flume下载安装与使用(二)
103 0
分布式日志收集框架Flume下载安装与使用(二)
|
Java iOS开发 MacOS
分布式日志收集框架Flume下载安装与使用(三)
分布式日志收集框架Flume下载安装与使用(三)
106 0
分布式日志收集框架Flume下载安装与使用(三)
|
存储 监控 Unix
分布式日志收集框架Flume下载安装与使用(四)
分布式日志收集框架Flume下载安装与使用(四)
197 0
分布式日志收集框架Flume下载安装与使用(四)