hadoop初探--性能分析1

本文涉及的产品
服务治理 MSE Sentinel/OpenSergo,Agent数量 不受限
简介:

    研究Hadoop也有2个星期了,由于之前有MooseFS的基础,所以学习Hadoop的HDFS部分还算比较顺利。我的环境是非常典型的Hadoop环境,一台NameNode+Hmaster,4台DataNode+RegionServer。如下图所示

    image

   说明: DataNode1和DataNode2是VM,配置4VCPU和4G内存,虚拟化平台是ESXI4.1。DataNode3和DataNode4是物理机,OS为Ubuntu11.04,2cpu和2G内存

 

   实验:利用hadoop自身提供的程序,利用MapReduce计算“TestTable”这个表的“行数” 
[root@master bin]# ./hadoop jar /root/hbase/hbase-0.90.4.jar rowcounter TestTable

 

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.2-1031432, built on 11/05/2010 05:32 GMT

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:host.name=master

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_26

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.

。。。。。。。。。。。。。。。。。。。。。。

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/root/hadoop/bin/../lib/native/Linux-i386-32

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:os.arch=i386

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.18-164.el5

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:user.name=root

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:user.home=/root

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:user.dir=/root/hadoop-0.20.2/bin

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=datanode2:2181,datanode1:2181,master:2181 sessionTimeout=180000 watcher=hconnection

11/10/25 10:34:36 INFO zookeeper.ClientCnxn: Opening socket connection to server datanode2/172.20.10.193:2181

11/10/25 10:34:36 INFO zookeeper.ClientCnxn: Socket connection established to datanode2/172.20.10.193:2181, initiating session

11/10/25 10:34:36 INFO zookeeper.ClientCnxn: Session establishment complete on server datanode2/172.20.10.193:2181, sessionid = 0x23338df2f890004, negotiated timeout = 180000

11/10/25 10:34:37 INFO mapred.JobClient: Running job: job_201110241323_0001

11/10/25 10:34:38 INFO mapred.JobClient: map 0% reduce 0%

11/10/25 10:35:34 INFO mapred.JobClient: map 2% reduce 0%

11/10/25 10:39:30 INFO zookeeper.ClientCnxn: Socket connection established to datanode1/172.20.10.201:2181, initiating session

11/10/25 10:40:05 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x23338df2f890004, likely server has closed socket, closing socket connection and attempting reconnect

11/10/25 10:40:06 INFO zookeeper.ClientCnxn: Opening socket connection to server datanode2/172.20.10.193:2181

11/10/25 10:40:06 INFO zookeeper.ClientCnxn: Socket connection established to datanode2/172.20.10.193:2181, initiating session

11/10/25 10:40:06 INFO zookeeper.ClientCnxn: Session establishment complete on server datanode2/172.20.10.193:2181, sessionid = 0x23338df2f890004, negotiated timeout = 180000

11/10/25 10:40:13 INFO mapred.JobClient: map 7% reduce 0%

11/10/25 10:42:31 INFO mapred.JobClient: map 10% reduce 0%

11/10/25 10:43:04 INFO mapred.JobClient: map 13% reduce 0%

11/10/25 10:53:08 INFO mapred.JobClient: map 15% reduce 0%

。。。。。。。。。。。。。。。

11/10/25 11:11:28 INFO mapred.JobClient: map 36% reduce 0%

11/10/25 11:18:00 INFO mapred.JobClient: Task Id : attempt_201110241323_0001_m_000008_0, Status : FAILED

Task attempt_201110241323_0001_m_000008_0 failed to report status for 608 seconds. Killing!

Task attempt_201110241323_0001_m_000008_0 failed to report status for 611 seconds. Killing!

11/10/25 11:18:36 INFO mapred.JobClient: Task Id : attempt_201110241323_0001_m_000009_0, Status : FAILED

Task attempt_201110241323_0001_m_000009_0 failed to report status for 646 seconds. Killing!

Task attempt_201110241323_0001_m_000009_0 failed to report status for 617 seconds. Killing!

11/10/25 11:18:37 INFO mapred.JobClient: Task Id : attempt_201110241323_0001_m_000013_0, Status : FAILED

Task attempt_201110241323_0001_m_000013_0 failed to report status for 624 seconds. Killing!

Task attempt_201110241323_0001_m_000013_0 failed to report status for 604 seconds. Killing!

11/10/25 11:26:17 INFO mapred.JobClient: map 39% reduce 0%

11/10/25 11:26:48 INFO mapred.JobClient: map 42% reduce 0%

11/10/25 11:35:48 INFO mapred.JobClient: Task Id : attempt_201110241323_0001_m_000014_0, Status : FAILED

Task attempt_201110241323_0001_m_000014_0 failed to report status for 600 seconds. Killing!

11/10/25 11:42:07 INFO mapred.JobClient: map 44% reduce 0%

11/10/25 11:57:43 INFO mapred.JobClient: map 47% reduce 0%

11/10/25 12:12:16 INFO mapred.JobClient: map 49% reduce 0%

。。。。。。。。。。。。。。。。。。。。。

11/10/25 15:45:26 INFO mapred.JobClient: map 97% reduce 0%

11/10/25 16:16:57 INFO mapred.JobClient: map 100% reduce 0%

11/10/25 16:16:59 INFO mapred.JobClient: Job complete: job_201110241323_0001

11/10/25 16:16:59 INFO mapred.JobClient: Counters: 7

11/10/25 16:16:59 INFO mapred.JobClient: Job Counters

11/10/25 16:16:59 INFO mapred.JobClient: Rack-local map tasks=21

11/10/25 16:16:59 INFO mapred.JobClient: Launched map tasks=58

11/10/25 16:16:59 INFO mapred.JobClient: Data-local map tasks=37

11/10/25 16:16:59 INFO mapred.JobClient: org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters

 

11/10/25 16:16:59 INFO mapred.JobClient: ROWS=6003531

 

11/10/25 16:16:59 INFO mapred.JobClient: Map-Reduce Framework

11/10/25 16:16:59 INFO mapred.JobClient: Map input records=6003531

11/10/25 16:16:59 INFO mapred.JobClient: Spilled Records=0

11/10/25 16:16:59 INFO mapred.JobClient: Map output records=0

web监控如下显示

image

由于篇幅限制,日志输出中有大量略去,经过5h42min的运算,可以看到TestTable这个表总共有6003531行。日志中出现10个错误,基本就2类,

  第一类是“状态报告超时”

Task attempt_201110251718_0001_m_000014_0 failed to report status for 602 seconds. Killing!

Task attempt_201110251718_0001_m_000014_0 failed to report status for 600 seconds. Killing!

  第二类是“租约到期”

org.apache.hadoop.hbase.regionserver.LeaseException: org.apache.hadoop.hbase.regionserver.LeaseException: lease '7594235277433071278' does not exist at

org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:230) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1847) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at

 

由于篇幅限制,关于性能的分析见下一篇文章《hadoop初探--性能分析2





本文转自 taojin1240 51CTO博客,原文链接:http://blog.51cto.com/taotao1240/698496,如需转载请自行联系原作者

相关实践学习
云数据库HBase版使用教程
&nbsp; 相关的阿里云产品:云数据库 HBase 版 面向大数据领域的一站式NoSQL服务,100%兼容开源HBase并深度扩展,支持海量数据下的实时存储、高并发吞吐、轻SQL分析、全文检索、时序时空查询等能力,是风控、推荐、广告、物联网、车联网、Feeds流、数据大屏等场景首选数据库,是为淘宝、支付宝、菜鸟等众多阿里核心业务提供关键支撑的数据库。 了解产品详情:&nbsp;https://cn.aliyun.com/product/hbase &nbsp; ------------------------------------------------------------------------- 阿里云数据库体验:数据库上云实战 开发者云会免费提供一台带自建MySQL的源数据库&nbsp;ECS 实例和一台目标数据库&nbsp;RDS实例。跟着指引,您可以一步步实现将ECS自建数据库迁移到目标数据库RDS。 点击下方链接,领取免费ECS&amp;RDS资源,30分钟完成数据库上云实战!https://developer.aliyun.com/adc/scenario/51eefbd1894e42f6bb9acacadd3f9121?spm=a2c6h.13788135.J_3257954370.9.4ba85f24utseFl
目录
相关文章
|
29天前
|
分布式计算 Hadoop 测试技术
Hadoop【环境搭建 05】【hadoop-3.1.3 单机版基准测试 TestDFSIO + mrbench + nnbench + Terasort + sort 举例】
【4月更文挑战第1天】Hadoop【环境搭建 05】【hadoop-3.1.3 单机版基准测试 TestDFSIO + mrbench + nnbench + Terasort + sort 举例】
37 3
|
9月前
|
存储 分布式计算 运维
Hadoop---10、生产调优手册(二)
Hadoop---10、生产调优手册(二)
|
8月前
|
分布式计算 资源调度 Hadoop
Hadoop调优
Hadoop调优
84 1
|
9月前
|
存储 分布式计算 Hadoop
Hadoop---10、生产调优手册(一)
Hadoop---10、生产调优手册(一)
|
5月前
|
分布式计算 监控 Hadoop
Hadoop MR 任务运行时日志分析
Hadoop MR 任务运行时日志分析
32 0
|
分布式计算 资源调度 Hadoop
Hadoop之MapReduce03【wc案例流程分析】
Hadoop之MapReduce03【wc案例流程分析】
Hadoop之MapReduce03【wc案例流程分析】
|
SQL 存储 JSON
Hive性能调优工具
HQL提供了两个查看查询性能的工具:explain与analyze,除此之外Hive的日志也提供了非常详细的信息,方便查看执行性能和报错排查。
4925 0
Hive性能调优工具
|
分布式计算 监控 Hadoop
|
分布式计算 监控 JavaScript
《Hadoop实战第2版》——2.5节日志分析及几个小技巧
本节书摘来自华章社区《Hadoop实战第2版》一书中的第2章,第2.5节日志分析及几个小技巧,作者:陆嘉恒,更多章节内容可以访问云栖社区“华章社区”公众号查看
1201 0
|
分布式计算 Hadoop
hadoop 排重优化
java hadoop 排重 优化
1140 0