OProfile & Systemtap

本文涉及的产品
云原生数据库 PolarDB MySQL 版,Serverless 5000PCU 100GB
简介:
Oprofile性能损耗小,如果CPU支持硬件监控的话(现在大多数CPU已经支持)。但是Oprofile不能像stap样使用timer来间断输出或累计输出统计,STAP损耗较大。 Oprofile 适合做性能诊断,例如系统中最耗CPU的进程,进程中哪些函数是比较耗CPU的,函数中哪段代码是最耗CPU的。。。 operf开启监控, opreport, opannotate可以输出调用报告,或函数、汇编指令等统计情况。 Stap 适合做跟踪。 例子 : 
 
     

[root@digoal ~]# cd /data06
[root@digoal data06]#  operf --system-wide --lazy-conversion
operf: Press Ctl-c or 'kill -SIGINT 45366' to stop profiling
operf: Profiler started
^C
Profiling done.
Converting profile data to OProfile format
................

输出报告:
 
       

[root@digoal data06]# opreport -l -f -w -x -t 1 
Using /data06/oprofile_data/samples/ for samples directory.
CPU: Intel Core/i7, speed 1995.14 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
vma      samples  %        app name                 symbol name
007827a0 2091381  26.6819  /opt/pgsql9.4.1/bin/postgres HeapTupleSatisfiesVacuum
00490300 988600   12.6126  /opt/pgsql9.4.1/bin/postgres heap_page_prune
0078a8c0 698665    8.9136  /opt/pgsql9.4.1/bin/postgres pg_qsort
0058afb0 676022    8.6247  /opt/pgsql9.4.1/bin/postgres vac_cmp_itemptr
0058baf0 385039    4.9123  /opt/pgsql9.4.1/bin/postgres lazy_vacuum_rel
004c4d00 365497    4.6630  /opt/pgsql9.4.1/bin/postgres XLogInsert
00675420 229805    2.9319  /opt/pgsql9.4.1/bin/postgres itemoffcompare
00675d20 184668    2.3560  /opt/pgsql9.4.1/bin/postgres PageRepairFragmentation
0078a7e0 169808    2.1664  /opt/pgsql9.4.1/bin/postgres swapfunc
00655590 147647    1.8837  /opt/pgsql9.4.1/bin/postgres BufferGetBlockNumber
00488940 139389    1.7783  /opt/pgsql9.4.1/bin/postgres heap_prepare_freeze_tuple
007624d0 86239     1.1002  /opt/pgsql9.4.1/bin/postgres hash_search_with_hash_value

[root@digoal data06]# opreport -l -f -g -w -x -t 1 /opt/pgsql/bin/postgres
Using /data06/oprofile_data/samples/ for samples directory.
CPU: Intel Core/i7, speed 1995.14 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
vma      samples  %        linenr info                 symbol name
007827a0 2091381  26.7572  /opt/soft_bak/postgresql-9.4.1/src/backend/utils/time/tqual.c:1116 HeapTupleSatisfiesVacuum
00490300 988600   12.6482  /opt/soft_bak/postgresql-9.4.1/src/backend/access/heap/pruneheap.c:174 heap_page_prune
0078a8c0 698665    8.9387  /opt/soft_bak/postgresql-9.4.1/src/port/qsort.c:104 pg_qsort
0058afb0 676022    8.6491  /opt/soft_bak/postgresql-9.4.1/src/backend/commands/vacuumlazy.c:1728 vac_cmp_itemptr
0058baf0 385039    4.9262  /opt/soft_bak/postgresql-9.4.1/src/backend/commands/vacuumlazy.c:172 lazy_vacuum_rel
004c4d00 365497    4.6762  /opt/soft_bak/postgresql-9.4.1/src/backend/access/transam/xlog.c:844 XLogInsert
00675420 229805    2.9401  /opt/soft_bak/postgresql-9.4.1/src/backend/storage/page/bufpage.c:415 itemoffcompare
00675d20 184668    2.3626  /opt/soft_bak/postgresql-9.4.1/src/backend/storage/page/bufpage.c:433 PageRepairFragmentation
0078a7e0 169808    2.1725  /opt/soft_bak/postgresql-9.4.1/src/port/qsort.c:78 swapfunc
00655590 147647    1.8890  /opt/soft_bak/postgresql-9.4.1/src/backend/storage/buffer/bufmgr.c:1898 BufferGetBlockNumber
00488940 139389    1.7833  /opt/soft_bak/postgresql-9.4.1/src/backend/access/heap/heapam.c:5756 heap_prepare_freeze_tuple
007624d0 86239     1.1033  /opt/soft_bak/postgresql-9.4.1/src/backend/utils/hash/dynahash.c:824 hash_search_with_hash_value

可以看到最耗费CPU的调用是哪些。
 
       

[root@digoal data06]# opannotate -x -s -t 1 /opt/pgsql/bin/postgres -i HeapTupleSatisfiesVacuum|less
Using /data06/oprofile_data/samples/ for session-dir
/* 
 * Command line: opannotate -x -s -t 1 /opt/pgsql/bin/postgres -i HeapTupleSatisfiesVacuum 
 * 
 * Interpretation of command line:
 * Output annotated source file with samples
 * Output files where samples count reach 1% of the samples
 * 
 * CPU: Intel Core/i7, speed 1995.14 MHz (estimated)
 * Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
 */
/* 
 * Total samples for file : "/opt/soft_bak/postgresql-9.4.1/src/backend/utils/time/tqual.c"
 * 
 * 2091381 100.000
 */


               :/*-------------------------------------------------------------------------
               : *
               : * tqual.c
               : *        POSTGRES "time qualification" code, ie, tuple visibility rules.
               : *
               : * NOTE: all the HeapTupleSatisfies routines will update the tuple's
               : * "hint" status bits if we see that the inserting or deleting transaction
               : * has now committed or aborted (and it is safe to set the hint bits).
               : * If the hint bits are changed, MarkBufferDirtyHint is called on
               : * the passed-in buffer.  The caller must hold not only a pin, but at least
               : * shared buffer content lock on the buffer containing the tuple.
               : *
               : * NOTE: must check TransactionIdIsInProgress (which looks in PGXACT array)
。。。。。。
1879024 89.8461 :       if (!HeapTupleHeaderXminCommitted(tuple))
               :        {
    63  0.0030 :                if (HeapTupleHeaderXminInvalid(tuple))
               :                        return HEAPTUPLE_DEAD;
               :                /* Used by pre-9.0 binary upgrades */
    18 8.6e-04 :                else if (tuple->t_infomask & HEAP_MOVED_OFF)
               :                {
               :                        TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
               :
。。。。。。

最耗费的出现在代码中的这段调用。
if (!HeapTupleHeaderXminCommitted(tuple))
Oprofile支持的事件,使用opcontrol --list-events查看:
 
         

[root@digoal data06]# opcontrol --list-events
oprofile: available events for CPU type "Intel Core/i7"

See Intel Architecture Developer's Manual Volume 3B, Appendix A and
Intel Architecture Optimization Reference Manual

For architectures using unit masks, you may be able to specify
unit masks by name.  See 'opcontrol' or 'operf' man page for more details.

CPU_CLK_UNHALTED: (counter: all)
        Clock cycles when not halted (min count: 6000)
UNHALTED_REFERENCE_CYCLES: (counter: all)
        Unhalted reference cycles (min count: 6000)
        Unit masks (default 0x1)
        ----------
        0x01: No unit mask
......

事件配置:
 
         

       --events / -e event1[,event2[,...]]
              This option is for passing a comma-separated list of event specifications for profiling. Each event spec
              is of the form:
                 name:count[:unitmask[:kernel[:user]]]
              You can specify unit mask values using either a numerical value (hex values must begin with "0x")  or  a
              symbolic  name  (if  the name=<um_name> field is shown in the ophelp output). For some named unit masks,
              the hex value is not unique; thus, OProfile tools enforce specifying such unit masks value by name.

              Event names for some IBM PowerPC systems include a _GRP<n> (group number) suffix. You  can  pass  either
              the  full event name or the base event name (i.e., without the suffix) to operf.  If the base event name
              is passed, operf will automatically choose an appropriate group number suffix for the event; thus, OPro-
              file post-processing tools will always show real event names that include the group number suffix.

              When  no event specification is given, the default event for the running processor type will be used for
              profiling.  Use ophelp to list the available events for your processor type.


以下摘自redhat admin doc
OProfile is a low overhead, system-wide performance monitoring tool. It uses the performance monitoring hardware on the processor to retrieve information about the kernel and executables on the system, such as when memory is referenced, the number of L2 cache requests, and the number of hardware interrupts received. On a Red Hat Enterprise Linux system, the  oprofile  package must be installed to use this tool.
Many processors include dedicated performance monitoring hardware. This hardware makes it possible to detect when certain events happen (such as the requested data not being in cache). The hardware normally takes the form of one or more  counters that are incremented each time an event takes place. When the counter value increments, an interrupt is generated, making it possible to control the amount of detail (and therefore, overhead) produced by performance monitoring.
OProfile uses this hardware (or a timer-based substitute in cases where performance monitoring hardware is not present) to collect  samples of performance-related data each time a counter generates an interrupt. These samples are periodically written out to disk; later, the data contained in these samples can then be used to generate reports on system-level and application-level performance.
Be aware of the following limitations when using OProfile:
  • Use of shared libraries — Samples for code in shared libraries are not attributed to the particular application unless the  --separate=library option is used.
  • Performance monitoring samples are inexact — When a performance monitoring register triggers a sample, the interrupt handling is not precise like a divide by zero exception. Due to the out-of-order execution of instructions by the processor, the sample may be recorded on a nearby instruction.
  • opreport does not associate samples for inline functions properly —  opreport uses a simple address range mechanism to determine which function an address is in. Inline function samples are not attributed to the inline function but rather to the function the inline function was inserted into.
  • OProfile accumulates data from multiple runs — OProfile is a system-wide profiler and expects processes to start up and shut down multiple times. Thus, samples from multiple runs accumulate. Use the command  opcontrol --reset to clear out the samples from previous runs.
  • Hardware performance counters do not work on guest virtual machines — Because the hardware performance counters are not available on virtual systems, you need to use the  timer mode. Enter the command  opcontrol --deinit, and then execute  modprobe oprofile timer=1 to enable the  timer mode.
  • Non-CPU-limited performance problems — OProfile is oriented to finding problems with CPU-limited processes. OProfile does not identify processes that are asleep because they are waiting on locks or for some other event to occur (for example an I/O device to finish an operation).

SystemTap is a tracing and probing tool that allows users to study and monitor the activities of the operating system in fine detail. It provides information similar to the output of tools like netstat, ps, top, and iostat; however, SystemTap is designed to provide more filtering and analysis options for the collected information.
While using OProfile is suggested in cases of collecting data on where and why the processor spends time in a particular area of code, it is less usable when finding out why the processor stays idle.
You might want to use SystemTap when instrumenting specific places in code. Because SystemTap allows you to run the code instrumentation without having to stop and restart the instrumented code, it is particularly useful for instrumenting the kernel and daemons.

[参考]
相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
相关文章
|
28天前
|
算法 数据可视化 Linux
Linux内核编译:深入理解`make menuconfig`命令
Linux内核编译:深入理解`make menuconfig`命令
32 0
|
20天前
|
人工智能 Ubuntu 机器人
【Valgrind】Valgrind安装(ubuntu系统)
【Valgrind】Valgrind安装(ubuntu系统)
|
8月前
|
Linux 编译器 C语言
Linux内核编译和LLVM的信息
Linux内核编译和LLVM的信息
113 1
|
Linux 编译器
Linux下make -j加快编译速度
Linux下make -j加快编译速度
111 0
|
Ubuntu Linux 编译器
【Linux】内存检测工具Valgrind
【Linux】内存检测工具Valgrind
|
C语言 Linux
SystemTap工具的使用基础
systemtap工具的安装 准备工作 uname -a 查看当前内核版本是哪一个,然后使用 yum install kernel-devel 安装kernel debuginfo包 rpm -qi kernel-devel 找到内核构建的详细信息,然后去对应发布网站上找kernel-debuginfo和kernel-debuginfo-common包。
1984 0
|
存储 网络协议 Unix
Linux内核配置文档!!!(make menuconfig)
一、引言: 本文档的内容大部份内容都是从网上收集而来,然后配合一些新的截图(内核版本:V2.4.19)。在每一配置项后会有一个选择指南的部份,用来指导大家怎么样根据自己的情况来做相应的选择;还有在每一个大项和文档的最后会有一个经验谈,它是一些高手们在应对问题和处理特有硬件时的一些经验(这个还得靠各位)。文档最后会发到网上,到时会根据网友们的回复随时进行更新。 我们的目的是让我们有一个全面的、简单明了内核编译帮手。 [注:] 请大家能够发表自己的经验和想法,使本文能够不断充实!但是最好不要发一些从网cp过来的没有经过自己实践的文章! 1.Code maturity
644 0