[20180301]模拟cursor pin S wait on X.txt

  1. 云栖社区>
  2. 博客>
  3. 正文

[20180301]模拟cursor pin S wait on X.txt

lfreeali 2018-03-01 10:47:08 浏览793
展开阅读全文

[20180301]模拟cursor pin S wait on X等待事件.txt

--//昨天看链接http://www.askmaclean.com/archives/cursor-pin-s-wait-on-x.html,自己也按照例子重复演示看看.
--//出现'cursor: pin * events'等待事件主要原因就是子光标太多,或者硬解析太多,还有就是一些sql语句太复杂,
--//导致分析时间太长,这些都是我的理解.

--//链接http://www.askmaclean.com/archives/cursor-pin-s-wait-on-x.html给出一些可能的情况.

该类等待事件一般是为了pin相关的子游标

'Cursor: pin S on X' 最常见的等待事件,  进程为了共享操作例如执行pin游标而以SHRD S mode申请mutex, 但是未立即获得。原因
是该游标被其他进程以EXCL X mode 持有了。

实际该 cursor: pin S wait on X等待事件往往是由于其他因素诱发的。Mutex争用仅仅是问题的症状,但根本原因需要Database
Consultant 进一步挖掘。

下面我们列出一些已知的常见案例, 在这些例子中可以看到 我上面提到的 Mutex的争用仅仅是伪争用:

过多的子游标 High Version Counts

过多的子游标版本Version Count可能导致Mutex 争用,一般一个SQL的Version Count不要高于500。

检查High Version Count很简单, 在AWR里就有SQL ordered by High Version Count,也可以写SQL查V$SQL、V$SQLAREA

昂贵的X$、V$视图查询

一些对于V$、X$视图的查询,需要访问X$KGL*之类的fixed table,可能触发Mutex争用。
--//我测试过几个会话访问v$sql视图,出现的是library cache: mutex X,链接:http://blog.itpub.net/267265/viewspace-2142625/
Mutex持有者得不到CPU

Mutex持有者若得不到足够的CPU片可能一直阻塞他人,直到它拿到需要的CPU。

这种情况可能由于OS操作系统的实际情况或者使用Resource Manager而引起。需要配合AWR中的Host CPU、Instance CPu一起看。

已经被KILLED的SESSION仍持有Mutex

当session正持有Mutex,而其对应的Process被强制KILL掉, 则直到PMON彻底清理掉该Dead Process并释放Mutex,其他session才能不再
等待。  诊断该类问题,最好能检查PMON的TRACE。 当然也存在部分BUG会导致PMON清理过程非常慢。

举例来说,bug 9312879描述了一种场景:PMON 需要获得某个Mutex以便清理某个dead process,但是该Mutex又被其他进程持有,则PMON
甚至无法开始真正清理并释放Mutex。

详见 《深入理解Oracle中的Mutex》一文

如何模拟 cursor pin S wait on X 等待事件

--//我自己也遇到一例:http://blog.itpub.net/267265/viewspace-2141699/,当时网络出现环路,导致后续调用相同语句hang住.

1.环境:
SCOTT@book> @ &r/ver1
PORT_STRING                    VERSION        BANNER
------------------------------ -------------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx            11.2.0.4.0     Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

2.测试:
--//session 1:
SCOTT@book> @ &r/s
SCOTT@book(274,7)> @ &r/spid
       SID    SERIAL# PROCESS                  SERVER    SPID       PID  P_SERIAL# C50
---------- ---------- ------------------------ --------- ------ ------- ---------- --------------------------------------------------
       274          7 36826                    DEDICATED 36827       21          4 alter system kill session '274,7' immediate;
--//进程号=36826.

SCOTT@book(274,7)> var b1 number;
SCOTT@book(274,7)> exec :b1 :=10;
PL/SQL procedure successfully completed.

SCOTT@book(274,7)> select * from dept where deptno=:b1;
    DEPTNO DNAME          LOC
---------- -------------- -------------
        10 ACCOUNTING     NEW YORK

3.使用gdb跟踪session 1
$ gdb $(which oracle) 36827
...
(gdb) break kxsPeekBinds
Breakpoint 1 at 0x2123fa0
(gdb) command
Type commands for when breakpoint 1 is hit, one per line.
End with a line saying just "end".
>bt 4
>end
(gdb) cont
Continuing.

--//说明:break kxsPeekBinds 表示调用kxsPeekBinds中断,调用bt 4命令.
(gdb) help bt
Print backtrace of all stack frames, or innermost COUNT frames.
With a negative argument, print outermost -COUNT frames.
Use of the 'full' qualifier also prints the values of the local variables.
--//gdb不是很熟悉,按照提示:Print backtrace of all stack frames, or innermost COUNT frames.

--//session 1:
SCOTT@book(274,7)> select * from dept where deptno=:b1;
    DEPTNO DNAME          LOC
---------- -------------- -------------
        10 ACCOUNTING     NEW YORK

--//可以发现正常,因为我跟踪的调用绑定变量探查的函数,我以前的语句已经执行1次,这样第2次执行不会调用绑定变量窥视的函数.
--//修改select为Select ,这样语句会硬解析同时出现调用绑定变量窥视的函数.

SCOTT@book(274,7)> Select * from dept where deptno=:b1;
--//挂起.

--//gdb界面出现如下:
Breakpoint 1, 0x0000000002123fa0 in kxsPeekBinds ()
#0  0x0000000002123fa0 in kxsPeekBinds ()
#1  0x00000000025b821c in opitca ()
#2  0x0000000001ecf7a9 in kksSetBindType ()
#3  0x0000000009641a89 in kksfbc ()

--//session 3:
SYS@book> @ &r/wait
P1RAW            P2RAW            P3RAW                    P1         P2         P3        SID    SERIAL#       SEQ# EVENT                                    STATUS   STATE               WAIT_TIME_MICRO SECONDS_IN_WAIT WAIT_CLASS
---------------- ---------------- ---------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------------------------------------- -------- ------------------- --------------- --------------- --------------------
0000000062657100 0000000000000001 00               1650815232          1          0        274          7         58 SQL*Net message from client              ACTIVE   WAITED KNOWN TIME         120254317             182 Idle
--//并没有出现cursor pin S wait on X

--//session 2:
SCOTT@book> @ &r/s
SCOTT@book(28,25)> @ &r/spid
       SID    SERIAL# PROCESS                  SERVER    SPID       PID  P_SERIAL# C50
---------- ---------- ------------------------ --------- ------ ------- ---------- --------------------------------------------------
        28         25 37014                    DEDICATED 37015       26         11 alter system kill session '28,25' immediate;

SCOTT@book> var b1 number;
SCOTT@book> exec :b1 :=20;
PL/SQL procedure successfully completed.

SCOTT@book(28,25)> Select * from dept where deptno=:b1;

--//挂起!!

SCOTT@book> select * from empy where t1 =:b1;
---//hang...在t1 = 之间加入空格.

--//session 3:
SYS@book> set numw 12
SYS@book> @ &r/wait
P1RAW            P2RAW            P3RAW                        P1             P2             P3            SID        SERIAL#           SEQ# EVENT                                    STATUS   STATE               WAIT_TIME_MICRO SECONDS_IN_WAIT WAIT_CLASS
---------------- ---------------- ---------------- -------------- -------------- -------------- -------------- -------------- -------------- ---------------------------------------- -------- ------------------- --------------- --------------- --------------------
0000000062657100 0000000000000001 00                   1650815232              1              0            274              7             58 SQL*Net message from client              ACTIVE   WAITED KNOWN TIME         120254317             437 Idle
00000000AB221382 0000011200000000 0000000500000000     2871137154  1176821039104    21474836480             28             25             42 cursor: pin S wait on X                  ACTIVE   WAITING                    54427696              54 Concurrency

SYS@book> select * from v$mutex_sleep;
MUTEX_TYPE                       LOCATION                                         SLEEPS      WAIT_TIME
-------------------------------- ---------------------------------------- -------------- --------------
Library Cache                    kgllkdl1  85                                          1              0
Cursor Pin                       kkslce [KKSCHLPIN2]                              132160              0

SYS@book> select * from v$mutex_sleep;
MUTEX_TYPE                       LOCATION                                         SLEEPS      WAIT_TIME
-------------------------------- ---------------------------------------- -------------- --------------
Cursor Pin                       kkslce [KKSCHLPIN2]                              144350              0

SYS@book> @ &r/mutex 5
old  18: ORDER BY sleeps DESC ) where rownum<= &1
new  18: ORDER BY sleeps DESC ) where rownum<= 5
          HASH         SLEEPS LOCATION             MUTEX_TYPE           SQLID         KGLNAOWN             C100
-------------- -------------- -------------------- -------------------- ------------- -------------------- -----------------------------------
    2871137154         339228 kkslce [KKSCHLPIN2]  Cursor Pin           g0vgm2upk44w2                      Select * from dept where deptno=:b1
--//停在LOCATION=kkslce,与前面gdb看到的一致.

--//做hanganalyze分析:
SYS@book> oradebug setmypid
Statement processed.
SYS@book> oradebug hanganalyze 3;
Hang Analysis in /u01/app/oracle/diag/rdbms/book/book/trace/book_ora_32088.trc

--//在执行中,看等待事件如下:
SYS@book> @ &r/wait
P1RAW            P2RAW            P3RAW                        P1             P2             P3            SID        SERIAL#           SEQ# EVENT                                    STATUS   STATE               WAIT_TIME_MICRO SECONDS_IN_WAIT WAIT_CLASS
---------------- ---------------- ---------------- -------------- -------------- -------------- -------------- -------------- -------------- ---------------------------------------- -------- ------------------- --------------- --------------- --------------------
0000000062657100 0000000000000001 00                   1650815232              1              0            274              7             58 SQL*Net message from client              ACTIVE   WAITED KNOWN TIME         120254317             656 Idle
00000000AB221382 0000011200000000 0000000500000000     2871137154  1176821039104    21474836480             28             25             45 cursor: pin S wait on X                  ACTIVE   WAITING                    57412911              57 Concurrency
00               00               00                            0              0              0             94             25             21 ksdxexeotherwait                         INACTIVE WAITING                    10567976              11 Other

--//检查跟踪文件:
*** 2018-03-01 10:22:19.845
===============================================================================
HANG ANALYSIS:
  instances (db_name.oracle_sid): book.book
  oradebug_node_dump_level: 3
  analysis initiated by oradebug
  os thread scheduling delay history: (sampling every 1.000000 secs)
    0.000000 secs at [ 10:22:19 ]
      NOTE: scheduling delay has not been sampled for 0.231213 secs    0.000000 secs from [ 10:22:15 - 10:22:20 ], 5 sec avg
    0.000000 secs from [ 10:21:20 - 10:22:20 ], 1 min avg
    0.000000 secs from [ 10:17:20 - 10:22:20 ], 5 min avg
  vktm time drift history
===============================================================================

Chains most likely to have caused the hang:
[a] Chain 1 Signature: <not in a wait><='cursor: pin S wait on X'
     Chain 1 Signature Hash: 0x3a7b30c
[b] Chain 2 Signature: 'LNS ASYNC end of log'
     Chain 2 Signature Hash: 0x8ceed34f

===============================================================================
Non-intersecting chains:

-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 1 (book.book)
                   os id: 37015
              process id: 26, oracle@gxqyydg4 (TNS V1-V3)
              session id: 28
        session serial #: 25
    }
    is waiting for 'cursor: pin S wait on X' with wait info:
    {
                      p1: 'idn'=0xab221382
                      p2: 'value'=0x11200000000
                      p3: 'where'=0x500000000
            time in wait: 46.772297 sec
      heur. time in wait: 5 min 34 sec
           timeout after: never
                 wait id: 44
                blocking: 0 sessions
             current sql: Select * from dept where deptno=:b1
             short stack: ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-__select()+19<-skgpwwait()+332<-kgxWait()+774<-kgxSharedExamine()+568<-kxsGetRuntimeLock()+240<-kkscsCheckCursor()+556<-kkscsSearchChildList()+1171<-kksfbc()+12417<-kksp
            wait history:
              * time between current wait and wait #1: 0.000021 sec
              1.       event: 'cursor: pin S wait on X'
                 time waited: 1 min 12 sec
                     wait id: 43              p1: 'idn'=0xab221382
                                              p2: 'value'=0x11200000000
                                              p3: 'where'=0x500000000
              * time between wait #1 and #2: 0.000021 sec
              2.       event: 'cursor: pin S wait on X'
                 time waited: 1 min 11 sec
                     wait id: 42              p1: 'idn'=0xab221382
                                              p2: 'value'=0x11200000000
                                              p3: 'where'=0x500000000
              * time between wait #2 and #3: 0.000020 sec
              3.       event: 'cursor: pin S wait on X'
                 time waited: 1 min 11 sec
                     wait id: 41              p1: 'idn'=0xab221382
                                              p2: 'value'=0x11200000000
                                              p3: 'where'=0x500000000
    }
    and is blocked by
=> Oracle session identified by:
    {
                instance: 1 (book.book)
                   os id: 36827
              process id: 21, oracle@gxqyydg4 (TNS V1-V3)
              session id: 274
        session serial #: 7
    }
    which is not in a wait:
    {
               last wait: 8 min 44 sec ago
                blocking: 1 session
             current sql: Select * from dept where deptno=:b1
             short stack: <none: error encountered - ORA-32516: cannot wait for process 'Unix process pid: 36827, image: oracle@gxqyydg4 (TNS V1-V3)' to finish executing ORADEBUG command 'SHORT_STACK'; wait time exceeds 30000 ms>
            wait history:
              1.       event: 'SQL*Net message from client'
                 time waited: 2 min 0 sec
                     wait id: 57              p1: 'driver id'=0x62657100
                                              p2: '#bytes'=0x1
              * time between wait #1 and #2: 0.000101 sec
              2.       event: 'SQL*Net message to client'
                 time waited: 0.000004 sec
                     wait id: 56              p1: 'driver id'=0x62657100
                                              p2: '#bytes'=0x1
              * time between wait #2 and #3: 0.000384 sec
              3.       event: 'SQL*Net message from client'
                 time waited: 2 min 1 sec
                     wait id: 55              p1: 'driver id'=0x62657100
                                              p2: '#bytes'=0x1
    }

Chain 1 Signature: <not in a wait><='cursor: pin S wait on X'
Chain 1 Signature Hash: 0x3a7b30c
-------------------------------------------------------------------------------
...//省略
===============================================================================
Extra information that will be dumped at higher levels:
[level  4] :   1 node dumps -- [LEAF] [LEAF_NW]
[level  5] :   2 node dumps -- [NO_WAIT] [INVOL_WT] [SINGLE_NODE] [NLEAF] [SINGLE_NODE_NW]

State of ALL nodes
([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):
[27]/1/28/25/0x85f9f0a0/37015/NLEAF/[273]
[273]/1/274/7/0x8595c500/36827/LEAF_NW/
[299]/1/300/15/0x85984060/36369/SINGLE_NODE/

*** 2018-03-01 10:22:19.848
===============================================================================
END OF HANG ANALYSIS
===============================================================================

*** 2018-03-01 10:22:19.849
===============================================================================
HANG ANALYSIS DUMPS:
  oradebug_node_dump_level: 3
===============================================================================

State of LOCAL nodes
([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):
[27]/1/28/25/0x85f9f0a0/37015/NLEAF/[273]
[273]/1/274/7/0x8595c500/36827/LEAF_NW/
[299]/1/300/15/0x85984060/36369/SINGLE_NODE/


No processes qualify for dumping.

===============================================================================
HANG ANALYSIS DUMPS: END
===============================================================================

*** 2018-03-01 10:22:19.849
Oradebug command 'hanganalyze 3' console output:
Hang Analysis in /u01/app/oracle/diag/rdbms/book/book/trace/book_ora_37100.trc

--//对比等待事件,可以对上:
SYS@book> @ &r/wait
P1RAW            P2RAW            P3RAW                        P1             P2             P3            SID        SERIAL#           SEQ# EVENT                                    STATUS   STATE               WAIT_TIME_MICRO SECONDS_IN_WAIT WAIT_CLASS
---------------- ---------------- ---------------- -------------- -------------- -------------- -------------- -------------- -------------- ---------------------------------------- -------- ------------------- --------------- --------------- --------------------
0000000062657100 0000000000000001 00                   1650815232              1              0            274              7             58 SQL*Net message from client              ACTIVE   WAITED KNOWN TIME         120254317             949 Idle
00000000AB221382 0000011200000000 0000000500000000     2871137154  1176821039104    21474836480             28             25             49 cursor: pin S wait on X                  ACTIVE   WAITING                    63781034              64 Concurrency

---sid,serial#= 274,7 28,25,

SYS@book> select spid from v$process where addr in (select paddr from v$SESSION where sid in (274,28));
SPID
------
36827
37015
--//进程号也能对上.

--//转载:http://www.oracleblog.org/working-case/deal-with-row-cache-lock/
--//我们来看看([nodenum]/cnode/sid/sess_srno/session/ospid/state/start/finish/[adjlist]/predecessor)部分:
nodenum是hanganalyze自己为了记录这些会话而定制的编号,从0开始排起。
cnode是Node Id
sid是 Session ID
sess_srno是serial#
proc_ptr是Process Pointer
ospid 是OS Process ID
state 是node的状态
adjlist是临近的node(通常代表一个blocker node)
predecessor是Predecessor node ,通常代表一个 waiter node

state部分有几个比较重要的状态:
1.IN_HANG:这表示该node处于死锁状态,通常还有其他node(blocker)也处于该状态
2.LEAF/LEAF_NW:该node通常是blocker。通过条目的"predecessor"列可以判断这个node是否是blocker。LEAF说明该NODE没有等待其他
  资源,而LEAF_NW则可能是没有等待其他资源或者是在使用CPU.
3.NLEAF:通常可以看作这些会话是被阻塞的资源。发生这种情况一般说明数据库发生性能问题而不是数据库hang
4.IGN/IGN_DMP:这类会话通常被认为是空闲会话,除非其adjlist列里存在node。如果是非空闲会话则说明其adjlist里的node正在等待其他node释放资源。
5.SINGLE_NODE/SINGLE_NODE_NW:近似于空闲会话

--//也可以执行如下,链接:http://blog.itpub.net/267265/viewspace-2141699/
/* Formatted on 2018/3/1 10:28:02 (QP5 v5.252.13127.32867) */
SELECT s.inst_id AS inst
      ,s.sid AS blocked_sid
      ,s.username AS blocked_user
      ,sa.sql_id AS blocked_sql_id
      ,TRUNC (s.p2 / 4294967296) AS blocking_sid
      ,b.username AS blocking_user
      ,b.sql_id AS blocking_sql_id
  FROM gv$session s
       JOIN gv$sqlarea sa ON sa.hash_value = s.p1
       JOIN gv$session b
          ON TRUNC (s.p2 / 4294967296) = b.sid AND s.inst_id = b.inst_id
       JOIN gv$sqlarea sa2 ON b.sql_id = sa2.sql_id
WHERE s.event = 'cursor: pin S wait on X';

          INST    BLOCKED_SID BLOCKED_USER                   BLOCKED_SQL_I   BLOCKING_SID BLOCKING_USER                  BLOCKING_SQL_
-------------- -------------- ------------------------------ ------------- -------------- ------------------------------ -------------
             1             28 SCOTT                          g0vgm2upk44w2            274 SCOTT                          g0vgm2upk44w2
--//sid=274阻塞了sid=28.

--//在gdb界面选择继续:
(gdb) cont
Continuing.

Program received signal SIGUSR2, User defined signal 2.
0x0000000002123fa0 in kxsPeekBinds ()
(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x0000000009805bd5 in slaac_int ()
(gdb) cont
Continuing.

Breakpoint 1, 0x0000000002123fa0 in kxsPeekBinds ()
#0  0x0000000002123fa0 in kxsPeekBinds ()
#1  0x00000000025b821c in opitca ()
#2  0x0000000001ecf7a9 in kksSetBindType ()
#3  0x0000000009641a89 in kksfbc ()
(gdb) cont
Continuing.

--//知道session 1执行完成,在看session 2,执行也ok.
--//session 1:
SCOTT@book(274,7)> Select * from dept where deptno=:b1;
    DEPTNO DNAME          LOC
---------- -------------- -------------
        10 ACCOUNTING     NEW YORK
--//session 2:
SCOTT@book(28,25)> Select * from dept where deptno=:b1;
  =  DEPTNO DNAME          LOC
---------- -------------- -------------
        20 RESEARCH       DALLAS
--//session 3:
SYS@book> @ &r/wait
no rows selected

4.补充说明:
cursor: pin S wait on X 的p1,p2,p3参数:
--//参考http://www.oratea.com/2017/02/03/single-task-message%e7%ad%89%e5%be%85%e4%ba%8b%e4%bb%b6/

P1=2871137154
P2=1176821039104
P3=21474836480

--//idn 通过以前的测试表示hash_value
--//PARAMETER2参数:
--//它的前一半(高位):就是要加Mutex的会话的SID,这里也就是274号会话。274号会话要申请Mutex。后一半(低位)是"引用计数"或模式。
SYS@book> @ &r/10to16 1176821039104
10 to 16 HEX   REVERSE16
-------------- -----------------------------------
0011200000000 0x00000000-12010000

--//0x112 = 274,就是session 1.

SYS@book> select 1176821039104,32,trunc(1176821039104/power(2,32)) ,mod(1176821039104,power(2,32))   from dual;
1176821039104             32 TRUNC(1176821039104/POWER(2,32)) MOD(1176821039104,POWER(2,32))
-------------- -------------- -------------------------------- ------------------------------
1176821039104             32                              274                              0

SYS@book> select sql_id,sql_text from v$sql where hash_value=2871137154 ;
SQL_ID        SQL_TEXT
------------- ------------------------------------------------------------
g0vgm2upk44w2 Select * from dept where deptno=:b1

5.测试脚本:
--//附上mutex.sql脚本:

# cat mutex.sql
column kglnaown format a20
column MUTEX_TYPE format a20
column kglnaobj format a100
column LOCATION format a20
  select * from (
  SELECT kglnahsh hash
        ,SUM (sleeps) sleeps
        ,location
        ,mutex_type
        ,kglobt03 sqlid
        ,kglnaown
        ,replace(kglnaobj,chr(13)) c100
    --,SUBSTR (kglnaobj, 1, 140) object
    FROM x$kglob, v$mutex_sleep_history
   WHERE kglnahsh = mutex_identifier
GROUP BY kglnaobj
        ,kglobt03
        ,kglnaown
        ,kglnahsh
        ,location
        ,mutex_type
ORDER BY sleeps DESC ) where rownum<= &1;

# cat wait.sql
select p1raw,p2raw,p3raw,p1,p2,p3,sid,serial#,seq#,event,status,state,wait_time_micro,seconds_in_wait,wait_class
from v$session where ( wait_class<>'Idle' or (status='ACTIVE' and STATE='WAITED KNOWN TIME'))
and sid not in (select sid from v$mystat where rownum=1)
order by event ;

网友评论

登录后评论
0/500
评论
lfreeali
+ 关注