1. OS :
3. man strace
2. /var/log/message :
3. LOAD :
4. PostgreSQL csvlog :
5. strace未做.
1. OS :
Red Hat Enterprise Linux Server release 5.5 (Tikanga)
2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
2. /var/log/message :
task hung >120秒, 涉及进程号 :
May 31 16:35:11 postgres:915
May 31 16:37:02 postgres:18159
May 31 16:39:11 postgres:915
May 31 16:41:37 postgres:916
May 31 16:45:39 postgres:917
May 31 16:49:22 postgres:918
May 31 16:52:15 postgres:972
May 31 16:55:47 postgres:21304
May 31 16:59:33 postgres:20706
May 31 17:01:38 postgres:19971
详细内容 :
May 31 16:35:11 db-xxx.xxx kernel: INFO: task postgres:915 blocked for more than 120 seconds.
May 31 16:35:20 db-xxx.xxx kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 31 16:35:29 db-xxx.xxx kernel: postgres D ffff810227072120 0 915 676 916 677 (NOTLB)
May 31 16:35:33 db-xxx.xxx kernel: ffff8102bc463d88 0000000000000082 0001a548c08e672d 00000000000af397
May 31 16:35:43 db-xxx.xxx kernel: ffff8101f6819a48 0000000000000009 ffff8101f6819860 ffff81041fecc820
May 31 16:35:49 db-xxx.xxx kernel: 0001a55ab4cbed65 0000000000000985 ffff8101f6819a48 0000000576ed6890
May 31 16:35:50 db-xxx.xxx kernel: Call Trace:
May 31 16:35:52 db-xxx.xxx kernel: [<ffffffff88036d8a>] :jbd:log_wait_commit+0xa3/0xf5
May 31 16:35:53 db-xxx.xxx kernel: [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 16:35:54 db-xxx.xxx kernel: [<ffffffff8803178a>] :jbd:journal_stop+0x1cf/0x1ff
May 31 16:35:58 db-xxx.xxx kernel: [<ffffffff8002ff40>] __writeback_single_inode+0x1e9/0x328
May 31 16:36:15 db-xxx.xxx kernel: [<ffffffff800484d0>] pagevec_lookup_tag+0x1a/0x21
May 31 16:36:21 db-xxx.xxx kernel: [<ffffffff8004a66a>] wait_on_page_writeback_range+0xd6/0x12e
May 31 16:36:30 db-xxx.xxx kernel: [<ffffffff800f3d9d>] sync_inode+0x24/0x33
May 31 16:36:42 db-xxx.xxx kernel: [<ffffffff8804c36d>] :ext3:ext3_sync_file+0xc9/0xdc
May 31 16:36:48 db-xxx.xxx kernel: [<ffffffff8005073a>] do_fsync+0x52/0xa4
May 31 16:36:52 db-xxx.xxx kernel: [<ffffffff800e1ce9>] __do_fsync+0x23/0x36
May 31 16:36:57 db-xxx.xxx kernel: [<ffffffff8005e28d>] tracesys+0xd5/0xe0
May 31 16:36:59 db-xxx.xxx kernel:
May 31 16:37:02 db-xxx.xxx kernel: INFO: task postgres:18159 blocked for more than 120 seconds.
May 31 16:37:06 db-xxx.xxx kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 31 16:37:09 db-xxx.xxx kernel: postgres D ffff810227072120 0 18159 676 18208 18149 (NOTLB)
May 31 16:37:11 db-xxx.xxx kernel: ffff81035b3e7d88 0000000000000082 ffff81035b3e7d98 ffffffff80063ff8
May 31 16:37:15 db-xxx.xxx kernel: 0000000000001000 0000000000000008 ffff81025d50f7e0 ffff81041fecc820
May 31 16:37:22 db-xxx.xxx kernel: 0001a55ab4da022a 0000000000000857 ffff81025d50f9c8 000000050000b23e
May 31 16:37:29 db-xxx.xxx kernel: Call Trace:
May 31 16:37:33 db-xxx.xxx kernel: [<ffffffff80063ff8>] thread_return+0x62/0xfe
May 31 16:37:38 db-xxx.xxx kernel: [<ffffffff88036d8a>] :jbd:log_wait_commit+0xa3/0xf5
May 31 16:37:45 db-xxx.xxx kernel: [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 16:37:49 db-xxx.xxx kernel: [<ffffffff80098e91>] process_timeout+0x0/0x5
May 31 16:37:51 db-xxx.xxx kernel: [<ffffffff8803178a>] :jbd:journal_stop+0x1cf/0x1ff
May 31 16:37:53 db-xxx.xxx kernel: [<ffffffff8002ff40>] __writeback_single_inode+0x1e9/0x328
May 31 16:37:53 db-xxx.xxx kernel: [<ffffffff80067b88>] do_page_fault+0x4fe/0x874
May 31 16:37:58 db-xxx.xxx kernel: [<ffffffff800f3d9d>] sync_inode+0x24/0x33
May 31 16:38:13 db-xxx.xxx kernel: [<ffffffff8804c36d>] :ext3:ext3_sync_file+0xc9/0xdc
May 31 16:38:23 db-xxx.xxx kernel: [<ffffffff8005073a>] do_fsync+0x52/0xa4
May 31 16:38:35 db-xxx.xxx kernel: [<ffffffff800e1ce9>] __do_fsync+0x23/0x36
May 31 16:38:46 db-xxx.xxx kernel: [<ffffffff8005e28d>] tracesys+0xd5/0xe0
May 31 16:38:55 db-xxx.xxx kernel:
May 31 16:39:11 db-xxx.xxx kernel: INFO: task postgres:915 blocked for more than 120 seconds.
May 31 16:39:25 db-xxx.xxx kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 31 16:39:36 db-xxx.xxx kernel: postgres D ffff81018667af78 0 915 676 916 677 (NOTLB)
May 31 16:39:36 db-xxx.xxx kernel: ffff8102bc463d88 0000000000000082 ffff8102bc463d98 ffffffff80063ff8
May 31 16:39:51 db-xxx.xxx kernel: ffff8101f6819a48 0000000000000009 ffff8101f6819860 ffff81025d50f7e0
May 31 16:40:00 db-xxx.xxx kernel: 0001a5817aa3b719 00000000000006d1 ffff8101f6819a48 00000005000004e5
May 31 16:40:17 db-xxx.xxx kernel: Call Trace:
May 31 16:40:38 db-xxx.xxx kernel: [<ffffffff80063ff8>] thread_return+0x62/0xfe
May 31 16:40:45 db-xxx.xxx kernel: [<ffffffff88036d8a>] :jbd:log_wait_commit+0xa3/0xf5
May 31 16:40:58 db-xxx.xxx kernel: [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 16:41:08 db-xxx.xxx kernel: [<ffffffff8803178a>] :jbd:journal_stop+0x1cf/0x1ff
May 31 16:41:22 db-xxx.xxx kernel: [<ffffffff8002ff40>] __writeback_single_inode+0x1e9/0x328
May 31 16:41:31 db-xxx.xxx kernel: [<ffffffff800484d0>] pagevec_lookup_tag+0x1a/0x21
May 31 16:41:31 db-xxx.xxx kernel: [<ffffffff8004a66a>] wait_on_page_writeback_range+0xd6/0x12e
May 31 16:41:31 db-xxx.xxx kernel: [<ffffffff800f3d9d>] sync_inode+0x24/0x33
May 31 16:41:31 db-xxx.xxx kernel: [<ffffffff8804c36d>] :ext3:ext3_sync_file+0xc9/0xdc
May 31 16:41:31 db-xxx.xxx kernel: [<ffffffff8005073a>] do_fsync+0x52/0xa4
May 31 16:41:31 db-xxx.xxx kernel: [<ffffffff800e1ce9>] __do_fsync+0x23/0x36
May 31 16:41:32 db-xxx.xxx kernel: [<ffffffff8005e28d>] tracesys+0xd5/0xe0
May 31 16:41:32 db-xxx.xxx kernel:
May 31 16:41:37 db-xxx.xxx kernel: INFO: task postgres:916 blocked for more than 120 seconds.
May 31 16:41:42 db-xxx.xxx kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 31 16:41:58 db-xxx.xxx kernel: postgres D ffff81021e7f6600 0 916 676 917 915 (NOTLB)
May 31 16:42:13 db-xxx.xxx kernel: ffff81032593faa8 0000000000000082 ffff810105d5b538 0000000000000286
May 31 16:42:26 db-xxx.xxx kernel: ffff81032593fa28 000000000000000a ffff8103e65810c0 ffff810290158080
May 31 16:42:39 db-xxx.xxx kernel: 0001a58a6429ef21 0000000000003616 ffff8103e65812a8 0000000600200200
May 31 16:42:47 db-xxx.xxx kernel: Call Trace:
May 31 16:43:04 db-xxx.xxx kernel: [<ffffffff800cd378>] zone_statistics+0x3e/0x6d
May 31 16:43:16 db-xxx.xxx kernel: [<ffffffff88032002>] :jbd:start_this_handle+0x2e5/0x36c
May 31 16:43:26 db-xxx.xxx kernel: [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 16:43:35 db-xxx.xxx kernel: [<ffffffff88032152>] :jbd:journal_start+0xc9/0x100
May 31 16:43:48 db-xxx.xxx kernel: [<ffffffff88050347>] :ext3:ext3_write_begin+0x9a/0x1cc
May 31 16:43:59 db-xxx.xxx kernel: [<ffffffff8000fd40>] generic_file_buffered_write+0x14b/0x675
May 31 16:44:05 db-xxx.xxx kernel: [<ffffffff880317ae>] :jbd:journal_stop+0x1f3/0x1ff
May 31 16:44:21 db-xxx.xxx kernel: [<ffffffff80016641>] __generic_file_aio_write_nolock+0x369/0x3b6
May 31 16:44:38 db-xxx.xxx kernel: [<ffffffff801209aa>] sys_semtimedop+0x6e7/0x720
May 31 16:45:07 db-xxx.xxx kernel: [<ffffffff80021763>] generic_file_aio_write+0x65/0xc1
May 31 16:45:07 db-xxx.xxx kernel: [<ffffffff8804c1b6>] :ext3:ext3_file_write+0x16/0x91
May 31 16:45:24 db-xxx.xxx kernel: [<ffffffff80018266>] do_sync_write+0xc7/0x104
May 31 16:45:27 db-xxx.xxx kernel: [<ffffffff8000eab5>] find_get_pages_tag+0x34/0x89
May 31 16:45:33 db-xxx.xxx kernel: [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 16:45:33 db-xxx.xxx kernel: [<ffffffff80063ff8>] thread_return+0x62/0xfe
May 31 16:45:34 db-xxx.xxx kernel: [<ffffffff80016a49>] vfs_write+0xce/0x174
May 31 16:45:36 db-xxx.xxx kernel: [<ffffffff80017316>] sys_write+0x45/0x6e
May 31 16:45:37 db-xxx.xxx kernel: [<ffffffff8005e28d>] tracesys+0xd5/0xe0
May 31 16:45:38 db-xxx.xxx kernel:
May 31 16:45:39 db-xxx.xxx kernel: INFO: task postgres:917 blocked for more than 120 seconds.
May 31 16:45:39 db-xxx.xxx kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 31 16:45:40 db-xxx.xxx kernel: postgres D ffff810227072120 0 917 676 918 916 (NOTLB)
May 31 16:45:41 db-xxx.xxx kernel: ffff81041b507b88 0000000000000086 ffff81023ae60380 ffffffff8022ff8d
May 31 16:45:46 db-xxx.xxx kernel: 0000000000000000 000000000000000a ffff8102272730c0 ffff81041fecc820
May 31 16:45:52 db-xxx.xxx kernel: 0001a5924a73a860 0000000000006aac ffff8102272732a8 0000000500000000
May 31 16:46:02 db-xxx.xxx kernel: Call Trace:
May 31 16:46:17 db-xxx.xxx kernel: [<ffffffff8022ff8d>] dev_hard_start_xmit+0x1b7/0x28a
May 31 16:46:36 db-xxx.xxx kernel: [<ffffffff800322bb>] ip_output+0x29a/0x2dd
May 31 16:46:51 db-xxx.xxx kernel: [<ffffffff88032002>] :jbd:start_this_handle+0x2e5/0x36c
May 31 16:47:01 db-xxx.xxx kernel: [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 16:47:19 db-xxx.xxx kernel: [<ffffffff88032152>] :jbd:journal_start+0xc9/0x100
May 31 16:47:26 db-xxx.xxx kernel: [<ffffffff88050c72>] :ext3:ext3_dirty_inode+0x28/0x7b
May 31 16:47:32 db-xxx.xxx kernel: [<ffffffff80013c93>] __mark_inode_dirty+0x29/0x16e
May 31 16:47:37 db-xxx.xxx kernel: [<ffffffff8000c44a>] do_generic_mapping_read+0x342/0x354
May 31 16:47:50 db-xxx.xxx kernel: [<ffffffff8000d0fb>] file_read_actor+0x0/0x159
May 31 16:48:03 db-xxx.xxx kernel: [<ffffffff8000c5a8>] __generic_file_aio_read+0x14c/0x198
May 31 16:48:10 db-xxx.xxx kernel: [<ffffffff80016dd4>] generic_file_aio_read+0x34/0x39
May 31 16:48:20 db-xxx.xxx kernel: [<ffffffff8000ce24>] do_sync_read+0xc7/0x104
May 31 16:48:35 db-xxx.xxx kernel: [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 16:48:47 db-xxx.xxx kernel: [<ffffffff8000df43>] do_mmap_pgoff+0x431/0x7d7
May 31 16:48:47 db-xxx.xxx kernel: [<ffffffff8000e17e>] do_mmap_pgoff+0x66c/0x7d7
May 31 16:48:57 db-xxx.xxx kernel: [<ffffffff8000b6b0>] vfs_read+0xcb/0x171
May 31 16:49:06 db-xxx.xxx kernel: [<ffffffff80011c01>] sys_read+0x45/0x6e
May 31 16:49:06 db-xxx.xxx kernel: [<ffffffff8005e28d>] tracesys+0xd5/0xe0
May 31 16:49:17 db-xxx.xxx kernel:
May 31 16:49:22 db-xxx.xxx kernel: INFO: task postgres:918 blocked for more than 120 seconds.
May 31 16:49:33 db-xxx.xxx kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 31 16:49:45 db-xxx.xxx kernel: postgres D ffff81021e7f6600 0 918 676 924 917 (NOTLB)
May 31 16:49:50 db-xxx.xxx kernel: ffff81025f38dcd8 0000000000000086 0000000000000009 ffffffff8000b192
May 31 16:49:50 db-xxx.xxx kernel: 0000000000000009 0000000000000009 ffff81021c66a7a0 ffff8101b13d57e0
May 31 16:49:51 db-xxx.xxx kernel: 0001a589ed252ae3 0000000000347ea8 ffff81021c66a988 000000010454f400
May 31 16:49:51 db-xxx.xxx kernel: Call Trace:
May 31 16:49:51 db-xxx.xxx kernel: [<ffffffff8000b192>] release_pages+0x14d/0x15a
May 31 16:49:51 db-xxx.xxx kernel: [<ffffffff88032002>] :jbd:start_this_handle+0x2e5/0x36c
May 31 16:49:51 db-xxx.xxx kernel: [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 16:49:51 db-xxx.xxx kernel: [<ffffffff88032152>] :jbd:journal_start+0xc9/0x100
May 31 16:49:52 db-xxx.xxx kernel: [<ffffffff8804d5f2>] :ext3:start_transaction+0x1c/0x48
May 31 16:50:02 db-xxx.xxx kernel: [<ffffffff880501d9>] :ext3:ext3_delete_inode+0x0/0xd4
May 31 16:50:26 db-xxx.xxx kernel: [<ffffffff88050208>] :ext3:ext3_delete_inode+0x2f/0xd4
May 31 16:50:44 db-xxx.xxx kernel: [<ffffffff880501d9>] :ext3:ext3_delete_inode+0x0/0xd4
May 31 16:51:03 db-xxx.xxx kernel: [<ffffffff8002f6a6>] generic_delete_inode+0xc6/0x143
May 31 16:51:13 db-xxx.xxx kernel: [<ffffffff8000d4b3>] dput+0xf6/0x114
May 31 16:51:23 db-xxx.xxx kernel: [<ffffffff80036f13>] sys_renameat+0x18a/0x1eb
May 31 16:51:39 db-xxx.xxx kernel: [<ffffffff8000e4a1>] free_pages_and_swap_cache+0x67/0x7e
May 31 16:51:50 db-xxx.xxx kernel: [<ffffffff800b878c>] audit_syscall_entry+0x180/0x1b3
May 31 16:51:50 db-xxx.xxx kernel: [<ffffffff8005e28d>] tracesys+0xd5/0xe0
May 31 16:52:02 db-xxx.xxx kernel:
May 31 16:52:15 db-xxx.xxx kernel: INFO: task postgres:972 blocked for more than 120 seconds.
May 31 16:52:28 db-xxx.xxx kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 31 16:52:40 db-xxx.xxx kernel: postgres D ffff81022707a7a0 0 972 676 989 960 (NOTLB)
May 31 16:52:54 db-xxx.xxx kernel: ffff81041bfcfaa8 0000000000000086 0000000000000000 0000000000000000
May 31 16:53:02 db-xxx.xxx kernel: 0000000000000000 000000000000000a ffff81041e6580c0 ffff81022714c7a0
May 31 16:53:21 db-xxx.xxx kernel: 0001a58f2d4d93ee 0000000000055ef9 ffff81041e6582a8 0000000736d40100
May 31 16:53:21 db-xxx.xxx kernel: Call Trace:
May 31 16:53:37 db-xxx.xxx kernel: [<ffffffff800cd378>] zone_statistics+0x3e/0x6d
May 31 16:53:37 db-xxx.xxx kernel: [<ffffffff88032002>] :jbd:start_this_handle+0x2e5/0x36c
May 31 16:53:49 db-xxx.xxx kernel: [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 16:54:04 db-xxx.xxx kernel: [<ffffffff88032152>] :jbd:journal_start+0xc9/0x100
May 31 16:54:17 db-xxx.xxx kernel: [<ffffffff88050347>] :ext3:ext3_write_begin+0x9a/0x1cc
May 31 16:54:47 db-xxx.xxx kernel: [<ffffffff8000fd40>] generic_file_buffered_write+0x14b/0x675
May 31 16:54:53 db-xxx.xxx kernel: [<ffffffff880317ae>] :jbd:journal_stop+0x1f3/0x1ff
May 31 16:55:07 db-xxx.xxx kernel: [<ffffffff80016641>] __generic_file_aio_write_nolock+0x369/0x3b6
May 31 16:55:23 db-xxx.xxx kernel: [<ffffffff8000e324>] current_fs_time+0x3b/0x40
May 31 16:55:39 db-xxx.xxx kernel: [<ffffffff80021763>] generic_file_aio_write+0x65/0xc1
May 31 16:55:41 db-xxx.xxx kernel: [<ffffffff8804c1b6>] :ext3:ext3_file_write+0x16/0x91
May 31 16:55:42 db-xxx.xxx kernel: [<ffffffff80018266>] do_sync_write+0xc7/0x104
May 31 16:55:42 db-xxx.xxx kernel: [<ffffffff80067b88>] do_page_fault+0x4fe/0x874
May 31 16:55:43 db-xxx.xxx kernel: [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 16:55:43 db-xxx.xxx kernel: [<ffffffff800274fb>] do_filp_open+0x1c/0x38
May 31 16:55:43 db-xxx.xxx kernel: [<ffffffff80016a49>] vfs_write+0xce/0x174
May 31 16:55:43 db-xxx.xxx kernel: [<ffffffff80017316>] sys_write+0x45/0x6e
May 31 16:55:46 db-xxx.xxx kernel: [<ffffffff8005e28d>] tracesys+0xd5/0xe0
May 31 16:55:46 db-xxx.xxx kernel:
May 31 16:55:47 db-xxx.xxx kernel: INFO: task postgres:21304 blocked for more than 120 seconds.
May 31 16:55:48 db-xxx.xxx kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 31 16:55:59 db-xxx.xxx kernel: postgres D ffff81021e7f6600 0 21304 676 24881 7368 (NOTLB)
May 31 16:56:06 db-xxx.xxx kernel: ffff81025277db88 0000000000000082 ffff8102460080c0 ffff8103d4d58780
May 31 16:56:17 db-xxx.xxx kernel: ffff8102460080c0 0000000000000009 ffff8102460080c0 ffff81024b0ac860
May 31 16:56:23 db-xxx.xxx kernel: 0001a58d6411b200 0000000000002302 ffff8102460082a8 0000000780063ff8
May 31 16:56:52 db-xxx.xxx kernel: Call Trace:
May 31 16:57:09 db-xxx.xxx kernel: [<ffffffff88032002>] :jbd:start_this_handle+0x2e5/0x36c
May 31 16:57:18 db-xxx.xxx kernel: [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 16:57:29 db-xxx.xxx kernel: [<ffffffff88032152>] :jbd:journal_start+0xc9/0x100
May 31 16:57:39 db-xxx.xxx kernel: [<ffffffff88050c72>] :ext3:ext3_dirty_inode+0x28/0x7b
May 31 16:57:55 db-xxx.xxx kernel: [<ffffffff80013c93>] __mark_inode_dirty+0x29/0x16e
May 31 16:58:07 db-xxx.xxx kernel: [<ffffffff8000c44a>] do_generic_mapping_read+0x342/0x354
May 31 16:58:23 db-xxx.xxx kernel: [<ffffffff8000d0fb>] file_read_actor+0x0/0x159
May 31 16:58:35 db-xxx.xxx kernel: [<ffffffff8000c5a8>] __generic_file_aio_read+0x14c/0x198
May 31 16:58:48 db-xxx.xxx kernel: [<ffffffff80016dd4>] generic_file_aio_read+0x34/0x39
May 31 16:59:05 db-xxx.xxx kernel: [<ffffffff8000ce24>] do_sync_read+0xc7/0x104
May 31 16:59:21 db-xxx.xxx kernel: [<ffffffff8000e324>] current_fs_time+0x3b/0x40
May 31 16:59:28 db-xxx.xxx kernel: [<ffffffff8002b9a9>] sys_recvfrom+0xd4/0x130
May 31 16:59:28 db-xxx.xxx kernel: [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 16:59:29 db-xxx.xxx kernel: [<ffffffff800634b6>] __sched_text_start+0xf6/0xbd6
May 31 16:59:29 db-xxx.xxx kernel: [<ffffffff8000b6b0>] vfs_read+0xcb/0x171
May 31 16:59:29 db-xxx.xxx kernel: [<ffffffff80011c01>] sys_read+0x45/0x6e
May 31 16:59:32 db-xxx.xxx kernel: [<ffffffff8005e28d>] tracesys+0xd5/0xe0
May 31 16:59:32 db-xxx.xxx kernel:
May 31 16:59:33 db-xxx.xxx kernel: INFO: task postgres:20706 blocked for more than 120 seconds.
May 31 16:59:33 db-xxx.xxx kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 31 16:59:36 db-xxx.xxx kernel: postgres D ffff81021e7f6600 0 20706 676 20710 20702 (NOTLB)
May 31 16:59:37 db-xxx.xxx kernel: ffff8102ffea1aa8 0000000000000086 000000054f79df8f ffffffff8022ff8d
May 31 16:59:38 db-xxx.xxx kernel: ffff81041f108000 000000000000000a ffff81026add10c0 ffff8102e9cf20c0
May 31 16:59:38 db-xxx.xxx kernel: 0001a589f380b25e 0000000000078593 ffff81026add12a8 0000000771664800
May 31 16:59:38 db-xxx.xxx kernel: Call Trace:
May 31 16:59:38 db-xxx.xxx kernel: [<ffffffff8022ff8d>] dev_hard_start_xmit+0x1b7/0x28a
May 31 16:59:39 db-xxx.xxx kernel: [<ffffffff800cd378>] zone_statistics+0x3e/0x6d
May 31 16:59:48 db-xxx.xxx kernel: [<ffffffff88032002>] :jbd:start_this_handle+0x2e5/0x36c
May 31 17:00:22 db-xxx.xxx kernel: [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 17:00:52 db-xxx.xxx kernel: [<ffffffff88032152>] :jbd:journal_start+0xc9/0x100
May 31 17:00:56 db-xxx.xxx kernel: [<ffffffff88050347>] :ext3:ext3_write_begin+0x9a/0x1cc
May 31 17:00:57 db-xxx.xxx kernel: [<ffffffff8000fd40>] generic_file_buffered_write+0x14b/0x675
May 31 17:00:58 db-xxx.xxx kernel: [<ffffffff880317ae>] :jbd:journal_stop+0x1f3/0x1ff
May 31 17:01:00 db-xxx.xxx kernel: [<ffffffff80016641>] __generic_file_aio_write_nolock+0x369/0x3b6
May 31 17:01:00 db-xxx.xxx kernel: [<ffffffff80021763>] generic_file_aio_write+0x65/0xc1
May 31 17:01:00 db-xxx.xxx kernel: [<ffffffff8804c1b6>] :ext3:ext3_file_write+0x16/0x91
May 31 17:01:00 db-xxx.xxx kernel: [<ffffffff80018266>] do_sync_write+0xc7/0x104
May 31 17:01:00 db-xxx.xxx kernel: [<ffffffff8022678b>] sock_from_file+0xf/0x60
May 31 17:01:01 db-xxx.xxx kernel: [<ffffffff8002b9a9>] sys_recvfrom+0xd4/0x130
May 31 17:01:17 db-xxx.xxx kernel: [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 17:01:35 db-xxx.xxx kernel: [<ffffffff80019ebe>] current_kernel_time+0x26/0x38
May 31 17:01:37 db-xxx.xxx kernel: [<ffffffff80016a49>] vfs_write+0xce/0x174
May 31 17:01:37 db-xxx.xxx kernel: [<ffffffff80017316>] sys_write+0x45/0x6e
May 31 17:01:37 db-xxx.xxx kernel: [<ffffffff8005e28d>] tracesys+0xd5/0xe0
May 31 17:01:37 db-xxx.xxx kernel:
May 31 17:01:38 db-xxx.xxx kernel: INFO: task postgres:19971 blocked for more than 120 seconds.
May 31 17:01:38 db-xxx.xxx kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 31 17:01:38 db-xxx.xxx kernel: postgres D ffff81021e7f6600 0 19971 676 19972 20717 (NOTLB)
May 31 17:01:38 db-xxx.xxx kernel: ffff810242c5dbe8 0000000000000086 0000000000000001 ffffffff8001bf98
May 31 17:01:41 db-xxx.xxx kernel: ffff81028b0b19c0 000000000000000a ffff8103f7f4b7a0 ffff81023b3ec100
May 31 17:01:41 db-xxx.xxx kernel: 0001a58c690dd582 0000000000044dbb ffff8103f7f4b988 000000078b0b19c0
May 31 17:01:42 db-xxx.xxx kernel: Call Trace:
May 31 17:01:42 db-xxx.xxx kernel: [<ffffffff8001bf98>] tcp_rcv_established+0x8bd/0x925
May 31 17:01:42 db-xxx.xxx kernel: [<ffffffff8003b875>] tcp_v4_do_rcv+0x2a/0x2fa
May 31 17:01:42 db-xxx.xxx kernel: [<ffffffff80228f92>] sk_wait_data+0xac/0xbf
May 31 17:01:42 db-xxx.xxx kernel: [<ffffffff88032002>] :jbd:start_this_handle+0x2e5/0x36c
May 31 17:01:43 db-xxx.xxx kernel: [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 17:01:59 db-xxx.xxx kernel: [<ffffffff88032152>] :jbd:journal_start+0xc9/0x100
May 31 17:02:22 db-xxx.xxx kernel: [<ffffffff88050c72>] :ext3:ext3_dirty_inode+0x28/0x7b
May 31 17:02:40 db-xxx.xxx kernel: [<ffffffff80013c93>] __mark_inode_dirty+0x29/0x16e
May 31 17:02:42 db-xxx.xxx kernel: [<ffffffff80016562>] __generic_file_aio_write_nolock+0x28a/0x3b6
May 31 17:02:42 db-xxx.xxx kernel: [<ffffffff80021763>] generic_file_aio_write+0x65/0xc1
May 31 17:02:43 db-xxx.xxx kernel: [<ffffffff8804c1b6>] :ext3:ext3_file_write+0x16/0x91
May 31 17:02:43 db-xxx.xxx kernel: [<ffffffff80018266>] do_sync_write+0xc7/0x104
May 31 17:02:43 db-xxx.xxx kernel: [<ffffffff8002b9a9>] sys_recvfrom+0xd4/0x130
May 31 17:02:43 db-xxx.xxx kernel: [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 17:02:44 db-xxx.xxx kernel: [<ffffffff80016a49>] vfs_write+0xce/0x174
May 31 17:02:44 db-xxx.xxx kernel: [<ffffffff80017316>] sys_write+0x45/0x6e
May 31 17:02:46 db-xxx.xxx kernel: [<ffffffff8005e28d>] tracesys+0xd5/0xe0
May 31 17:02:46 db-xxx.xxx kernel:
3. STAT :
sar -q
06:00:02 PM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15
04:30:01 PM 1 679 4.50 4.69 4.89
04:40:19 PM 0 930 360.01 203.95 90.12
04:50:18 PM 0 1341 424.38 390.62 244.93
05:00:29 PM 0 629 163.70 266.70 266.76
05:10:21 PM 0 545 151.23 171.54 215.05
05:20:19 PM 50 638 89.74 124.40 170.83
05:30:08 PM 0 505 85.16 90.21 131.29
05:40:11 PM 0 505 87.55 90.32 111.86
05:50:01 PM 1 183 4.01 36.36 78.12
sar -b
04:20:01 PM 505.29 310.97 194.32 30629.99 26180.39
04:30:01 PM 503.24 301.42 201.82 29280.97 26050.53
04:40:19 PM 1423.54 66.54 1357.00 2722.51 21954.34
04:50:18 PM 1586.79 5.16 1581.63 128.76 19194.44
05:00:29 PM 1413.65 2.62 1411.03 58.10 15683.34
05:10:21 PM 1286.75 8.75 1278.00 329.97 15370.59
05:20:19 PM 1330.03 19.53 1310.51 1450.70 16408.34
05:30:08 PM 1275.22 4.95 1270.28 289.15 14080.58
05:40:11 PM 1359.69 0.07 1359.62 4.72 15609.15
05:50:01 PM 924.40 87.56 836.84 11463.61 21348.42
06:00:02 PM 203.88 83.46 120.43 7030.39 5903.88
sar -c
03:20:01 PM 0.23
03:30:03 PM 11.39
03:40:02 PM 36.32
03:50:01 PM 36.44
04:00:01 PM 33.89
04:10:01 PM 30.26
04:20:01 PM 35.78
04:30:01 PM 34.99
04:40:19 PM 10.07
04:50:18 PM 1.21
05:00:29 PM 0.33
05:10:21 PM 0.59
05:20:19 PM 2.30
05:30:08 PM 0.79
05:40:11 PM 0.34
05:50:01 PM 16.07
06:00:02 PM 4.37
sar -f
03:20:01 PM all 17.44 0.00 1.43 19.37 0.00 61.75
03:30:03 PM all 16.12 0.00 2.69 19.42 0.00 61.77
03:40:02 PM all 8.96 0.00 6.01 17.24 0.00 67.80
03:50:01 PM all 8.56 0.00 5.96 15.68 0.00 69.81
04:00:01 PM all 8.12 0.00 5.62 16.45 0.00 69.81
04:10:01 PM all 13.17 0.00 4.93 17.36 0.00 64.53
04:20:01 PM all 13.59 0.00 5.80 16.49 0.00 64.12
04:30:01 PM all 8.09 0.00 5.83 16.30 0.00 69.78
04:40:19 PM all 3.23 0.00 37.95 28.69 0.00 30.13
04:50:18 PM all 0.29 0.00 0.83 45.72 0.00 53.16
05:00:29 PM all 0.16 0.00 0.66 25.89 0.00 73.29
05:10:21 PM all 0.49 0.00 1.07 15.32 0.00 83.12
05:20:19 PM all 1.17 0.00 13.05 22.89 0.00 62.89
05:30:08 PM all 0.61 0.00 4.39 31.01 0.00 63.99
05:40:11 PM all 0.10 0.00 0.30 16.26 0.00 83.34
05:50:01 PM all 2.33 0.00 2.85 21.73 0.00 73.09
06:00:02 PM all 0.77 0.00 0.72 5.17 0.00 93.34
06:00:02 PM CPU %user %nice %system %iowait %steal %idle
06:10:01 PM all 7.90 0.00 0.90 13.15 0.00 78.05
06:20:01 PM all 7.68 0.00 0.59 8.63 0.00 83.10
06:30:01 PM all 2.81 0.00 0.49 8.64 0.00 88.06
4. PostgreSQL csvlog :
被hung > 120秒的PostgreSQL 进程在干什么?
2013-05-31 16:28:10.018 CST,,,915,,515e4339.393,32377,,2013-04-05 11:21:29 CST,,0,LOG,00000,"checkpoint starting: time",,,,,,,,,""
915是bgwriter进程.
2013-05-31 16:32:07.486 CST,,,18159,"",51a86007.46ef,1,"",2013-05-31 16:32:07 CST,,0,LOG,00000,"connection received: host=xxx port=35511",,,,,,,,,""
2013-05-31 16:32:07.556 CST,"xxx","postgres",18159,"xxx:35511",51a86007.46ef,2,"authentication",2013-05-31 16:32:07 CST,418/819705,0,LOG,00000,"connection authorized: user=xxx database=postgres",,,,,,,,,""
正在等待执行csvlog monitor, 如下.
2013-05-31 16:31:27.626 CST,,,917,,515e4339.395,3,,2013-04-05 11:21:29 CST,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""
2013-05-31 16:34:25.689 CST,,,917,,515e4339.395,4,,2013-04-05 11:21:29 CST,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""
2013-05-31 16:34:35.716 CST,,,917,,515e4339.395,5,,2013-04-05 11:21:29 CST,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""
2013-05-31 16:34:45.788 CST,,,917,,515e4339.395,6,,2013-04-05 11:21:29 CST,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""
2013-05-31 16:36:37.164 CST,,,917,,515e4339.395,7,,2013-04-05 11:21:29 CST,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""
2013-05-31 16:36:47.178 CST,,,917,,515e4339.395,8,,2013-04-05 11:21:29 CST,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""
2013-05-31 16:36:57.194 CST,,,917,,515e4339.395,9,,2013-04-05 11:21:29 CST,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""
917是stats collector process进程.
2013-05-31 16:55:46.693 CST,"xxx","xxx",972,"xxx.xxx.6.40:34720",515e433b.3cc,862,"INSERT",2013-04-05 11:21:31 CST,54/2689795,1641358131,LOG,00000,"duration: 840612.144 ms statement: INSERT INTO xxx (xxx) VALUES (xxx)",,,,,,,,,""
2013-05-31 16:41:39.754 CST,"xxx","xxx",21304,"xxx.xxx.6.253:56320",515eb0b2.5338,752,"INSERT",2013-04-05 19:08:34 CST,157/1012458,1641350121,LOG,00000,"duration: 298353.462 ms execute <unnamed>: insert into xxx(
被hung住的应该是下一条SQL, 只是还未打印出来, 客户端已经强行退出了.
2013-05-31 16:51:07.220 CST,"xxx","xxx",21304,"xxx.xxx.6.253:56320",515eb0b2.5338,753,"idle",2013-04-05 19:08:34 CST,157/0,0,LOG,08P01,"unexpected EOF on client connection",,,,,,,,,""
正在等待扩展heap表存储.
2013-05-31 17:01:02.195 CST,"xxx","xxx",19971,"xxx.xxx.6.40:53112",5162317c.4e03,864,"INSERT waiting",2013-04-08 10:54:52 CST,143/2890876,1641362432,LOG,00000,"process 19971 still waiting for ExclusiveLock on extension of relation 886049 of database 16644 after 1000.799 ms",,,,,,"INSERT INTO xxx (xxx) VALUES (xxx)",,,""
4. 连接 :
由于数据库响应速度下降, 程序不断发起新连接请求.
postgres@db-xxx-xxx-8-83-> cat postgresql-2013-05-31_000000.csv |grep "connection authorized"|grep "2013-05-31 16:2"|awk -F "user=" '{print $2}'|awk '{print $1}'|sort|uniq -c
1 u4
postgres@db-xxx-xxx-8-83-> cat postgresql-2013-05-31_000000.csv |grep "connection authorized"|grep "2013-05-31 16:3"|awk -F "user=" '{print $2}'|awk '{print $1}'|sort|uniq -c
72 u1
6 u2
8 u3
187 u4
4 u5
1 u6
postgres@db-xxx-xxx-8-83-> cat postgresql-2013-05-31_000000.csv |grep "connection authorized"|grep "2013-05-31 16:4"|awk -F "user=" '{print $2}'|awk '{print $1}'|sort|uniq -c
1 u1
191 u2
160 u3
242 u4
[参考]