PostgreSQL hung case

本文涉及的产品
云原生数据库 PolarDB MySQL 版,Serverless 5000PCU 100GB
简介:
1. OS : 
2. /var/log/message : 
3. LOAD : 
4. PostgreSQL csvlog : 
5. strace未做.

1. OS : 
Red Hat Enterprise Linux Server release 5.5 (Tikanga)
2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

2. /var/log/message : 
task hung >120秒, 涉及进程号 : 
May 31 16:35:11 postgres:915
May 31 16:37:02 postgres:18159
May 31 16:39:11 postgres:915
May 31 16:41:37 postgres:916
May 31 16:45:39 postgres:917
May 31 16:49:22 postgres:918
May 31 16:52:15 postgres:972
May 31 16:55:47 postgres:21304
May 31 16:59:33 postgres:20706
May 31 17:01:38 postgres:19971
详细内容 : 
May 31 16:35:11 db-xxx.xxx kernel: INFO: task postgres:915 blocked for more than 120 seconds.
May 31 16:35:20 db-xxx.xxx kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 31 16:35:29 db-xxx.xxx kernel: postgres      D ffff810227072120     0   915    676           916   677 (NOTLB)
May 31 16:35:33 db-xxx.xxx kernel:  ffff8102bc463d88 0000000000000082 0001a548c08e672d 00000000000af397
May 31 16:35:43 db-xxx.xxx kernel:  ffff8101f6819a48 0000000000000009 ffff8101f6819860 ffff81041fecc820
May 31 16:35:49 db-xxx.xxx kernel:  0001a55ab4cbed65 0000000000000985 ffff8101f6819a48 0000000576ed6890
May 31 16:35:50 db-xxx.xxx kernel: Call Trace:
May 31 16:35:52 db-xxx.xxx kernel:  [<ffffffff88036d8a>] :jbd:log_wait_commit+0xa3/0xf5
May 31 16:35:53 db-xxx.xxx kernel:  [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 16:35:54 db-xxx.xxx kernel:  [<ffffffff8803178a>] :jbd:journal_stop+0x1cf/0x1ff
May 31 16:35:58 db-xxx.xxx kernel:  [<ffffffff8002ff40>] __writeback_single_inode+0x1e9/0x328
May 31 16:36:15 db-xxx.xxx kernel:  [<ffffffff800484d0>] pagevec_lookup_tag+0x1a/0x21
May 31 16:36:21 db-xxx.xxx kernel:  [<ffffffff8004a66a>] wait_on_page_writeback_range+0xd6/0x12e
May 31 16:36:30 db-xxx.xxx kernel:  [<ffffffff800f3d9d>] sync_inode+0x24/0x33
May 31 16:36:42 db-xxx.xxx kernel:  [<ffffffff8804c36d>] :ext3:ext3_sync_file+0xc9/0xdc
May 31 16:36:48 db-xxx.xxx kernel:  [<ffffffff8005073a>] do_fsync+0x52/0xa4
May 31 16:36:52 db-xxx.xxx kernel:  [<ffffffff800e1ce9>] __do_fsync+0x23/0x36
May 31 16:36:57 db-xxx.xxx kernel:  [<ffffffff8005e28d>] tracesys+0xd5/0xe0
May 31 16:36:59 db-xxx.xxx kernel: 
May 31 16:37:02 db-xxx.xxx kernel: INFO: task postgres:18159 blocked for more than 120 seconds.
May 31 16:37:06 db-xxx.xxx kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 31 16:37:09 db-xxx.xxx kernel: postgres      D ffff810227072120     0 18159    676         18208 18149 (NOTLB)
May 31 16:37:11 db-xxx.xxx kernel:  ffff81035b3e7d88 0000000000000082 ffff81035b3e7d98 ffffffff80063ff8
May 31 16:37:15 db-xxx.xxx kernel:  0000000000001000 0000000000000008 ffff81025d50f7e0 ffff81041fecc820
May 31 16:37:22 db-xxx.xxx kernel:  0001a55ab4da022a 0000000000000857 ffff81025d50f9c8 000000050000b23e
May 31 16:37:29 db-xxx.xxx kernel: Call Trace:
May 31 16:37:33 db-xxx.xxx kernel:  [<ffffffff80063ff8>] thread_return+0x62/0xfe
May 31 16:37:38 db-xxx.xxx kernel:  [<ffffffff88036d8a>] :jbd:log_wait_commit+0xa3/0xf5
May 31 16:37:45 db-xxx.xxx kernel:  [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 16:37:49 db-xxx.xxx kernel:  [<ffffffff80098e91>] process_timeout+0x0/0x5
May 31 16:37:51 db-xxx.xxx kernel:  [<ffffffff8803178a>] :jbd:journal_stop+0x1cf/0x1ff
May 31 16:37:53 db-xxx.xxx kernel:  [<ffffffff8002ff40>] __writeback_single_inode+0x1e9/0x328
May 31 16:37:53 db-xxx.xxx kernel:  [<ffffffff80067b88>] do_page_fault+0x4fe/0x874
May 31 16:37:58 db-xxx.xxx kernel:  [<ffffffff800f3d9d>] sync_inode+0x24/0x33
May 31 16:38:13 db-xxx.xxx kernel:  [<ffffffff8804c36d>] :ext3:ext3_sync_file+0xc9/0xdc
May 31 16:38:23 db-xxx.xxx kernel:  [<ffffffff8005073a>] do_fsync+0x52/0xa4
May 31 16:38:35 db-xxx.xxx kernel:  [<ffffffff800e1ce9>] __do_fsync+0x23/0x36
May 31 16:38:46 db-xxx.xxx kernel:  [<ffffffff8005e28d>] tracesys+0xd5/0xe0
May 31 16:38:55 db-xxx.xxx kernel: 
May 31 16:39:11 db-xxx.xxx kernel: INFO: task postgres:915 blocked for more than 120 seconds.
May 31 16:39:25 db-xxx.xxx kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 31 16:39:36 db-xxx.xxx kernel: postgres      D ffff81018667af78     0   915    676           916   677 (NOTLB)
May 31 16:39:36 db-xxx.xxx kernel:  ffff8102bc463d88 0000000000000082 ffff8102bc463d98 ffffffff80063ff8
May 31 16:39:51 db-xxx.xxx kernel:  ffff8101f6819a48 0000000000000009 ffff8101f6819860 ffff81025d50f7e0
May 31 16:40:00 db-xxx.xxx kernel:  0001a5817aa3b719 00000000000006d1 ffff8101f6819a48 00000005000004e5
May 31 16:40:17 db-xxx.xxx kernel: Call Trace:
May 31 16:40:38 db-xxx.xxx kernel:  [<ffffffff80063ff8>] thread_return+0x62/0xfe
May 31 16:40:45 db-xxx.xxx kernel:  [<ffffffff88036d8a>] :jbd:log_wait_commit+0xa3/0xf5
May 31 16:40:58 db-xxx.xxx kernel:  [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 16:41:08 db-xxx.xxx kernel:  [<ffffffff8803178a>] :jbd:journal_stop+0x1cf/0x1ff
May 31 16:41:22 db-xxx.xxx kernel:  [<ffffffff8002ff40>] __writeback_single_inode+0x1e9/0x328
May 31 16:41:31 db-xxx.xxx kernel:  [<ffffffff800484d0>] pagevec_lookup_tag+0x1a/0x21
May 31 16:41:31 db-xxx.xxx kernel:  [<ffffffff8004a66a>] wait_on_page_writeback_range+0xd6/0x12e
May 31 16:41:31 db-xxx.xxx kernel:  [<ffffffff800f3d9d>] sync_inode+0x24/0x33
May 31 16:41:31 db-xxx.xxx kernel:  [<ffffffff8804c36d>] :ext3:ext3_sync_file+0xc9/0xdc
May 31 16:41:31 db-xxx.xxx kernel:  [<ffffffff8005073a>] do_fsync+0x52/0xa4
May 31 16:41:31 db-xxx.xxx kernel:  [<ffffffff800e1ce9>] __do_fsync+0x23/0x36
May 31 16:41:32 db-xxx.xxx kernel:  [<ffffffff8005e28d>] tracesys+0xd5/0xe0
May 31 16:41:32 db-xxx.xxx kernel: 
May 31 16:41:37 db-xxx.xxx kernel: INFO: task postgres:916 blocked for more than 120 seconds.
May 31 16:41:42 db-xxx.xxx kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 31 16:41:58 db-xxx.xxx kernel: postgres      D ffff81021e7f6600     0   916    676           917   915 (NOTLB)
May 31 16:42:13 db-xxx.xxx kernel:  ffff81032593faa8 0000000000000082 ffff810105d5b538 0000000000000286
May 31 16:42:26 db-xxx.xxx kernel:  ffff81032593fa28 000000000000000a ffff8103e65810c0 ffff810290158080
May 31 16:42:39 db-xxx.xxx kernel:  0001a58a6429ef21 0000000000003616 ffff8103e65812a8 0000000600200200
May 31 16:42:47 db-xxx.xxx kernel: Call Trace:
May 31 16:43:04 db-xxx.xxx kernel:  [<ffffffff800cd378>] zone_statistics+0x3e/0x6d
May 31 16:43:16 db-xxx.xxx kernel:  [<ffffffff88032002>] :jbd:start_this_handle+0x2e5/0x36c
May 31 16:43:26 db-xxx.xxx kernel:  [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 16:43:35 db-xxx.xxx kernel:  [<ffffffff88032152>] :jbd:journal_start+0xc9/0x100
May 31 16:43:48 db-xxx.xxx kernel:  [<ffffffff88050347>] :ext3:ext3_write_begin+0x9a/0x1cc
May 31 16:43:59 db-xxx.xxx kernel:  [<ffffffff8000fd40>] generic_file_buffered_write+0x14b/0x675
May 31 16:44:05 db-xxx.xxx kernel:  [<ffffffff880317ae>] :jbd:journal_stop+0x1f3/0x1ff
May 31 16:44:21 db-xxx.xxx kernel:  [<ffffffff80016641>] __generic_file_aio_write_nolock+0x369/0x3b6
May 31 16:44:38 db-xxx.xxx kernel:  [<ffffffff801209aa>] sys_semtimedop+0x6e7/0x720
May 31 16:45:07 db-xxx.xxx kernel:  [<ffffffff80021763>] generic_file_aio_write+0x65/0xc1
May 31 16:45:07 db-xxx.xxx kernel:  [<ffffffff8804c1b6>] :ext3:ext3_file_write+0x16/0x91
May 31 16:45:24 db-xxx.xxx kernel:  [<ffffffff80018266>] do_sync_write+0xc7/0x104
May 31 16:45:27 db-xxx.xxx kernel:  [<ffffffff8000eab5>] find_get_pages_tag+0x34/0x89
May 31 16:45:33 db-xxx.xxx kernel:  [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 16:45:33 db-xxx.xxx kernel:  [<ffffffff80063ff8>] thread_return+0x62/0xfe
May 31 16:45:34 db-xxx.xxx kernel:  [<ffffffff80016a49>] vfs_write+0xce/0x174
May 31 16:45:36 db-xxx.xxx kernel:  [<ffffffff80017316>] sys_write+0x45/0x6e
May 31 16:45:37 db-xxx.xxx kernel:  [<ffffffff8005e28d>] tracesys+0xd5/0xe0
May 31 16:45:38 db-xxx.xxx kernel: 
May 31 16:45:39 db-xxx.xxx kernel: INFO: task postgres:917 blocked for more than 120 seconds.
May 31 16:45:39 db-xxx.xxx kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 31 16:45:40 db-xxx.xxx kernel: postgres      D ffff810227072120     0   917    676           918   916 (NOTLB)
May 31 16:45:41 db-xxx.xxx kernel:  ffff81041b507b88 0000000000000086 ffff81023ae60380 ffffffff8022ff8d
May 31 16:45:46 db-xxx.xxx kernel:  0000000000000000 000000000000000a ffff8102272730c0 ffff81041fecc820
May 31 16:45:52 db-xxx.xxx kernel:  0001a5924a73a860 0000000000006aac ffff8102272732a8 0000000500000000
May 31 16:46:02 db-xxx.xxx kernel: Call Trace:
May 31 16:46:17 db-xxx.xxx kernel:  [<ffffffff8022ff8d>] dev_hard_start_xmit+0x1b7/0x28a
May 31 16:46:36 db-xxx.xxx kernel:  [<ffffffff800322bb>] ip_output+0x29a/0x2dd
May 31 16:46:51 db-xxx.xxx kernel:  [<ffffffff88032002>] :jbd:start_this_handle+0x2e5/0x36c
May 31 16:47:01 db-xxx.xxx kernel:  [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 16:47:19 db-xxx.xxx kernel:  [<ffffffff88032152>] :jbd:journal_start+0xc9/0x100
May 31 16:47:26 db-xxx.xxx kernel:  [<ffffffff88050c72>] :ext3:ext3_dirty_inode+0x28/0x7b
May 31 16:47:32 db-xxx.xxx kernel:  [<ffffffff80013c93>] __mark_inode_dirty+0x29/0x16e
May 31 16:47:37 db-xxx.xxx kernel:  [<ffffffff8000c44a>] do_generic_mapping_read+0x342/0x354
May 31 16:47:50 db-xxx.xxx kernel:  [<ffffffff8000d0fb>] file_read_actor+0x0/0x159
May 31 16:48:03 db-xxx.xxx kernel:  [<ffffffff8000c5a8>] __generic_file_aio_read+0x14c/0x198
May 31 16:48:10 db-xxx.xxx kernel:  [<ffffffff80016dd4>] generic_file_aio_read+0x34/0x39
May 31 16:48:20 db-xxx.xxx kernel:  [<ffffffff8000ce24>] do_sync_read+0xc7/0x104
May 31 16:48:35 db-xxx.xxx kernel:  [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 16:48:47 db-xxx.xxx kernel:  [<ffffffff8000df43>] do_mmap_pgoff+0x431/0x7d7
May 31 16:48:47 db-xxx.xxx kernel:  [<ffffffff8000e17e>] do_mmap_pgoff+0x66c/0x7d7
May 31 16:48:57 db-xxx.xxx kernel:  [<ffffffff8000b6b0>] vfs_read+0xcb/0x171
May 31 16:49:06 db-xxx.xxx kernel:  [<ffffffff80011c01>] sys_read+0x45/0x6e
May 31 16:49:06 db-xxx.xxx kernel:  [<ffffffff8005e28d>] tracesys+0xd5/0xe0
May 31 16:49:17 db-xxx.xxx kernel: 
May 31 16:49:22 db-xxx.xxx kernel: INFO: task postgres:918 blocked for more than 120 seconds.
May 31 16:49:33 db-xxx.xxx kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 31 16:49:45 db-xxx.xxx kernel: postgres      D ffff81021e7f6600     0   918    676           924   917 (NOTLB)
May 31 16:49:50 db-xxx.xxx kernel:  ffff81025f38dcd8 0000000000000086 0000000000000009 ffffffff8000b192
May 31 16:49:50 db-xxx.xxx kernel:  0000000000000009 0000000000000009 ffff81021c66a7a0 ffff8101b13d57e0
May 31 16:49:51 db-xxx.xxx kernel:  0001a589ed252ae3 0000000000347ea8 ffff81021c66a988 000000010454f400
May 31 16:49:51 db-xxx.xxx kernel: Call Trace:
May 31 16:49:51 db-xxx.xxx kernel:  [<ffffffff8000b192>] release_pages+0x14d/0x15a
May 31 16:49:51 db-xxx.xxx kernel:  [<ffffffff88032002>] :jbd:start_this_handle+0x2e5/0x36c
May 31 16:49:51 db-xxx.xxx kernel:  [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 16:49:51 db-xxx.xxx kernel:  [<ffffffff88032152>] :jbd:journal_start+0xc9/0x100
May 31 16:49:52 db-xxx.xxx kernel:  [<ffffffff8804d5f2>] :ext3:start_transaction+0x1c/0x48
May 31 16:50:02 db-xxx.xxx kernel:  [<ffffffff880501d9>] :ext3:ext3_delete_inode+0x0/0xd4
May 31 16:50:26 db-xxx.xxx kernel:  [<ffffffff88050208>] :ext3:ext3_delete_inode+0x2f/0xd4
May 31 16:50:44 db-xxx.xxx kernel:  [<ffffffff880501d9>] :ext3:ext3_delete_inode+0x0/0xd4
May 31 16:51:03 db-xxx.xxx kernel:  [<ffffffff8002f6a6>] generic_delete_inode+0xc6/0x143
May 31 16:51:13 db-xxx.xxx kernel:  [<ffffffff8000d4b3>] dput+0xf6/0x114
May 31 16:51:23 db-xxx.xxx kernel:  [<ffffffff80036f13>] sys_renameat+0x18a/0x1eb
May 31 16:51:39 db-xxx.xxx kernel:  [<ffffffff8000e4a1>] free_pages_and_swap_cache+0x67/0x7e
May 31 16:51:50 db-xxx.xxx kernel:  [<ffffffff800b878c>] audit_syscall_entry+0x180/0x1b3
May 31 16:51:50 db-xxx.xxx kernel:  [<ffffffff8005e28d>] tracesys+0xd5/0xe0
May 31 16:52:02 db-xxx.xxx kernel: 
May 31 16:52:15 db-xxx.xxx kernel: INFO: task postgres:972 blocked for more than 120 seconds.
May 31 16:52:28 db-xxx.xxx kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 31 16:52:40 db-xxx.xxx kernel: postgres      D ffff81022707a7a0     0   972    676           989   960 (NOTLB)
May 31 16:52:54 db-xxx.xxx kernel:  ffff81041bfcfaa8 0000000000000086 0000000000000000 0000000000000000
May 31 16:53:02 db-xxx.xxx kernel:  0000000000000000 000000000000000a ffff81041e6580c0 ffff81022714c7a0
May 31 16:53:21 db-xxx.xxx kernel:  0001a58f2d4d93ee 0000000000055ef9 ffff81041e6582a8 0000000736d40100
May 31 16:53:21 db-xxx.xxx kernel: Call Trace:
May 31 16:53:37 db-xxx.xxx kernel:  [<ffffffff800cd378>] zone_statistics+0x3e/0x6d
May 31 16:53:37 db-xxx.xxx kernel:  [<ffffffff88032002>] :jbd:start_this_handle+0x2e5/0x36c
May 31 16:53:49 db-xxx.xxx kernel:  [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 16:54:04 db-xxx.xxx kernel:  [<ffffffff88032152>] :jbd:journal_start+0xc9/0x100
May 31 16:54:17 db-xxx.xxx kernel:  [<ffffffff88050347>] :ext3:ext3_write_begin+0x9a/0x1cc
May 31 16:54:47 db-xxx.xxx kernel:  [<ffffffff8000fd40>] generic_file_buffered_write+0x14b/0x675
May 31 16:54:53 db-xxx.xxx kernel:  [<ffffffff880317ae>] :jbd:journal_stop+0x1f3/0x1ff
May 31 16:55:07 db-xxx.xxx kernel:  [<ffffffff80016641>] __generic_file_aio_write_nolock+0x369/0x3b6
May 31 16:55:23 db-xxx.xxx kernel:  [<ffffffff8000e324>] current_fs_time+0x3b/0x40
May 31 16:55:39 db-xxx.xxx kernel:  [<ffffffff80021763>] generic_file_aio_write+0x65/0xc1
May 31 16:55:41 db-xxx.xxx kernel:  [<ffffffff8804c1b6>] :ext3:ext3_file_write+0x16/0x91
May 31 16:55:42 db-xxx.xxx kernel:  [<ffffffff80018266>] do_sync_write+0xc7/0x104
May 31 16:55:42 db-xxx.xxx kernel:  [<ffffffff80067b88>] do_page_fault+0x4fe/0x874
May 31 16:55:43 db-xxx.xxx kernel:  [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 16:55:43 db-xxx.xxx kernel:  [<ffffffff800274fb>] do_filp_open+0x1c/0x38
May 31 16:55:43 db-xxx.xxx kernel:  [<ffffffff80016a49>] vfs_write+0xce/0x174
May 31 16:55:43 db-xxx.xxx kernel:  [<ffffffff80017316>] sys_write+0x45/0x6e
May 31 16:55:46 db-xxx.xxx kernel:  [<ffffffff8005e28d>] tracesys+0xd5/0xe0
May 31 16:55:46 db-xxx.xxx kernel: 
May 31 16:55:47 db-xxx.xxx kernel: INFO: task postgres:21304 blocked for more than 120 seconds.
May 31 16:55:48 db-xxx.xxx kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 31 16:55:59 db-xxx.xxx kernel: postgres      D ffff81021e7f6600     0 21304    676         24881  7368 (NOTLB)
May 31 16:56:06 db-xxx.xxx kernel:  ffff81025277db88 0000000000000082 ffff8102460080c0 ffff8103d4d58780
May 31 16:56:17 db-xxx.xxx kernel:  ffff8102460080c0 0000000000000009 ffff8102460080c0 ffff81024b0ac860
May 31 16:56:23 db-xxx.xxx kernel:  0001a58d6411b200 0000000000002302 ffff8102460082a8 0000000780063ff8
May 31 16:56:52 db-xxx.xxx kernel: Call Trace:
May 31 16:57:09 db-xxx.xxx kernel:  [<ffffffff88032002>] :jbd:start_this_handle+0x2e5/0x36c
May 31 16:57:18 db-xxx.xxx kernel:  [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 16:57:29 db-xxx.xxx kernel:  [<ffffffff88032152>] :jbd:journal_start+0xc9/0x100
May 31 16:57:39 db-xxx.xxx kernel:  [<ffffffff88050c72>] :ext3:ext3_dirty_inode+0x28/0x7b
May 31 16:57:55 db-xxx.xxx kernel:  [<ffffffff80013c93>] __mark_inode_dirty+0x29/0x16e
May 31 16:58:07 db-xxx.xxx kernel:  [<ffffffff8000c44a>] do_generic_mapping_read+0x342/0x354
May 31 16:58:23 db-xxx.xxx kernel:  [<ffffffff8000d0fb>] file_read_actor+0x0/0x159
May 31 16:58:35 db-xxx.xxx kernel:  [<ffffffff8000c5a8>] __generic_file_aio_read+0x14c/0x198
May 31 16:58:48 db-xxx.xxx kernel:  [<ffffffff80016dd4>] generic_file_aio_read+0x34/0x39
May 31 16:59:05 db-xxx.xxx kernel:  [<ffffffff8000ce24>] do_sync_read+0xc7/0x104
May 31 16:59:21 db-xxx.xxx kernel:  [<ffffffff8000e324>] current_fs_time+0x3b/0x40
May 31 16:59:28 db-xxx.xxx kernel:  [<ffffffff8002b9a9>] sys_recvfrom+0xd4/0x130
May 31 16:59:28 db-xxx.xxx kernel:  [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 16:59:29 db-xxx.xxx kernel:  [<ffffffff800634b6>] __sched_text_start+0xf6/0xbd6
May 31 16:59:29 db-xxx.xxx kernel:  [<ffffffff8000b6b0>] vfs_read+0xcb/0x171
May 31 16:59:29 db-xxx.xxx kernel:  [<ffffffff80011c01>] sys_read+0x45/0x6e
May 31 16:59:32 db-xxx.xxx kernel:  [<ffffffff8005e28d>] tracesys+0xd5/0xe0
May 31 16:59:32 db-xxx.xxx kernel: 
May 31 16:59:33 db-xxx.xxx kernel: INFO: task postgres:20706 blocked for more than 120 seconds.
May 31 16:59:33 db-xxx.xxx kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 31 16:59:36 db-xxx.xxx kernel: postgres      D ffff81021e7f6600     0 20706    676         20710 20702 (NOTLB)
May 31 16:59:37 db-xxx.xxx kernel:  ffff8102ffea1aa8 0000000000000086 000000054f79df8f ffffffff8022ff8d
May 31 16:59:38 db-xxx.xxx kernel:  ffff81041f108000 000000000000000a ffff81026add10c0 ffff8102e9cf20c0
May 31 16:59:38 db-xxx.xxx kernel:  0001a589f380b25e 0000000000078593 ffff81026add12a8 0000000771664800
May 31 16:59:38 db-xxx.xxx kernel: Call Trace:
May 31 16:59:38 db-xxx.xxx kernel:  [<ffffffff8022ff8d>] dev_hard_start_xmit+0x1b7/0x28a
May 31 16:59:39 db-xxx.xxx kernel:  [<ffffffff800cd378>] zone_statistics+0x3e/0x6d
May 31 16:59:48 db-xxx.xxx kernel:  [<ffffffff88032002>] :jbd:start_this_handle+0x2e5/0x36c
May 31 17:00:22 db-xxx.xxx kernel:  [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 17:00:52 db-xxx.xxx kernel:  [<ffffffff88032152>] :jbd:journal_start+0xc9/0x100
May 31 17:00:56 db-xxx.xxx kernel:  [<ffffffff88050347>] :ext3:ext3_write_begin+0x9a/0x1cc
May 31 17:00:57 db-xxx.xxx kernel:  [<ffffffff8000fd40>] generic_file_buffered_write+0x14b/0x675
May 31 17:00:58 db-xxx.xxx kernel:  [<ffffffff880317ae>] :jbd:journal_stop+0x1f3/0x1ff
May 31 17:01:00 db-xxx.xxx kernel:  [<ffffffff80016641>] __generic_file_aio_write_nolock+0x369/0x3b6
May 31 17:01:00 db-xxx.xxx kernel:  [<ffffffff80021763>] generic_file_aio_write+0x65/0xc1
May 31 17:01:00 db-xxx.xxx kernel:  [<ffffffff8804c1b6>] :ext3:ext3_file_write+0x16/0x91
May 31 17:01:00 db-xxx.xxx kernel:  [<ffffffff80018266>] do_sync_write+0xc7/0x104
May 31 17:01:00 db-xxx.xxx kernel:  [<ffffffff8022678b>] sock_from_file+0xf/0x60
May 31 17:01:01 db-xxx.xxx kernel:  [<ffffffff8002b9a9>] sys_recvfrom+0xd4/0x130
May 31 17:01:17 db-xxx.xxx kernel:  [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 17:01:35 db-xxx.xxx kernel:  [<ffffffff80019ebe>] current_kernel_time+0x26/0x38
May 31 17:01:37 db-xxx.xxx kernel:  [<ffffffff80016a49>] vfs_write+0xce/0x174
May 31 17:01:37 db-xxx.xxx kernel:  [<ffffffff80017316>] sys_write+0x45/0x6e
May 31 17:01:37 db-xxx.xxx kernel:  [<ffffffff8005e28d>] tracesys+0xd5/0xe0
May 31 17:01:37 db-xxx.xxx kernel: 
May 31 17:01:38 db-xxx.xxx kernel: INFO: task postgres:19971 blocked for more than 120 seconds.
May 31 17:01:38 db-xxx.xxx kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 31 17:01:38 db-xxx.xxx kernel: postgres      D ffff81021e7f6600     0 19971    676         19972 20717 (NOTLB)
May 31 17:01:38 db-xxx.xxx kernel:  ffff810242c5dbe8 0000000000000086 0000000000000001 ffffffff8001bf98
May 31 17:01:41 db-xxx.xxx kernel:  ffff81028b0b19c0 000000000000000a ffff8103f7f4b7a0 ffff81023b3ec100
May 31 17:01:41 db-xxx.xxx kernel:  0001a58c690dd582 0000000000044dbb ffff8103f7f4b988 000000078b0b19c0
May 31 17:01:42 db-xxx.xxx kernel: Call Trace:
May 31 17:01:42 db-xxx.xxx kernel:  [<ffffffff8001bf98>] tcp_rcv_established+0x8bd/0x925
May 31 17:01:42 db-xxx.xxx kernel:  [<ffffffff8003b875>] tcp_v4_do_rcv+0x2a/0x2fa
May 31 17:01:42 db-xxx.xxx kernel:  [<ffffffff80228f92>] sk_wait_data+0xac/0xbf
May 31 17:01:42 db-xxx.xxx kernel:  [<ffffffff88032002>] :jbd:start_this_handle+0x2e5/0x36c
May 31 17:01:43 db-xxx.xxx kernel:  [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 17:01:59 db-xxx.xxx kernel:  [<ffffffff88032152>] :jbd:journal_start+0xc9/0x100
May 31 17:02:22 db-xxx.xxx kernel:  [<ffffffff88050c72>] :ext3:ext3_dirty_inode+0x28/0x7b
May 31 17:02:40 db-xxx.xxx kernel:  [<ffffffff80013c93>] __mark_inode_dirty+0x29/0x16e
May 31 17:02:42 db-xxx.xxx kernel:  [<ffffffff80016562>] __generic_file_aio_write_nolock+0x28a/0x3b6
May 31 17:02:42 db-xxx.xxx kernel:  [<ffffffff80021763>] generic_file_aio_write+0x65/0xc1
May 31 17:02:43 db-xxx.xxx kernel:  [<ffffffff8804c1b6>] :ext3:ext3_file_write+0x16/0x91
May 31 17:02:43 db-xxx.xxx kernel:  [<ffffffff80018266>] do_sync_write+0xc7/0x104
May 31 17:02:43 db-xxx.xxx kernel:  [<ffffffff8002b9a9>] sys_recvfrom+0xd4/0x130
May 31 17:02:43 db-xxx.xxx kernel:  [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
May 31 17:02:44 db-xxx.xxx kernel:  [<ffffffff80016a49>] vfs_write+0xce/0x174
May 31 17:02:44 db-xxx.xxx kernel:  [<ffffffff80017316>] sys_write+0x45/0x6e
May 31 17:02:46 db-xxx.xxx kernel:  [<ffffffff8005e28d>] tracesys+0xd5/0xe0
May 31 17:02:46 db-xxx.xxx kernel: 

3. STAT : 
sar -q
06:00:02 PM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15
04:30:01 PM         1       679      4.50      4.69      4.89
04:40:19 PM         0       930    360.01    203.95     90.12
04:50:18 PM         0      1341    424.38    390.62    244.93
05:00:29 PM         0       629    163.70    266.70    266.76
05:10:21 PM         0       545    151.23    171.54    215.05
05:20:19 PM        50       638     89.74    124.40    170.83
05:30:08 PM         0       505     85.16     90.21    131.29
05:40:11 PM         0       505     87.55     90.32    111.86
05:50:01 PM         1       183      4.01     36.36     78.12

sar -b
04:20:01 PM    505.29    310.97    194.32  30629.99  26180.39
04:30:01 PM    503.24    301.42    201.82  29280.97  26050.53
04:40:19 PM   1423.54     66.54   1357.00   2722.51  21954.34
04:50:18 PM   1586.79      5.16   1581.63    128.76  19194.44
05:00:29 PM   1413.65      2.62   1411.03     58.10  15683.34
05:10:21 PM   1286.75      8.75   1278.00    329.97  15370.59
05:20:19 PM   1330.03     19.53   1310.51   1450.70  16408.34
05:30:08 PM   1275.22      4.95   1270.28    289.15  14080.58
05:40:11 PM   1359.69      0.07   1359.62      4.72  15609.15
05:50:01 PM    924.40     87.56    836.84  11463.61  21348.42
06:00:02 PM    203.88     83.46    120.43   7030.39   5903.88

sar -c
03:20:01 PM      0.23
03:30:03 PM     11.39
03:40:02 PM     36.32
03:50:01 PM     36.44
04:00:01 PM     33.89
04:10:01 PM     30.26
04:20:01 PM     35.78
04:30:01 PM     34.99
04:40:19 PM     10.07
04:50:18 PM      1.21
05:00:29 PM      0.33
05:10:21 PM      0.59
05:20:19 PM      2.30
05:30:08 PM      0.79
05:40:11 PM      0.34
05:50:01 PM     16.07
06:00:02 PM      4.37

sar -f
03:20:01 PM       all     17.44      0.00      1.43     19.37      0.00     61.75
03:30:03 PM       all     16.12      0.00      2.69     19.42      0.00     61.77
03:40:02 PM       all      8.96      0.00      6.01     17.24      0.00     67.80
03:50:01 PM       all      8.56      0.00      5.96     15.68      0.00     69.81
04:00:01 PM       all      8.12      0.00      5.62     16.45      0.00     69.81
04:10:01 PM       all     13.17      0.00      4.93     17.36      0.00     64.53
04:20:01 PM       all     13.59      0.00      5.80     16.49      0.00     64.12
04:30:01 PM       all      8.09      0.00      5.83     16.30      0.00     69.78
04:40:19 PM       all      3.23      0.00     37.95     28.69      0.00     30.13
04:50:18 PM       all      0.29      0.00      0.83     45.72      0.00     53.16
05:00:29 PM       all      0.16      0.00      0.66     25.89      0.00     73.29
05:10:21 PM       all      0.49      0.00      1.07     15.32      0.00     83.12
05:20:19 PM       all      1.17      0.00     13.05     22.89      0.00     62.89
05:30:08 PM       all      0.61      0.00      4.39     31.01      0.00     63.99
05:40:11 PM       all      0.10      0.00      0.30     16.26      0.00     83.34
05:50:01 PM       all      2.33      0.00      2.85     21.73      0.00     73.09
06:00:02 PM       all      0.77      0.00      0.72      5.17      0.00     93.34
06:00:02 PM       CPU     %user     %nice   %system   %iowait    %steal     %idle
06:10:01 PM       all      7.90      0.00      0.90     13.15      0.00     78.05
06:20:01 PM       all      7.68      0.00      0.59      8.63      0.00     83.10
06:30:01 PM       all      2.81      0.00      0.49      8.64      0.00     88.06

4. PostgreSQL csvlog : 
被hung > 120秒的PostgreSQL 进程在干什么?
2013-05-31 16:28:10.018 CST,,,915,,515e4339.393,32377,,2013-04-05 11:21:29 CST,,0,LOG,00000,"checkpoint starting: time",,,,,,,,,""
915是bgwriter进程.

2013-05-31 16:32:07.486 CST,,,18159,"",51a86007.46ef,1,"",2013-05-31 16:32:07 CST,,0,LOG,00000,"connection received: host=xxx port=35511",,,,,,,,,""
2013-05-31 16:32:07.556 CST,"xxx","postgres",18159,"xxx:35511",51a86007.46ef,2,"authentication",2013-05-31 16:32:07 CST,418/819705,0,LOG,00000,"connection authorized: user=xxx database=postgres",,,,,,,,,""
正在等待执行csvlog monitor, 如下.

2013-05-31 16:31:27.626 CST,,,917,,515e4339.395,3,,2013-04-05 11:21:29 CST,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""
2013-05-31 16:34:25.689 CST,,,917,,515e4339.395,4,,2013-04-05 11:21:29 CST,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""
2013-05-31 16:34:35.716 CST,,,917,,515e4339.395,5,,2013-04-05 11:21:29 CST,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""
2013-05-31 16:34:45.788 CST,,,917,,515e4339.395,6,,2013-04-05 11:21:29 CST,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""
2013-05-31 16:36:37.164 CST,,,917,,515e4339.395,7,,2013-04-05 11:21:29 CST,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""
2013-05-31 16:36:47.178 CST,,,917,,515e4339.395,8,,2013-04-05 11:21:29 CST,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""
2013-05-31 16:36:57.194 CST,,,917,,515e4339.395,9,,2013-04-05 11:21:29 CST,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""
917是stats collector process进程.

2013-05-31 16:55:46.693 CST,"xxx","xxx",972,"xxx.xxx.6.40:34720",515e433b.3cc,862,"INSERT",2013-04-05 11:21:31 CST,54/2689795,1641358131,LOG,00000,"duration: 840612.144 ms  statement: INSERT INTO xxx (xxx) VALUES (xxx)",,,,,,,,,""

2013-05-31 16:41:39.754 CST,"xxx","xxx",21304,"xxx.xxx.6.253:56320",515eb0b2.5338,752,"INSERT",2013-04-05 19:08:34 CST,157/1012458,1641350121,LOG,00000,"duration: 298353.462 ms  execute <unnamed>: insert into xxx(
被hung住的应该是下一条SQL, 只是还未打印出来, 客户端已经强行退出了.
2013-05-31 16:51:07.220 CST,"xxx","xxx",21304,"xxx.xxx.6.253:56320",515eb0b2.5338,753,"idle",2013-04-05 19:08:34 CST,157/0,0,LOG,08P01,"unexpected EOF on client connection",,,,,,,,,""

正在等待扩展heap表存储.
2013-05-31 17:01:02.195 CST,"xxx","xxx",19971,"xxx.xxx.6.40:53112",5162317c.4e03,864,"INSERT waiting",2013-04-08 10:54:52 CST,143/2890876,1641362432,LOG,00000,"process 19971 still waiting for ExclusiveLock on extension of relation 886049 of database 16644 after 1000.799 ms",,,,,,"INSERT INTO xxx (xxx) VALUES (xxx)",,,""

4. 连接 : 
由于数据库响应速度下降, 程序不断发起新连接请求.
postgres@db-xxx-xxx-8-83-> cat postgresql-2013-05-31_000000.csv |grep "connection authorized"|grep "2013-05-31 16:2"|awk -F "user=" '{print $2}'|awk '{print $1}'|sort|uniq -c
      1 u4
postgres@db-xxx-xxx-8-83-> cat postgresql-2013-05-31_000000.csv |grep "connection authorized"|grep "2013-05-31 16:3"|awk -F "user=" '{print $2}'|awk '{print $1}'|sort|uniq -c
     72 u1
      6 u2
      8 u3
    187 u4
      4 u5
      1 u6
postgres@db-xxx-xxx-8-83-> cat postgresql-2013-05-31_000000.csv |grep "connection authorized"|grep "2013-05-31 16:4"|awk -F "user=" '{print $2}'|awk '{print $1}'|sort|uniq -c
      1 u1
    191 u2
    160 u3
    242 u4

[参考]
3. man strace
相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
相关文章
|
存储 关系型数据库 PostgreSQL
PostgreSQL 优化CASE - 有序UUID插件
标签 PostgreSQL , uuid , 无序uuid , 索引分裂 , io , 性能诊断 背景 无序UUID会带来很多问题,例如索引分裂膨胀,离散IO,WAL膨胀等,详见以前的分析。 Regular random UUIDs are distributed uniformly over the whole range of possible values.
2266 0
|
SQL 关系型数据库 数据库
PostgreSQL 设计优化case - 大宽表任意字段组合查询索引如何选择(btree, gin, rum) - (含单个索引列数超过32列的方法)
标签 PostgreSQL , adhoc查询 , 大宽表 , 任意字段组合查询 , 索引 , btree , gin , rum 背景 大宽表,任意字段组合查询,透视。是实时分析系统中的常见需求: 1、实时写入。
2482 0
|
SQL 存储 关系型数据库
PostgreSQL 设计优化case - 多对多 转 一对多(数组)
标签 PostgreSQL , 数组 , 多对多 , 一对多 , udf , JOIN 背景 某个系统存储了会员的标签,以及标签的描述信息。业务上需要通过会员ID得到会员的标签,再得到描述信息。 每个会员有若干标签,原来是这么存储的 1、会员标签表,人数5亿左右,每个人平均有几百个标签,1500亿行左右。
2116 0
|
SQL 算法 关系型数据库
PostgreSQL 10.1 手册_部分 II. SQL 语言_第 10 章 类型转换_10.5. UNION、CASE和相关结构
10.5. UNION、CASE和相关结构 SQL UNION结构必须使可能不相似的类型匹配成为一个单一的结果集。该决定算法被独立地应用到一个联合查询的每个输出列。 INTERSECT和EXCEPT采用和UNION相同的方法来决定不相似的类型。
1275 0