linux – 服务器重载,就像内存不足一样,但事实并非如此

linux – 服务器重载,就像内存不足一样,但事实并非如此,第1张

概述我有一台运行QEMU-KVM虚拟化的Centos 6.5服务器: 硬件: > 40个CPU > 400 GB RAM 软件: >内核:2.6.32-431.17.1.el6.x86_64 > Qemu:0.12.1.2 > Libvirt:0.10.2 有3位客人,具有相同的hw配置: > 16个CPU > 120 GB RAM <memory unit=’KiB’>125829120</memo 我有一台运行QEMU-KVM虚拟化的Centos 6.5服务器:

硬件:

> 40个cpu
> 400 GB RAM

软件:

>内核:2.6.32-431.17.1.el6.x86_64
> Qemu:0.12.1.2
> libvirt:0.10.2

有3位客人,具有相同的hw配置:

> 16个cpu
> 120 GB RAM

<memory unit=’KiB’>125829120</memory>
<currentMemory unit=’KiB’>125829120</currentMemory>
<vcpu placement=’static’>16</vcpu>

客人正在运行Apache和MySQL.

在主机上只运行虚拟机旁边的一些备份和维护脚本,没有别的.

经过几天的运行后,问题就开始出现了.客人的负载随机上升至150左右,窃取cpu时间为10-15%.在主机上,负载大约为38-40,用户cpu时间约为30-40%,系统cpu时间为40-50%.

在那一刻,主机上耗费大量cpu的进程是Qemu虚拟客户机的进程,紧随其后的是kswapd0和kswapd1,cpu使用率为100%.

那一刻的内存使用情况:

> RAM总计378.48 GB
> RAM使用330.82 GB
> RAM免费47.66 GB
> SWAP总计500.24 MB
> SWAP使用497.13 MB
> SWAP free 3192 kB

加上缓冲区中的10-20 GB RAM.

所以,从内存使用的角度来看,不应该有任何问题.但是kswapd进程的繁重工作表明内存不足,也就是那个方向上的完全交换点(当我转换掉时,它会在几分钟内被填满).而偶尔,OOM杀手会杀死一些进程:

Nov 20 12:42:42 wv2-f302 kernel: active_anon:79945387 inactive_anon:3660742 isolated_anon:0Nov 20 12:42:42 wv2-f302 kernel: active_file:252 inactive_file:0 isolated_file:0Nov 20 12:42:42 wv2-f302 kernel: unevictable:0 dirty:2 writeback:0 unstable:0Nov 20 12:42:42 wv2-f302 kernel: free:12513746 slab_reclaimable:5001 slab_unreclaimable:1759785Nov 20 12:42:42 wv2-f302 kernel: mapped:213 shmem:41 pagetables:188243 bounce:0Nov 20 12:42:42 wv2-f302 kernel: Node 0 DMA free:15728kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15332kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yesNov 20 12:42:42 wv2-f302 kernel: lowmem_reserve[]: 0 2965 193855 193855Nov 20 12:42:42 wv2-f302 kernel: Node 0 DMA32 free:431968kB min:688kB low:860kB high:1032kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3037072kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yesNov 20 12:42:42 wv2-f302 kernel: lowmem_reserve[]: 0 0 190890 190890Nov 20 12:42:42 wv2-f302 kernel: Node 0 normal free:6593828kB min:44356kB low:55444kB high:66532kB active_anon:178841380kB inactive_anon:7783292kB active_file:540kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:195471360kB mlocked:0kB dirty:8kB writeback:0kB mapped:312kB shmem:48kB slab_reclaimable:11136kB slab_unreclaimable:1959664kB kernel_stack:5104kB pagetables:397332kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? noNov 20 12:42:42 wv2-f302 kernel: lowmem_reserve[]: 0 0 0 0Nov 20 12:42:42 wv2-f302 kernel: Node 1 normal free:43013460kB min:45060kB low:56324kB high:67588kB active_anon:140940168kB inactive_anon:6859676kB active_file:468kB inactive_file:56kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:198574076kB mlocked:0kB dirty:0kB writeback:0kB mapped:540kB shmem:116kB slab_reclaimable:8868kB slab_unreclaimable:5079476kB kernel_stack:2856kB pagetables:355640kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? noNov 20 12:42:42 wv2-f302 kernel: lowmem_reserve[]: 0 0 0 0Nov 20 12:42:42 wv2-f302 kernel: Node 0 DMA: 2*4kB 1*8kB 2*16kB 2*32kB 2*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15728kBNov 20 12:42:42 wv2-f302 kernel: Node 0 DMA32: 10*4kB 11*8kB 12*16kB 13*32kB 12*64kB 5*128kB 7*256kB 10*512kB 9*1024kB 6*2048kB 98*4096kB = 431968kBNov 20 12:42:42 wv2-f302 kernel: Node 0 normal: 1648026*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6592104kBNov 20 12:42:42 wv2-f302 kernel: Node 1 normal: 8390977*4kB 1181188*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 43013412kBNov 20 12:42:42 wv2-f302 kernel: 49429 total pagecache pagesNov 20 12:42:42 wv2-f302 kernel: 48929 pages in swap cacheNov 20 12:42:42 wv2-f302 kernel: Swap cache stats: add 2688331,delete 2639402,find 16219898/16530111Nov 20 12:42:42 wv2-f302 kernel: Free swap  = 3264kBNov 20 12:42:42 wv2-f302 kernel: Total swap = 512248kBNov 20 12:42:44 wv2-f302 kernel: 100663294 pages RAMNov 20 12:42:44 wv2-f302 kernel: 1446311 pages reservedNov 20 12:42:44 wv2-f302 kernel: 10374115 pages sharedNov 20 12:42:44 wv2-f302 kernel: 84534113 pages non-sharedOct 27 14:24:43 wv2-f302 kernel: [ pID ]   uID  tgID total_vm      RSS cpu oom_adj oom_score_adj nameOct 27 14:24:43 wv2-f302 kernel: [ 3878]     0  3878 32042399 31569413  10       0             0 qemu_wl52Oct 27 14:24:43 wv2-f302 kernel: [ 4321]     0  4321 32092081 31599762  20       0             0 qemu_wl51Oct 27 14:24:43 wv2-f302 kernel: [ 4394]     0  4394 32106979 31575717  15       0             0 qemu_wl50...Oct 27 14:24:43 wv2-f302 kernel: Out of memory: Kill process 3878 (qemu_wl52) score 318 or sacrifice childOct 27 14:24:43 wv2-f302 kernel: Killed process 3878,UID 0,(qemu_wl52) total-vm:128169596kB,anon-RSS:126277476kB,file-RSS:176kB

完全转储:http://evilcigi.eu/msg/msg.txt

然后我开始杀死的客人,从那一刻起,一切都很好,持续几天..使用与问题之前相同的内存使用情况:

> RAM总计378.48 GB
> RAM使用336.15 GB
> RAM免费42.33 GB
> SWAP总计500.24 MB
> SWAP使用344.55 MB
> SWAP免费155.69 MB

是否可能服务器以某种方式严重计算内存?还是有什么我想念的?

有一件事我想到,主机将所有空闲内存放在缓冲区和缓存中,然后遭受内存不足(调用OOM杀手)?但是,我认为,这不应该发生,对吧?而且,这并不能解释杀戮之前的行为.

先感谢您.

所以今天又出现了问题,这里是/ proc / meminfo的内容:

MemTotal:       396867932 kBMemFree:         9720268 kBBuffers:        53354000 kBCached:            22196 kBSwapCached:       343964 kBActive:         331872796 kBInactive:       41283992 kBActive(anon):   305458432 kBInactive(anon): 14322324 kBActive(file):   26414364 kBInactive(file): 26961668 kBUnevictable:           0 kBmlocked:               0 kBSwapTotal:        512248 kBSwapFree:              0 kBDirty:                48 kBWriteback:             0 kBAnonPages:      319438656 kBMapped:             8536 kBShmem:               164 kBSlab:            9052784 kBSReclaimable:    2014752 kBSUnreclaim:      7038032 kBKernelStack:        8064 kBPagetables:       650892 kBNFS_Unstable:          0 kBBounce:                0 kBWritebackTmp:          0 kBCommitlimit:    198946212 kBCommitted_AS:   383832752 kBVmallocTotal:   34359738367 kBVmallocUsed:     1824832 kBVmallocChunk:   34157271228 kBHarDWareCorrupted:     0 kBAnonHugePages:  31502336 kBHugePages_Total:       0HugePages_Free:        0HugePages_Rsvd:        0HugePages_Surp:        0Hugepagesize:       2048 kBDirectMap4k:        7852 kBDirectMap2M:     3102720 kBDirectMap1G:    399507456 kB

似乎所有“免费”内存都用在缓冲区中.

在@Matthew Ife关于内存碎片的提示之后,我已经压缩了内存并且还在主机上删除了缓存(以释放缓冲区中的60 GB),并使用以下命令:

echo 3 > /proc/sys/vm/drop_cachesecho 1 >/proc/sys/vm/compact_memory

以下是内存碎片现在的样子:

# cat /proc/buddyinfoNode 0,zone      DMA      2      1      2      2      2      1      0      0      1      1      3 Node 0,zone    DMA32     12     12     13     16     10      5      7     10      9      6     98 Node 0,zone   normal 2398537 469407 144288  97224  58276  24155   8153   3141   1299    451     75 Node 1,zone   normal 9182926 2727543 648104  81843   7915   1267    244     67      3      1      0

更新2014/11/25 – 服务器再次超载:

# cat /proc/buddyinfoNode 0,zone   normal 4374385  85408      0      0      0      0      0      0      0      0      0 Node 1,zone   normal 1830850 261703    460     14      0      0      0      0      0      0      0 # cat /proc/meminfo MemTotal:       396867932 kBMemFree:        28038892 kBBuffers:        49126656 kBCached:            19088 kBSwapCached:       303624 kBActive:         305426204 kBInactive:       49729776 kBActive(anon):   292040988 kBInactive(anon): 13969376 kBActive(file):   13385216 kBInactive(file): 35760400 kBUnevictable:           0 kBmlocked:               0 kBSwapTotal:        512248 kBSwapFree:             20 kBDirty:                28 kBWriteback:             0 kBAnonPages:      305706632 kBMapped:             9324 kBShmem:               124 kBSlab:            8616228 kBSReclaimable:    1580736 kBSUnreclaim:      7035492 kBKernelStack:        8200 kBPagetables:       702268 kBNFS_Unstable:          0 kBBounce:                0 kBWritebackTmp:          0 kBCommitlimit:    198946212 kBCommitted_AS:   384014048 kBVmallocTotal:   34359738367 kBVmallocUsed:     1824832 kBVmallocChunk:   34157271228 kBHarDWareCorrupted:     0 kBAnonHugePages:  31670272 kBHugePages_Total:       0HugePages_Free:        0HugePages_Rsvd:        0HugePages_Surp:        0Hugepagesize:       2048 kBDirectMap4k:        7852 kBDirectMap2M:     3102720 kBDirectMap1G:    399507456 kB

在syslog中有一些页面分配失败:

Nov 25 09:14:07 wv2-f302 kernel: qemu_wl50: page allocation failure. order:4,mode:0x20Nov 25 09:14:07 wv2-f302 kernel: PID: 4444,comm: qemu_wl50 Not tainted 2.6.32-431.17.1.el6.x86_64 #1Nov 25 09:14:07 wv2-f302 kernel: Call Trace:Nov 25 09:14:07 wv2-f302 kernel: <IRQ>  [<ffffffff8112f64a>] ? __alloc_pages_nodemask+0x74a/0x8d0Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8116e082>] ? kmem_getpages+0x62/0x170Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8116ec9a>] ? fallback_alloc+0x1ba/0x270Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8116ea19>] ? ____cache_alloc_node+0x99/0x160Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8116fbe0>] ? kmem_cache_alloc_node_trace+0x90/0x200Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8116fdfd>] ? __kmalloc_node+0x4d/0x60Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8144ff5a>] ? __alloc_skb+0x7a/0x180Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff81451070>] ? skb_copy+0x40/0xb0Nov 25 09:14:07 wv2-f302 kernel: [<ffffffffa013a55c>] ? tg3_start_xmit+0xa8c/0xd80 [tg3]Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff814603e4>] ? dev_hard_start_xmit+0x224/0x480Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8147be6a>] ? sch_direct_xmit+0x15a/0x1c0Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff814608e8>] ? dev_queue_xmit+0x228/0x320Nov 25 09:14:07 wv2-f302 kernel: [<ffffffffa02c8898>] ? br_dev_queue_push_xmit+0x88/0xc0 [brIDge]Nov 25 09:14:07 wv2-f302 kernel: [<ffffffffa02c8928>] ? br_forward_finish+0x58/0x60 [brIDge]Nov 25 09:14:07 wv2-f302 kernel: [<ffffffffa02c8ae8>] ? __br_deliver+0x78/0x110 [brIDge]Nov 25 09:14:07 wv2-f302 kernel: [<ffffffffa02c8bb5>] ? br_deliver+0x35/0x40 [brIDge]Nov 25 09:14:07 wv2-f302 kernel: [<ffffffffa02c78f4>] ? br_dev_xmit+0x114/0x140 [brIDge]Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff814603e4>] ? dev_hard_start_xmit+0x224/0x480Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8146087d>] ? dev_queue_xmit+0x1bd/0x320Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff81466785>] ? neigh_resolve_output+0x105/0x2d0Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8149a2f8>] ? ip_finish_output+0x148/0x310Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8149a578>] ? ip_output+0xb8/0xc0Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8149983f>] ? __ip_local_out+0x9f/0xb0Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff81499875>] ? ip_local_out+0x25/0x30Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff81499d50>] ? ip_queue_xmit+0x190/0x420Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff814af06e>] ? tcp_transmit_skb+0x40e/0x7b0Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff814b15b0>] ? tcp_write_xmit+0x230/0xa90Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff814b2130>] ? __tcp_push_pending_frames+0x30/0xe0Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff814a9893>] ? tcp_data_snd_check+0x33/0x100Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff814ad491>] ? tcp_rcv_established+0x381/0x7f0Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff814b5873>] ? tcp_v4_do_rcv+0x2e3/0x490Nov 25 09:14:07 wv2-f302 kernel: [<ffffffffa02b1557>] ? ipv4_confirm+0x87/0x1d0 [nf_conntrack_ipv4]Nov 25 09:14:07 wv2-f302 kernel: [<ffffffffa0124441>] ? nf_nat_fn+0x91/0x260 [iptable_nat]Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff814b717a>] ? tcp_v4_rcv+0x51a/0x900Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff81494300>] ? ip_local_deliver_finish+0x0/0x2d0Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff814943dd>] ? ip_local_deliver_finish+0xdd/0x2d0Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff81494668>] ? ip_local_deliver+0x98/0xa0Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff81493b2d>] ? ip_rcv_finish+0x12d/0x440Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff814940b5>] ? ip_rcv+0x275/0x350Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff81489509>] ? nf_iterate+0x69/0xb0Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8145b5db>] ? __netif_receive_skb+0x4ab/0x750Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8145f1f0>] ? netif_receive_skb+0x0/0x60Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8145f248>] ? netif_receive_skb+0x58/0x60Nov 25 09:14:07 wv2-f302 kernel: [<ffffffffa02c9af8>] ? br_handle_frame_finish+0x1e8/0x2a0 [brIDge]Nov 25 09:14:07 wv2-f302 kernel: [<ffffffffa02c9d5a>] ? br_handle_frame+0x1aa/0x250 [brIDge]Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8145b659>] ? __netif_receive_skb+0x529/0x750Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8145f248>] ? netif_receive_skb+0x58/0x60Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8145f350>] ? nAPI_skb_finish+0x50/0x70Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff81460ab9>] ? nAPI_gro_receive+0x39/0x50Nov 25 09:14:07 wv2-f302 kernel: [<ffffffffa0136b54>] ? tg3_poll_work+0xc24/0x1020 [tg3]Nov 25 09:14:07 wv2-f302 kernel: [<ffffffffa0136f9c>] ? tg3_poll_msix+0x4c/0x150 [tg3]Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff81460bd3>] ? net_rx_action+0x103/0x2f0Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff810a6da9>] ? ktime_get+0x69/0xf0Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8107a551>] ? __do_softirq+0xc1/0x1e0Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff810e6b20>] ? handle_IRQ_event+0x60/0x170Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8100fa75>] ? do_softirq+0x65/0xa0Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8107a405>] ? irq_exit+0x85/0x90Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff815312c5>] ? do_IRQ+0x75/0xf0Nov 25 09:14:07 wv2-f302 kernel: <EOI>  [<ffffffffa018e271>] ? kvm_arch_vcpu_ioctl_run+0x4c1/0x10b0 [kvm]Nov 25 09:14:07 wv2-f302 kernel: [<ffffffffa018e25f>] ? kvm_arch_vcpu_ioctl_run+0x4af/0x10b0 [kvm]Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff810aee2e>] ? futex_wake+0x10e/0x120Nov 25 09:14:07 wv2-f302 kernel: [<ffffffffa0175b04>] ? kvm_vcpu_ioctl+0x434/0x580 [kvm]Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8100b9ce>] ? common_interrupt+0xe/0x13Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8119d802>] ? vfs_ioctl+0x22/0xa0Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8119dcca>] ? do_vfs_ioctl+0x3aa/0x580Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff810b186b>] ? sys_futex+0x7b/0x170Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8119df21>] ? sys_ioctl+0x81/0xa0Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b

编辑:
通过减少为客人分配的内存来解决问题.现在有3位客户,每位80 GB RAM,为主机系统留下大约150 GB RAM:

# free -h              total        used        free      shared  buff/cache   availableMem:           377G        243G         29G        1,9G        104G        132G
解决方法 有很多空闲内存,但这些区域完全是碎片化的:
Node 0 normal: 1648026*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6592104kBNode 1 normal: 8390977*4kB 1181188*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB

剩下的非零订单页面非常少,一个区域中根本没有.

我不能保证任何东西,但你可能想尝试关闭ksmd并重新压缩内存.压缩只在高阶页面分配时自动调用,并且永远不会调用oom-killer,因此我假设系统已尝试从订单2或3分配内存并卡住.

压缩内存运行echo 1> / proc / sys / vm / compact_memory

在这个问题上只有这么多,但我怀疑ksmd是通过扫描两个VM中重复的页面并将它们全部交换而导致碎片化.

总结

以上是内存溢出为你收集整理的linux – 服务器重载,就像内存不足一样,但事实并非如此全部内容,希望文章能够帮你解决linux – 服务器重载,就像内存不足一样,但事实并非如此所遇到的程序开发问题。

如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。

欢迎分享,转载请注明来源:内存溢出

原文地址: http://www.outofmemory.cn/yw/1039185.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-05-24
下一篇 2022-05-24

发表评论

登录后才能评论

评论列表(0条)

保存