Today I've got an alert message because one of our EC2 instances exceed 70% limitation of its memory usage.
Its server type is c3.large and it has 3.75G of memory.
The main running application of this instance is a tomcat server and -Xmx2g option was given when it started.
At the point of failure, memory usage of the java process was 1.48g and there was no any memory leak on our JVM monitoring system.
Let's see the running information of normal instance first.
Result of landscape :
vdscm@:~$ landscape-sysinfo
System load: 0.0 Processes: 113
Usage of /: 1.9% of 98.30GB Users logged in: 1
Memory usage: 47% IP address for eth0: XX.XX.XX.XX
Swap usage: 0%
It uses 47% of memory.
Result of top : (sorted by %MEM desc)
top - 04:06:30 up 155 days, 21:37, 1 user, load average: 0.00, 0.01, 0.05
Tasks: 113 total, 1 running, 112 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.2 us, 0.0 sy, 0.0 ni, 99.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 3854812 total, 3691880 used, 162932 free, 154504 buffers
KiB Swap: 0 total, 0 used, 0 free. 1702992 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3707 vdscm 20 0 3621092 1.204g 16140 S 0.0 32.7 123:55.73 java
5197 syslog 20 0 256616 30612 892 S 0.0 0.8 1:33.79 rsyslogd
8973 root 20 0 1065348 8204 2304 S 0.0 0.2 4:04.69 s3fs
22462 root 20 0 105628 4248 3268 S 0.0 0.1 0:00.01 sshd
22549 vdscm 20 0 21200 3660 1748 S 0.0 0.1 0:00.07 bash
5276 root 20 0 172124 3144 2428 S 0.0 0.1 7:42.74 controller
588 root 20 0 10224 2916 624 S 0.0 0.1 0:04.53 dhclient
1 root 20 0 33496 2564 1212 S 0.0 0.1 0:15.51 init
...
3 root 20 0 0 0 0 S 0.0 0.0 0:01.36 ksoftirqd/0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
7 root 20 0 0 0 0 S 0.0 0.0 12:15.05 rcu_sched
8 root 20 0 0 0 0 S 0.0 0.0 21:04.46 rcuos/0
9 root 20 0 0 0 0 S 0.0 0.0 13:23.79 rcuos/1
10 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcuos/2
11 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcuos/3
Java process uses 32.7%. Total memory usage of all process is about 35%.
All process uses 35% and landscape says 47%. It's quite reasonable.
Next, Let's see the running information of the instance of problem.
Result of landscape :
vdscm@:~$ landscape-sysinfo
System load: 0.0 Processes: 117
Usage of /: 1.7% of 98.30GB Users logged in: 1
Memory usage: 72% IP address for eth0: XX.XX.XX.XX
Swap usage: 0%
It uses 72% of memory.
Result of top : (Sorted by %MEM desc)
top - 02:19:10 up 155 days, 19:11, 2 users, load average: 0.02, 0.02, 0.05
Tasks: 117 total, 1 running, 116 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.3 us, 0.0 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 3854812 total, 3689104 used, 165708 free, 155256 buffers
KiB Swap: 0 total, 0 used, 0 free. 727764 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26472 vdscm 20 0 3618176 1.480g 14536 S 0.3 40.3 160:22.85 java
983 syslog 20 0 260848 30372 452 S 0.0 0.8 1:41.61 rsyslogd
367 root 20 0 1205188 6000 1616 S 0.0 0.2 93:47.09 s3fs
24356 root 20 0 105628 4248 3264 S 0.0 0.1 0:00.00 sshd
20759 root 20 0 105628 4228 3244 S 0.0 0.1 0:00.00 sshd
24412 vdscm 20 0 21232 3696 1748 S 0.0 0.1 0:00.04 bash
20857 vdscm 20 0 21224 3684 1748 S 0.0 0.1 0:00.07 bash
590 root 20 0 10224 2908 620 S 0.0 0.1 0:04.45 dhclient
...
1198 root 20 0 12788 152 0 S 0.0 0.0 0:00.00 getty
2261 root 20 0 15252 152 0 S 0.0 0.0 2:09.86 nimbus
1128 root 20 0 4368 148 0 S 0.0 0.0 0:00.00 acpid
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:35.27 ksoftirqd/0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
7 root 20 0 0 0 0 S 0.0 0.0 36:53.03 rcu_sched
memory usage = used - (buffers + cached Mem)
2,806,084(0.7279%) = 3,689,104 - (155,256 + 727,764) ==> 72%
Memory usage doesn't contain buffered and cached memory.
Java process uses 40.3%, rsyslogd uses 0.8 and almost other processes use less or equal 0.1%. Its summary is about 43%.
It's only 8% much more than normal instance but landscape says 72%. Where the 29% of memory was gone ?
vdscm@:~$ cat /proc/meminfo
MemTotal: 3854812 kB
MemFree: 165828 kB
Buffers: 155256 kB
Cached: 727768 kB
SwapCached: 0 kB
Active: 2099668 kB
Inactive: 376808 kB
Active(anon): 1593532 kB
Inactive(anon): 304 kB
Active(file): 506136 kB
Inactive(file): 376504 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 76 kB
Writeback: 0 kB
AnonPages: 1593452 kB
Mapped: 22940 kB
Shmem: 384 kB
Slab: 282956 kB
SReclaimable: 164592 kB
SUnreclaim: 118364 kB
KernelStack: 1712 kB
PageTables: 6736 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 1927404 kB
Committed_AS: 2718656 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 8288 kB
VmallocChunk: 34359713979 kB
HardwareCorrupted: 0 kB
AnonHugePages: 1546240 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 28672 kB
DirectMap2M: 3903488 kB
slab only uses 0.2g of memory.
We killed the java process and restarted it. These are only things we did.
After that, 72% memory usage went down to 30%.
It seems that the java process had been occupying more 30% of system memory in addition to its %MEM usage shown by top command.
How is it possible ?
After Restart Java,
Result of landscape :
vdscm@:~$ landscape-sysinfo
System load: 0.0 Processes: 113
Usage of /: 1.7% of 98.30GB Users logged in: 1
Memory usage: 30% IP address for eth0: XX.XX.XX.XX
Swap usage: 0%
Result of top : (Sorted by %MEM desc)
top - 01:01:25 up 156 days, 17:53, 2 users, load average: 0.00, 0.01, 0.05
Tasks: 116 total, 1 running, 115 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.3 us, 0.0 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31567 vdscm 20 0 3599936 999640 10872 S 0.0 25.9 3:07.55 java
983 syslog 20 0 260848 30372 452 S 0.0 0.8 1:42.34 rsyslogd
367 root 20 0 1204736 7420 1616 S 0.0 0.2 93:47.58 s3fs
31367 root 20 0 843872 4288 1124 S 0.0 0.1 8:07.27 controller
10820 root 20 0 105628 4244 3264 S 0.0 0.1 0:00.00 sshd
11280 root 20 0 105628 4224 3244 S 0.0 0.1 0:00.00 sshd
10875 vdscm 20 0 21220 3648 1748 S 0.0 0.1 0:00.11 bash
11367 vdscm 20 0 21220 3536 1668 S 0.0 0.1 0:00.03 bash
590 root 20 0 10224 2908 620 S 0.0 0.1 0:04.48 dhclient
31453 root 20 0 12608 2476 2056 S 0.0 0.1 21:57.17 cdm
1 root 20 0 33508 2040 696 S 0.0 0.1 0:15.01 init
10874 vdscm 20 0 105628 1888 904 S 0.0 0.0 0:00.09 sshd
11366 vdscm 20 0 105628 1720 752 S 0.0 0.0 0:00.00 sshd
11269 vdscm 20 0 23684 1620 1096 R 0.0 0.0 0:00.04 top
996 root 20 0 43452 1184 808 S 0.0 0.0 0:00.85 systemd-logind
...
Aucun commentaire:
Enregistrer un commentaire