mercredi 29 avril 2015

About EC2 Instance Memory leak, JAVA vs EC2

Today I've got an alert message because one of our EC2 instances exceed 70% limitation of its memory usage.
Its server type is c3.large and it has 3.75G of memory.
The main running application of this instance is a tomcat server and -Xmx2g option was given when it started.
At the point of failure, memory usage of the java process was 1.48g and there was no any memory leak on our JVM monitoring system.


Let's see the running information of normal instance first.

Result of landscape :




vdscm@:~$ landscape-sysinfo System load: 0.0 Processes: 113 Usage of /: 1.9% of 98.30GB Users logged in: 1 Memory usage: 47% IP address for eth0: XX.XX.XX.XX Swap usage: 0%
It uses 47% of memory.

Result of top : (sorted by %MEM desc)




top - 04:06:30 up 155 days, 21:37, 1 user, load average: 0.00, 0.01, 0.05 Tasks: 113 total, 1 running, 112 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.2 us, 0.0 sy, 0.0 ni, 99.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 3854812 total, 3691880 used, 162932 free, 154504 buffers KiB Swap: 0 total, 0 used, 0 free. 1702992 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3707 vdscm 20 0 3621092 1.204g 16140 S 0.0 32.7 123:55.73 java 5197 syslog 20 0 256616 30612 892 S 0.0 0.8 1:33.79 rsyslogd 8973 root 20 0 1065348 8204 2304 S 0.0 0.2 4:04.69 s3fs 22462 root 20 0 105628 4248 3268 S 0.0 0.1 0:00.01 sshd 22549 vdscm 20 0 21200 3660 1748 S 0.0 0.1 0:00.07 bash 5276 root 20 0 172124 3144 2428 S 0.0 0.1 7:42.74 controller 588 root 20 0 10224 2916 624 S 0.0 0.1 0:04.53 dhclient 1 root 20 0 33496 2564 1212 S 0.0 0.1 0:15.51 init ... 3 root 20 0 0 0 0 S 0.0 0.0 0:01.36 ksoftirqd/0 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H 7 root 20 0 0 0 0 S 0.0 0.0 12:15.05 rcu_sched 8 root 20 0 0 0 0 S 0.0 0.0 21:04.46 rcuos/0 9 root 20 0 0 0 0 S 0.0 0.0 13:23.79 rcuos/1 10 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcuos/2 11 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcuos/3
Java process uses 32.7%. Total memory usage of all process is about 35%.
All process uses 35% and landscape says 47%. It's quite reasonable.
Next, Let's see the running information of the instance of problem.

Result of landscape :




vdscm@:~$ landscape-sysinfo System load: 0.0 Processes: 117 Usage of /: 1.7% of 98.30GB Users logged in: 1 Memory usage: 72% IP address for eth0: XX.XX.XX.XX Swap usage: 0%
It uses 72% of memory.

Result of top : (Sorted by %MEM desc)




top - 02:19:10 up 155 days, 19:11, 2 users, load average: 0.02, 0.02, 0.05 Tasks: 117 total, 1 running, 116 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.3 us, 0.0 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 3854812 total, 3689104 used, 165708 free, 155256 buffers KiB Swap: 0 total, 0 used, 0 free. 727764 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26472 vdscm 20 0 3618176 1.480g 14536 S 0.3 40.3 160:22.85 java 983 syslog 20 0 260848 30372 452 S 0.0 0.8 1:41.61 rsyslogd 367 root 20 0 1205188 6000 1616 S 0.0 0.2 93:47.09 s3fs 24356 root 20 0 105628 4248 3264 S 0.0 0.1 0:00.00 sshd 20759 root 20 0 105628 4228 3244 S 0.0 0.1 0:00.00 sshd 24412 vdscm 20 0 21232 3696 1748 S 0.0 0.1 0:00.04 bash 20857 vdscm 20 0 21224 3684 1748 S 0.0 0.1 0:00.07 bash 590 root 20 0 10224 2908 620 S 0.0 0.1 0:04.45 dhclient ... 1198 root 20 0 12788 152 0 S 0.0 0.0 0:00.00 getty 2261 root 20 0 15252 152 0 S 0.0 0.0 2:09.86 nimbus 1128 root 20 0 4368 148 0 S 0.0 0.0 0:00.00 acpid 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:35.27 ksoftirqd/0 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H 7 root 20 0 0 0 0 S 0.0 0.0 36:53.03 rcu_sched
memory usage = used - (buffers + cached Mem)
2,806,084(0.7279%) = 3,689,104 - (155,256 + 727,764) ==> 72%
Memory usage doesn't contain buffered and cached memory.
Java process uses 40.3%, rsyslogd uses 0.8 and almost other processes use less or equal 0.1%. Its summary is about 43%. It's only 8% much more than normal instance but landscape says 72%. Where the 29% of memory was gone ?
vdscm@:~$ cat /proc/meminfo MemTotal: 3854812 kB MemFree: 165828 kB Buffers: 155256 kB Cached: 727768 kB SwapCached: 0 kB Active: 2099668 kB Inactive: 376808 kB Active(anon): 1593532 kB Inactive(anon): 304 kB Active(file): 506136 kB Inactive(file): 376504 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 76 kB Writeback: 0 kB AnonPages: 1593452 kB Mapped: 22940 kB Shmem: 384 kB Slab: 282956 kB SReclaimable: 164592 kB SUnreclaim: 118364 kB KernelStack: 1712 kB PageTables: 6736 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 1927404 kB Committed_AS: 2718656 kB VmallocTotal: 34359738367 kB VmallocUsed: 8288 kB VmallocChunk: 34359713979 kB HardwareCorrupted: 0 kB AnonHugePages: 1546240 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 28672 kB DirectMap2M: 3903488 kB
slab only uses 0.2g of memory.
We killed the java process and restarted it. These are only things we did. After that, 72% memory usage went down to 30%. It seems that the java process had been occupying more 30% of system memory in addition to its %MEM usage shown by top command. How is it possible ?
After Restart Java,

Result of landscape :




vdscm@:~$ landscape-sysinfo System load: 0.0 Processes: 113 Usage of /: 1.7% of 98.30GB Users logged in: 1 Memory usage: 30% IP address for eth0: XX.XX.XX.XX Swap usage: 0%

Result of top : (Sorted by %MEM desc)




top - 01:01:25 up 156 days, 17:53, 2 users, load average: 0.00, 0.01, 0.05 Tasks: 116 total, 1 running, 115 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.3 us, 0.0 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 31567 vdscm 20 0 3599936 999640 10872 S 0.0 25.9 3:07.55 java 983 syslog 20 0 260848 30372 452 S 0.0 0.8 1:42.34 rsyslogd 367 root 20 0 1204736 7420 1616 S 0.0 0.2 93:47.58 s3fs 31367 root 20 0 843872 4288 1124 S 0.0 0.1 8:07.27 controller 10820 root 20 0 105628 4244 3264 S 0.0 0.1 0:00.00 sshd 11280 root 20 0 105628 4224 3244 S 0.0 0.1 0:00.00 sshd 10875 vdscm 20 0 21220 3648 1748 S 0.0 0.1 0:00.11 bash 11367 vdscm 20 0 21220 3536 1668 S 0.0 0.1 0:00.03 bash 590 root 20 0 10224 2908 620 S 0.0 0.1 0:04.48 dhclient 31453 root 20 0 12608 2476 2056 S 0.0 0.1 21:57.17 cdm 1 root 20 0 33508 2040 696 S 0.0 0.1 0:15.01 init 10874 vdscm 20 0 105628 1888 904 S 0.0 0.0 0:00.09 sshd 11366 vdscm 20 0 105628 1720 752 S 0.0 0.0 0:00.00 sshd 11269 vdscm 20 0 23684 1620 1096 R 0.0 0.0 0:00.04 top 996 root 20 0 43452 1184 808 S 0.0 0.0 0:00.85 systemd-logind ...



Aucun commentaire:

Enregistrer un commentaire