mercredi 16 septembre 2015

aerospike cluster crashed after index creation

We have a cluster at AWS of 4 machines t2micro (1cpu 1gb ram 15gb ssd) and we were testing aerospike. We used the aws marketplace AMI to install aerospike v3 community edition, and configured only the aerospike.conf file to have a namespace on the disk.

We had one namespace with two sets, totaling 18M documents, 2gb ram occupied and aprox 40gb of disk space occupied. After the creation of an index in a 12M records set the system crashed.

Some info:

aql on the instance:

[ec2-user@ip-172-XX-XX-XXX ~]$ aql
2015-09-16 18:44:37 WARN AEROSPIKE_ERR_CLIENT Socket write error: 111
Error -1: Failed to seed cluster*

Tail of the log: (it keeps adding only lines repeated)

Sep 16 2015 19:08:26 GMT: INFO (drv_ssd): (drv_ssd.c::2406) device /opt/aerospike/data/bar.dat: used 6980578688, contig-free 5382M (5382 wblocks), swb-free 0, n-w 0, w-q 0 w-tot 23 (0.0/s), defrag-q 0 defrag-tot 128 (0.0/s)
Sep 16 2015 19:08:46 GMT: INFO (drv_ssd): (drv_ssd.c::2406) device /opt/aerospike/data/bar.dat: used 6980578688, contig-free 5382M (5382 wblocks), swb-free 0, n-w 0, w-q 0 w-tot 23 (0.0/s), defrag-q 0 defrag-tot 128 (0.0/s)
Sep 16 2015 19:09:06 GMT: INFO (drv_ssd): (drv_ssd.c::2406) device /opt/aerospike/data/bar.dat: used 6980578688, contig-free 5382M (5382 wblocks), swb-free 0, n-w 0, w-q 0 w-tot 23 (0.0/s), defrag-q 0 defrag-tot 128 (0.0/s)
Sep 16 2015 19:09:26 GMT: INFO (drv_ssd): (drv_ssd.c::2406) device /opt/aerospike/data/bar.dat: used 6980578688, contig-free 5382M (5382 wblocks), swb-free 0, n-w 0, w-q 0 w-tot 23 (0.0/s), defrag-q 0 defrag-tot 128 (0.0/s)

asmonitor:

$ asmonitor -h 54.XX.XXX.XX
request to 54.XX.XXX.XX : 3000 returned error
skipping 54.XX.XXX.XX:3000
***failed to connect to any hosts

asadm:

$ asadm -h 54.XXX.XXX.XX -p 3000
Aerospike Interactive Shell, version 0.0.10-6-gdd6fb61
Found 1 nodes
Offline: 54.207.67.238:3000

We tried restarting the instances, one of them is back but working as a standalone node, the rest are in the described state. The instances are working, but the aerospike service is not.




1 commentaire: