mardi 13 octobre 2015

Connection to instance Ec2 ssh error (spark cluster)

I have a spark (1.3.1) cluster on ec2 (region: us-east). Since the past two months I haven't had any problem with it, but since yesterday I can't ssh one slave (or I can but it takes a really really really long time). My jobs don't fail, they are just waiting and waiting because they are trying to connect to one slave and the slave doesn't answer.

I tried to create a new spark with spark-ec2, but I got this error :

Warning: SSH connection error. (This could be temporary.)
Host: 54.90.24.42
SSH return code: 255
SSH output: ssh: connect to host 54.90.24.42 port 22: Connection refused

.

Warning: SSH connection error. (This could be temporary.)
Host: XX.XXX.XXX.XX
SSH return code: 255
SSH output: ssh: connect to host XX.XXX.XXX.XX port 22: Connection refused

As I am writing a colleague report a similar problem on another cluster :

org.apache.spark.shuffle.FetchFailedException: Failed to connect to ip-10-231-187-233.ec2.internal/10.231.187.233:54801

All those problems seem to be linked.

Does someone have an idea what could it be?




Aucun commentaire:

Enregistrer un commentaire