lundi 12 octobre 2015

K-Means Code throwing Spark Exceptions if Input data size increases Gb's in AWS Amazon Cluster

I tried to run K-Means Code in AWS EMR Cluster (1 Master, 6 Core m3.xlarge). http://ift.tt/1LJIMBB

I am facing Spark Exceptions ("timeout, Recepient Termination and Java heap space"). All these errors are not part of the kmeans.py code as it is the inbuilt code that i am using while running the code. "K-Means is running fine as long as the data is upto 3Gb". But once i give the input file of higher size > 3Gb it gives me Spark Exceptions.

Command used to run code:

./bin/spark-submit examples/src/main/python/mllib/kmeans.py k > output.txt

Any ideas on what is happening?

Regards -Ashwin




Aucun commentaire:

Enregistrer un commentaire