I have set up an AWS EMR cluster with Spark 1.4. I have set up one master node and two slave nodes. Looking at the load distribution, it seems like one slave is always maxed out while the other one is not doing much. Has anyone faced similar issue? What might be causing this?
Note: I am trying to run Spark MLLib for generating recommendation. So it pulls data from Elasticsearch and does recommendation computation using Spark. One slave is always maxed out on Network usage while the other seems to be using minimal resource and almost idle. The master is using 10 GB of network while each slave is using 1 GB.
Aucun commentaire:
Enregistrer un commentaire