samedi 22 août 2015

how to kill long running steps on my aws emr cluster using java?

I have been facing an issue with my aws EMR cluster where some jobs in a step get stuck. I know the best solution would be to actually solve the stuck job issue. In the mean time is there a way I can do the following?

  1. List the current step on the cluster.
  2. List the jobs in the step with status RUNNING.
  3. Kill one of those jobs randomly and wait for 3600 seconds to check the running jobs again. At the same maintain the number of jobs with PENDING status.
  4. If the number of jobs with status PENDING still has not decreased then kill another job.

I have been facing this issue for the last 3 days and have been manually killing the jobs by logging into the cluster. Help would be greatly appreciated! Thanks in advance!




Aucun commentaire:

Enregistrer un commentaire