amazon web services: boto: checking for correct hadoop parameters

jeudi 16 avril 2015

boto: checking for correct hadoop parameters

EMR memory errors & job fails occur on large runs, not on smaller test runs. To same time and expense, I check job logs to confirm the expected parameters were handed over from boto to hadoop in


 http://s3job_flow_bucket/j_123ABC/jobs/job_..._conf.xml

Question 1: Is this the correct place to look for the purpose above?

What I observed is that some of my bootstrap action parameters are not implemented by hadoop. Also: adding a wrong/bad parameter doesn't generate any warning in the logs (that I can find).

Question 2: If I use the following bootstrap action parameters, how can I confirm that the hadoop call actually sees the request?

At this point I have to wait over an hour for the memory error to occur. There has to be a more efficient way to debug such memory issues.


 from boto.emr.bootstrap_action import BootstrapAction

 params = [ '-m' , 'mapred.child.java.opts=-Xmx2g' ,
            '-m' , 'mapred.cluster.reduce.memory.mb=2000' ,
            '-m' , 'mapred.job.reduce.memory.mb=2000'       ]

 config_bootstrapper = BootstrapAction( name="Bootstrap name" ,
    path   ='s3://elasticmapreduce/bootstrap-actions/configure-hadoop',
    bootstrap_action_args = params)

 jobid = conn.run_jobflow(name='The Debug Jobflow',
                #api_params=api_params ,
                #ec2_keyname="thekey",
                bootstrap_actions=[config_bootstrapper],
                ami_version="latest",
                log_uri='s3://thebucket/jobflowlogs',
                master_instance_type='m1.medium',
                slave_instance_type='m1.medium',
                num_instances=4,
                steps=[step],
                enable_debugging=True,
                keep_alive=False)

amazon web services

jeudi 16 avril 2015

boto: checking for correct hadoop parameters

Aucun commentaire:

Enregistrer un commentaire