samedi 25 avril 2015

How to start an elastic mapreduce cluster inside a created vpc (using boto)

I have created a vpc, and then I created 4 subnets, each one in a different availability zone:

10.0.1.0 - us-east-1b
10.0.2.0 - us-east-1d
10.0.3.0 - us-east-1e
10.0.4.0 - us-east-1b

Now I want to launch a mapreduce job inside that created vpc.

But I dont find which parameters can we use to say that this mapreduce job needs to be launch inside one of the subnets that I created.

Do you know how can I get the available subnets in my created vpc and then which parameter can I use in my function below to say that this mapreduce job needs to be launch in one of that available subnets?

This is my function to start a mapreduce job:

def mapreducejob(data):     

    print "Connecting to EMR"
    conn = boto.emr.connect_to_region('us-east-1')

    print "Creating Streaming step"

    step = StreamingStep(name='My wordcount example',
    mapper=data['mapper'],
    reducer=data['reducer'],
    input=data['datafile'],
    output='s3n://folder/uploads/')

    print "Creating job flow"
    jobid = conn.run_jobflow(name='My jobflow', 
    log_uri='s3://folder/uploads/erm_logs/',
    steps=[step],
    num_instances = 1, 
    )




Aucun commentaire:

Enregistrer un commentaire