samedi 22 août 2015

Iterative MapReduce jobs with Amazon ElasticMapReduce(EMR)

I have a large text file. After each map reduce operation i get only half of the raws.

My question is, how can i do this with Amazon EMR? I need to have:
1)Compute a job on a text file.
2)concatenate all results into a single file and obtain only half of the raws.
3)Run the job again on the new file.

I need to have about logn operations like that.

Can I do such a thing with Amazon EMR? It seems to be pretty limited for me at the moment.




Aucun commentaire:

Enregistrer un commentaire