lundi 27 juillet 2015

Can AWS ElasticMapReduce take S3 folders as Input?

i'm currently trying to run a mapreduce job where the inputs are scattered in different folders underneath catch-all bucket in S3.

My original approach was to create a cluster for each of the input files and write separate outputs for each of them. However, that would require spinning up more than 200+ clusters and I don't think thats the most efficient way.

I was wondering if I could instead of specifying a file as input into EMR, specify a folder whose subfolders contain all of the input files.

Thanks!




Aucun commentaire:

Enregistrer un commentaire