lundi 24 août 2015

Understadning S3DISTCP command in Amazon ElasticMapReduce (EMR)

First of all, how do i run this command from AWS EMR console?
Do I have to create a jar step with this location: /home/hadoop/lib/emr-s3distcp-1.0.jar ?

Second, after I run this step, my input files from S3 should be on the HDFS.
Howerver, When i want to run a MapReduce task, I can't(!) choose an input directory which is not from S3.

So what is the use cases of this command?
How do I really use HDFS instead of S3 in Amazon EMR?




Aucun commentaire:

Enregistrer un commentaire