First of all, how do i run this command from AWS EMR console?
Do I have to create a jar step with this location: /home/hadoop/lib/emr-s3distcp-1.0.jar ?
Second, after I run this step, my input files from S3 should be on the HDFS.
Howerver, When i want to run a MapReduce task, I can't(!) choose an input directory which is not from S3.
So what is the use cases of this command?
How do I really use HDFS instead of S3 in Amazon EMR?
Aucun commentaire:
Enregistrer un commentaire