mardi 1 septembre 2015

File Does Not Exist in Amazon EMR even though it tries to upload it

I used Amazon EMR to create an emr-4.0.0 cluster:

However, whenever I try to submit a spark application on it, it fails and gives the following error:

File does not exist: hdfs://ip-xx-xx-xxx-xx.ec2.internal:8020/user/hadoop/.sparkStaging/application_1441035668468_0001/spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar

This is even though earlier in the log it uploads this exact same file without issuing any error message:

2015-08-31 15:43:29,070 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Uploading resource file:/usr/lib/spark/lib/spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar -> hdfs://ip-xx-xx-xxx-xx.ec2.internal:8020/user/hadoop/.sparkStaging/application_1441035668468_0001/spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar

(I've verified that the source file indeed exists at /usr/lib/spark/lib/spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar on the master machine).

The command I use is:

spark-submit --deploy-mode cluster --master yarn-cluster --class com.sundaysky.ads.spark.cluster.TrackingLogsAnalysis /tmp/oz/AdsTests-1.0-SNAPSHOT.jar

BTW, I've noticed that this uses Java 1.7 (even though it's the newest EMR version by Amazon), but I don't think that is relevant.

Do you have any ideas what could be the issue, or alternatively, how to debug the problem? I've tried many way of adding parameters to the spark-submit command to get TRACE level messages from yarn-client, but without success.

Thanks, Oz




Aucun commentaire:

Enregistrer un commentaire