dimanche 19 avril 2015

Apache Nutch 1.9 in local Eclipse to run on Amazon EMR remotely

I am on Windows 8 32 bit, running Eclipse Juno.


I have just started working on Amazon EMR. So far, I am being able to connect to EMR remotely from my local using SSH and inside Eclipse. I could run my custom JAR on EMR remotely by creating AWS project in Eclipse and using th Custom JAR execution on EMR commands.


I am now trying to run Apache Nutch 1.9 from inside my Eclipse. I did Ant build to create Nutch Eclipse project and I am being to export inside Eclipse workspace successfully. Now, when I am running the Injector I am getting the following error:



Injector: starting at 2015-04-20 00:56:08
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: java.io.IOException: Failed to set permissions of path: \tmp\hadoop-Kajari_G\mapred\staging\Kajari_G881485826\.staging to 0700


I found out this is sue to permission issues of Hadoop. After lots of search online I realized this is a common issue in Windows. I ran it via Cygwin as Admin and still couldn't fix it.


So, now I want to still run the Injector code, but I want to run it on my remote EMR cluster, instead of in my local.


Can you please guide me how to tell my Apache Nutch Eclipse project to run on Amazon EMR and not locally? I don't want to create a JAR and run it. I want to run it as an usual Run As --> in Eclipse.


Is this possible to do at all? I did search this online, but couldn't find any working solution.


Thanks!





Aucun commentaire:

Enregistrer un commentaire