dimanche 3 mai 2015

attach additional files to python streamming job in hadoop

My question is straight forward.

So I want to run a mapreduce job on ec2.

I have a mapper.py reducer.py and helper.py and a package.

Basically my mapper.py will call helper.py and helper.py will import from for the modules in the package(which are a bunch of python files).

How should my command be when I run the hadoop job?

should I use -file, or -cache? I tried both, but they dont work




Aucun commentaire:

Enregistrer un commentaire