jeudi 30 avril 2015

EMR issues with reducer.py

I'm running AWS and trying to run a simulation on the EMR setting. I know my mapper.py file is correct but I can't seem to figure out why my reducer.py file isn't correctly working.. The idea was to sort a movies.cvs file that holds data from IMDB and find the worst 20 movies from a voting and rating perspective. I've been trying to figure out why my code isn't working and would love some help if possible. All logs show that my mapper.py file is running correctly but not the reducer.py. I have included the code for my reducer.py. Thank you for the help.

reducer.py

#! /usr/bin/env python

import sys
from operator import itemgetter

arraysize = 20
q = 0

for line in sys.stdin:
     line = line.strip()
     title,votes,rating = line.split("\t")

try: 
        results = (title, int(votes), rating)
        results_printed.append(results)
        results_printed = [('x', int(0), 'x')]
        for q in range (0,arraysize):
            print(results_printed[q])
            q = q + 1
except ValueError:  pass
sorted(results_printed, key=itemgetter('votes'))




Aucun commentaire:

Enregistrer un commentaire