The map stage of my emr streaming job-flow fails with java.io.IOException: Stream closed
.
I've simplified the mapping code as far as it will go to still be compatible with the reduce step, and I'm still getting the Stream Closed issue. All mapping tasks fail, the job-flow doesn't get to the reduce steps.
There is no break
statement involved, and the pipeline test completes without any issues: I don't know what I'm overlooking here.
#!/usr/bin/env python
import re
import csv
import sys
import json
import datetime
header = [ 'x' + str(i) for i in range(11) ]
for line in csv.reader( sys.stdin , delimiter = '|' ):
row = { header[i]:line[i].replace( '\t' , '' ) for i in range(len(header)) } # sidestep any output issues
print row["x0"] + '\t' + json.dumps( row , sort_keys=True , separators=(',',': ') )
Aucun commentaire:
Enregistrer un commentaire