mardi 21 avril 2015

AWS-EMR: java.io.IOException: Stream closed

The map stage of my emr streaming job-flow fails with java.io.IOException: Stream closed.

I've simplified the mapping code as far as it will go to still be compatible with the reduce step, and I'm still getting the Stream Closed issue. All mapping tasks fail, the job-flow doesn't get to the reduce steps.

There is no break statement involved, and the pipeline test completes without any issues: I don't know what I'm overlooking here.

#!/usr/bin/env python


import re
import csv
import sys
import json
import datetime

header = [ 'x' + str(i) for i in range(11) ]

for line in csv.reader( sys.stdin , delimiter = '|' ):

    row  = { header[i]:line[i].replace( '\t' , '' ) for i in range(len(header)) } # sidestep any output issues    
    print row["x0"] + '\t' + json.dumps( row , sort_keys=True , separators=(',',': ') ) 




Aucun commentaire:

Enregistrer un commentaire