I have a Spark (1.3.1) application with Python, running on YARN, EMR clusters and using the S3 like storage.
My application transform a CSV file in a RDD and performs regex transformations (ETL). We need create a level line log solution for error capture and identification of source problem (record and column). I don't have any idea about this.
def lineMap(column):
return (
column[1],
column[2]
)
fileContent = sc.textFile(s3FilePathInput)
RDDcru = (fileContent
.map(lambda x : x.split(";"))
.map(lineMap)
)
I've trying create a try - catch block on lineMap function, using logging default python lib. I tried also, create a new SparkContext writting a file log on S3 (on catch block)
All failed ...
Thank's and sorry for my bad english :)
Aucun commentaire:
Enregistrer un commentaire