vendredi 24 juillet 2015

Spark Application Log Solution

I have a Spark (1.3.1) application with Python, running on YARN, EMR clusters and using the S3 like storage.
My application transform a CSV file in a RDD and performs regex transformations (ETL). We need create a level line log solution for error capture and identification of source problem (record and column). I don't have any idea about this.


def lineMap(column):
   return (
        column[1],
        column[2]
   )

fileContent = sc.textFile(s3FilePathInput)

RDDcru = (fileContent
                .map(lambda x : x.split(";"))
                .map(lineMap)
            )


I've trying create a try - catch block on lineMap function, using logging default python lib. I tried also, create a new SparkContext writting a file log on S3 (on catch block)

All failed ...

Thank's and sorry for my bad english :)




Aucun commentaire:

Enregistrer un commentaire