dimanche 23 août 2015

spark - set null when column not exist in dataframe

I'm loading many versions of files to spark dataframe. some of the files holds columns A,B and some A,B,C or A,C..

if run this command

from pyspark.sql import SQLContext

sqlContext = SQLContext(sc)

df = sqlContext.sql("SELECT A,B,C FROM table")

after loading several i can get error "column not exist" i loaded only files that are not holding column C. how can set this value to null instead of getting error?

thank!




Aucun commentaire:

Enregistrer un commentaire