lundi 29 juin 2015

Loading JSON data to AWS Redshift results in NULL values

I am trying to perform a load/copy operation to import data from JSON files in an S3 bucket directly to Redshift. The COPY operation succeeds, and after the COPY, the table has the correct number of rows/records, but every record is NULL !

It takes the expected amount of time for the load, the COPY command returns OK, the Redshift console reports successful and no errors... but if I perform a simple query from the table, it returns only NULL values.

The JSON is very simple + flat, and formatted correctly (according to examples I found here: http://ift.tt/1U2hyaN)

Basically, it is one row per line, formatted like:

{ "col1": "val1", "col2": "val2", ... }
{ "col1": "val1", "col2": "val2", ... }
{ "col1": "val1", "col2": "val2", ... }

I have tried things like rewriting the schema based on values and data types found in the JSON objects and also copying from uncompressed files. I thought perhaps the JSON was not being parsed correctly upon load, but it should presumably raise an error if the objects cannot be parsed.

My COPY command looks like this:

copy events from 's3://mybucket/json/prefix' 
with credentials 'aws_access_key_id=xxx;aws_secret_access_key=xxx'
json 'auto' gzip;

Any guidance would be appreciated! Thanks.




Aucun commentaire:

Enregistrer un commentaire