lundi 28 septembre 2015

AWS Data Pipeline doesn't use DynamoDB's indexes

I have a data pipeline running every hour, running a HiveCopyActivity to select the past hour's data from DynamoDB into S3. The table I'm selecting from has a hash key VisitorID and range key Timestamp, around 4 million rows and is 7.5GB in size. To reduce the time taken for the job, I created a global secondary index on Timestamp but after monitoring Cloudwatch, it seems that HiveCopyActivity doesn't use the index. I've read through all the relevant AWS documentation but can't find any mention of indexes.

Is there a way to force data pipeline to use an index while filtering like this? If not, are there any alternative applications which could transfer hourly (or any other period) data from DynamoDB to S3?




Aucun commentaire:

Enregistrer un commentaire