lundi 1 juin 2015

Best Amazon technologies for ETL application

My system will be as follows:

There will be a Java service in Mule that will poll all the time different mysql datasources for new data. The objective is to make that data available as soon as possible to our web clients. We are currently using a Spark cluster for making that data available (that data should be "queryable").

My investigations ended up by using Java polling service-->Amazon Kinesis-->RedShift<--Spark cluster.

Is this the best Amazon solution? what about Amazon SQS,S3,EMR etc?




Aucun commentaire:

Enregistrer un commentaire