mardi 3 mars 2015

How to move lots of data from AWS Oracle RDS instance to S3?

I have a huge amount of data in one table (~7 billion rows) in an AWS Oracle RDS instance. The end result is I want that table as pipe-separated values stored in S3 so that I can read it into EMR. This is basically a one-time thing so I need it to work accurately and without having to re-run the whole upload because something timed out; I don't really care how it works or how difficult/annoying it is to set up. I have root access on the Oracle box. I looked at Data Pipelines but it appears they only support MySQL and I must have it work with Oracle. Also, I do not have enough hard drive space to dump the whole table to a CSV on the Oracle instance and then upload it. How can I get this done?





Aucun commentaire:

Enregistrer un commentaire