I have a huge amount of data in one table (~7 billion rows) in an AWS Oracle RDS instance. The end result is I want that table as pipe-separated values stored in S3 so that I can read it into EMR. This is basically a one-time thing so I need it to work accurately and without having to re-run the whole upload because something timed out; I don't really care how it works or how difficult/annoying it is to set up. I have root access on the Oracle box. I looked at Data Pipelines but it appears they only support MySQL and I must have it work with Oracle. Also, I do not have enough hard drive space to dump the whole table to a CSV on the Oracle instance and then upload it. How can I get this done?
Aucun commentaire:
Enregistrer un commentaire