jeudi 28 mai 2015

Linux: huge files vs huge number of files

I am writing software in C, on Linux running on AWS, that has to handle 240 terabytes of data, in 72 million files.

The data will be spread across 24 or more nodes, so there will only be 10 terabytes on each node, and 3 million files per node.

Because I have to append data to each of these three million files every 60 seconds, the easiest and fastest thing to do would to be able to keep each of these files open at one time.

I can't store the data in a database, because the performance in reading/writing the data will be too slow. I need to be able to read the data back very quickly.

My questions:

1) is it even possible to keep open 3 million files

2) if it is possible, how much memory would it consume

3) if it is possible, would performance be terrible

4) if it is not possible, I will need to combine all of the individual files into a couple of dozen large files. Is there a maximum file size in Linux?

5) if it is not possible, what technique should I use to append data every 60 seconds, and keep track of it?




Aucun commentaire:

Enregistrer un commentaire