Abstract
I have my processing done using two console applications (Stage-estimate, Stage-step), each application processes files on disk, files are organized into folders. Each folder represents one step of processing which is considered completed when all files are estimated.
As an example lets consider that we are at Step 0 and the folder 0 contains the following files:
Folder 0 contains:
000.data
001.data
002.data
...
999.data
We have the data files, now we need to estimate them, we run Stage-estimate application 1000 times that result with the following directory structure:
Folder 0 contains:
000.data
000.estimate
001.data
001.estimate
002.data
002.estimate
...
999.data
999.estimate
Step 0 is now complete we have all the data/estimate pairs. In order to switch to Step 1 we run Stage-step application 1000 times on every data/estimate pair files and it results with new set of 1000 *.data files into folder 1. After Stage-step application completed, we have a folder 1 with the same structure as we had on Step 0:
Folder 1 contains:
000.data
001.data
002.data
...
999.data
From now on the process repeats until it is canceled.
The Problem
Application Stage-estimate does some pretty heavy calculations it consumes 99% of overall processing power compared to Stage-step application.
I was planing to use AWS in order to speed the things up. I don't want to start inventing special batch files that would call my applications the way described above, I know that there is special software that does some high-lifting at scheduling processes and other cluster related stuff.
Question
I was never dealing with cluster computing, off top of my head I see that application is parallelized really nice and it fits into AWS infrastructure. On the other hand I'm complete newbie in the world of cluster-computing and I don't know where to start from. I was dealing with AWS however nothing related to cluster computing, I don't know how to organize the flow I've described and how to make it run efficiently, so I would appreciate if you point me in right direction or provide some links on demos / best practices.
Thank you in advance!
Aucun commentaire:
Enregistrer un commentaire