mercredi 2 septembre 2015

Performance with S3 streaming in java

I am streaming some data from s3 and processing it on an EC2 instance. The download speeds I measured from nload is averaging out to 450-500 Mbit/sec. I am trying to benchmark my solution and I am not sure whether the bottleneck is streaming file or reading from the stream. The objects in s3 are in gzip format. Here is the code to read the

GZIPInputStream gzipInputStream = new GZIPInputStream(s3Object.getObjectContent());
BufferedReader reader = new BufferedReader(new InputStreamReader(gzipInputStream));
String line;
while ((line = reader.readLine()) != null) {
//Process
}

I profiled the application with VisualVm and here is what I get for overall execution time of 701,890ms

java.io.BufferedReader.readline()                            561,825
   SelfTime                                                  552,825
   java.io.BufferedReader.readline()                           8,775

com.amazonaws.services.s3.AmazonS3Client.getObject()          12,485

Rest of the time (701,890 - 561,825 - 12,485) is taken in objects creation and other processing etc.

Now this SelfTime doesn't really tells me whether its a gzip decompression time or download time or what else.

Does anyone has any benchmarking results for processing data from S3? Can it be that streaming data takes more time than just download results and process it from disk.




Aucun commentaire:

Enregistrer un commentaire