I am streaming some data from s3 and processing it on an EC2 instance. The download speeds I measured from nload is averaging out to 450-500 Mbit/sec. I am trying to benchmark my solution and I am not sure whether the bottleneck is streaming file or reading from the stream. The objects in s3 are in gzip format. Here is the code to read the
GZIPInputStream gzipInputStream = new GZIPInputStream(s3Object.getObjectContent());
BufferedReader reader = new BufferedReader(new InputStreamReader(gzipInputStream));
String line;
while ((line = reader.readLine()) != null) {
//Process
}
I profiled the application with VisualVm and here is what I get for overall execution time of 701,890ms
java.io.BufferedReader.readline() 561,825
SelfTime 552,825
java.io.BufferedReader.readline() 8,775
com.amazonaws.services.s3.AmazonS3Client.getObject() 12,485
Rest of the time (701,890 - 561,825 - 12,485) is taken in objects creation and other processing etc.
Now this SelfTime doesn't really tells me whether its a gzip decompression time or download time or what else.
Does anyone has any benchmarking results for processing data from S3? Can it be that streaming data takes more time than just download results and process it from disk.
Aucun commentaire:
Enregistrer un commentaire