jeudi 24 septembre 2015

AWS multipart upload from inputStream has bad offfset

I am using the Java Amazon AWS SDK to perform some multipart uploads from HDFS to S3. My code is the following:

for (int i = startingPart; currentFilePosition < contentLength ; i++)
        {
            FSDataInputStream inputStream = fs.open(new Path(hdfsFullPath));

            // Last part can be less than 5 MB. Adjust part size.
            partSize = Math.min(partSize, (contentLength - currentFilePosition));

            // Create request to upload a part.
            UploadPartRequest uploadRequest = new UploadPartRequest()
                    .withBucketName(bucket).withKey(s3Name)
                    .withUploadId(currentUploadId)
                    .withPartNumber(i)
                    .withFileOffset(currentFilePosition)
                    .withInputStream(inputStream)
                    .withPartSize(partSize);

            // Upload part and add response to our list.
            partETags.add(s3Client.uploadPart(uploadRequest).getPartETag());
            currentFilePosition += partSize;

            inputStream.close();

            lastFilePosition = currentFilePosition;
        }

However, the uploaded file is not the same as the original one. More specifically, I am testing on a test file, which has about 20 MB. The parts I upload are 5 MB each. At the end of each 5MB part, I see some extra text, which is always 96 characters long.

Even stranger, if I add something stupid to .withFileOffset(), for example,

.withFileOffset(currentFilePosition-34)

the error stays the same. I was expecting to get other characters, but I am getting the EXACT 96 extra characters as if I hadn't modified the line.

Any ideas what might be wrong?

Thanks, Serban

Aucun commentaire:

Enregistrer un commentaire