Multipart Uploads

Speeding up uploads

OpEx Sec Rel Perf Cost Sus

If you are looking to upload an object greater than ~100mb in size you should consider using multipart uploads. This will speed up your total time to upload by using multiple threads. You also get the added bennefit that if a part fails in upload, you can just reupload that part and not the entire part again.

Using multipart upload provides the following advantages:

  • Improved throughput - You can upload parts in parallel to improve throughput.
  • Quick recovery from any network issues - Smaller part size minimizes the impact of restarting a failed upload due to a network error.
  • Pause and resume object uploads - You can upload object parts over time. After you initiate a multipart upload, there is no expiry; you must explicitly complete or stop the multipart upload.
  • Begin an upload before you know the final object size - You can upload an object as you are creating it.

There are three steps to a multipart upload,

  • You initiate a request to S3 and get a UUID back
  • You upload the parts
  • You send a complete upload to S3 and S3 reconsructs the object for you

When uploading your file is split into parts, anywhere between w and 10,000, and if you are doing this programatically you need to track the part numbers and the ETag responses from S3 within your application.i The good news is the aws cli automatically splits large files for you. I’ve included information from the aws website below showing you how to upload and tweak how many concurrent connections you are making, so if you have a fast and stable connection you can use it to it’s full potential, equally you could minimize the calls and slow down the amount of the concurrent uploads.

To use a high-level aws s3 command for your multipart upload, run this command:

$ aws s3 cp large_test_file s3://DOC-EXAMPLE-BUCKET/

This example uses the command aws s3 cp, but other aws s3 commands that involve uploading objects into an S3 bucket (for example, aws s3 sync or aws s3 mv) also automatically perform a multipart upload when the object is large.

Objects that are uploaded to Amazon S3 using multipart uploads have a different ETag format than objects that are uploaded using a traditional PUT request. To store the MD5 checksum value of the source file as a reference, upload the file with the checksum value as custom metadata. To add the MD5 checksum value as custom metadata, include the optional parameter –metadata md5=“examplemd5value1234/4Q” in the upload command, similar to the following:

$ aws s3 cp large_test_file s3://DOC-EXAMPLE-BUCKET/ --metadata md5="examplemd5value1234/4Q"

To use more of your host’s bandwidth and resources during the upload, increase the maximum number of concurrent requests set in your AWS CLI configuration. By default, the AWS CLI uses 10 maximum concurrent requests. This command sets the maximum concurrent number of requests to 20:

$ aws configure set default.s3.max_concurrent_requests 20

Upload the file in multiple parts using low-level (aws s3api) commands

Important: Use this aws s3api procedure only when aws s3 commands don’t support a specific upload need, such as when the multipart upload involves multiple servers, a multipart upload is being manually stopped and resumed, or when the aws s3 command doesn’t support a required request parameter. For other multipart uploads, use aws s3 cp or other high-level s3 commands.

  1. Split the file that you want to upload into multiple parts. Tip: If you’re using a Linux operating system, use the split command.

  2. Run this command to initiate a multipart upload and to retrieve the associated upload ID. The command returns a response that contains the UploadID:

aws s3api create-multipart-upload --bucket DOC-EXAMPLE-BUCKET --key large_test_file
  1. Copy the UploadID value as a reference for later steps.

  2. Run this command to upload the first part of the file. Be sure to replace all values with the values for your bucket, file, and multipart upload. The command returns a response that contains an ETag value for the part of the file that you uploaded. For more information on each parameter, see upload-part.

aws s3api upload-part --bucket DOC-EXAMPLE-BUCKET --key large_test_file --part-number 1 --body large_test_file.001 --upload-id exampleTUVGeKAk3Ob7qMynRKqe3ROcavPRwg92eA6JPD4ybIGRxJx9R0VbgkrnOVphZFK59KCYJAO1PXlrBSW7vcH7ANHZwTTf0ovqe6XPYHwsSp7eTRnXB1qjx40Tk --content-md5 exampleaAmjr+4sRXUwf0w==
  1. Copy the ETag value as a reference for later steps.

  2. Repeat steps 4 and 5 for each part of the file. Be sure to increase the part number with each new part that you upload.

  3. After you upload all the file parts, run this command to list the uploaded parts and confirm that the list is complete:

aws s3api list-parts --bucket DOC-EXAMPLE-BUCKET --key large_test_file --upload-id exampleTUVGeKAk3Ob7qMynRKqe3ROcavPRwg92eA6JPD4ybIGRxJx9R0VbgkrnOVphZFK59KCYJAO1PXlrBSW7vcH7ANHZwTTf0ovqe6XPYHwsSp7eTRnXB1qjx40Tk
  1. Compile the ETag values for each file part that you uploaded into a JSON-formatted file that is similar to the following:
{
    "Parts": [{
        "ETag": "example8be9a0268ebfb8b115d4c1fd3",
        "PartNumber":1
    },

    ....

    {
        "ETag": "example246e31ab807da6f62802c1ae8",
        "PartNumber":4
    }]
}
  1. Name the file fileparts.json.

  2. Run this command to complete the multipart upload. Replace the value for –multipart-upload with the path to the JSON-formatted file with ETags that you created.

aws s3api complete-multipart-upload --multipart-upload file://fileparts.json --bucket DOC-EXAMPLE-BUCKET --key large_test_file --upload-id exampleTUVGeKAk3Ob7qMynRKqe3ROcavPRwg92eA6JPD4ybIGRxJx9R0VbgkrnOVphZFK59KCYJAO1PXlrBSW7vcH7ANHZwTTf0ovqe6XPYHwsSp7eTRnXB1qjx40Tk
  1. If the previous command is successful, then you receive a response similar to the following:
{
    "ETag": "\\"exampleae01633ff0af167d925cad279-2\\"",
    "Bucket": "DOC-EXAMPLE-BUCKET",
    "Location": "https://DOC-EXAMPLE-BUCKET.s3.amazonaws.com/large_test_file",
   
    "Key": "large_test_file"
}

Resolve upload failures

If you use the high-level aws s3 commands for a multipart upload and the upload fails (due either to a timeout or a manual cancellation), you must start a new multipart upload. In most cases, the AWS CLI automatically cancels the multipart upload and then removes any multipart files that you created. This process can take several minutes.

If you use aws s3api commands for a multipart upload and the process is interrupted, you must remove incomplete parts of the upload, and then re-upload the parts.

To remove the incomplete parts, use the AbortIncompleteMultipartUpload lifecycle action. Or, use aws s3api commands to remove the incomplete parts by following these steps:

  1. Run this command to list incomplete multipart file uploads. Replace the value for –bucket with the name of your bucket.
aws s3api list-multipart-uploads --bucket DOC-EXAMPLE-BUCKET
  1. The command returns a message with any file parts that weren’t processed, similar to the following:
{
    "Uploads": [
        {
           
    "Initiator": {
                "DisplayName": "multipartmessage",
                "ID": "290xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    "
            },
            "Initiated": "2016-03-31T06:13:15.000Z",
           
    "UploadId": "examplevQpHp7eHc_J5s9U.kzM3GAHeOJh1P8wVTmRqEVojwiwu3wPX6fWYzADNtOHklJI6W6Q9NJUYgjePKCVpbl_rDP6mGIr2AQJNKB_A-",
            "StorageClass": "STANDARD",
           
    "Key": "",
            "Owner": {
                "DisplayName": "multipartmessage",
               
    "ID": "290xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx "
            }
        }
   ]
}
  1. Run this command to remove the incomplete parts:
aws s3api abort-multipart-upload --bucket DOC-EXAMPLE-BUCKET --key large_test_file --upload-id examplevQpHp7eHc_J5s9U.kzM3GAHeOJh1P8wVTmRqEVojwiwu3wPX6fWYzADNtOHklJI6W6Q9NJUYgjePKCVpbl_rDP6mGIr2AQJNKB
Last modified March 13, 2022: Setting up for multiple books (d99c538)