AWS S3 Backup to support concurrent uploads for faster transport
As a System Administrator, I would like for the AWS S3 backup process to allow for multiple part uploads simultaneously in order users with high-bandwidth connections to use that bandwidth.
===============
The AWS S3 backup process utilises a multipart upload process, splitting each backup file into 20MB parts. These parts are then uploaded individually with PUT requests using a synchronous loop. This process is slow and cannot make use of the high bandwidth connections available in datacentres.
I suggest that this system be modified to upload multiple parts concurrently.
In general, each upload request to S3 is only able to achieve 2-4MB/s, whereas most servers have at least a Gigabit connection (~120MB/s). I suggest that the number of connections be configurable, so as to not saturate the connection, with a default of 10 (same as the S3 CLI).
As an example, to backup 100GiB of accounts to S3 currently takes about 6-8 hours. With this solution we could reduce this to less than 1 hour.
Other S3 solutions already use concurrent connections, including the AWS CLI tool and the official AWS PHP SDK (unfortunately there is no official Perl SDK).
This should also be relatively simple to implement as the backup file is already split into parts for transport. The while loop in `_do_multipart_upload` in `/usr/local/cpanel/Cpanel/Transport/Files/AmazonS3.pm` would need to be replaced and instead multiple threads started which each handle uploading their own parts.
More Information on concurrent uploads is available in the AWS documentation: https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html#distributedmpupload
Replies have been locked on this page!