Backups broken into numerous/sectional tarballs/gzip's for faster backups, archiving, etc.
Currently the new backup scheme has two options, you either do tarballed backups, or you do incremental.
The problem for anybody who is managing backup servers, or any volume of real content, becomes the number of tiny files, and the FS performance hit involved in retaining multiple [remote] copies of the 1:1 file backups.
Alternatively the tarballed/gzip'd backups present a different problem: no matter what, ever single tarball is created from scratch, every single night, despite > 90% of the content being static.
So I bring this forward, pkgacct clearly works through various operations to come up with its packaged file, but why stop at one file? Why not produce a tarball for /mail, one for /public_html, one for [mysql DB's], etc? That way, the contents can be compared to the last tarball, if no checksums/etc. have changed, don't touch the old tarball, no need to re-create them, no need to re-transfer them to whatever backup host, etc.
This might mean for most hosting companies, the tarball with /public_html, and [cpanel settings] remains static for 98%+ of their accounts for every backup, while the /mail, and [mysql DB's] tarballs change for most accounts nightly [who doesn't get any spam, or any DB hits in a day?], etc.
This could mean > 90% less bits transferred, and make remote retention of numerous copies much more viable/straight forward without needing the full remote incremental system.
Heck, this could allow the cpanel account transfer mechanism to go more smoothly (since it uses pkgacct, same as the backups), providing a clear progress meter on the various tarballs created, transferred, uncompressed, etc. making the entire process smoother.
Also, for many this will simplify grabbing backups remotely. No good admin lets their CPanel box *push* to their remote backup destination, because that eliminates the security of the backups, if any compromise means the remote content is compromised as well. Everybody worth their salt should be *pulling* their backups from their CPanel boxes.
I'd like to try and offer some further insight by iterating through some of your topic points:
"Why not produce a tarball for /mail, one for /public_html, one for [mysql DB's], etc?"
The single biggest reason for not breaking up the pieces is one of customer clarity/organization. With the goal of cPanel & WHM to allow hosting operation without in-depth knowledge of internals, we don't expect system owners to have to understand anything beyond "this big archive file contains everything that makes up my account". Breaking the archive would add significant complexity to the system and also raise situations where "mix'n'match" might occur where you'll get DBs and static files from public_html that are not from the same time period, therefore resulting (in the customer's eyes) a failed restore/transfer. Keeping everything "whole" reduces complexity and ensures a smoother restore/transfer experience.
"That way, the contents can be compared to the last tarball, if no checksums/etc. have changed, don't touch the old tarball, no need to re-create them, no need to re-transfer them to whatever backup host, etc."
With your concern regarding I/O intensity, the act of iterating through a compressed tarball for checksums and iterating through the disk for checksums to compare would be significant. I am not convinced that this would result in I/O performance gains, although this would need further investigation.
"this could allow the cpanel account transfer mechanism to go more smoothly" and "this will simplify grabbing backups remotely"
An approach I would seem more suitable for this, at least with regard to transfers, would be live streaming of the particular bits of data requested. The ability, for instance, to transfer "Only Mail" during the transfer process in efforts to bring over the bits of email that ended up on an "old" account during DNS propagation. This is something talked about and discussed for the future, but I don't have a hard ETA for you.
In short, I am not convinced of the performance or organizational benefits of what's proposed. At the very least, the performance aspect of it requires significant investigation and testing. I would like to see further comments and voting occur for this feature to gauge community interest and opinion.
I'd like to try and offer some further insight by iterating through some of your topic points:
"Why not produce a tarball for /mail, one for /public_html, one for [mysql DB's], etc?"
The single biggest reason for not breaking up the pieces is one of customer clarity/organization. With the goal of cPanel & WHM to allow hosting operation without in-depth knowledge of internals, we don't expect system owners to have to understand anything beyond "this big archive file contains everything that makes up my account". Breaking the archive would add significant complexity to the system and also raise situations where "mix'n'match" might occur where you'll get DBs and static files from public_html that are not from the same time period, therefore resulting (in the customer's eyes) a failed restore/transfer. Keeping everything "whole" reduces complexity and ensures a smoother restore/transfer experience.
"That way, the contents can be compared to the last tarball, if no checksums/etc. have changed, don't touch the old tarball, no need to re-create them, no need to re-transfer them to whatever backup host, etc."
With your concern regarding I/O intensity, the act of iterating through a compressed tarball for checksums and iterating through the disk for checksums to compare would be significant. I am not convinced that this would result in I/O performance gains, although this would need further investigation.
"this could allow the cpanel account transfer mechanism to go more smoothly" and "this will simplify grabbing backups remotely"
An approach I would seem more suitable for this, at least with regard to transfers, would be live streaming of the particular bits of data requested. The ability, for instance, to transfer "Only Mail" during the transfer process in efforts to bring over the bits of email that ended up on an "old" account during DNS propagation. This is something talked about and discussed for the future, but I don't have a hard ETA for you.
In short, I am not convinced of the performance or organizational benefits of what's proposed. At the very least, the performance aspect of it requires significant investigation and testing. I would like to see further comments and voting occur for this feature to gauge community interest and opinion.
I'd like to try and offer some further insight by iterating through some of your topic points:
"Why not produce a tarball for /mail, one for /public_html, one for [mysql DB's], etc?"
The single biggest reason for not breaking up the pieces is one of customer clarity/organization. With the goal of cPanel & WHM to allow hosting operation without in-depth knowledge of internals, we don't expect system owners to have to understand anything beyond "this big archive file contains everything that makes up my account". Breaking the archive would add significant complexity to the system and also raise situations where "mix'n'match" might occur where you'll get DBs and static files from public_html that are not from the same time period, therefore resulting (in the customer's eyes) a failed restore/transfer. Keeping everything "whole" reduces complexity and ensures a smoother restore/transfer experience.
"That way, the contents can be compared to the last tarball, if no checksums/etc. have changed, don't touch the old tarball, no need to re-create them, no need to re-transfer them to whatever backup host, etc."
With your concern regarding I/O intensity, the act of iterating through a compressed tarball for checksums and iterating through the disk for checksums to compare would be significant. I am not convinced that this would result in I/O performance gains, although this would need further investigation.
"this could allow the cpanel account transfer mechanism to go more smoothly" and "this will simplify grabbing backups remotely"
An approach I would seem more suitable for this, at least with regard to transfers, would be live streaming of the particular bits of data requested. The ability, for instance, to transfer "Only Mail" during the transfer process in efforts to bring over the bits of email that ended up on an "old" account during DNS propagation. This is something talked about and discussed for the future, but I don't have a hard ETA for you.
In short, I am not convinced of the performance or organizational benefits of what's proposed. At the very least, the performance aspect of it requires significant investigation and testing. I would like to see further comments and voting occur for this feature to gauge community interest and opinion.
I'd like to try and offer some further insight by iterating through some of your topic points:
"Why not produce a tarball for /mail, one for /public_html, one for [mysql DB's], etc?"
The single biggest reason for not breaking up the pieces is one of customer clarity/organization. With the goal of cPanel & WHM to allow hosting operation without in-depth knowledge of internals, we don't expect system owners to have to understand anything beyond "this big archive file contains everything that makes up my account". Breaking the archive would add significant complexity to the system and also raise situations where "mix'n'match" might occur where you'll get DBs and static files from public_html that are not from the same time period, therefore resulting (in the customer's eyes) a failed restore/transfer. Keeping everything "whole" reduces complexity and ensures a smoother restore/transfer experience.
"That way, the contents can be compared to the last tarball, if no checksums/etc. have changed, don't touch the old tarball, no need to re-create them, no need to re-transfer them to whatever backup host, etc."
With your concern regarding I/O intensity, the act of iterating through a compressed tarball for checksums and iterating through the disk for checksums to compare would be significant. I am not convinced that this would result in I/O performance gains, although this would need further investigation.
"this could allow the cpanel account transfer mechanism to go more smoothly" and "this will simplify grabbing backups remotely"
An approach I would seem more suitable for this, at least with regard to transfers, would be live streaming of the particular bits of data requested. The ability, for instance, to transfer "Only Mail" during the transfer process in efforts to bring over the bits of email that ended up on an "old" account during DNS propagation. This is something talked about and discussed for the future, but I don't have a hard ETA for you.
In short, I am not convinced of the performance or organizational benefits of what's proposed. At the very least, the performance aspect of it requires significant investigation and testing. I would like to see further comments and voting occur for this feature to gauge community interest and opinion.
I was thinking more for uncompressed backups (then tar/gzip'd ones). I've always found rsync can complete a comparison faster then you could otherwise copy the files (suggesting better times when doing read only operations), and remember with the wear characteristics of SSD's, this will become increasingly important going forward.
CPanel could always adjust the date/time-stamps artificially of the tar/gzip files to coincide between all items in a backup if need be as well, if the checksum's match.
I was thinking more for uncompressed backups (then tar/gzip'd ones). I've always found rsync can complete a comparison faster then you could otherwise copy the files (suggesting better times when doing read only operations), and remember with the wear characteristics of SSD's, this will become increasingly important going forward.
CPanel could always adjust the date/time-stamps artificially of the tar/gzip files to coincide between all items in a backup if need be as well, if the checksum's match.
Replies have been locked on this page!