Backup failures / corrupt files should be automatically removed
Open Discussion
With the new automated backup system. If a backup should fail for whatever reason, it should delete (or move someone temporarily) the files it created before trying again. I recently had an instance where a file was corrupt and every time it tried to generate a backup it would fail midway into creating the archive. This was another backup file was was incomplete (not sure how that happened). I had 5 consecutive backups get created with large chunks of missing files and no way of knowing until I opened one of the generated backups.
If the backup files, an email notification should be sent to the administrator and it should move (or delete?) the broken files before starting again.
One major point of discussion I see with this request is that none of the actions will be favorable for 100% of our customer base (Leaving partial/failed backup, moving failed backup away, auto-deleting failed backup).
Leaving the partial/failed backup in place would be favorable to those who want "whatever could be backed up to be backed up", this way in the event of primary drive failure they at least have something in place to re-use.
Deleting the backup would potentially mean silent failure and that the user would have *nothing* available in the event that the primary drive failed while the backup system was having issues.
Moving the backup risks exponential disk space consumption for each backup that gets auto-moved. If a limit is instantiated to prevent runaway disk usage, then this simply moves the problem to a few backups down the road versus immediately.
Of the 3, having it fail and leave the partial backup in place seems the most favorable to me. Regardless of which of the 3 actions would potentially be taken, any failed backup would be reported as such and warrant manual investigation by the server owner/cPanel support. The delete/move actions seem to encourage ignoring the problem.
This topic really warrants further discussion and input to see what the consensus is.
Have you spoken to cPanel Support yet as to why the backup started failing in the first place? Addressing the origin cause of the failure is critical, and if there's a bug there we need it sorted out.
One major point of discussion I see with this request is that none of the actions will be favorable for 100% of our customer base (Leaving partial/failed backup, moving failed backup away, auto-deleting failed backup).
Leaving the partial/failed backup in place would be favorable to those who want "whatever could be backed up to be backed up", this way in the event of primary drive failure they at least have something in place to re-use.
Deleting the backup would potentially mean silent failure and that the user would have *nothing* available in the event that the primary drive failed while the backup system was having issues.
Moving the backup risks exponential disk space consumption for each backup that gets auto-moved. If a limit is instantiated to prevent runaway disk usage, then this simply moves the problem to a few backups down the road versus immediately.
Of the 3, having it fail and leave the partial backup in place seems the most favorable to me. Regardless of which of the 3 actions would potentially be taken, any failed backup would be reported as such and warrant manual investigation by the server owner/cPanel support. The delete/move actions seem to encourage ignoring the problem.
This topic really warrants further discussion and input to see what the consensus is.
Have you spoken to cPanel Support yet as to why the backup started failing in the first place? Addressing the origin cause of the failure is critical, and if there's a bug there we need it sorted out.
I suspect it failed the first time because the server was rebooted or NAS was disconnected when it was creating the archive. Since the archive was corrupt it started failing. I think in my haste I missed a few details...
You are are right, we don't want to be moving around files or deleting things. The main problem here is that I had no idea that the automated backup was failing until I tried to use one of them, and then I had nearly all of the public_html files missing from a corrupt archive. Perhaps we simply need an option to automatically TEST the archive (monthly?) for errors once an archive is generated? The problem is, if a backup failes, it doesn't do anything else after that.
Originally I thought it was the backup file being generated that was failing, but it actually does generate a .tar.gz, but you just can't extract it. A backup was being created, and then it started to backup an old backup file in the home directory, and the old backup was corrupt, which caused the automated backup to just stop.
Perhaps this isn't a bug, but maybe just a matter of keeping up with the server? I'm not sure what the solution is here, but I don't think half created backups is a good place to leave it either.
Here is a quote for the ticket I had open with cPanel:
When attempting to extract the backup file, the process exited on the following
4411509 root@server [~/4411509]# tar zxvf user.tar.gz
(...)
user/homedir/backup-8.17.2012_14-27-27_user.tar.gz
gzip: stdin: unexpected end of file
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
this would indicate the backup process can't get past this, and the homedir tar is then finished incompletely.
You should be able to move this backup out of the homedir, into /home or similar, then the backup should be able to finish correctly.
I suspect it failed the first time because the server was rebooted or NAS was disconnected when it was creating the archive. Since the archive was corrupt it started failing. I think in my haste I missed a few details...
You are are right, we don't want to be moving around files or deleting things. The main problem here is that I had no idea that the automated backup was failing until I tried to use one of them, and then I had nearly all of the public_html files missing from a corrupt archive. Perhaps we simply need an option to automatically TEST the archive (monthly?) for errors once an archive is generated? The problem is, if a backup failes, it doesn't do anything else after that.
Originally I thought it was the backup file being generated that was failing, but it actually does generate a .tar.gz, but you just can't extract it. A backup was being created, and then it started to backup an old backup file in the home directory, and the old backup was corrupt, which caused the automated backup to just stop.
Perhaps this isn't a bug, but maybe just a matter of keeping up with the server? I'm not sure what the solution is here, but I don't think half created backups is a good place to leave it either.
Here is a quote for the ticket I had open with cPanel:
When attempting to extract the backup file, the process exited on the following
4411509 root@server [~/4411509]# tar zxvf user.tar.gz
(...)
user/homedir/backup-8.17.2012_14-27-27_user.tar.gz
gzip: stdin: unexpected end of file
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
this would indicate the backup process can't get past this, and the homedir tar is then finished incompletely.
You should be able to move this backup out of the homedir, into /home or similar, then the backup should be able to finish correctly.
We just lost +10 years of transaction data as last all cpanel backup cpmove snapshots are corrupt!
We just lost +10 years of transaction data as last all cpanel backup cpmove snapshots are corrupt!
I would also like this problem fixed.
Recently a couple of my backups were corrupted and I would not know of known about this corruption if I had not of tested them myself. This is a big issue, as you can unknowingly have corrupted backups. I would love some sort of notification system to be implemented here.
Now we get to another problem, because of this, some of the backups were left on my cPanel server, even though I have retain backups locally unchecked. Suddenly my cPanel had critical disk space, as a weeks worth of daily backups had filled my servers HDD. I had to manually delete these backups. This is a huge problem, as simply a disconnected backup destination can lead to serious HDD space issues.
Overall, I see these 2 points as a major issue, as a automated backup should be just that, without error notifications we are flying blind.
I would also like this problem fixed.
Recently a couple of my backups were corrupted and I would not know of known about this corruption if I had not of tested them myself. This is a big issue, as you can unknowingly have corrupted backups. I would love some sort of notification system to be implemented here.
Now we get to another problem, because of this, some of the backups were left on my cPanel server, even though I have retain backups locally unchecked. Suddenly my cPanel had critical disk space, as a weeks worth of daily backups had filled my servers HDD. I had to manually delete these backups. This is a huge problem, as simply a disconnected backup destination can lead to serious HDD space issues.
Overall, I see these 2 points as a major issue, as a automated backup should be just that, without error notifications we are flying blind.
Maybe the best option is to ask in the backup settings page what the sysadmin would prefer cPanel to do if this happen.
What to do with possibly corrupted files?
() Delete the file.
() Leave it in place.
I would leave it in place to count on at least some files.
Obviously, a failed backup notification must be issued and sent to the designated syadmin email address.
Maybe the best option is to ask in the backup settings page what the sysadmin would prefer cPanel to do if this happen.
What to do with possibly corrupted files?
() Delete the file.
() Leave it in place.
I would leave it in place to count on at least some files.
Obviously, a failed backup notification must be issued and sent to the designated syadmin email address.
Replies have been locked on this page!