Server Side Archive Failure and the Fix
Category Administration
As we've been upgrading the clients, and their mail files, we've been adding Server Side Archiving (SSA) combined with a policy/setting document. I've always like SSA, and am nearly ecstatic to implement it at this site. However, we've had some hitches which has slowed us down.
Most of the problems were straightforward. For instance, the mail ownership was wrong or the policy/setting document was wrong. However, today the system admins could not shake a mail file loose from a persistent error. The error blocked the "compact -A" from working, and halted the compact and its document transfer, only leaving a hint of the problem with a console error that a "Document has been deleted."
Hm. The sys admins had tried rebuilding the views (updall -R), running fixup, copy-style compact, all to no avail. I took a closer look at the file. It's important for us to keep the SSA working, as we have a fixed amount of SAN space. This mail file was nearly a gig.
I have a monitoring and administration tool that I built, which includes a function to delete the archive profile document. Sometimes, rebuilding the archive settings from scratch gets the SSA working right (after all, some of these databases have been around since early 4.x days, and have gone through many release iterations). I deleted and rebuilt the archive settings, replacing the policy/setting with a manual entry (after clearing out the person document entry for the setting). No dice.
This had me puzzled. What "document has been deleted" could it be referencing? I assumed it was some design element, so I renewed the design. Nope, still got the same error.
Maybe, it's as simple as a corrupted view. Even though all the utilities had been run against the file, clearly something was wrong. I couldn't find any IBM technote referencing the same symptoms, so I began to suspect the database views. After all, if my code builds a collection from a view, then tries to access the referenced documents, and one of them is deleted, then the error would be same as what I was seeing on the server console.
I looked more carefully in the trash folder, and tried to open up some documents. They were fine. Then I tried to clear out the trash folder. Guess what? I got an error. I narrowed it down to a few, mysterious documents without a subject, which couldn't be opened (as only the view entry existed, but there was no matching document).
Manually rebuilding the view didn't help, so I used the admin client and purged the $Trash view. That freed up the view to be rebuilt, and when I checked it, the ghost entries were completely gone. Within a few minutes, the SSA was working great.
Corrupted data in the trash had stopped the SSA.
As we've been upgrading the clients, and their mail files, we've been adding Server Side Archiving (SSA) combined with a policy/setting document. I've always like SSA, and am nearly ecstatic to implement it at this site. However, we've had some hitches which has slowed us down.
Most of the problems were straightforward. For instance, the mail ownership was wrong or the policy/setting document was wrong. However, today the system admins could not shake a mail file loose from a persistent error. The error blocked the "compact -A" from working, and halted the compact and its document transfer, only leaving a hint of the problem with a console error that a "Document has been deleted."
Hm. The sys admins had tried rebuilding the views (updall -R), running fixup, copy-style compact, all to no avail. I took a closer look at the file. It's important for us to keep the SSA working, as we have a fixed amount of SAN space. This mail file was nearly a gig.
I have a monitoring and administration tool that I built, which includes a function to delete the archive profile document. Sometimes, rebuilding the archive settings from scratch gets the SSA working right (after all, some of these databases have been around since early 4.x days, and have gone through many release iterations). I deleted and rebuilt the archive settings, replacing the policy/setting with a manual entry (after clearing out the person document entry for the setting). No dice.
This had me puzzled. What "document has been deleted" could it be referencing? I assumed it was some design element, so I renewed the design. Nope, still got the same error.
Maybe, it's as simple as a corrupted view. Even though all the utilities had been run against the file, clearly something was wrong. I couldn't find any IBM technote referencing the same symptoms, so I began to suspect the database views. After all, if my code builds a collection from a view, then tries to access the referenced documents, and one of them is deleted, then the error would be same as what I was seeing on the server console.
I looked more carefully in the trash folder, and tried to open up some documents. They were fine. Then I tried to clear out the trash folder. Guess what? I got an error. I narrowed it down to a few, mysterious documents without a subject, which couldn't be opened (as only the view entry existed, but there was no matching document).
Manually rebuilding the view didn't help, so I used the admin client and purged the $Trash view. That freed up the view to be rebuilt, and when I checked it, the ghost entries were completely gone. Within a few minutes, the SSA was working great.
Corrupted data in the trash had stopped the SSA.
- 


Comments
Posted by Charles Robinson At 07:57:06 PM On 03/10/2008 | - Website - |