Altirium logo

Recover a bit but lose a block

When attempting a data recovery from a Microsoft Exchange email server after a catastrophic failure, and when I say catastrophic I mean, no backup to restore from and file system corruption or file deletion that has rendered the Exchange information store files inaccessible, one of the tools in Altirium’s data recovery arsenal was to trawl the entire disk or RAID volume and identify pages of Exchange data and rebuild the information store from the ashes. However when Microsoft engineers decided to change their page error correction method so that they could correct a single bit error in a page this seemingly minor ‘upgrade’ had dramatic effect in the ability to identify Exchange page data.

When a file system has become severely corrupt, or has been wiped out by re-partitioning or re-formatting, the ability to examine every block of data within a partition, hard drive or RAID array to identify and extract specific fragments of a specific file could be your only hope of recovery. This may sound extreme but it does happen, especially with business critical systems such as email and database servers.

Back in the days of Exchange 5.5, Exchange 2000 and Exchange 2003 (pre service pack 1) the internal structure of the 4K or 8K data pages that made up the information store files had enough encoded data within them to allow the pages to not only be recognised but also to identify where in the file the page was located. The values that allowed us to do this included the page CRC (cyclic redundancy check) checksum value and the pgnoThis page number. These were two 32bit values that sat side by side at the start of the the page structure.

Whilst the CRC value allowed a page to be checked to see if the data was valid and the pgnoThis could identify if the page was in the correct position within the file, it did not allow any errors within the page to be rectified. Research by Microsoft apparently identified that approximately 40 percent of CRC errors identified were caused by single bit “bit flip” errors so, with the release of Exchange 2003 service pack 1 in May 2004, they replaced the CRC and pgnoThis values in the page header structure with an updated CRC value and an ECC value to allow these single bit errors to be corrected when the page is loaded.

The algorithm to produce the first CRC value is seeded with the page number. This means that Exchange can still determine if the page it read is the page it expected to read however the CRC value can now no longer easily be used to identify a page of Exchange data if you don’t know its location as is the case when scavenging data during a data recovery process. Therefore, new techniques have had to be developed. Microsoft’s changes to the ESE (Extensible Storage Engine) architecture to allow a single bit within a block of Exchange data mean it is no longer easy to recovery a block or page of Exchange data when the allocation information for that file no longer exists.

Data recovery work of this nature require an in depth knowledge of the internal structures of files and a lot of dedication to produce the best possible result and this is how we approach every data recovery we do.

References:
Extensible Storage Engine Architecture: http://technet.microsoft.com/en-us/library/aa998171%28EXCHG.65%29.aspx

VN:F [1.9.22_1171]
Rating: 8.0/10 (1 vote cast)
Recover a bit but lose a block, 8.0 out of 10 based on 1 rating

Leave a Reply