Attempting to process large numbers of tapes natively (i.e. using the the originating backup application and IT environment) can prove to be a daunting task when large volumes of data are needed quickly. Most infrastructures are designed around getting data written out to tapes, but only sometimes needing to restore. This is not an adverse criticism, if daily restores are needed then either it is time to stop getting IT equipment from fire sales or there are some staff training issues to address.
However, when a large volume of data is needed quickly from a broad cross-section of backup tapes, perhaps a legal case requiring email data spanning several years, how can it be done?
In the case of one UK insurance company they had over 1000 LTO tapes, LTO1 to LTO3, containing Backup Exec backups from a number of Windows based systems, these included MSSQL backups, Exchange IS backups, and user data file backups from a number of servers running SIS (single-instance-store). They were not in a desperate hurry, but they knew a requirement to disclose data was likely in the next 4 months and that it could include email and user’s file data, so they were looking for a 3rd party tape restoration service.
As fortune would have it we had just released our latest version of our ADR Tape Restoration software suite, complete with a new module to handle SIS backups without the need to have SIS installed, and the Exchange and file data were duly restored to USB disks. Our processing for this, to meet the deadline, meant using multiple drives restoring in parallel on standalone PC systems, no need to Exchange servers or Windows servers with client systems. Each time a set of 4TB USB3 drives became full they were shipped post-haste to the customer so they could introduce the data to their new archive management system, and by the time they needed it they had it all in place.
Being a generally optimistic person and not believing in bad luck, not even if someone turns up with an OnStream cartridge, the prospect of 13 x LTO2 tapes containing ARCserve backups was not of particular concern. Then the additional information was provided, they had been to another company for restoration and there was a problem with the data, the backups were corrupted and could not be restored, “was there anything we could do”?
Not having seen the tapes it was not honest to give more than a cautiously optimistic opinion. No-one here had encountered a corrupted ARCserve backup since some problems with Adaptec 1542 cards and MSDOS with too much memory installed back in the early 1990s. It seemed more likely we would find that they were either not ARCserve at all, or else were encrypted.
When the tapes arrived it all became clear. They were ARCserve with multiplexing, which means that the backup data from several backups can be interleaved and any attempt to proceed in a linear manner without first loading and interpreting the ARCserve MUX (multiplex) tables is going to end in tears, or at least with worthless data. The next challenge was restoration within an average life-span, restoring a single backup would be relatively straightforward one the MUX tables had been correctly interpreted, but with over a hundred backups per tape the idea of restoring one set at a time was not overly appealing as each tape would have to be read over 100 times, effectively turning a 13 tape restore into a 1300+ tape restoration exercise. This is where the benefits of developing software for tape restoration come to the fore, and being able to modify code to enable the simultaneous processing of all backups so each tape took less than 3 hours to read.
Once the tapes had been catalogued the required Exchange email data was located. Anyone fancy a guess at which number tape it was on? Sorry, that would have been too poetic, it was on tape 7.
3,500 backup tapes containing Commvault Galaxy backups from which selected emails are required within 30 days might seem like a tall order, until the tapes arrive and turn out to be 1400 TSM backups on LTO2, 1200 Galaxy on LTO4, 900 NetBackup on LTO1 and LTO2, along with a selection of additional DATs and AIT tapes of unknown origin (it transpired that these were AS/400 SAVLIB). The water having been muddied it now turned to sludge as the court deadline to get the data turned out to be 30 days from 18 days earlier, so there were 12 days until the deadline. One other small detail, the email system in use had changed at some point from Notes to Exchange.
Planning around formats such as NetBackup and Galaxy where there is at least the option to position along tape to filemarks and get backup set information without having read every block of data is one thing, for TSM there was no option but to read every block of every tape and identify all of the file present.
Under such circumstances using the originating backup applications is not an option, for NetBackup and Galaxy where this would be possible, the infrastructure set-up requirement prior to starting work would take us past the deadline. With TSM it is just not an option. To meet the deadline tapes had to be “spinning” from day 0.
This is where the benefit of having written your own “non-native” restoration software and having spent years proving it in live situations reaps rewards. Rather than needing media servers to host drives & backup servers to host backup software, we were able to process the tapes using single PC systems each with 4 tape drives attached and scale up to 60+ drives running simultaneously on a 24/7 basis, filtering the file information as we went to identify Exchange backups and Notes files and where found process the tapes in question and restore the data. The deadline was met, not easily, but a day early.
Whilst there are cases where using the originating “native” backup application is the way to go, in a case like this being able to scale up processing with the relatively simple addition of Windows PCs each with multiple tape drives and no requirement for additional servers is what made it possible.
The restoration of data that was backed up from NetApp and EMC appliances, when those appliances have been retired, has long been a source of angst for IT departments. Do you have to retain appliances in case data is needed? Do you just accept that data is lost or that legacy hardware will have to be re-commissioned if a restoration from an NDMP backup is required?
Altirium’s “restore-on-demand” service now provides a solution with NDMP support being an integral part of Altirum’s much vaunted ADR Suite tape restoration software. Whether you have NetWorker NDMP backups from a NetApp filer or NetBackup backups from an EMC Celerra, files can be restored from your tapes direct to USB disk and returned to you quickly.
Contact Mark Sear or Laura Sangster on 01296 658737 to find how your access to your NDMP backups can be retained without the main of maintaining legacy systems.
If you’ve ever wondered what tools a technical data recovery engineer might use on a daily basis, then here are my top five tools, although they may not be quite what you’re expecting.
If you think that data recovery is just about running a bunch of software tools on hard disks or RAIDs or data from backup tapes then for a professional technical company, this is far from the truth. As a data recovery expert my job at Altirium involves interrogating raw data and writing software to solve often complex problems. This is one thing that I think sets us apart from some of the other companies in the data recovery industry. Yes we use “off the shelf” recovery software where it’s appropriate but often they are found lacking and don’t give the best results or properly report their findings. Therefore to recover data where there are no tools available, we develop them in-house.
Some of the achievements we’ve delivered in the past 12 months, using my top five tools include:
- Reverse engineering the MS SQL Server data structures and written software to recover data from dropped tables, where off the shelf tools failed.
- Extending our Tivoli Storage Manager recovery software capabilities, adding extractors for more data sources and software compressed data.
- Identifying and implementing the processing of many undocumented structures in the Microsoft Tape Format, used in software such as Symantec BackupExec.
- Reverse engineering Atempo Time Navigator backup format including processing of software compressed data.
All of this has contributed to solving genuine data recovery issues and has saved the companies that have come to us, thousands of pounds in lost revenue, many hours of support time and countless terabytes of potentially “unrecoverable” data.
Here are the top five tools that I use pretty much every day during the course of my job.
Many off-the-shelf Microsoft SQL Recovery tools state that they can recover from corrupt files, deleted data and some claim to recover from dropped tables, so the recent arrival of an MSSQL Recovery into the lab (all of the tables within the database had been dropped) gave us the ideal opportunity to undertake some tests. From looking at the MDF file of the database, the data was still present, yet out of the 4 or so packages we tried “NONE” of them could retrieve any of the dropped tables, yet we were still able recover the required data for our client.
Read the rest of this entry »
I’ve often heard it said, “the RAID has been rebuilt – the data cannot be recovered” and often this is the case. With RAID5, if the configuration is changed, and new parity is calculated, then there will be a significant loss of any data that was previously stored on the RAID.
As Hamlet so eloquently put it “There are more things in heaven and earth, Horatio, than are dreamt of in your philosophy.”, just because something is outside of our normal experience does not mean that it is not possible.
Read the rest of this entry »
Tivoli Storage Manager (“TSM”) provides a sophisticated heterogeneous data storage environment within which large volumes of data can be held. These might include email backups, user documents and SQL database, in fact all of the information that might be just a little bit useful in a computer forensic investigation or a tape data discovery exercise.
So, you are an investigator who has been handed a case containing 25 LTO4 cartridges from a TSM archive, now what?
Read the rest of this entry »
(With apologies to Mark Twain)
The release of LTO5 by Quantum Corporation brings 1.5TB native/3TB compressed tape to the market, and it is a sure fire bet that IBM and HP will shortly follow with their own offerings, which means that for the past 20 years or so, a technology many said was going the way of the Dodo, has managed to more than keep pace with competing technologies, and seen quite a few off (remember how optical disk was the future of storage back in the late 1980′s?).
When attempting a data recovery from a Microsoft Exchange email server after a catastrophic failure, and when I say catastrophic I mean, no backup to restore from and file system corruption or file deletion that has rendered the Exchange information store files inaccessible, one of the tools in Altirium’s data recovery arsenal was to trawl the entire disk or RAID volume and identify pages of Exchange data and rebuild the information store from the ashes. However when Microsoft engineers decided to change their page error correction method so that they could correct a single bit error in a page this seemingly minor ‘upgrade’ had dramatic effect in the ability to identify Exchange page data.