Data migration – making a molehill out of a mountain

It is very easy to accrue data, and with large diverse systems it is very easy to accrue very large volumes of the stuff from a wide range of different places. The problem comes when you want to get the data from this massive archive and onto a new format of tape to be accessed via a new system.

The more complex the system the more pitfalls when attempting to migrate the data. Where data is being streamed from multiple sources a technique known as multiplexing is often preferred as this gives the best use of the bandwidth available for data backup. The problem comes with restoring as each data set is potentially spread across multiple tapes, and the restore process using the originating backup software might well require that each set be restored individually and so you can end up having to read each tape multiple times. This effectively means that if you have an archive of 1000 tapes, you have to read 5,000 or 10,000 tapes to restore everything.

Factor in the need to have systems set up for the restoration, and that this could include NAS and Filer devices, Exchange Servers and SQL servers, and the temptation might well be to lock the archive and lose the key. This is fine until you need some data back or the regulators come-a-knocking.

Faced with customers who are despairing at the size of the data migration project to turn their 500 or 1000 DLT tapes the contain NetBackup, TSM or NetWorker archives into a smaller number of LTO3 or LTO4 tapes we decided the only approach was the development one. Trying to restore tapes under these circumstances using the originating software is just too restrictive, requires too much time and other resources from an IT department.

Having developed our own data migration utilities and so being able to scan the tapes and then restore them sequentially means that a 1000 tape data migration job might require only 1000 tape reads and that these can be done in parallel over multiple systems. Use 10 systems and rather than 1000 tapes requiring 10,000 read passes, we end up needing only 100. This equates to a job taking less than a month rather than several years. An extreme example but a fair illustration of the point.

