Altirium logo

Data migration and the curse of multiplexed NetBackup

A bit unfair to single out NetBackup, any format that supports multiplexing can leave similar problems for anyone attempting an archive-wide data migration project. The problem is this, multiplexing involves the interspersing of data from several sources within a backup set. This gives improvements in backup performance but the payback is in potentially degraded restore performance, especially if attempting a complete restoration of data. Why?

Well the problem is that the data from any single backup is distributed throughout the data from several other backups, and to identify the data from the required backup you have to read all of the data, including that from the other backups. For example, one backup set might occupy only 10% of the space on an LTO4 tape, but the restoration might require that 100% of the tape be read as each block must be read to see if it belongs to the required backup set.

The problem then with an archive wide tape data migration project is that if there are 1000 tapes containing backups from 100 different sources, the migration process might require that each tape be read 100 times, so the entire reading process will be the equivalent of transfering data from 100,000 tapes.

A bit more than one of those jobs to leave for a wet afternoon.

I’m not denigrating multiplexing as a procedure. It improves backup performance and logistics, and it is backup that occupies the most time. Unless we are very unlucky a lot less time is spent restoring data from tapes than is spent writing to them. It is just that when the time comes to move a large amount of data the option of using the original backup software and systems can be just too daunting and resource hungry.

There are ways around the problem. We write our tape data migration software to cater for this type of issue, so that a faster high level scan of the tapes can then mean that the tapes have only to be read once. This means that the data migration process can be completed in an acceptable time without the need for a massive investment in infrastructure and staff.

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)

5 Responses to “Data migration and the curse of multiplexed NetBackup”

  • You said “For example, one backup set might occupy only 10% of the space on an LTO4 tape, but the restoration might require that 100% of the tape be read as each block must be read to see if it belongs to the required backup set.”

    NetBackup doesn’t work that way. In general we would be skipping ahead as needed to just the data we need. There are exceptions of course, but you shouldn’t have to read 100% of the tape.

    One exception might be if you have lost the NetBackup catalog and you need to recreate that from scratch. This underscores the neccessity of maintaining a good backup of the NetBackup catalog and practicing disaster recovery scenarios.

    If you have questions feel free to email or comment.

    thanks,

    Tim Burlowski
    Product Manager
    Symantec

    VA:F [1.9.22_1171]
    Rating: 0.0/5 (0 votes cast)
  • Mark says:

    Thank you for your comments.

    I should begin by making clear that I did not intend claim that the entire tape will always have to be read for each restore, but sometimes either all or a significant proportion might have to be read, and I apologise if I gave the wrong impression.

    NetBackup organises data along a tape in sequential sections named “fragments”. Each fragment is a number of data blocks terminated by a tape file mark. The fragments of a single image (backup) can be distributed over many tapes and in a multiplexed backup each of these fragments can contain data from a number of different images.

    I agree that NetBackup can reference the catalog to identify only the fragments that are required to restore a specific image and skip any fragments that do not contain data relevant to the restore, and that this can dramatically reduce the area of tape that needs to be read. Where, in my experience, this is not the case however is with multiplexed backups when the distribution of the data is such that every fragment contains data of the image that needs to be restored and must therefore be read. In such cases the multiplexed fragments contain data from multiple images, the gap between relevant data is significantly reduced and any fast block seeking capabilities of the tape drives are of little benefit as the drive is unable to seek quickly before the next relevant portion of data is required. Therefore the time to read and restore all images is multiplied by the number of multiplex images.

    VN:F [1.9.22_1171]
    Rating: 0.0/5 (0 votes cast)
  • This is a great blog you got here but i can’t seem to find the RSS button.

    VA:F [1.9.22_1171]
    Rating: 0.0/5 (0 votes cast)
  • Steven says:

    The RSS button is positioned bottom left in the page footer.

    VN:F [1.9.22_1171]
    Rating: 0.0/5 (0 votes cast)
  • Interesting to say the least.

    VA:F [1.9.22_1171]
    Rating: 0.0/5 (0 votes cast)

Leave a Reply