On copying DVDs

| No Comments

I have an extensive collection of DVDs. They take up three bookcases, which is a bit annoying, because I have books which I’d like to put on them instead. So I’ve been copying the DVDs onto computer storage. This has taught me surprising things.

The wrong approach

I started out using dvdbackup to grab the video-specific bits of the discs, and genisoimage from the cdrkit collection to put them back into an ‘.iso’ image. (These are poorly named, because DVDs primarily use UDF for their filesystem, rather than – or possibly in addition to – ISO 9660, but the name is now traditional.)

This turned out to be undesirable for two reasons.

Firstly, some DVDs have interesting things other than video in their filesystems, and dvdbackup ignores this stuff. For example, Doctor Who DVDs often have random PDF things, and the original disc for Shada – the one with Tom Baker’s narration filling in for the pieces which were never filmed – has a complete Big Finish animated version, in Flash, of all things, with voice acting by Paul McGann, Lalla Ward, and Jonathan Pryce; so missing this stuff out is unfortunate.

More importantly, some DVDs, notably ones from Lionsgate, have weird filesystems, with multiple VTS (‘video title set’) file entries referring to the same sectors on the actual disc. (This is a violation of the DVD spec, which insists on the VTS files being laid out in a specific order on the disc, but players don’t care so it works anyway.) The dvdbackup program gets confused by this, and tries to make separate copies of each VTS, which requires much more storage than the 8.5 GB notional maximum for a DVD, and causes genisoimage to fail. Worse, the common sector range includes a number of bad sectors which slow down copying significantly, and dvdbackup hits these for each of the duplicate VTS files. The result is that a copy takes aaages to finish, exhausts the available temporary storage, and can’t be packed back into an ‘.iso’ image anyway.

A new program

A change of approach was necessary. I wrote a new set of utilities, which I’ve named dvdrip. The cornerstone of the collection is dvd-sector-copy, which copies an entire disc, sector by sector, in order, to an image file. The clever bit is that, beforehand, it’s used the VideoLAN project’s libdvdread library to identify where the encrypted ‘.vob’ files are on the disc, and uses libdvdread’s services to decrypt the applicable sectors rather than reading them directly from the DVD block device. It still reads each sector at most once, so the Lionsgate storage explosion can’t happen. It reads and copies every sector, so all of the non-video extras are preserved.

It’s important to deal sensibly with bad sectors. Some discs are damaged, but others – notably from Lionsgate, Sony, and Disney – frequently have intentional bad sectors in places where DVD players won’t look because of the navigation data. DVD drives slow down dramatically when they actually try to read bad sectors, and the bad areas are often quite large, so wading through them one sector at a time is impractical.

To deal with this, I’ve developed a rather fiddly algorithm which tries to skip past regions of bad sectors. Once I encounter a bad sector, I first try reading it a few more times: this sometimes works on discs which are actually damaged and provides a useful clue that a disc requires some TLC, or maybe attention from a different drive. If the sector is persistently bad, then the next task is to determine the end of the bad region. I do this by trying sectors in a geometrically increasing sequence, skipping 1, 2, 4, 8, … sectors until I find one that can be read successfully. (The precise details of the sequence can be tuned by setting parameters.) Once the the region is bounded on both sides, I use a binary search to refine the upper bound. Bad regions often have groups of good and bad sectors, and bounding a bad region too early means that I go through the retry and bounding stages again for the next group, which slows everything down a lot. So instead I only consider a sector to be ‘good’ if an area following it is completely clear. The size of the area I require to be clear is related to the size of the bad region I’ve found so far. This approach seems to work fairly well, but I did find one disc – Memoirs of a Geisha – where I needed to tweak the parameters because a good group sandwiched between two bad groups was actually critical to the disc playing correctly.

Another thing that I put a lot of effort into – which dvdbackup ignores – is completing and fixing partial or erroneous copies. My program can automatically pick up where a previously interrupted copy finished. (This is easy, because its output is a single ‘.iso’ file, and I read the raw disc from start to finish in order, so the length of the output file is a good indicator of how far it got last time.) When it skips regions of bad sectors, it can write a file listing the regions it skipped, and it can be instructed to copy just particular regions of the disc. (I skip regions of bad sectors using lseek, not by writing any particular pattern, so a subsequent attempt won’t destroy old good data if it encounters problems.) This all turns out to be useful with discs with intentionally bad sectors: my drives tend to run rather slowly after encountering a bad sector, so once it looks like it’s recovered properly, I’ll interrupt the copy, eject and reinsert the disc, and continue the copy, which then runs at full speed until it hits the next bad region.

The final thing I put (probably too much) effort into was progress reporting. A while ago, I wrote a small library in Python to display multiple progress bars on a terminal, leaning on the blessed library to do the fiddly terminal handling. I translated this library into C, using the traditional termcap or terminfo facilities, and use them to show progress through: the sectors it’s been asked to copy; the entire disc; a VTS, when applicable; and regions of bad blocks, when applicable.

Differences between drives

I’m using four different DVD drives, all attached to the same computer, a 13-year-old Lenovo Thinkpad T500. One is the Matshita UJ862A built into it; the other three are USB-attached LG drives (GP70NS50s), which were very cheap but work pretty well.

The built-in Matshita drive is usually twice as fast as the LG drives. It also reports bad sectors much more quickly, so I’ll often switch a problematic disc into it. Sometimes, though, the LG drives decide that they really like a disc, and I can hear them spinning up their hyperdrives (pausing data transfer for a bit) so that they can really get going. Also, the Matshita drive seems generally rather pickier: the LG drives are better at coping with clearly damaged discs.

The LG drives don’t appear to care about region coding, while the built-in Lenovo drive certainly does. I have a small number of out-of-region discs, and I have to copy those using the LG drives.

On the other hand, the LG drives don’t seem to do CSS at all (maybe because they don’t have a region configured). The Matshita drive always cracks keys very quickly, while libdvdcss sometimes takes a long time to crack keys using the LG drives, and sometimes fails completely – usually on very short VTSes which likely don’t offer enough guessed-plaintext for the cryptanalysis. Usually this doesn’t cause trouble; sometimes it does, and I just re-rip the problematic VTS using the Matshita drive.

Drive speed

As part of my progress display, I try to guess and report a time to completion. This turns out to be really difficult.

The data on a DVD starts in the middle, works out towards the rim, and, on a dual-layer disc, comes back towards the middle again on the other layer. Disc performance improves towards the edge, presumably because the bits are moving past the read head faster for the same angular velocity, and the latter is constrained more by physical concerns, of the disc being stable, than by being able to read the data. I’ve not yet factored this into the performance model, so the ETA is rather inaccurate. This seems to be traditional for progress indicators anyway.

All the drives slow down a lot when they encounter bad sectors. The Matshita drive slows down less, but doesn’t return to high-speed reading afterwards until the disc is ejected. This is a little vexing, since the drive actually seems to be worse at reading reliably when it’s in this go-slow state. (If anyone knows of a way to get an optical drive to go back to high speed without ejecting, maybe by issuing an ioctl or something, I’d love to know.) The LG discs seem much more reluctant to report a sector as bad, so bad regions slow them down much more, but bounce back better. (Even though they’re very slow to report sectors as bad, they don’t seem much more reliable than the Matshita drive, and retrying in software still pays off on actually damaged discs.)

Annoying discs

Probably the most annoying disc I’ve found so far has been Brass Eye. It used the duplicate-VTS trick, with copious regions of bad sectors, so copying was slow. But it has a further trick: it presents 99 different titles, but many are mangled duplicates of each other, with the pieces of episodes stitched together in the wrong order, and the disc menu only works correctly if you wade through the tedious copyright notices which VLC usually skips over.

Damaged discs

Surprisingly few discs in my collection have actually been damaged. I’ve found that I could usually resurrect a disc, at least temporarily, by spraying it with Impega screen cleaning fluid and wiping (radially!) with the microfibre cloth provided with my old Nokia N900 phone. (I ordered replacements for the affected discs anyway; fortunately, I’ve not yet had any problems with discs which are hard to replace.)

Software

My program is available from its Git repository. There’s currently no documentation, and pretty much no commentary, so you’ll have to take your chances. Besides, there’s quite a bit more to using the program effectively than knowing what the command-line options are. Anyway, once things settle down a bit, I’ll tidy it up and make it actually usable. Although (lack of commentary!) it doesn’t say, this is free software and you can copy and/or distribute it under the term of the GNU General Public License version 3, or, at your option, any later version.

(Edited 2022-03-03: Identified the drives precisely.)

Leave a comment

About this Entry

This page contains a single entry by Mark Wooding published on March 1, 2022 2:03 PM.

Finally! was the previous entry in this blog.

A review of I'm Back 35 is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Pages

OpenID accepted here Learn more about OpenID
Powered by Movable Type 5.2.13