BookID, Book IDunno

In my previous post, I raised a few questions about BookID regarding potential false positives and unjustified removals. Coker has provided numbers for Smashwords removals, but there was no data on the validity of each of those cases. In my search for that data, I found an article from Nate Hoffelder at The Digital Reader, comparing BookID to YouTube’s ContentID. Scribd’s system (“an automated that checks user uploads for pirated content”—an automated what? My dear, you seem to have forgotten something) works “too well,” he writes, and references Smashwords users who have posted details about receiving false positives on their books.

Hoffelder briefly comments on the implications I raised earlier, albeit with slightly different focus, regarding my skepticism about treating the Smashwords document as the master text:

“this has led to more than a few problems. Whenever an author quoted a court document, public domain work, or other legitimately copyright-free document in their book, Scribd logs the quoted text as belonging to the author and their automated system flags and removes any user-uploaded documents that contain the same text.”

I don’t want to belabour this point, but it is an important side effect Scribd needs to take into account with future software updates.

Hoffelder spends the remainder of the post reflecting on YouTube’s numerous unfair removals. As interesting as the comparison is, he unfortunately devotes almost as many words to ContentID as he does to BookID. He concludes with a decision about digital fingerprint software in general, based primarily on ContentID but exacerbated by BookID, that he is critical of systems that make decisions without any human element. While it is unfortunate that legitimate uploads are being taken down, what is the alternative? Hoffelder doesn’t offer any solution. He claims the human decision maker is necessary, but with 48,000 Smashwords removals alone, the time and labour commitment required would be astronomical. It is simply not feasible to get a person involved in every single removal. It would be far more practical to allow a user to file a dispute (Dammit Jim, I’m a writer, not a pirate) over a removal, and get the Scribd employee involved at that point.

One other aspect of BookID that concerns me (which Hoffelder does not address) is the focus on uploads, and not downloads. BookID detects pirated material being added to the site (false positives aside), but does not seem to be able to cope with prevention of users downloading and illegally copying legitimate texts in the first place. I’m no authority on the finer points of DRM, but I wonder how well (if at all) the software can prevent book piracy at an even earlier stage. I am aware of the problematic nature of DRM itself, because it also limits legitimate usage, but perhaps BookID needs to take the downloading into account just as much as the uploading.

Hoffelder raises some important points about the weaknesses of BookID, but leaves me wanting. His post is a brief and barely skims the surface. It’s easy to be critical of Scribd, especially with the bad press they’ve received over the last five years, but they’ve taken a step in the right direction. The software is far from flawless, but the fact remains they are on the front lines of the battle against book piracy.