On May 11th, 2014, Smashwords founder Mark Coker updated the official blog to address concerns from the Smashwords community regarding the December 2013 announcement that the company would be joining the Scribd subscription service. Coker briefly explains that the tension centres around Scribd users, many of whom are uploading content that violates copyright. Many of Smashwords’ authors expressed anxieties, having seen their books posted illegally, and criticized the decision to join Scribd. Coker emphasizes that “we wouldn’t have partnered with Scribd if we weren’t confident their heart was in the right place, and if we weren’t confident our relationship with Scribd would benefit all indie authors.”
Coker then moves into a summary and discussion of Scribd’s new digital fingerprint software, called BookID. He explains roughly how the software works:
“BookID automatically scans all Smashwords-delivered books, and analyzes the text for semantic data such as word count, letter frequency, phrases, and other elements. BookID then creates a digital fingerprint of the authorized Smashwords book, and uses this fingerprint to automatically detect and remove unauthorized versions. It proactively removes all files at Scribd that match the same fingerprint, and also uses this fingerprint to proactively block the upload of future unauthorized versions.”
He then provides some hard data about the number of unauthorized uploads that have now been removed. As of his post, Scribd has taken down about 48,000 copies of Smashwords books alone. Sounds impressive. Except, that’s just the successful Smashwords takedowns. How many pirated copies still haven’t been recognized? How many pirated documents do they have across the board? How did those numbers get so high?
Coker acknowledges that “no automated scanning system will every [sic] be 100% accurate,” but remains confident that Scribd will continue to improve the amount of cases they catch. Typos aside, Coker’s account of the matter is highly optimistic, and makes a valiant effort to defend his new business partner while reassuring his writers of the safety of their work. Actually, typos not aside because the situational irony is too glorious to ignore. In an argument about the impossibility of accuracy with automation, he misspells a word he has manually typed.
As mentioned in my previous post, I don’t want to paint Scribd as the devil, but they have indirectly enabled piracy on a grand scale. It reflects well on Scribd, having developed BookID, but I am highly skeptical of Coker’s sunny outlook. There are two primary reasons this Always Look on the Bright Side of Life is illogical, Captain. First, the software does not allow for similarities in story arcs. Many stories, especially those passed down through oral culture and mythology, carry strong resemblances. I cannot imagine the system would be able to cope, and I suspect there have been and will be a high number of false positives. How will it cope with quotes, or stories within stories? If a character is retelling a myth, for example, will BookID flag it because that same myth has appeared in another book?
Second, the software uses the Smashwords document as the Rosetta Stone to find all the pirated versions. While I agree that this method is effective for locating illegal uploads of that original document, it is perhaps dangerous to put Smashwords on such a high pedestal. They are an ebook distributor primarily for independent and self-published authors. I do not by any means wish to imply the work is of a lower quality, but I have no way of knowing that every book has been checked for cases of plagiarism. If Scribd and Smashwords accept books without question, then use those books to create the fingerprints, what if that master text has stolen material from a book by a different publisher? In the future, if that other publisher uploads that book, it will be unjustly removed while the real case of plagiarism remains.
Are cases such as these even possible? Or likely? I haven’t a clue. The software is too new, and the public conversation is too little. There is simply not enough information to warrant the amount of confidence that Coker has. I, therefore, choose to remain a skeptic. That said, I really do give Scribd a pat on the back for trying, and will keep a sharp eye on their development.