google print: google’s evilness is beside the point (Bonus Rant Included)

I’m pleased to see the Google Print issue spurring discussion of the role of corporations in controlling access to information. See, e.g., today’s post @ Gnuosphere [link from sivacracy]

Gnuosphere, Siva, and others point out that Google isn’t doing Google Print out of the goodness of its heart; the company is scanning, indexing, and providing access to information for lots and lots of money. These warnings are a helpful antidote to Google worship.

But the problem with this complaint can be seen in the gnuosphere post:

Personally, I’m not against having an institution be granted the right to create such a database. But I’m wary about handing over such privilege and control to a body that is not working for the people. Should a corporation control what could potentially become the world’s first digital library? What is the purpose of a library? Why do libraries exist? For who do libraries exist? If this project is to become a globally accessible library, should there be someone controlling your right to read?

As the database of books increases in size and therefore scientific and cultural value, is an unregulated for-profit corporation the best choice to manage and control that database?

I think not.

The impulses guiding this post are clearly pro-public access and use, and pro-library, and I wholeheartedly support that. But the issue is couched as “handing over … control” of this information to a corporation. “[G]rant[ing] the right to create such a database.” Making a “choice” of entities to “manage and control” that database.

No, no, no.

The point of people’s support for Google Print is not that we support Google, love Google, or want Google to control our access to information. The point is that Google, and any other entity who wants to do it, should be able to add value to information. Google should not be THE ONE; Google should be ONE OF MANY. Picking and choosing a single entity presupposes that the information is already controlled, and this new use, this new added value, is to be carefully metered as a scarce resource.

We should be concerned about Google Print’s contractual restrictions on holders of its scanned works. But we should not fear Google simply for being the first entrant into the market. Google turns out to be evil? Implementing DRM, gathering and exploiting private personal data, indexing our DNA, imposing restrictive licensing agreements on its source material holders? Fine, criticize the evil practices (and Google too). Some other entity turns out to be evil, and wants to restrict copyright such that only Google’s database is valid? Criticize them, too. But I want to recommend that we resist the conflation of evils. If we’re concerned that Google is going to control a big huge really valuable database, and possibly to the detriment of those who want to use the database, then the answer is, in First Amendment terms, more speech. More databases, more indexers, more more more.

Bonus: Free Rant!

And by the way, you publishers, authors, and copyright-holders. You want to cash in on this market? Why don’t you consider selling the electronic texts to the aggregators and indexers for cheaper than they can scan them in and with reasonable licensing terms? There’s your market, right there. In fact, technology has made that market available to you for MORE THAN THIRTY YEARS. Dialog, Lexis, WestLaw, and other database vendors could have been using the full text of books for a really long time. Libraries would have killed to have full-text access to books.

As it happens, ignoring obvious markets is not new to the publishing industry. Book publishers ignored the market for enriched information content for years before they began ignoring the market for searchable full-text. Libraries and indexers could have used, at any time in the 20th century, a flourishing market for bibliographic and enriched descriptive information about books. Instead, with no such market, librarians CREATED, from scratch, and at very great expense, indexes and catalogs of information about books — with virtually no assistance from publishers. All those major research databases like MedLine, Agricola, and the like? Laboriously created by individual librarians, basically indexing and cataloging research journals by hand. Compare book publishers to research journal publishers. After some time research journal publishers figured out there was a market in enhanced information content, and began figuring out how to take advantage of that market. They facilitated the indexing process by including keywords and abstracts. They began selling tables of contents and journal indexes to the literature indexes. Ultimately they began selling full-text to databases and aggregators. In fact, once they figured it out, research publishers have been incredibly successful at capturing monetary value from information that the authors mostly want to give away for free. (So successful, in fact, that academic authors are having to fight their own publishers to get that valuable research information out of the market — another very interesting topic for another time.)

Could book publishers have done something similar? Sure. But for decades, literally, book publishers ignored this opportunity. As with research journals, individual librarians created the catalogs and indexes of books, hand-examining each book, figuring out what the book is about and how to describe it, etc. Libraries organized consortia and union catalogs to share this information and reduce the expense of creating it. For most of the time that cataloging took place, book publishers weren’t much help. Only in the very last few years have book publishers even begun to scratch the surface of providing enriched content to libraries and information vendors, by providing tables of contents to library systems vendors, and dipping their toes into very limited full-text databases that are scarcely available to anyone.

So the book publishing industry quaked in its boots and sat on its ass and ignored the market for searchable full-text, focusing solely on the market for information packaged as a physical artifact. And now the industry wants to complain that Google is jumping into the market? And doing it, not by licensing the full texts from the publishers, but in the most expedient fashion possible, by scanning? Please. Cry me a river, and while you’re at it, shed a few tears for the recording industry’s failure to jump online in the mid-90s.