Tag Archives: Google Book Search

Google Book Search panel at ALA Midwinter

The ALA’s Copyright Subcommittee (Committee on Legislation) is hosting a panel on the Google Book Settlement at ALA Midwinter this year — Saturday at 1:30 at the Grand Hyatt. (I’m on the committee and on the panel.) Should be interesting.

Come to the Google Book Settlement Session at ALA Midwinter Conference January 24th, 2009, 1:30-3:30, Grand Hyatt, Maroon Peak Room

If you’ll be at ALA’s Midwinter Conference in Denver at the end of January, please check out the session “Google Book Search: What’s In It for Libraries?” The open forum will be hosted by the ALA Committee on Legislation’s Copyright Subcommittee to discuss the proposed Google Book Search settlement. The discussion will take place on Saturday, January 24, from 1:30 to 3:30 p.m. at the Grand Hyatt, Maroon Peak (listed as the Washington Office Breakout Session IV – Google Book Search in the program).

Panelists will include Dan Clancy, Engineering Director for the Google Book Search Project, Karen Coyle, Digital Librarian and Consultant, Paul Courant, Dean of Libraries at the University of Michigan, and Laura Quilter, Librarian and Attorney at Law. The session will be moderated by Nancy Kranich, chair of the COL Copyright Subcommittee. Following brief opening remarks by each panelist, there be an opportunity for dialogue and questions from the audience.

Additional information about the proposed Google Book Search settlement is available at http://wo.ala.org/gbs/.

“scan this book”?

siva linked to “scan this book!”, a NYT magazine article by kevin kelly, with a promise to post comments about it soon. i look forward to them, and in the meantime will post my own (hurried & no-doubt flawed) quick reactions to one point:

Authors and publishers (including publishers of music and film) have relied for years on cheap mass-produced copies protected from counterfeits and pirates by a strong law based on the dominance of copies and on a public educated to respect the sanctity of a copy. This model has, in the last century or so, produced the greatest flowering of human achievement the world has ever seen, a magnificent golden age of creative works. Protected physical copies have enabled millions of people to earn a living directly from the sale of their art to the audience, without the weird dynamics of patronage. Not only did authors and artists benefit from this model, but the audience did, too. For the first time, billions of ordinary people were able to come in regular contact with a great work. In Mozart’s day, few people ever heard one of his symphonies more than once. With the advent of cheap audio recordings, a barber in Java could listen to them all day long.

Um, no. I mean, partly yes, but partly no. Let’s not get the “protected physical copies” cart before the horse of creativity and economic power. The “greatest flowering of human achievement” has been enabled by a relatively wealthier populace; the wealth has been largely enabled by technology which enabled faster & more efficient production and resource extraction. Billions of ordinary people can come into contact with works, great or otherwise, because they have surplus capital and time to purchase them (technology, democracy, FEMINISM, the labor movement, etc.); it is cheap to reproduce them (because of technology); and as a result of this expanded marketplace, and greater leisure / capital, more people could create their own works. (And the barber in Java, if she is like barbers in many other developing nations, is listening to the recordings despite copyright law, not because of it.)

The transitional technological moment when works could be mass-produced but only with expensive equipment and with relatively expensive resources allowed a chokehold on that production, and certain parties were able to make a killing on that chokehold — record producers, publishers, and the like. But the chokehold didn’t enable the flowering — like a dam, it just siphoned off energy from a river that was already flowing. Sorry for the mixed metaphors. But let’s not isolate creativity and copyright from history and the real world.

There’s no question that, as Kelly suggests, this is a clash of business models. But it’s important to characterize the middle-man business model correctly: not as the cause of creativity, but as a by-product of creativity + a transitional technological state. If you think about it that way, you quickly figure out that middle-men’s interests not only aren’t protected by copyright law, they’re not even the interests described in the Constitutional “Authors and Inventors”. If the middle-men want to continue making revenue, they’ll have to do it in some way that adds to the value. The principal means they formerly had of adding value no longer cut it. Creation and capture are getting cheaper, being reduced just to their human inputs of creativity and ingenuity — as they should be. And distribution costs are approaching zero. Thanks for bringing your expensive printing presses, recording and processing equipment, and land-based distribution methods to the party, guys, but we can party on without them now. What else ya got? Editing? Selection? Indexing? Archiving? Tagging? Promotion? Because all those things could be very helpful.

… Anyway, I largely enjoyed the article, although I skimmed it very quickly during a short break. I look forward to reading it again at more leisure (and to Siva’s eventual commentary).

google print postscript

More Google Print …

Although the media coverage is only picking up, to me, it seems that some of the most interesting discussions have died down. So, here’s a very few that caught my attention for one reason or another:

  • Google Kills Children, A Whole Lotta Nothing, 2005/11/5: Link and complaint about interview with spokesperson for the children’s hospital that owns the royalty rights for Peter Pan.

    This is the most intellectually dishonest article title I’ve seen in quite some time: Google Print upsets children’s hospital.

    Commentator Rob Begbie added:

    To my eyes, reading the article, your complaint should be with the lazy hack who presumably phoned up GOSH out of the blue asking for a comment, in order to produce a stupidly sensational article which would get them lots of pageviews.

  • Farhad Manjoo covered the issue @ salon.com (11/9)
  • Rebecca Tushnet covers her reactions to a talk by Clifford Lynch (CNI) & Jonathan Band in Massive Digitization Projects.
  • Siva explicated his concerns in the Chronicle. I liked the article: rather than looking at the legal questions, Siva looked at the larger social questions raised and brought to the fore by the project. I hope Siva’s article triggers more examination of the social concerns, riding on the media’s love of Google to get attention to the general questions of information access and control.

    As for the substantive issues Siva addresses, I have two reactions. 1) I wholly agree that libraries ought to be scanning and digitizing and creating databases. But I’m not down on Google because they’re not my preferred party. 2) I’m also not down on Google for what they might do, although I’m leery of amassed power. But if the concern is that Google will have access to all this information, then I have a two-word solution: Arms Race.

For lots more, there are roundups from ipta blog 12/2 and 11/9; Charles Bailey’s Google Print Bibliography (10/25); and kestrell’s roundup of coverage (10/3).

A number of panels & debates & radio stories have been filed on it, too, with some interesting responses from listeners / watchers / participants:

The Google Print ‘debate’ offers a great example of debate polarization: over time, people with an opinion grow more confirmed in that opinion, rather than more nuanced. They dig their heels in more, reflexively responding to critiques with arguments. As they mull over the issue, both supporters and opponents find even more reasons why they were originally right. The intemperate may move towards shrill. (Some people are driven to shrillness by other reasons.) Siva’s article managed to buck that polarizing trend, remaining thoughtful & nuanced (even though I may disagree with him). Hmm. Taking in critical information, letting it affect your opinions, and becoming more nuanced as a result. What an useful skill that would be.

Google Print: whither goest this debate …

Oy. The Google Print discussion just keeps going on, and on …

  1. Karl-Friedrich Lenz makes the point that Yahoo!’s opt-in only venture, and older search databases that licensed content, weaken Google’s case. Yep. As a copyfighter, though, I resist the idea that copyright-holders already legally possess that level of control over copyrighted works, and can force an indexer to the bargaining table to index the content. Yahoo! claims that they didn’t do it this way to screw Google, but I can’t help but think that they did; or at least there was a serious sense of two-dead-birds in their choice of the opt-in model.

  2. Derek Slater made the point that supporting Google Print does not mean not supporting libraries, and he did it very well. If I had seen his post, I could have saved myself the trouble of writing mine. The video store / library example is quite right (although, Derek, video stores are not necessarily superior to library rentals; library rentals are in most instances free. And that must be good, since the music industry has told us nothing can compete with free….)

  3. Farber’s IP list has had debate over Google Print, too, summarized on an o’reilly blog.

  4. copyfight rounded up the coverage in the Oct. 24-ish period.

  5. I’m enjoying gnuosphere’s blog, which I recently discovered. gnuosphere asks:

    Of what use will the freedom to exercise “fair use” in the creation of an electronic database be if the deliverance of that database is controlled by software idea patents?

    A lot, because the freedom to exercise ‘fair use’ is much, much broader than its single application in the Google Print project.

    This is why my original call was to resist conflating the problems and critiques of Google: If we’re concerned that Google Print isn’t fair use, then that’s a copyright concern and has some implications. If we’re concerned that Google Inc. is anticompetitive, that has other implications. Google’s privacy policy is yet another issue. The issues could interrelate, as in Google’s exclusivity clauses in its contracts with the libraries. But the privacy, DRM, and patent over scanning and search algorithms — none of that particularly relates to the question of fair use, as far as I can tell. Separate problems and worthy of attention in their own right, to be sure. But it almost seems as though there is a piling on effect happening: People are suddenly nervous about Google and suddenly the spotlight is on Google Print. But the main distinctive points about Google Print, from everything I’ve read so far, are (a) the copyright fair use questions, and (b) the exclusive licensing agreements with libraries. So … at this point I think it’s more helpful to keep the issues separate.

  6. Siva rightly points out that there is a risk, maybe a very good one, maybe a certainty (“they will…”) that Google will use its market power to maintain its market power. That’s surely true, as I suspect it is and will be true of almost every other private for-profit enterprise in the world.

    I guess I’m not sure where that point takes us, in terms of thinking about Google Print. Ought we be fighting DRM, closed algorithms, privacy invasions, abuses of monopolistic power, and the like? Yes. Does that mean we ought to fight the very creation of Google Print? No. So I am, honestly, puzzled at this critique.

    Siva wonders “Why would Google allow competition? I am afraid ‘more speech’ is wishful thinking here.” But again, I’m not sure where that leaves us. If “more speech” is wishful thinking in a Google Print world, then what will we get in a no-Google Print world? Less speech and fewer indexes.

    If we don’t like Google Print, and we want the law to take a hand, then we have to think of legal grounds on which to stop it. The only available grounds, as far as I can see, are copyright. Antitrust violations still seem to be only in Google’s future as far as I can tell, and wouldn’t in any case stop the database; even in the years of much stronger antitrust enforcement, we didn’t shut down technologies, just dispersed them or restricted their contractual terms. Privacy? DRM? Tort law? I don’t see any colorable claims but copyright.

    Lots of virtual ink has already been spilled on that discussion, but for the record, I think Google Print ought to be fair use, and I think there’s a good chance that a court will find so. And to make an argument that fair use does not permit full-text indexing will only hurt libraries and the library digitizing projects of the future. Yes, it will also hurt for-profit value-added indexing companies like Google, Microsoft, and Yahoo!, because under such an argument, those entities will have to reach licensing agreements with copyright holders. They will do so, because they actually do, when it comes down to it, have the money to reach a licensing accord. Such a situation will not help libraries and non-profit digitizing projects.

    Libraries and non-profit digitizing projects do not have the money to reach licensing accords. That will leave them with two choices: (1) Rely instead on Google, Microsoft, Yahoo!; Dialog, Lexis/Nexis, Westlaw, Ebsco, Elsevier, and other kind-hearted private companies to treat them fairly. (2) Go it on their own and push open access, full-text indexing. The kind the library community has been lobbying for, unsuccessfully, for years. Beyond lobbying, will libraries push the envelope that has been created by Amazon.com, Google, Yahoo!, Microsoft, and every other major content indexer? They absolutely will not. Fair use is determined on a case-by-case basis, by folks willing and able to litigate. Google is (maybe). Libraries aren’t.

    In fact, I can state with 99.8% certainty that if fair use is abridged for the private sector, libraries will not risk litigation, and instead they will heed the advice of their conservative, cautious University general counsels and municipal legal departments — and follow in the footsteps of private entities’ approach to copyright and digitization. Because no library can afford to take on the litigation, even if ALA backs them and even if libraries are more attractive defendants than Google (or Napster).

    And this has been the case for the past twenty years. Libraries (and museums, let’s not leave them out) have not led the way in digitizing copyrighted content, much as I would wish it to have been so. Librarians and their lawyers have been, reasonably, scared to death of the really crazy expansions of copyright in the last 30 years. And those expansions have stifled what librarians would have liked to have done with technology. Three examples: InterLibrary Loan (ILL); digital reserves; TV archives.

  7. Siva also pointed out that publishers have been selling the full-text. It’s true, that in the past few years they have entered this market. Hardly full-force, by what I can see, but I have appreciated that they’ve been doing it. But Siva also says that this renders Google’s fair use claims flimsy, because they are choking off the market before it matures. I beg to disagree, for two reasons.

    First, it certainly may “undermine” a market for someone to come in with a new approach; that’s called innovation. Merely undermining a market, whether developing or extant, is not a problem. It’s how you undermine it: are you taking advantage of a monopolistic position? or doing some other illegal or immoral method of undermining a market? You can argue that Google is taking advantage of a monopoly in the Gigantic Search Engine market, but Yahoo!’s efforts and the efforts of the European libraries suggest otherwise.

    Second, I disagree that the market is being choked off before it matures. This market is mature. It is so mature, it is practically ancient and wizened. At least, if by “mature” you mean having willing customers. I know that University Presses have been trying to figure out how to capitalize on electronic distribution, as have other book publishers. But they’ve been having this discussion since the nineties. It’s 2005. The technology to offer text has been around since the seventies. Library catalogs, and Dialog, and others, have been buying full-text as soon as it became available. Book publishers, like music publishers in the nineties, have been too shortsighted and/or confounded by fear, anxiety and loathing to actually get the texts out there. Instead, they’ve been “experimenting” with DRM-restricted, unusable electronic books since the nineties; and in the last five years dipping their toes into very limited fulltext database sales to libraries.

    Come on. Technologically speaking, how crazy is it that music has an iTunes before text? The recording industry was notoriously slow and retrograde on this issue — and they still beat the book publishers. How long do we have to wait for this market to “mature”?

    Consider the Amazon.com ‘search inside the book’ example. Siva mentions in another post that “[P]ublishers fear that a market that Amazon created for them: ‘search inside the book’ licensing, will evaporate.” This isn’t how I remember the search-inside-the-book issue. I remember publishers complaining and freaking out, and the Authors Guild complaining about copyright infringement. Amazon didn’t “create a market for publishers”: Amazon forced publishers to come to the bargaining table, by threatening to do what Google is now doing wholesale. And publishers did come to the bargaining table, and did so in such a way that they and Amazon were able to come to an accord. Amazon, in the business of selling books, had an real incentive to work with publishers on this. Google, in the business of selling ads to people using a search engine, needs the publishers less, but I bet could still do business with them if the terms were right.

    And the market isn’t dead even if Google rips apart each & every available title & scans it in. Because other competitors will want to come into the market, and will need to get the text themselves. Publishers have unique services they can offer as middle-agents between authors and aggregators: Publishers offer editorial and enhancement services to authors, and they can offer digitizing, formatting, and text-enhancing services to aggregators. So Yahoo!, European libraries, and other conceivable aggregators are all possible customers for publishers, even if Google takes itself out of the market. Scanning isn’t free or even cheap, and publishers already have the goods, so …

lost licensing revenue & Google Print

I just got around to reading the weekend’s Washington Post Google Print editorials, pro (Mary Sue Coleman, UMich Pres) & con (Nick Taylor, Authors’ Guild). Short editorials, and I suppose the format limits their ability to go beyond rhetoric (“access to vast libraries of content” … “this is a socialist plot!”) into any actual legal or policy nuances. But I was particularly disappointed with Nick Taylor’s editorial, in a few ways. Taylor wisely doesn’t actually make any legal arguments. Instead, his editorial boils down to the complaint that Google Print is lost licensing revenue for publishers. It’s okay, that he makes that point, because that’s actually the publishers and Authors’ Guild’s real (and only) point. I just resent the rhetorical slurs that are used to pad the actual argument.

  1. Red-baiting? “It’s been tradition in this country to believe in property rights. When did we decide that socialism was the way to run the Internet?” Man. Best response: Peter Suber, at Open Access News, who said:

    Nick Taylor’s piece shows that he’s as clueless as I feared. First, he doesn’t understand what socialism is. Second and more important, he complains that the Google project will deprive him of revenue but doesn’t offer a single reason to think so.

  2. Taylor uses socialism as a slur in one breath, and in the next apparently would like to see — what? a government panel passing over each and every use of a copyrighted work?

    Google contends that the portions of books it will make available to searchers amount to “fair use,” the provision under copyright that allows limited use of protected works without seeking permission. That makes a private company, which is profiting from the access it provides, the arbiter of a legal concept it has no right to interpret.

    <shaking my head in disbelief> What? A user has no “right to interpret” fair use? Okay, but I think that government bureaucracy’s gonna be pretty large when every teacher, every forwarded email, every reviewer, every parodist, every sampler, every quoter, and so on, and so on, has to file permission slips with the “arbiters” of “fair use”.

    Once again, if Google Print goes forward, that doesn’t mean that Google Print will be the only big database, and it doesn’t mean that Google is now the arbiter of, well, anything other than its own sweat-of-the-brow compilation of data (the words used in books and the order in which they are used).1

As for the actual argument, yeah, there’s lost licensing revenue. Every use of a work, including every fair use, involves potential licensing revenue.2 That, alone, won’t win their case. But I suppose they think red-baiting and appeals to public sympathy for starving artists (not exactly a coherent set of positions) can only help.


Footnote Meanderings

1. The total number of words, the presence of particular words, and the arrangement of those words in a work are, among other things, facts about the work. So are the author, the title, chapter titles, publication date, etc. Creation of an index to a work or multiple works includes gathering facts about the works. Conceptually, it’s quite distinct from the activities the Copyright Act is aimed at: copying and distributing works are clearly aimed at competitive copying, what used to be termed “piracy”. The copy(ies) that Google makes in the course of its scanning and indexing are technical copies, like RAM copies, and that would be an unpleasant route for courts to try to follow. (Although they have in the past; see, e.g., MP3.com, 92 F.Supp.2d 349 (SDNY 2000).)

The Google Print distributions are small pieces of the text, not easily framed with all the other pieces of that text, but instead contextualized with small pieces of other texts that match the search terms. Again, this isn’t the sort of competitive distribution which leaps easily to mind when one thinks of the exclusive right to distribute a work. [Note: this is true for Google Google Print Library program for copyrighted books, not the Google Print Publishers program, or Google Print for public domain works. I’ve seen several articles, like this one, that conflate or obfuscate the different programs.]

The derivative works right is aimed at translations, movie scripts, and the like. Again, not quite the right fit. I know some people will argue that an index is a derivative work, but treating derivative works in this way skirts too close to any and all fair uses. The caselaw shows this kind of interpretation, which is why the derivative works right is the most troubling of the exclusive rights, but I’m going to steer clear of that morass of a discussion for purposes of this footnote.

Performance and display are also aimed, obviously, at specific actions. Oddly, I think performance might be the best fit for Google’s use, in some kind of wierd philosophical way. A performance enacts a work, simultaneously interpreting it and creating the possibility of interaction with the audience. Interpretative performance necessarily demands recourse to information about the work, as well as the work itself. An index is also centrally about user interactivity, in a way that mere consumption of the text work is not. An index, then, performs the work, interpreting it by recourse to information beyond the text itself (for instance, bibliographic data; retail or location data; or the meta-structures of the work’s organization, in paragraphs, sections, chapters, parts, pages) and opening it to dialog with the audience.

Ahem. Or not. I confess to some recent exposure to critical continental literary queer performative prepoststructuralist theory stuff.

2. Hell, you could sell a copy of a book with a separate shrinkwrapped license that charges a new fee for each and every individual use. (I think Adobe may already have a patent for that method of doing business, though.)

google print: google’s evilness is beside the point (Bonus Rant Included)

I’m pleased to see the Google Print issue spurring discussion of the role of corporations in controlling access to information. See, e.g., today’s post @ Gnuosphere [link from sivacracy]

Gnuosphere, Siva, and others point out that Google isn’t doing Google Print out of the goodness of its heart; the company is scanning, indexing, and providing access to information for lots and lots of money. These warnings are a helpful antidote to Google worship.

But the problem with this complaint can be seen in the gnuosphere post:

Personally, I’m not against having an institution be granted the right to create such a database. But I’m wary about handing over such privilege and control to a body that is not working for the people. Should a corporation control what could potentially become the world’s first digital library? What is the purpose of a library? Why do libraries exist? For who do libraries exist? If this project is to become a globally accessible library, should there be someone controlling your right to read?

As the database of books increases in size and therefore scientific and cultural value, is an unregulated for-profit corporation the best choice to manage and control that database?

I think not.

The impulses guiding this post are clearly pro-public access and use, and pro-library, and I wholeheartedly support that. But the issue is couched as “handing over … control” of this information to a corporation. “[G]rant[ing] the right to create such a database.” Making a “choice” of entities to “manage and control” that database.

No, no, no.

The point of people’s support for Google Print is not that we support Google, love Google, or want Google to control our access to information. The point is that Google, and any other entity who wants to do it, should be able to add value to information. Google should not be THE ONE; Google should be ONE OF MANY. Picking and choosing a single entity presupposes that the information is already controlled, and this new use, this new added value, is to be carefully metered as a scarce resource.

We should be concerned about Google Print’s contractual restrictions on holders of its scanned works. But we should not fear Google simply for being the first entrant into the market. Google turns out to be evil? Implementing DRM, gathering and exploiting private personal data, indexing our DNA, imposing restrictive licensing agreements on its source material holders? Fine, criticize the evil practices (and Google too). Some other entity turns out to be evil, and wants to restrict copyright such that only Google’s database is valid? Criticize them, too. But I want to recommend that we resist the conflation of evils. If we’re concerned that Google is going to control a big huge really valuable database, and possibly to the detriment of those who want to use the database, then the answer is, in First Amendment terms, more speech. More databases, more indexers, more more more.

Bonus: Free Rant!

And by the way, you publishers, authors, and copyright-holders. You want to cash in on this market? Why don’t you consider selling the electronic texts to the aggregators and indexers for cheaper than they can scan them in and with reasonable licensing terms? There’s your market, right there. In fact, technology has made that market available to you for MORE THAN THIRTY YEARS. Dialog, Lexis, WestLaw, and other database vendors could have been using the full text of books for a really long time. Libraries would have killed to have full-text access to books.

As it happens, ignoring obvious markets is not new to the publishing industry. Book publishers ignored the market for enriched information content for years before they began ignoring the market for searchable full-text. Libraries and indexers could have used, at any time in the 20th century, a flourishing market for bibliographic and enriched descriptive information about books. Instead, with no such market, librarians CREATED, from scratch, and at very great expense, indexes and catalogs of information about books — with virtually no assistance from publishers. All those major research databases like MedLine, Agricola, and the like? Laboriously created by individual librarians, basically indexing and cataloging research journals by hand. Compare book publishers to research journal publishers. After some time research journal publishers figured out there was a market in enhanced information content, and began figuring out how to take advantage of that market. They facilitated the indexing process by including keywords and abstracts. They began selling tables of contents and journal indexes to the literature indexes. Ultimately they began selling full-text to databases and aggregators. In fact, once they figured it out, research publishers have been incredibly successful at capturing monetary value from information that the authors mostly want to give away for free. (So successful, in fact, that academic authors are having to fight their own publishers to get that valuable research information out of the market — another very interesting topic for another time.)

Could book publishers have done something similar? Sure. But for decades, literally, book publishers ignored this opportunity. As with research journals, individual librarians created the catalogs and indexes of books, hand-examining each book, figuring out what the book is about and how to describe it, etc. Libraries organized consortia and union catalogs to share this information and reduce the expense of creating it. For most of the time that cataloging took place, book publishers weren’t much help. Only in the very last few years have book publishers even begun to scratch the surface of providing enriched content to libraries and information vendors, by providing tables of contents to library systems vendors, and dipping their toes into very limited full-text databases that are scarcely available to anyone.

So the book publishing industry quaked in its boots and sat on its ass and ignored the market for searchable full-text, focusing solely on the market for information packaged as a physical artifact. And now the industry wants to complain that Google is jumping into the market? And doing it, not by licensing the full texts from the publishers, but in the most expedient fashion possible, by scanning? Please. Cry me a river, and while you’re at it, shed a few tears for the recording industry’s failure to jump online in the mid-90s.

authors vs. copyright owners

Meghann Marco, a new author, would like to have her book indexed by Google, but her publisher says no, they’d rather sue. [link from kottke.org]

As a person who spends a large part of her day trying to get people to read her book, I asked my publisher to include me in Google Print.

They said no.

I think the majority of authors would benefit from something like Google Print.

essence of library

I like the flow of the google / library discussion: what’s the essence of library? and suspect I’ll be thinking about that one for a long time to come. (It sounds like a delightful perfume: a bit musty with an sweet undernote of decaying paper and an overnote of astringent preservative, maybe.)

Just picking out a few of the responses & adding a few more comments:

Michael Madison laid out a best-case defense for google based on google’s added-value of meta-information, and then termed the discussion: “is there an ‘essence’ of library?” And points out that we ought to focus “more what Google does than on what Google is“.

Siva Vaidhyanathan responded that Google doesn’t come close to the ‘essence of a library’.

This is the heart of the discussion that really intrigues me. Not because I truly am arguing that Google is a library, but because I suspect that the ways that information is being transmitted might start to render moot our current definitions of “library”. In my earlier post, I wasn’t really suggesting that Google take advantage of the warm feelings towards libraries; I doubt it would be a very helpful strategy, because most judges, like everyone else, would intuitively distinguish between the classical public library and Google. Rather, I was suggesting that library exceptionalism is only going to work so long as libraries are conceptually distinct.

Michael M then responded to Siva with some discussion of the essence of a library, ultimately concluding that we really have to talk about libraries in terms of information flows. And then he brings it back to Google:

Do we experience Google Print content as we experience other collections that we regard as libraries, or do we experience that content as we experience the Web — a functionally unlimited aggregation of data? Right now, the answer to that question has to rely on intuition and speculation. My money is on the second option, but in the end: who knows?

I’d like to suggest two basic functions for libraries: One is warehousing and archiving physical collections; serving effectively as a museum of information. The second function is providing information services. Storage, and access.

In the past and even today these two functions are, practically, inseparable. And each implicates a whole host of sub-functions many of which serve both masters — e.g., cataloging, which organizes the stored collections.

But these functions have been splitting and will continue to. Digitizing projects, like Google Print, will put the physical artifacts on the same plane with museum artifacts: nice if you’re a scholar and need the original, but for most people, the digitized content will suffice. [Google Print is not the only digitizing project, of course; there are plenty of others on smaller scales that have gotten less attention. I would be interested to get some examples of public-private partnerships because I suspect Google Print isn't the only one.]

As more of the information content becomes digital, the subfunctions used to service both the storage and access functions will shift. Two examples: cataloging and preservation. Electronic information needs much less in the way of cataloging; full-text searching obviates a lot of cataloging needs. (No, not all; I believe in subject headings and hierarchical thesauruses — although I’m not sure they’re ultimately scalable if we’re talking about organizing all information.) Digital media have their own preservation problems, fairly distinct from those relevant in most special collections. The central problem in preserving digital media collections is shifting formats; the central problem in preserving physical collections is preserving the original artifact.

So as these transitions within libraries move forward, the easy and obvious distinctions present today between libraries and Google Print will erode.

Now, Eric Goldman in a comment here said another of his maxims was never build a business on fair use. Google Print, of course, relies entirely on fair use (17 USC 107), so far as I can see. One way we might distinguish libraries at present is that most libraries, operating in the book-warehousing business today, rely not very much at all on fair use, and rather a lot on first sale (17 USC 109). Libraries vary with respect to the library exemptions in 108, which are used principally, so far as I know, to (a) establish reserves collections; and (b) make backups of software, videos, records, etc.

But the bedrock library provisions we rely on today, 108 and 109, won’t be enough for some collections that need to be built in the future. For instance, I don’t know what libraries are currently archiving popular digital ephemera (besides the Internet Archive). But just as libraries have begun to collect popular culture media in DVDs, CDs, comic books, and zines, so there will have to be archiving projects dedicated to archiving purely digital media, including digital media that are distributed for free via the web. I’m thinking of things like JibJab’s “This Land Is Our Land”, Mark Fiore’s shockwave commentaries, and similar such materials.

Let’s consider the Mark Fiore shockwave animated cartoons. [This is purely my example, because I love Mark Fiore; I have no idea if he has been approached by any libraries or what his response might have been.] The cartoons are distributed for free over the Internet; but they are not (so far as I know) licensed for free reproduction & distribution, and they author retains copyright. If a library wanted to begin collecting them, how would they analyze this collection & provision of access to it? 109 protects the rights of “the owner of a particular copy or phonorecord lawfully made under this title … to sell or otherwise dispose of the possession of that copy or phonorecord”. But “computer programs” are exempted. Are shockwave files “computer programs”? Maybe we have to resort, at last, to fair use. Now what do American Geophysical Union, Kelly v. Arriba, MP3.com, et al, tell us? Michael Madison talked about it, but I think it was summed up by Eric Goldman: “Don’t build a business on fair use … multi-factor tests lead to complete unpredictability.”

This is obviously not a fullbore analysis of the relevant provisions as applied to publicly distributed shockwave files, but it does make my point: digital media and new ways of distributing content are already troubling the current copyright categories that are designed around brick-and-mortar libraries and physical artifacts.

And that’s just one example looking at only one aspect of the question of collecting & providing access to Mark Fiore shockwave animations. Consider the reams of problems that digital media pose in the realm of licensing, DRM, and DMCA-type technical protection measures, notwithstanding the protections allegedly offered by 109, 108, and the 1201(d). (Is there any point in even citing to 1201(c)? I feel it’s been effectively read out of the statute the same way, and perhaps for similar reasons, the 9th Amendment to the Constitution has been politely ignored.)

Libraries qua libraries — well, libraries qua public and academic libraries, anyway — will always have recourse to Congress, and I predict they will prove as popular there in the future as they have in the past: not popular enough to sway Congress from granting very broad rights to copyright holders that end up hurting libraries, but popular enough to get some limited library-specific protections.

But most librarians, myself included, want to preserve BOTH today’s model of the library: the brick-and-mortar warehouse-and-cataloger-of-physical-media (which I do think will always be around) — AND the idea of the library: the collector and provider of information. So the question is, how, or why, do we copyfighters / librarians / information activists / legal scholars distinguish Google Print in a way that doesn’t hurt Essence of Library down the line? And why, tactically, should we? Maybe, we should focus on building a more robust fair use, fixing 109 so it works with digital media, or even adding in more 108 exemptions. Or maybe on the DMCA Library of Congress anticircumvention comment rounds that are coming up again.

Further reading on this discussion at copy this blog and copy this blog again. copyfight is following the debate and a number of people are commenting: See google print is as google print does and google print library shoulda coulda woulda. More from “real librarians” and others responding on Siva’s blog: Eileen Snyder, 8/17; Siva responding to Michael Madison, and including comments from other folks too.

I’d like to link to some good discussions on 109 (I seem to recall Derek Slater recently talked about 109 and digital music files, for instance, but can’t find his post — is there a search function I’m missing? Derek?) but will need to do some more digging … later.


As I write I follow one of those social sciences rules about mobs or group discussions or something: I make myself more firm in my opinions the longer I write. This is why it would be much better if I had time to write a long post, then sit on it for a while — my tone could be measured & even the whole way through. But I was already delayed in responding, so wanted to get some thoughts out in a hurry.

update 8/18: a few last posts on this discussion: madisonian.net 8/17; siva 8/18 and siva again 8/18;

siva also posted about an aspect of this issue which i didn’t really touch at all in this discussion, which is the trustworthiness of private actors in general and google in particular. my interest was piqued by the essence-of-library question, but this was a significant thread in comments & subtexts in various discussions. See siva 8/17; copy this blog (previously cited) linked to a post & comment discussion of the google / library contract on the library law blog; and seth finkelstein wrote about what’s in it for google.

update 9/1: the best response to it all came from the onion: Google Announces Plan To Destroy All Information It Can’t Index …

The new project, dubbed Google Purge…. The company’s new directive may explain its recent acquisition of Celera Genomics, the company that mapped the human genome, and its buildup of a vast army of laser-equipped robots. ‘Google finally has what it needs to catalog the DNA of every organism on Earth,’ said analyst Imran Kahn of J.P. Morgan Chase. ‘Of course, some people might not want their DNA indexed. Hence, the robot army. It’s crazy, it’s brilliant—typical Google.’ … ‘This announcement is a red flag,’ said Daniel Brandt, founder of Google-Watch.org. ‘I certainly don’t want to accuse of them having bad intentions. But this campaign of destruction and genocide raises some potential privacy concerns.’

related posts: interesting reading early saturday morning 8/13google & not-for-profit libraries 8/13

google & not-for-profit libraries

More on Google and Siva’s response (and my responses to Siva):

Recap: In response to publisher anxieties & thinly-veiled threats of litigation, Google is implementing an opt-out provision in its scan-copyrighted-library-books program, and delaying scans of copyrighted books until November. [google blog] This has been widely reported as Google backing down. See, e.g., “Chilled by Publishers” (BoingBoing), “Google Sells Out Users” (Copyfight).

Siva Vaidhyanathan had a different take, predicated largely (it seems to me) on the fact that Google is a for-profit corporation. For once, I disagree with Siva, and on two grounds: both with library exceptionalism in this instance and the take on American Geophysical Union. Siva:

Google did not have the right to make wholesale copies of millions of copyrighted books without permission from the copyright holders. Google’s original plan fails every possible fair use test ever tried. See, for example, American Geophysical Union v. Texaco.

If copyright is to mean anything at all, then corporations may not copy entire works that they have never purchased without permission for commercial gain.

Usually I agree (not slavishly. who said slavishly?) with everything Siva (and his minions on Sivacracy) has to say, but I have to disagree with him here on a couple of points.

First, the for-profit corporation issue. Yes, Google is a for-profit corporation, and while they try not to be evil, one could argue that they won’t be able to help it. Siva wishes that libraries would take greater advantage of fair use, and so do I — libraries are wonderful and should be able to do anything they want including lots of things they don’t do now (like, yeah, scan in everything they own). But I take issue with this form of library exceptionalism. Libraries should push fair use in the service and interests of their users, history, and humanity. But libraries are not the sole beneficiaries of fair use, nor should they be. For-profit corporations, not-for-profit corporations, heck, even tax-exempt religions — all should be able to exercise fair use broadly.

Well, Siva says Google is not a library. It’s true that Google is not the mom-and-apple-pie ALA version of a downtown library, complete with modern atrium and skylights for Mayoral gatherings. But I think we have to push on “library” for a bit. The Internet Archive is certainly a library. My home collection is certainly a library. (It even circulates, and I have remote storage, and I recently began a belated investment in DVDs.) Libraries may be private, semi-private, public; for- or not-for-profit; paper or digital. Why is Google not a library?

And tactically speaking, it just doesn’t make sense for information activists / copyfighters to start downwardly limiting various users’ sets of rights. Ultimately, this will come back to bite us: what if libraries start to look more like corporations? In fact, library exceptionalism has not served the library community well: Despite numerous statutory exemptions for libraries, librarians have still retreated into deep conservatism and fear of copyright liability. Librarians realize that the laws governing information transmission are porous, and the laws that apply to for-profit corporations will also affect not-for-profit libraries.

Second, Siva cites American Geophysical Union, 60 F.3d 913 (2d Cir. 1994), very quickly in support of his point that “Google’s original plan fails every possible fair use test ever tried. See, for example, American Geophysical Union v. Texaco.”

AGU is not the law of the land, much less every possible fair use test ever tried. While influential, AGU is the law of the 2nd Circuit. (Not the Fifth, although my brain always short-circuits me there, linking “Texaco” to “Texas/5th Circuit”.) I like to remember that fair use is a fact-based, multi-factor analysis. Paraphrasing one of my copyright professors, multi-factor tests = completely unpredictable results. Each and every case looks quite different and yes, different caselaw applies. There’s a limit to how far you can draw even an influential appellate precedent, as the p2p cases show.

Unfortunately, Siva and everyone else likes to just drop-cite AGU: It was a broad decision that, famously, stands for the idea that potential licensing revenue counts as an (apparently significant) effect on the market. That’s scary, and big, and consequently the decision weighs heavily in the set of bad anti-fair-use opinions. But over-reading it has led to significant nail-biting in the library community. I do agree with Siva that it’s important to remember that AGU took place in a for-profit environment; in fact, I’ve argued that not-for-profit libraries & archives have a lot less to worry about than they think they do from AGU. But the for-profit/not-for-profit status is not the be-all and end-all of the story. AGU demonstrates a sophisticated relationship between the various fair use factors. The potential licensing revenue was significant in large part because of the for-profit status. That means that it’s not the horror story that librarians sometimes fear, but it also means that you can’t take the fair use factors as a simplistic checklist: for-profit or non-profit? market effect (including lost licensing) or no market effect? It doesn’t work that way. The market that is considered is necessarily shaped by the environment in which the alleged infringement took place. Texaco was a for-profit corporation with the resources to do licensing. Librarians have been scared because the lost-licensing-revenue aspect looks even more insane in a public or academic library context than it did in Texaco’s internal special library, routing & private desk copy context. But that particular horror has never fully paraded itself, probably because the outcome is so insane outside of the particular circumstances of Texaco. Context is everything.

And, again thinking tactically, I would argue we ought to work to limit the reactionary conservatism this case fosters, rather than trying to puff it up even more. By drop-citing AGU in the service of anti-corporate use of information, Siva made the copyright maximalists’ case. And that’s not good for libraries or Google.


A little aside: Derek Slater disagrees with Siva on AGU, too, from a different angle. Derek points out that the Appellate Court found “undue emphasis” on commerciality in the District Court’s opinion. Derek’s point is well-taken, but I still read the commercial context as significant. Between the District Court & the Appellate Court opinions, the Supreme Court issued Campbell, which expressly reversed any presumption that for-profit uses were not fair. The Appellate Court wanted to uphold the lower court’s ruling, but had to deal with Campbell; hence the nod to Campbell. But the Appellate Court was really pointing out that Texaco’s use was still a traditional library use, even if in a for-profit environment.

We do not mean to suggest that the District Court overlooked these principles; in fact, the Court discussed them insightfully, see 802 F. Supp. at 12-13. Rather, our concern here is that the Court let the for-profit nature of Texaco’s activity weigh against Texaco without differentiating between a direct commercial use and the more indirect relation to commercial activity that occurred here. Texaco was not gaining direct or immediate commercial advantage from the photocopying at issue in this case – i.e., Texaco’s profits, revenues, and overall commercial performance were not tied to its making copies of eight Catalysis articles for Chickering. Cf. Basic Books, Inc. v. Kinko’s Graphics Corp., 758 F. Supp. 1522 (S.D.N.Y. 1991) (revenues of reprographic business stemmed directly from selling unauthorized photocopies of copyrighted books). Rather, Texaco’s photocopying served, at most, to facilitate Chickering’s research, which in turn might have led to the development of new products and technology that could have improved Texaco’s commercial performance. Texaco’s photocopying is more appropriately labeled an “intermediate use.” See Sega Enterprises, 977 F.2d at 1522-23 (labeling secondary use “intermediate” and finding first factor in favor of for-profit company, even though ultimate purpose of copying was to develop competing commercial product, because immediate purpose of copying computer code was to study idea contained within computer program).

[38] We do not consider Texaco’s status as a for-profit company irrelevant to the fair use analysis.

The Appellate Court then goes on to talk about the value to the user of the allegedly infringing activity. This discussion is critical, because it sets up the fourth factor discussion about the lost revenues.

As a pragmatic reading, I see this tweaking of analysis as a way for the Appellate Court to deal with Campbell. In its effect, the case has been bad; it has, as I’ve stated, been an oft-cited case when librarians are playing conservative. In its reasoning, the case is also bad: the potential-lost-revenue argument is virtually boundless. But my sense is that the potential-lost-revenue argument, although terrible, has not yet fulfilled its potential — maybe because it is so boundless.

In short, I think American Geophysical Union is over-rated, and the commercial context is critical.

… a bit more coming later hopefully

update 8/14: The massive amounts of media coverage given to the Google withdrawal confirm my opinion that tactically this sucks, for libraries, authors, readers and anybody else who actually uses copyrights. So much of this coverage is described as a copyright flap, Google’s copyright misstep, etc. The bounds of fair use have just shrunk in the court of public opinion, and that’s a much longer-lasting loss than American Geophysical Union, Napster or any other case.

update 8/15: See, this is why I like Siva so well: I wish I had time today to respond to all of the good comments zooming around the blogosphere and e-mail. …. They are all helping me formulate my arguments better. I can’t help but compare favorably this response to certain other thread-baiting that’s happening on a nearby (non-IP-related) blog. And I know Siva will eventually come up with some very cogent ideas on this issue that will make me go hmm.

Related posts: interesting reading early saturday morning 8/13essence of library, 8/17

interesting reading, early saturday morning

Up early for my spouse who caught a red-eye. Now she’s resting peacefully and I of course can’t get back to sleep. But that’s okay, because there’s the Internet!

  • Positive outcomes of BlogHer: Mary Hodder at Napsterization is establishing a Speakers’ Wiki.

  • In response to publisher anxieties & thinly-veiled threats of litigation, Google is implementing an opt-out provision in its scan-copyrighted-library-books program, and delaying scans of copyrighted books until November. [google blog] This has been widely reported as Google backing down. See, e.g., “Chilled by Publishers” (BoingBoing), “Google Sells Out Users” (Copyfight). I agree, sell-out, chill, yes, yes, but am taking a moment to appreciate the sweetness of the opt-out option as default.

    Siva Vaidhyanathan had a different take, predicated largely (it seems to me) on the fact that Google is a for-profit corporation. For once, I disagree with Siva, and on two grounds: both with library exceptionalism in this instance and the take on American Geophysical Union.

  • Ed Felten on Freedom to Tinker [8/9] talked about the DRM in Microsoft’s Longhorn-cum-Vista. Copyfight (8/9) summed it up and added this pithy observation: “[T]his isn’t about stopping mass copyright infringement or pleasing Hollywood. It’s about keeping “consumers” locked in and people who develop potentially competing products locked out.” See also Derek Slater at EFF Deeplinks (8/9).

  • On Balkinization, Brian Tamanaha ponders intelligent design, reminding us that the whole kerfluffle is not about debates between religion and science, but about debates between a few modern religious leaders who are picking issues:

    Darwin’s 1859 publication of The Origin of Species incited a wicked backlash from religious quarters in the United States, pitting science directly against religion. But within three decades an accommodation had been achieved, as Richard Hofstadter described in Social Darwinism in American Thought (1944):

    Science, [Le Conte] urged, should be looked upon not as the foe of religion, but rather as a complementary study of the ways in which the First Cause operated in the natural world. Whatever science might learn, the existence of God as First Cause could always be assumed.

    This raises the question: why has a sensible way to reconcile faith and science that has worked for so long become unacceptable to many religious leaders in this country? This is not like the other ongoing battles over religion in the public sphere and the separation between state and church (school prayer, Decalogue displays, funding for parochial schools), all of which raise debatable issues of public and private values.

    Putting it this way helps keep the focus on the small set of religious leaders who are sowing all this unnecessary discord.

    I feel I must document the provenance of this observation: I’m quoting Brian Tamanaha who’s quoting Richard Hofstadter who’s citing Joseph Le Conte who “followed” Asa Gray. I’m just tickled by the lengthy chain, but the observation stands on its own regardless of sources.

  • fafblog has been brilliant recently: two on intelligent design: creation science, creation technology! [fafnir 8/10] and overwhelming scientific proof [giblets 8/2]. Then more on torture: claustrophobic techniques [medium lobster 8/4] … in the kingdom of the one-eyed man, the best wars are blind [medium lobster, 7/28]. Segueing nicely from torture, the democrats: the great divorce [fafnir 8/3] . Last but not least, response to some recent efforts by the American Family Assn to provide gay checklists for childrearing: how to tell how gay your gay son is [giblets 8/9]. How despicable is this fear-mongering checklist in the light of this fearful Christian response? [See queerday 7/18, Tampa Bay Online 7/13] Too much anger. That’s why I read fafblog. I could just do a blog indexing fafblog. And still keep the title, ‘derivative work’.

  • A wretched decision out of the NLRB, restricting employees’ off-duty fraternization. Guardsmark, LLC, 334 NLRB No. 97 (2005) (decision in pdf); more info at american rights at work; linked from tom tomorrow. A bit more from me on this case.

Of course, two hours later, the spouse is still sleeping like a baby, and now “Adelaide’s Lament” is going through my head. It’s my own fault for putting iTunes on random shuffle through my entire 80+G music library last week, but still, I last heard that song over a week ago. Probably at some point this morning I had a low-level meditation on my own minor cold and it triggered a “Guys & Dolls” flashback. Unlike LSD, perhaps “Guys & Dolls” really does hang out in your fat cells waiting to be re-triggered.