Subject: Re: govdocs/google

Message 00056: Re: govdocs/google

To: "Carl Malamud" <xxxx@media.org>
Subject: Re: govdocs/google
From: "Aaron Swartz" <xx@aaronsw.com>
Date: Mon, 31 Dec 2007 13:24:52 -0800
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; bh=H7FoHU8A3YeC6BiVmdkQVkTedv7dR4cT2RQ+oIf6SV8=; b=cZA/VQjXhdFBUYVbilPox/7Jl7/WnfsxZ5EqYKG5xhOK655UMmHVwHdMX33WqeMEYYzak/z+Mxcfdo7L3d9PIJoj3iohmD4ooHD++Xt+wGAJqAOTGHhlGK0oOUL4W/7DXvceDsiI3HrlHhkroMF5WksQSCIoBDYTds6zEZTHGWA=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=j/sefS6rfOXxY6K+3/0r5xpE6yfxPl0kmQjaGFHvRZanbia8HhExPVPpOEN1O15XADLIKhdkcKPEQOH6tgDowieNQjftEK23x4t7zwK+NEzzLkl5GAQPiLQefLHokSXJI9VdC/B1nFUvPcbVrN493GuLLzcQ6e0Tvp4U/UO1NWY=
In-reply-to: <200712312121.lBVLLHYb029125@bulk.resource.org>
References: <dc21c7860712311305j1f57bbedg1ffc4ab758a3fe24@mail.gmail.com> <200712312121.lBVLLHYb029125@bulk.resource.org>
Sender: xxxxxxx@gmail.com

> Have you found the google book id's embedded in the pdf's by
> any chance?  It is such a pain to grab them out of the url's.

Yeah, there's apparently a line like:

<< /Type /Annot /Subtype /Link /C [0 0 1] /Border [0 0 1]   /Rect [022
227 167 238]   /H /I   /A << /S /URI /URI
(http://books.google.com/books?id=2Sw6AAAAMAAJ&ie=ISO-8859-1) >> >>

> By 530k ... 530,000 seperate titles?  Is there an easy way to
> find those?  What are you doing for metadata?

Yeah, 530K books. We extract the metadata from the HTML page that goes
with them and archive both. Here's an example:
http://www.archive.org/details/reportscasesarg255courgoog/

We haven't announced them yet, but when we do they'll be in the search engine.

Follow-Ups:
- Re: govdocs/google
  - From: Unknown

References:
- Re: govdocs/google
  - From: "Aaron Swartz" <xx@aaronsw.com>

Prev by Date: Re: govdocs/google
Next by Date: Re: govdocs/google
Previous by thread: Re: govdocs/google
Next by thread: Re: govdocs/google
Index(es):
- Date
- Thread