[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Message 00056: Re: govdocs/google

> Have you found the google book id's embedded in the pdf's by
> any chance?  It is such a pain to grab them out of the url's.

Yeah, there's apparently a line like:

<< /Type /Annot /Subtype /Link /C [0 0 1] /Border [0 0 1]   /Rect [022
227 167 238]   /H /I   /A << /S /URI /URI
(http://books.google.com/books?id=2Sw6AAAAMAAJ&ie=ISO-8859-1) >> >>

> By 530k ... 530,000 seperate titles?  Is there an easy way to
> find those?  What are you doing for metadata?

Yeah, 530K books. We extract the metadata from the HTML page that goes
with them and archive both. Here's an example:

We haven't announced them yet, but when we do they'll be in the search engine.