Message 00670: Re: is it really that simple?
That's the basic idea. The cookie will last for a week. The crawling
involves some annoying parsing (including generating POST requests)
but once you have it figured out it's not terribly complex.
You start with a given case number, go to a standard URL to grab the
docket, parse the docket to get the document sub-ages, request each of
those, parse each of those to see if you need to do get another sub-
page, and ultimately parse out the PDF link (there are different
standards used in different versions of PACER), and then request the
PDF.
On Apr 1, 2009, at 6:21 PM, Carl Malamud wrote:
Hi -
Is a pacer crawl as simple as download one file, save the cookie,
then hand that cookie back with every subsequent request you make?
e.g., wget --load-cookies=file.txt --output-document=out.html
http.....
Carl
--
Stephen Schultze
Fellow, Berkman Center for Internet and Society
xxxxxxx@cyber.law.harvard.edu