[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]


Message 00670: Re: is it really that simple?



That's the basic idea. The cookie will last for a week. The crawling involves some annoying parsing (including generating POST requests) but once you have it figured out it's not terribly complex.

You start with a given case number, go to a standard URL to grab the docket, parse the docket to get the document sub-ages, request each of those, parse each of those to see if you need to do get another sub- page, and ultimately parse out the PDF link (there are different standards used in different versions of PACER), and then request the PDF.

On Apr 1, 2009, at 6:21 PM, Carl Malamud wrote:

Hi -

Is a pacer crawl as simple as download one file, save the cookie, then hand that cookie back with every subsequent request you make?

e.g., wget --load-cookies=file.txt --output-document=out.html http.....

Carl




--
Stephen Schultze
Fellow, Berkman Center for Internet and Society
xxxxxxx@cyber.law.harvard.edu