Message 00214: Re: pacer program
I've got a quickie for Tim. How do you get the info for your case
listings, and are they definitely exhaustive back to 2004? Do you
have a special XML feed from the PACER folks or do you just somehow
parse the site? If the answer is the former, there a special
agreement involved?
And a quick follow-up for Carl: I believe we already have the "magic
pacer header" turned on. If it appears that this is not the case,
please let us know.
On Sep 6, 2008, at 7:03 PM, Carl Malamud wrote:
Tim Stanley (who drains pacer at Justia on a proxy basis) and John
Joergensen (librarian at Rutgers who is organizing the court
reporters on the east coast), please meet two very talented new
recruits, certified MIT rocket scientists. ;)
Aaron and Stephen have decided to adopt a local district court and
then take advantage of the local pacer "public" trial to
systematically grab all opinions for their jurisdiction and then
put them on bulk.resource.org. I've given Aaron an account. Once
we have an archive of their data, we'll scrub it for SSNs, then
figure out how to inform the chief judge that we have his data
available if he wants it.
Tim, can you review sample docs that they harvest? John, I wanted
to make you aware we have a couple ringers helping kick this off.
Both these guys are highly clueful. I've asked them to a) turn on
the magic pacer header on the top of the pdf and b) embed the
information we need for a unique id in the metadata for the pdf file.
Carl
On Sep 6, 2008, at 3:57 PM, Aaron Swartz wrote:
Can you introduce us to Tim Stanley? Schultze is trying to figure out
which cases to focus on with his Thumb Drive Corps members and is
looking at the list of case names in Justia and wondering how it was
generated. Perhaps Tim can also review some sample output and make
sure we're getting the right stuff.