[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Message 00206: Re: pacer program

Hi -

This looks great.

On http://watchdog.net/static/.tmp/vtd/15151/docket.html you need to adjust the URLs to be relative.

You have no metadata in the PDF docs. At the very least, we need to stamp in the following pieces of information:

1. the url of the doc you got (e.g., what is in your docket.html file)
2. the court: district court for the district of vermont
3. the office, which is on your docket.html file
4. the case number
5. the docket number (which is embedded inside of the case number)
5. the document number
6. the fact that it is public domain

(Tim, please chime in if I've forgot anything.)

Do you guys have/use exiftool?


I believe we want to do everything in the XMP headers. Aaron, you might be able to help me get this more precise. A couple things I think we need to set:

xmp:rights False
xmp:license [url of creative commons public domain license]

xmp:contributor (name of downloader?)
xmp:date (date on the document?)
xmp:publisher (name of the court?)

Where I get lost is where and how to put the identifying information. One project I work (archimedespalimpsest.org) shoves it all in the description field as a bunch of name value pairs. There is a proposal Tom Bruce has advanced (see the open case list for details), but I could never figure out from his spec where to shoehorn in things such as the name of the office, or the case number.

It would be very nice if we all came up with a standard list of what gets stamped where. We can write that up as a precise guide for others to follow and that would be very useful.

Note that none of this should slow you down from harvesting ... if you keep everything collected as sets in a docket, we can go back and do that part later as long as you keep a record of the urls you were at.


On Sep 6, 2008, at 4:12 PM, Aaron Swartz wrote:

Hello! I've put up a sample case (PACER ID 15151 at the Vermont
District Court) at this temporary URL:


Can you all look it over and see if we're missing anything or if
there's something we can do better? Thanks.