[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Message 00368: Re: a couple of questions

On Oct 20, 2008, at 4:31 PM, Aaron Swartz wrote:

Hmmm .... so here is my summary table so far. The results don't necessarily look right here for some of the districts. Can you look at these and see what you think? The numbers in the columns are the results of your script
... the last column is total number of gigabytes.

Your numbers look right to me. It's important to note that the % here
is the % of cases, whereas my 25% number is a % of data. It seems the
majority of cases in PACER are old cases for which they've only
digitized the dockets and not the documents, so they make up a large
percentage of the number of cases, but very little of the actual data.

So, for example, mad has 100,000 dockets but Justia reports that only
14K of them have real data

oy ... ok. i give up. :) Arizona has 400,000 valid docket numbers going up. I wanted to come up with some indicator of how much of each of the districts we have already and how much remains to be done.

The only way to do this is to have all the data, then count.

That's OK, I'll have some interesting metrics like number of privacy violations as a percentage of total number of docs by circuit, a graph of violations by date filed (I have the header for each of my audited files), and a few other interesting ones. I might even be able to get a top offender list using the initials in the case number.