Message 00368: Re: a couple of questions
On Oct 20, 2008, at 4:31 PM, Aaron Swartz wrote:
Hmmm .... so here is my summary table so far. The results don't
necessarily
look right here for some of the districts. Can you look at these
and see
what you think? The numbers in the columns are the results of your
script
... the last column is total number of gigabytes.
Your numbers look right to me. It's important to note that the % here
is the % of cases, whereas my 25% number is a % of data. It seems the
majority of cases in PACER are old cases for which they've only
digitized the dockets and not the documents, so they make up a large
percentage of the number of cases, but very little of the actual data.
So, for example, mad has 100,000 dockets but Justia reports that only
14K of them have real data
(http://dockets.justia.com/browse/state-massachusetts/court-madce/).
oy ... ok. i give up. :) Arizona has 400,000 valid docket numbers
going up. I wanted to come up with some indicator of how much of each
of the districts we have already and how much remains to be done.
The only way to do this is to have all the data, then count.
That's OK, I'll have some interesting metrics like number of privacy
violations as a percentage of total number of docs by circuit, a graph
of violations by date filed (I have the header for each of my audited
files), and a few other interesting ones. I might even be able to get
a top offender list using the initials in the case number.
Carl