Message 00357: Re: a couple of questions
On Oct 19, 2008, at 5:41 PM, Aaron Swartz wrote:
Can you tell me more about how access to the pacer data was
secured? You
said a cookie ... clear-text password transported in the cookie or
somehow
encrypted? How did one get to that point? Was https involved at
any point?
The librarian logs the computer in, then the person at the library
asks the browser for the cookie. The cookie doesn't seem to contain a
password; it looks like a hash.
Do they log you in specifically, or do they log in at the beginning of
the day and there it sits?
If they are logging you in specifically, how did that interaction go?
Second, do you believe that for the districts you have, you are
complete?
Or are these incomplete snapshots? How would one estimate what
portion of
a district we have in that case?
For ilcd, ded, almd, mad, cand, dcd, ilnd, casd, and nysd I started at
the last case and worked down, so you should be able to figure out
what percentage you have with:
$ ls ilcd | sort -n | head -n 1
27
$ ls ilcd | sort -rn | head -n 1
3000
(3000-27)/3000 == .991
OK ... are you sticking with our .25 of the full thing estimate? I'll
try to do a more sophisticated look at the percentages. I'd like to
be fairly precise.
New question ... do you still have a copy of the data? Does anybody
else?
For vaed, mdd, njd, ord, prd, azd, cod, ctd, hid, pawd, mnd, ohsd,
txd, paed, akd, pamd, laed, flsd, I started at 1 and worked up, so
you'd need to find the last case number to estimate. (I found last
case numbers by doing binary search on the pacer server.)
The rest are all done.
I'd prefer to be anonymous.
You got it. That actually makes my life a bit easier. (But, I always
am very sensitive to attribution and credit ... we're all scientists
and it all comes down to citation).
Thanks for everything.
Well, this was not on the schedule, but we're definitely making a fine
cheese out of soured milk. The important part for me is to see how we
can turn the quarter pacer into the whole deal. In my letter to Judge
Rosenthal, I'm actually going to point out that if they made the rest
of their database available, I'd be happy to scrub it for them.
My friend in the librarian association talked to government printing
office who said that AO of the Courts informed them that a) librarians
weren't to blame and b) they were "conducting an investigation." So,
let's be very careful still. You being anonymous means I can control
the message even more, and controlling message is going to be
absolutely crucial in pulling this off (particularly if we are going
to use the quarter pacer as a gateway to bigger things ... they're not
going to want to be made to look stupid, so it is important to keep a
straight face and explain why this all came out so great for everybody).
Did you and Vixie ever close the loop? For MIT, do you know Jeff
Schiller who runs the campus network? For Harvard, do you know Scott
Bradner who works for the university? I'm thinking one of them might
be able to help you solve your 3480 problem ... Media Lab may not have
a drive, but I bet academic computing has had them at various points.
Carl