Message 00357: Re: a couple of questions

On Oct 19, 2008, at 5:41 PM, Aaron Swartz wrote:

Can you tell me more about how access to the pacer data was secured? You said a cookie ... clear-text password transported in the cookie or somehow encrypted? How did one get to that point? Was https involved at any point?

The librarian logs the computer in, then the person at the library
asks the browser for the cookie. The cookie doesn't seem to contain a
password; it looks like a hash.

Do they log you in specifically, or do they log in at the beginning of the day and there it sits?

If they are logging you in specifically, how did that interaction go?

Second, do you believe that for the districts you have, you are complete? Or are these incomplete snapshots? How would one estimate what portion of
a district we have in that case?

For ilcd, ded, almd, mad, cand, dcd, ilnd, casd, and nysd I started at
the last case and worked down, so you should be able to figure out
what percentage you have with:

$ ls ilcd | sort -n | head -n 1
$ ls ilcd | sort -rn | head -n 1

(3000-27)/3000 == .991

OK ... are you sticking with our .25 of the full thing estimate? I'll try to do a more sophisticated look at the percentages. I'd like to be fairly precise.

New question ... do you still have a copy of the data? Does anybody else?

For vaed, mdd, njd, ord, prd, azd, cod, ctd, hid, pawd, mnd, ohsd,
txd, paed, akd, pamd, laed, flsd, I started at 1 and worked up, so
you'd need to find the last case number to estimate. (I found last
case numbers by doing binary search on the pacer server.)

The rest are all done.

I'd prefer to be anonymous.

You got it. That actually makes my life a bit easier. (But, I always am very sensitive to attribution and credit ... we're all scientists and it all comes down to citation).

Thanks for everything.

Well, this was not on the schedule, but we're definitely making a fine cheese out of soured milk. The important part for me is to see how we can turn the quarter pacer into the whole deal. In my letter to Judge Rosenthal, I'm actually going to point out that if they made the rest of their database available, I'd be happy to scrub it for them.

My friend in the librarian association talked to government printing office who said that AO of the Courts informed them that a) librarians weren't to blame and b) they were "conducting an investigation." So, let's be very careful still. You being anonymous means I can control the message even more, and controlling message is going to be absolutely crucial in pulling this off (particularly if we are going to use the quarter pacer as a gateway to bigger things ... they're not going to want to be made to look stupid, so it is important to keep a straight face and explain why this all came out so great for everybody).

Did you and Vixie ever close the loop? For MIT, do you know Jeff Schiller who runs the campus network? For Harvard, do you know Scott Bradner who works for the university? I'm thinking one of them might be able to help you solve your 3480 problem ... Media Lab may not have a drive, but I bet academic computing has had them at various points.