[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Message 00371: Re: a couple of questions

On Oct 20, 2008, at 5:27 PM, Aaron Swartz wrote:

 the data's a real mess -

That's ultimately my biggest defense in all of this. But, I'm still *really* nervous when I hear the Superintendent of Documents talk about "security breach" and "investigation."

In terms of the data being a mess, I'm thinking of doing a disallow on google in my robots.txt on this or maybe even just releasing really big tarballs ... I'm positive should be public, but I'm not necessarily convinced this stuff deserves to go live on random google searches until more volunteers have done more scrubbing. There is some really bad crap I caught, which means there is a whole bunch I didn't catch.

It really sucks of course that all the commercial guys don't care and have all this live, but our biggest defense (again) is that we actually care. My letter to Rosenthal will point out very clearly that her computer people and the commercial boys never told her any of this stuff nor did they redact the data ... by making the data public, ironically, we protect privacy much better.

What do you think if the initial release consists of:

1. scribd of all my letters to the judicial conference (with all the private information like the lists of hits and even the case numbers with hits redacted, of course)
2. a bunch of 50gbyte or so tarballs.

That lets us get the data out but not have to be in the end user business.

We should realize that if we put even just big tarballs, there will be some jerks that take all the data and slap google adwords on it and will not care if, for example, some really bad document is found and needs to be redacted.