[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Message 00736: Re: why no eo?

yeah, privacy issues. several hundred thousand ssn's. been working the issue for quite a while, not making a huge amount of progress. the world needs a public domain redaction toolkit, trying to get google to make tesseract do that.

(on the other hand, did get gpo to scrub all evidence of ssn's for the congressional record.)

On Aug 13, 2009, at 8:43 PM, Aaron Swartz wrote:

Why does the robots.txt ban access to the 990 PDF scans?

Disallow: /irs.gov/eo/
Disallow: /irs.gov/eo2/

Was about to load the metadata into watchdog and link to them. Should
I not do that for some reason?