[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Message 00407: Fwd: audit

hi -

you didn't answer this ... if you'd rather not take money (too complicated, don't want the obligation, whatever), I won't be offended ... and, happy to talk about the project. wasn't trying to assign you work. :)


p.s. acrobat 9 does pattern-based redaction over multiple files. i'm upgrading in the morning ... this will *really* help.

Begin forwarded message:

From: Carl Malamud <xxxxxxx@media.org>
Date: November 7, 2008 4:50:39 PM PST
To: Aaron Swartz <me@aaronsw.com>
Subject: Re: audit

On Nov 7, 2008, at 4:42 PM, Aaron Swartz wrote:

what is your corporate form these days? are you incorporated? filed for c4
or c3?  are you under official "fiscal sponsorship" of Sunlight?

we incorporated in MA and filed for c4. no fiscal sponsorship.

ok. I can work with that. I'm going to make a $25k contribution to Watchdog. my thinking is you write the general auditing software/ scripts and I apply them to .gov (these scripts could easily be used on state governments as well). please don't start yet ... I want a few days to put together some thoughts on this. I ran some dry runs of this audit concept a couple of years ago and there are some things I learned I want to transmit before you get started. I also want to make sure we stay very clearly on the above-board side of this thing (e.g., we'll run nmap to look for open ports and fingerprint os's, but we're *not* going to crack their password files. :)).

Anyway, let's talk Monday.

BTW, good progress on my irs project. I've got the 12-dvd loader up and running (finder screendump attached) and today I got the program working that scarfs a dozen dvds, reads an index file to figure out which tiff's go with which return (they are one page per tiff in semi-random order), use tiffcp to concatenate them together, use tiff2pdf to create a pdf,. use exiftool to stamp the metadata into the pdf header. I still need to automate running them through OCR, looking for SSNs (there are a bunch), and doing a few other housekeeping tasks. This is definitely a big project, but this is certainly progress.

For the CFR, I'm now able to go from their broken sgml to well- formed xml. Now, I need to figure out how to lay it out as xhtml, convert the eps files to png and pdf, and automate the laying it all into svn so you can do diffs.


PNG image