[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]


Message 00344: Re: irs



you'd probably want to mturk the irs stuff as well as well as court case metadata. the alternative is a bunch of wikis, but that is usually easier when things are more solid.

one of the more interesting projects going on is Tim Stanley's lawyer directory ... he thinks he can get the 1m lawyers in the country to mturk for him in return for reputation points. He has a very impressive little thing put together where he provides an attorney directory, they can blog and fill out their profiles, give money to folks, etc... he's got berkman and cornell as part of this so that attorneys can help, e.g., make cornell's metadata better.

If you are in the bay area, you should go meet Tim. you also owe paul vixie a lunch, who said he'd like to meet you.

Carl

On Oct 6, 2008, at 3:43 PM, Aaron Swartz wrote:

heh, poor guy. (dunno why he couldn't just send you a drive, tho)

glad to see the stuff showing up in Google; wondering how
machine-parseable the OCRed PDFs will be for the 501c3s. It looks like
Guidestar has people do data-entry by hand.

I'm currently trying to get BK to let me scan the personal financial
disclosures for members of congress as part of the kahle-omidyar
govdocs project; if we get those the plan is to mturk them.

On Mon, Oct 6, 2008 at 6:36 PM, Carl Malamud <xxxxxxx@media.org> wrote:
irs says they sent me my 6 tbytes of data on dvds today. took them 3 months to fill the order ... evidently there is some dude in utah who does all of
these by himself and I screwed up his whole summer.

guess I better figure out how to do a dvd jukebox or this is going to get old really quickly ... 1500 dvds to copy and run through ocr. but, this is
starting to work.  My 527 stuff is starting to show up in google:

http://www.google.com/search?q=%22political+organization%22++site%3Abulk.resource.org

Carl