[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]


Message 00423: Re: .gov crawl



i did an fyi with ellen miller and her assembled troops (I *really* hate these interminable conference calls!), and sent an fyi to brewster with a copy of the statement of work. he hasn't answered, but I'm sure he'll be fine.

that said, I don't want this to become a tribal thing right away ... let's you and me get the core aspects of the architecture right before this goes out the door to become a group contribution thing. Becky will be very helpful on the reporting side of this as well. the trick is going to be getting the core crawlers working right, then anybody can add any metric that they want. but, doing that initial crawl and digest will be hard and I'd like to keep the participants minimal. you ok with that?

there is one other crawl we have access to ... I'm on the board of common crawl, which is done by Gil Elbaz, one of my donors. He's the guy that did Ad Sense and sold it to Google (I think he was the largest outside shareholder when they went public). They don't have a .gov focus yet, but could. And, there are a variety of other operations like wikia and metaweb ... I'd like our audit stuff to stay agnostic and potentially work on any of those.

On Nov 11, 2008, at 4:23 PM, Aaron Swartz wrote:

here's a statement of work ... let me know if this works for you.

Looks good to me. I'm trying to get access to the corpus now and will
start with link checking.