Message 00423: Re: .gov crawl
i did an fyi with ellen miller and her assembled troops (I *really*
hate these interminable conference calls!), and sent an fyi to
brewster with a copy of the statement of work. he hasn't answered, but
I'm sure he'll be fine.
that said, I don't want this to become a tribal thing right away ...
let's you and me get the core aspects of the architecture right before
this goes out the door to become a group contribution thing. Becky
will be very helpful on the reporting side of this as well. the trick
is going to be getting the core crawlers working right, then anybody
can add any metric that they want. but, doing that initial crawl and
digest will be hard and I'd like to keep the participants minimal.
you ok with that?
there is one other crawl we have access to ... I'm on the board of
common crawl, which is done by Gil Elbaz, one of my donors. He's the
guy that did Ad Sense and sold it to Google (I think he was the
largest outside shareholder when they went public). They don't have
a .gov focus yet, but could. And, there are a variety of other
operations like wikia and metaweb ... I'd like our audit stuff to stay
agnostic and potentially work on any of those.
On Nov 11, 2008, at 4:23 PM, Aaron Swartz wrote:
here's a statement of work ... let me know if this works for you.
Looks good to me. I'm trying to get access to the corpus now and will
start with link checking.