Message 00428: Re: dotgov stuff
On Nov 20, 2008, at 3:20 PM, Aaron Swartz wrote:
Built an index of the dotgov crawl (it's huge) and working on link
checking now. I was thinking that as an intermediate format I should
just use:
URL
type param param param
so, e.g. for the link checking:
http://www.costello.house.gov/art/photos/2008-art-competition/ARatsNestofC
urrentIssues2.jpg
link http://candicemiller.house.gov/images/print/print_left.jpg 404
and then I'll just do a count of successful links for percentages:
link 82829373 200
then we can write other tools to split these up by domain and move
them around and stuff.
that sounds like a plan.
how big is the dotgov crawl?
any news on transition? looks like the blair-julius-sonal axis is the
place to be right now
interesting operation. julius and crew seem focused on national tech
policy things, so fcc and whatever the office of the cto becomes.
Slaby went radio silent on me, so I'm not sure what he and the ops
folks are up to.
you can invoice me for $12,500 if you want.
Carl