[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]


Message 00428: Re: dotgov stuff




On Nov 20, 2008, at 3:20 PM, Aaron Swartz wrote:

Built an index of the dotgov crawl (it's huge) and working on link
checking now. I was thinking that as an intermediate format I should
just use:

URL
   type param param param

so, e.g. for the link checking:

http://www.costello.house.gov/art/photos/2008-art-competition/ARatsNestofC
urrentIssues2.jpg
   link http://candicemiller.house.gov/images/print/print_left.jpg 404

and then I'll just do a count of successful links for percentages:
   link 82829373 200

then we can write other tools to split these up by domain and move
them around and stuff.

that sounds like a plan.

how big is the dotgov crawl?

any news on transition? looks like the blair-julius-sonal axis is the
place to be right now

interesting operation. julius and crew seem focused on national tech policy things, so fcc and whatever the office of the cto becomes. Slaby went radio silent on me, so I'm not sure what he and the ops folks are up to.

you can invoice me for $12,500 if you want.

Carl