[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Message 00427: dotgov stuff

Built an index of the dotgov crawl (it's huge) and working on link
checking now. I was thinking that as an intermediate format I should
just use:

    type param param param

so, e.g. for the link checking:

    link http://candicemiller.house.gov/images/print/print_left.jpg 404

and then I'll just do a count of successful links for percentages:
    link 82829373 200

then we can write other tools to split these up by domain and move
them around and stuff.

any news on transition? looks like the blair-julius-sonal axis is the
place to be right now