Message 00427: dotgov stuff
- To: "Carl Malamud" <xxxx@media.org>
- Subject: dotgov stuff
- From: "Aaron Swartz" <xx@aaronsw.com>
- Date: Thu, 20 Nov 2008 18:20:00 -0500
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender :to:subject:mime-version:content-type:content-transfer-encoding :content-disposition:x-google-sender-auth; bh=NCVBPwrlDcjbuiHL5Ghz+2HiE+wTxR0qhWr0JEPIc28=; b=QonBGykvDH7OqUU4/+pU7aWaTMzRHuDbU9BAoKSQum1jsTkp9PCAkFqmTrImGCGUkL Pcd41WlBaw87t79u24spRQ2snif2w/uJNJzTBzqlfB3YCE38ASxOypUuOVzvCt8G9VI7 jXi0MjIXKWmpLVNS9je2a8iavkUMKotqqw1oY=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:mime-version:content-type :content-transfer-encoding:content-disposition:x-google-sender-auth; b=nm7geQzvJSuoRba73mHuHmTcjYxLRs8IT9IYVMklSy9i5CEVGlgL4DepWIbdoJfoRX vakaXZ6GxGOPsXpVDlaw4B4+fCM3dps+mCmL506NWxJ21NvZD0Ea03p1+my3w3Q9wlOg u2KJk7wy6ysets2kRS6GtdsD/YvknsCkOB3X8=
- Sender: xxxxxxx@gmail.com
Built an index of the dotgov crawl (it's huge) and working on link
checking now. I was thinking that as an intermediate format I should
just use:
URL
type param param param
so, e.g. for the link checking:
http://www.costello.house.gov/art/photos/2008-art-competition/ARatsNestofC
urrentIssues2.jpg
link http://candicemiller.house.gov/images/print/print_left.jpg 404
and then I'll just do a count of successful links for percentages:
link 82829373 200
then we can write other tools to split these up by domain and move
them around and stuff.
any news on transition? looks like the blair-julius-sonal axis is the
place to be right now