Message 00426: archive crawl
- To: "Carl Malamud" <xxxx@media.org>
- Subject: archive crawl
- From: "Aaron Swartz" <xx@aaronsw.com>
- Date: Fri, 14 Nov 2008 20:41:52 -0500
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender :to:subject:mime-version:content-type:content-transfer-encoding :content-disposition:x-google-sender-auth; bh=SpUNuiLondSKMK12ZtJfFEZ7aS9oVbqLENEJVoEAAPs=; b=ErBrshliFibpNZQVPtUFYLlqmpGj3gfYjhJ/PhLDpYPQAO1bQXXRg/Gw7jkWA5MKhU Tx8m14tzbW7v1uQHUZli1DhboK+hOyggAuDPG96XecRJ6/yDVzPYaTZEXMjWOcG4uV4N R4ziyCnfXdPrPzPv97hOPWNAc4MXAgnNECNXw=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:mime-version:content-type :content-transfer-encoding:content-disposition:x-google-sender-auth; b=OvXqrHhHF/+wGVSPcny2m7GwCB4DWiDDpXNaQhwRdCptlGcsbxbvR5xX0fc98Ob1v9 wlYasv4WE3YrAF71mSn9GEE6bT+vST1aqE8Ot5eE2MJgIDyNpp0yzIcEiCZWPHlMNJZ/ Q338WP3Ygs91ZqnBRnAVzKkaha/x1sTj68dyA=
- Sender: xxxxxxx@gmail.com
OK, got access to the archive crawl (save one last bit they're
scraping off a failed drive--sigh). they already break out the links
by images, css, js, and just normal links. so doing the linkchecking
at least should be pretty easy...