Message 00284: Re: gpo archive
On Sep 19, 2008, at 7:23 PM, Aaron Swartz wrote:
http://bulk.resource.org/gpo.gov/bills/108/h5352ih.txt
Those documents are actually lame HTML, not text. (They're wrapped
in
a <pre> tag and &, <, and > are all quoted.)
we just shelve what they ship. :)
yeah, but you should serve them with content-type: text/html so they
render correctly.
er. I suppose I can do that if I knew exactly which directories
should get that .htaccess mod to the normal handling for txt and which
ones really are ascii txt.
in our next stage of evolution, I hope to have people spending more
time making the data better, but right now the focus is much more on
proving the point and honing in. seriously ... you don't want my
mirror of the waisgate system, you want the raw data from gpo.
If someone wants to look at the txt files in gpo.gov/... and say which
ones are really html, happy to through the right mime types back.
Otherwise, we assume it is a client issue.
Carl