On Sep 19, 2008, at 7:23 PM, Aaron Swartz wrote:
http://bulk.resource.org/gpo.gov/bills/108/h5352ih.txt
Those documents are actually lame HTML, not text. (They're
wrapped in
a <pre> tag and &, <, and > are all quoted.)
we just shelve what they ship. :)
yeah, but you should serve them with content-type: text/html so they
render correctly.
er. I suppose I can do that if I knew exactly which directories
should get
that .htaccess mod to the normal handling for txt and which ones
really are
ascii txt.
in our next stage of evolution, I hope to have people spending more
time
making the data better, but right now the focus is much more on
proving the
point and honing in. seriously ... you don't want my mirror of the
waisgate
system, you want the raw data from gpo.
If someone wants to look at the txt files in gpo.gov/... and say
which ones
are really html, happy to through the right mime types back.
Otherwise, we
assume it is a client issue.
Carl