[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]


Message 00284: Re: gpo archive



On Sep 19, 2008, at 7:23 PM, Aaron Swartz wrote:

http://bulk.resource.org/gpo.gov/bills/108/h5352ih.txt

Those documents are actually lame HTML, not text. (They're wrapped in
a <pre> tag and &, <, and > are all quoted.)

we just shelve what they ship.  :)


yeah, but you should serve them with content-type: text/html so they
render correctly.



er. I suppose I can do that if I knew exactly which directories should get that .htaccess mod to the normal handling for txt and which ones really are ascii txt.

in our next stage of evolution, I hope to have people spending more time making the data better, but right now the focus is much more on proving the point and honing in. seriously ... you don't want my mirror of the waisgate system, you want the raw data from gpo.

If someone wants to look at the txt files in gpo.gov/... and say which ones are really html, happy to through the right mime types back. Otherwise, we assume it is a client issue.

Carl