[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]


Message 00288: Re: gpo archive



cc'ing Mike on this.

Joel Hardi set up thumper with a braindead config with nginx on top of Apache (among other things). On our list of unwrapping hardi is shooting nginx and have a straight apache2 config.

right now, though, the gpo.gov subtree you see with http is nginx output. apparently, the addtype directive does not pass through to nginx. We can change the entire site to be .txt = text/html, but I don't want to do that because it screws up a ton of other readmes.

Long story short: the mime type will get done, but after we rip out nginx, which is on Mike's list of things to do. But, might be a few days since this is not an urgent thing.

Carl

On Sep 20, 2008, at 4:52 AM, Aaron Swartz wrote:

Yep.

On Fri, Sep 19, 2008 at 10:47 PM, Carl Malamud <xxxxxxx@media.org> wrote:
so ....

you want

thumper.public.resource.org:/pro/bulk.resource.org/htdocs/ gpo.gov/.htaccess

to have the line

  AddType text/html txt

Is that right?

Carl

On Sep 19, 2008, at 7:38 PM, Aaron Swartz wrote:

I'm pretty sure their waisgate interface returns everything as html

On Fri, Sep 19, 2008 at 10:34 PM, Carl Malamud <xxxxxxx@media.org> wrote:

On Sep 19, 2008, at 7:23 PM, Aaron Swartz wrote:

http://bulk.resource.org/gpo.gov/bills/108/h5352ih.txt

Those documents are actually lame HTML, not text. (They're wrapped in
a <pre> tag and &, <, and > are all quoted.)

we just shelve what they ship.  :)


yeah, but you should serve them with content-type: text/html so they
render correctly.



er. I suppose I can do that if I knew exactly which directories should
get
that .htaccess mod to the normal handling for txt and which ones really
are
ascii txt.

in our next stage of evolution, I hope to have people spending more time making the data better, but right now the focus is much more on proving
the
point and honing in.  seriously ... you don't want my mirror of the
waisgate
system, you want the raw data from gpo.

If someone wants to look at the txt files in gpo.gov/... and say which
ones
are really html, happy to through the right mime types back. Otherwise,
we
assume it is a client issue.

Carl