[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]


Message 00285: Re: gpo archive



I'm pretty sure their waisgate interface returns everything as html

On Fri, Sep 19, 2008 at 10:34 PM, Carl Malamud <xxxxxxx@media.org> wrote:
> On Sep 19, 2008, at 7:23 PM, Aaron Swartz wrote:
>
>>>>> http://bulk.resource.org/gpo.gov/bills/108/h5352ih.txt
>>>>
>>>> Those documents are actually lame HTML, not text. (They're wrapped in
>>>> a <pre> tag and &, <, and > are all quoted.)
>>>
>>> we just shelve what they ship.  :)
>>>
>>
>> yeah, but you should serve them with content-type: text/html so they
>> render correctly.
>>
>
>
> er.  I suppose I can do that if I knew exactly which directories should get
> that .htaccess mod to the normal handling for txt and which ones really are
> ascii txt.
>
> in our next stage of evolution, I hope to have people spending more time
> making the data better, but right now the focus is much more on proving the
> point and honing in.  seriously ... you don't want my mirror of the waisgate
> system, you want the raw data from gpo.
>
> If someone wants to look at the txt files in gpo.gov/... and say which ones
> are really html, happy to through the right mime types back.  Otherwise, we
> assume it is a client issue.
>
> Carl
>
>
>