[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]


Message 00287: Re: gpo archive



Yep.

On Fri, Sep 19, 2008 at 10:47 PM, Carl Malamud <xxxxxxx@media.org> wrote:
> so ....
>
> you want
>
> thumper.public.resource.org:/pro/bulk.resource.org/htdocs/gpo.gov/.htaccess
>
> to have the line
>
>    AddType text/html txt
>
> Is that right?
>
> Carl
>
> On Sep 19, 2008, at 7:38 PM, Aaron Swartz wrote:
>
>> I'm pretty sure their waisgate interface returns everything as html
>>
>> On Fri, Sep 19, 2008 at 10:34 PM, Carl Malamud <xxxxxxx@media.org> wrote:
>>>
>>> On Sep 19, 2008, at 7:23 PM, Aaron Swartz wrote:
>>>
>>>>>>> http://bulk.resource.org/gpo.gov/bills/108/h5352ih.txt
>>>>>>
>>>>>> Those documents are actually lame HTML, not text. (They're wrapped in
>>>>>> a <pre> tag and &, <, and > are all quoted.)
>>>>>
>>>>> we just shelve what they ship.  :)
>>>>>
>>>>
>>>> yeah, but you should serve them with content-type: text/html so they
>>>> render correctly.
>>>>
>>>
>>>
>>> er.  I suppose I can do that if I knew exactly which directories should
>>> get
>>> that .htaccess mod to the normal handling for txt and which ones really
>>> are
>>> ascii txt.
>>>
>>> in our next stage of evolution, I hope to have people spending more time
>>> making the data better, but right now the focus is much more on proving
>>> the
>>> point and honing in.  seriously ... you don't want my mirror of the
>>> waisgate
>>> system, you want the raw data from gpo.
>>>
>>> If someone wants to look at the txt files in gpo.gov/... and say which
>>> ones
>>> are really html, happy to through the right mime types back.  Otherwise,
>>> we
>>> assume it is a client issue.
>>>
>>> Carl
>>>
>>>
>>>
>>
>
>