[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]


Message 00196: Re: pacer crawl



no

On Thu, Sep 4, 2008 at 10:40 PM, Carl Malamud <xxxxxxx@media.org> wrote:
> do you have your library's permission/tacit agreement to drain pacer?  what
> library?
>
> Carl
>
> On Sep 4, 2008, at 7:38 PM, Aaron Swartz wrote:
>
>> the easiest thing would just to have a screen session open with a
>> couple perl scripts calling wget on the various pacer urls
>>
>> On Thu, Sep 4, 2008 at 10:36 PM, Carl Malamud <xxxxxxx@media.org> wrote:
>>>
>>> so, what specifically do you want to do on the box?
>>>
>>> do you need to run scripts, cron jobs, etc...?  periodically dump data
>>> off
>>> local crawlers?  run python jobs?
>>>
>>> On Sep 4, 2008, at 7:30 PM, Aaron Swartz wrote:
>>>
>>>> it's a disk space thing -- last time I did something like this, i kept
>>>> filling up people's disks whenever the process moving stuff off
>>>> hiccupped. and if we're at speed the hiccups don't have to last long.
>>>>
>>>> On Thu, Sep 4, 2008 at 10:26 PM, Carl Malamud <xxxxxxx@media.org> wrote:
>>>>>
>>>>> i don't mind crawling pacer with a valid account.  that is our
>>>>> production
>>>>> box, so i wouldn't want to be too intensive, but in principle I suppose
>>>>> one
>>>>> could crawl straight from thumper.
>>>>>
>>>>> is this a bandwidth thing?  not enough bits between your local
>>>>> computers
>>>>> and
>>>>> thumper to get the data over the wall?
>>>>>
>>>>> if this is really serious, there are a couple other places we can put
>>>>> you.
>>>>>
>>>>> let me know what you have in mind.
>>>>>
>>>>> On Sep 4, 2008, at 7:23 PM, Aaron Swartz wrote:
>>>>>
>>>>>> On Thu, Sep 4, 2008 at 10:23 PM, Carl Malamud <xxxxxxx@media.org> wrote:
>>>>>>>
>>>>>>> On Sep 4, 2008, at 7:22 PM, Aaron Swartz wrote:
>>>>>>>
>>>>>>>> I assume running the pacer crawl form thumper is not on, right?
>>>>>>>>
>>>>>>>
>>>>>>> so, what are you crawling?
>>>>>>>
>>>>>>> the thumb drive corps is based on going to the library and using
>>>>>>> their
>>>>>>> access.  other access is $0.08/page.  do you have some kind of magic
>>>>>>> account
>>>>>>> or something?
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> just the library's account.
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>