Subject: Re: pacer crawl

Message 00195: Re: pacer crawl

To: "Aaron Swartz" <xx@aaronsw.com>
Subject: Re: pacer crawl
From: Carl Malamud <xxxx@media.org>
Date: Thu, 4 Sep 2008 19:40:35 -0700
In-reply-to: <dc21c7860809041938g555f7ad7r506692f2cd691d21@mail.gmail.com>
References: <dc21c7860809041922i64893bd7p2c8bdac1df1a137f@mail.gmail.com> <4DA167E8-5E2E-402E-8331-1A4616F8D129@media.org> <dc21c7860809041923p41c4b55dtf3e5264ac217fd31@mail.gmail.com> <762EBFD0-5A93-423D-BFF6-93D974E441A7@media.org> <dc21c7860809041930v1cd6719dueddd61d949df57cc@mail.gmail.com> <8D62995A-C4A4-4D86-8032-EA7792662283@media.org> <dc21c7860809041938g555f7ad7r506692f2cd691d21@mail.gmail.com>

do you have your library's permission/tacit agreement to drain pacer?what library?


Carl

On Sep 4, 2008, at 7:38 PM, Aaron Swartz wrote:

the easiest thing would just to have a screen session open with a
couple perl scripts calling wget on the various pacer urls

On Thu, Sep 4, 2008 at 10:36 PM, Carl Malamud <xxxxxxx@media.org> wrote:
so, what specifically do you want to do on the box?
do you need to run scripts, cron jobs, etc...? periodically dumpdata off
local crawlers?  run python jobs?

On Sep 4, 2008, at 7:30 PM, Aaron Swartz wrote:
it's a disk space thing -- last time I did something like this, ikept
filling up people's disks whenever the process moving stuff off
hiccupped. and if we're at speed the hiccups don't have to lastlong.
On Thu, Sep 4, 2008 at 10:26 PM, Carl Malamud <xxxxxxx@media.org>wrote:
i don't mind crawling pacer with a valid account. that is ourproductionbox, so i wouldn't want to be too intensive, but in principle Isuppose
one
could crawl straight from thumper.
is this a bandwidth thing? not enough bits between your localcomputers
and
thumper to get the data over the wall?
if this is really serious, there are a couple other places we canput
you.

let me know what you have in mind.

On Sep 4, 2008, at 7:23 PM, Aaron Swartz wrote:
On Thu, Sep 4, 2008 at 10:23 PM, Carl Malamud <xxxxxxx@media.org>wrote:
On Sep 4, 2008, at 7:22 PM, Aaron Swartz wrote:
I assume running the pacer crawl form thumper is not on, right?
so, what are you crawling?
the thumb drive corps is based on going to the library andusing theiraccess. other access is $0.08/page. do you have some kind ofmagic
account
or something?
just the library's account.

Follow-Ups:
- Re: pacer crawl
  - From: "Aaron Swartz" <xx@aaronsw.com>

References:
- pacer crawl
  - From: "Aaron Swartz" <xx@aaronsw.com>
- Re: pacer crawl
  - From: Carl Malamud <xxxx@media.org>
- Re: pacer crawl
  - From: "Aaron Swartz" <xx@aaronsw.com>
- Re: pacer crawl
  - From: Carl Malamud <xxxx@media.org>
- Re: pacer crawl
  - From: "Aaron Swartz" <xx@aaronsw.com>
- Re: pacer crawl
  - From: Carl Malamud <xxxx@media.org>
- Re: pacer crawl
  - From: "Aaron Swartz" <xx@aaronsw.com>

Prev by Date: Re: pacer crawl
Next by Date: Re: pacer crawl
Previous by thread: Re: pacer crawl
Next by thread: Re: pacer crawl
Index(es):
- Date
- Thread