[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]


Message 00193: Re: pacer crawl



so, what specifically do you want to do on the box?

do you need to run scripts, cron jobs, etc...? periodically dump data off local crawlers? run python jobs?

On Sep 4, 2008, at 7:30 PM, Aaron Swartz wrote:

it's a disk space thing -- last time I did something like this, i kept
filling up people's disks whenever the process moving stuff off
hiccupped. and if we're at speed the hiccups don't have to last long.

On Thu, Sep 4, 2008 at 10:26 PM, Carl Malamud <xxxxxxx@media.org> wrote:
i don't mind crawling pacer with a valid account. that is our production box, so i wouldn't want to be too intensive, but in principle I suppose one
could crawl straight from thumper.

is this a bandwidth thing? not enough bits between your local computers and
thumper to get the data over the wall?

if this is really serious, there are a couple other places we can put you.

let me know what you have in mind.

On Sep 4, 2008, at 7:23 PM, Aaron Swartz wrote:

On Thu, Sep 4, 2008 at 10:23 PM, Carl Malamud <xxxxxxx@media.org> wrote:

On Sep 4, 2008, at 7:22 PM, Aaron Swartz wrote:

I assume running the pacer crawl form thumper is not on, right?


so, what are you crawling?

the thumb drive corps is based on going to the library and using their access. other access is $0.08/page. do you have some kind of magic
account
or something?



just the library's account.