atreides
Graduate Student
- Joined
- 7/4/08
- Messages
- 420
- Points
- 38
I was wondering if somewhere here has experience doing cron jobs in python. I have a script that basically scrapes data from a website and writes the data to a file.
The script should write about 60K records to file if everything goes smoothly, but this hasn't happened yet. After about 1000 records have been written, the script either hangs or gets a url error saying 'connection has been reset by peer'
EDIT: It seems my connection times out
I'm currently using time.sleep to make the script stop for a few seconds (how many seconds is determined by a random function) after writing every 100 records, but even with this, it eventually still stalls. I thick what I need to figure out is a way to exit the script completely after every writing say 100 records to file and reenter the scripts after a few seconds at the exact point of exit.
The problem is probably not from the page source because, I check whether the page html is well formed before I even attempt to scrape it.
Would appreciate any thoughts
I'm on a mac/linux env
The script should write about 60K records to file if everything goes smoothly, but this hasn't happened yet. After about 1000 records have been written, the script either hangs or gets a url error saying 'connection has been reset by peer'
EDIT: It seems my connection times out
I'm currently using time.sleep to make the script stop for a few seconds (how many seconds is determined by a random function) after writing every 100 records, but even with this, it eventually still stalls. I thick what I need to figure out is a way to exit the script completely after every writing say 100 records to file and reenter the scripts after a few seconds at the exact point of exit.
The problem is probably not from the page source because, I check whether the page html is well formed before I even attempt to scrape it.
Would appreciate any thoughts
I'm on a mac/linux env