Forums

Django Scraper with BaseCommand not working when scheduled

Hi,

My Django Scraper with BaseCommand is not scraping when scheduled. Is there some specific prerequisite I don't know about?

Thanks!

Mark

PS I used all of the options in this link: https://help.pythonanywhere.com/pages/ScheduledTasks

Code of the scraper, I removed some sections, the code works in the bash console.

import scraping modules

from django.core.management.base import BaseCommand from urllib.request import urlopen from bs4 import BeautifulSoup from webscraping.models import Functie

for random time intervals

from random import randint from time import sleep

Add the logging imports

from error_pages.logging import configure_logging

scrape website

class Command(BaseCommand): help = "collect jobs"

global counter
counter = 0

### define logic of command
def handle(self, *args, **options):
    ### set url
    url = 'XXX'

    try:            
        ### collect html
        html = urlopen(url)

        ### convert to soup
        soup = BeautifulSoup(html, 'html.parser')

        ### Scraper code removed

        ### while loop to get all pages with scraper, breaks when 'volgende_link' does not exist
        while (volgend_pagina_nummer>huidig_pagina_nummer):
            try:
                ### start with variables declared first, then change variables based on pages, wait in between

                ### Checks if page has changed, if > 5 same records, breaks
                global counter
                if counter > 5:
                    break

    except Exception as e:
        logger = configure_logging(__name__)
        logger.error("Error occured in setting scraper settings for url: %s with error: %s", url, e,exc_info=True)

The most likely cause of issues for that is that you are not setting the working directory: https://help.pythonanywhere.com/pages/ScheduledTasks/#make-sure-you-take-account-of-the-working-directory

Thanks,

The solution was in here: https://www.pythonanywhere.com/forums/topic/7556/ /home/user/.virtualenvs/myvirtualenv/python /home/<user>/<project>/manage.py scrape_XXX

But now I have a new challenge, because the scraper works, but does not save to the database. Challenge accepted ;)

Glad you found the solution! And good luck with the challenge ;-)