Hi,
My Django Scraper with BaseCommand is not scraping when scheduled. Is there some specific prerequisite I don't know about?
Thanks!
Mark
PS I used all of the options in this link: https://help.pythonanywhere.com/pages/ScheduledTasks
Code of the scraper, I removed some sections, the code works in the bash console.
import scraping modules
from django.core.management.base import BaseCommand from urllib.request import urlopen from bs4 import BeautifulSoup from webscraping.models import Functie
for random time intervals
from random import randint from time import sleep
Add the logging imports
from error_pages.logging import configure_logging
scrape website
class Command(BaseCommand): help = "collect jobs"
global counter
counter = 0
### define logic of command
def handle(self, *args, **options):
### set url
url = 'XXX'
try:
### collect html
html = urlopen(url)
### convert to soup
soup = BeautifulSoup(html, 'html.parser')
### Scraper code removed
### while loop to get all pages with scraper, breaks when 'volgende_link' does not exist
while (volgend_pagina_nummer>huidig_pagina_nummer):
try:
### start with variables declared first, then change variables based on pages, wait in between
### Checks if page has changed, if > 5 same records, breaks
global counter
if counter > 5:
break
except Exception as e:
logger = configure_logging(__name__)
logger.error("Error occured in setting scraper settings for url: %s with error: %s", url, e,exc_info=True)