Forums

Read timeout with requests on a paid account

I am hosting a Telegram bot on Pythonanywhere, using a paid account, and it works like a charm. The problem comes when I try to perform a daily scrape from stats.nba.com, in order to gather player statistics for the bot. I get this error: HTTPSConnectionPool(host='stats.nba.com', port=443): Read timed out. (read timeout=5)

I also tried rising the timeout to one minute but it fails. The request line is this one: r = requests.get(url=URL, timeout=5, headers={'Referer': 'https://www.nba.com/', 'User-Agent': USERAGENT})

The same instruction runs fine on my PC from Ubuntu and/or Windows. What am I missing to perform the GET on Pythonanywhere?

It's possible that the site you're trying to access has some kind of protection in place to prevent scraping, based on the IP address of incoming requests -- they're spotting that the requests from PythonAnywhere are coming from an IP address associated with a cloud computing environment (as opposed to one owned by a residential ISP) and are just ignoring them.

Unfortunately there's not really anything we can do about that from our side; you'd have to get in touch with them to ask them to lift the block (which I appreciate might not be easy with a large site like nba.com).

Bad to know, but at least I have an explanation now, thanks.