Forums

Scaling / Behaviour once number of users exceeds number of web workers

Hello all,

I have been working on a Python Dash data analysis web app. It is working on my local machine and on PythonAnywhere, but I have just realised that my current app comes to a complete stop when the number of users exceeds my number of PA web workers, and I am trying to understand what my options are to get around this. (As you can probably tell I am relatively new to coding, self-taught and haven’t considered this issue before!)

The way my site currently works is:

1) My home page is a Dash layout consisting of a welcome message/image, a dcc.input component (see below), and a couple of “Help”, “About” buttons which work with simple Dash callbacks to overlay “Help” and “About” modals when clicked.

2) The user provides an ID for analysis, for example 123456. Currently I allow input into the dcc.input component box, or alternatively into the address bar using dcc.location. In this case the user would enter mydomain.com/123456.

3) My Dash callback checks to see whether this is a returning user and I have already done the processing for this ID this week - if so there will be a pickled finished dictionary already stored on local disk, and the callback reads this file and serves the dictionary content back to the user. This reading/serving seems to be very quick, on the order of 0.1 seconds.

4) If the dictionary does not exist on local disk, then I have not yet done the processing for this ID. I then call my fetching function which gets the relevant ID data from an external API (speed varies but circa 5 seconds), and then calls my processing function which does the processing on it and saves down the finished dictionary (speed varies but circa another 5 seconds). So roughly a 10 second total “loading” time (fetching+processing) for the user, which is acceptable. The finished dictionary is then passed back to the front end, as well as being saved to disk in case the user returns. From the user’s perspective, after they input their ID number then there is a dcc.spinner which spins for 10 seconds in the top corner to show that the app is doing something, and then the landing page content is replaced with their processed content. This is the desired behaviour.

I thought that if many users wanted to use my site, I could simply increase the number of PA web workers (let’s say to 100). Naively I thought that if more than 100 people tried to use the site at the same time, then they would be able to get into the site but would face a longer “spinning” load time after submitting their ID number, whilst the workers work through the different requests. However it seems that what actually happens when I have more visitors than PA workers is that the site grinds to a complete halt and new visitors do not even receive the home page and get to see the entry box, and help buttons etc. So it seems that once my web workers are fully utilised (by 100 users in the above example), then there are no free ones to serve even my Dash landing page to additional visitors.


From reading It sounds like perhaps I should be using the web workers only to retrieve completed dictionaries from disk when it exists (as they currently do), but when the dictionary does not exist then the callback should simply add the user’s ID to a job list and then exit the callback and make themselves available to serve the site to other users - such as displaying the landing page, opening and closing my Help/About modals, reading any already finished dictionaries from disk. And then use some kind of always-on-task in the background, to monitor the job list and run my Fetch+Process functions on each job, and save down the finished dictionary to disk. My concern with this method is that I believe this would be just one worker, and so it would take 1000 seconds to create all 100 dictionaries (taking 10 seconds each as discussed above). And also doing this would take up 1000 seconds of my CPU time allocation. In contrast, with my current method, all of the 100 dictionaries are completed in parallel by 100 separate workers within 10 seconds (hopefully), and without using any CPU time allocation. Is this understanding correct?


1) It would be highly appreciated if anyone could please point me in the right direction to help achieve my goal of having a working landing page even when the number of visitors exceeds my number of web workers. Ideally it would show my landing page content, have working Help/About buttons, and allow the user to submit their ID and show a spinner whilst the fetching+processing is being done (although clearly the fetching+processing would take longer than 10 seconds for each at a busy time when 300 requests are being split by 100 web workers). I did read some comments about having some kind of static html landing page - would this help to solve my problem, and is there any kind of guide to how I could get this to interact correctly with my existing Dash app?

2) As a less important question, from my limited testing it seems that even when I have fewer users (eg 3) than my number of web workers (eg 5), my fetch+process function is often slower for each user than when I have only 1 user. I had guessed that the web workers would be totally separate so the speed would be the same for each user, until I exceed the number of web workers of course. Is this expected behaviour?

Many thanks for any help anyone is able to provide!

See this help on how to determine how many workers you may need.

Thank you, but it is difficult to estimate how many users my site will have and I am trying to ensure some level of basic functionality even if it becomes popular and the number of simultaneous visitors far exceeds the number of web workers I have.

It sounds to me like Dash is keeping connections open even when it's not currently getting data.

The way that the system operates is that all incoming requests to your site go into a queue; the workers then pick up requests from the end of that queue, call your code to do the processing, then send the responses back. So if your site is a normal HTTP one and makes a request, gets the response, and then closes the connection, you can handle large numbers of users per worker.

However, if the front-end of the site is holding connections open, then that would break things because the workers would never get freed up after they'd handled a request. Perhaps there's some way to configure Dash not to do that? Or perhaps you have inadvertently switched on a setting that makes it do it? We haven't heard of an issue like this before, and I would have thought that we would have done if it was a general issue with Dash on our platform.

Hi Giles, Thanks for your response. I have just found out about a newish feature in Dash called "background callbacks" which appears to be designed to fix exactly this problem, it is described here:

https://community.plotly.com/t/dash-2-6-released-background-callbacks-unit-testing-persistent-selections-dropdown-features/66322

Background callbacks run your callbacks in a background queue, enabling your apps to scale to more concurrent users and preventing your apps from running into network timeout issues.

Consider an app that has a callback that takes 10 seconds to run. Every time that callback is fired, a worker (CPU) is allocated to crunch the numbers. If your app is deployed with 4 CPU workers and if 4 people visit your app and execute the same callback at the same time, your app will have no more workers to serve the 5th app visitor. As a result, the 5th visitor will see a blank page for up to 10 seconds until the first worker becomes available.

Background callbacks offer a scalable solution for using long-running callbacks by running them in a separate background queue. In the background queue, the callbacks are executed one-by-one in the order that they came in by dedicated queue worker(s). The web workers remain available to load the dash app, return results of the background callback, and run non-background callbacks.

The same post says:

Deploying Apps with Background Callbacks

To deploy an app with background callbacks, you’ll need:

1) A deployment platform that can run two commands:

gunicorn app:server--workers 4 - For running the Dash app and “regular” callbacks. In this case, 4 CPU are serving these requests.

celery app:server --concurrency=2 - For running the background job workers. In this case, 2 CPU will be running the background callbacks in the order that they are submitted.

2)A Redis database available with network access available to both of those commands. Dash will submit and read jobs to and from Redis.

From some googling it sounds like PythonAnywhere doesn't support Redis or Celery and therefore I may not be able to use this Background callback method to solve the issue. Please could you confirm if this is correct? Thank you.

Celery would not work currently; regarding redis you could use an external instance since you already use a paid account, so you have unrestricted internet access (see this help page). Regarding the whole solution, have a look at this help page as maybe there is a kind of work around to the issue.