Hello my analytics team noted that since they have a lot of Outerbounds #ask-metaflow

Hello, my analytics team noted that, since they ha...

acoustic-baker-72585

12/26/2024, 11:31 AM

Hello, my analytics team noted that, since they have a lot of runs, perhaps 'Last 30 days' default time filter causes severely slowed loading times. I tried to dig through the code, as well as speaking about possible solutions here were I got a reply for

REACT_APP_MF_DEFAULT_TIME_FILTER_DAYS

. Unfortunately, since we build the UI through metadata-service repository, Dockerfile.ui-service does not support this argument and downloads UI part from .sh script. Now, I am not sure if that is even relevant for the slowness part, but I did see this. Could I get some clarification on this? Does PREFETCH_RUNS_LIMIT override the RUNS_SINCE if for example I have 1000 runs in past two days? Thanks. Configure amount of runs to prefetch during server startup (artifact cache): -

PREFETCH_RUNS_SINCE

[in seconds, defaults to 2 days ago (86400 * 2 seconds)] -

PREFETCH_RUNS_LIMIT

[defaults to 50]

✅ 1

square-wire-39606

12/26/2024, 5:26 PM

Hi! We don't expect a slow down to occur unless the UI is somehow misconfigured (we run the UI with the default settings and have millions of runs).

square-wire-39606

12/26/2024, 5:27 PM

re:

PREFETCH_RUNS_SINCE

and

PREFETCH_RUNS_LIMIT

- @bulky-afternoon-92433 do you remember the behavior?

bulky-afternoon-92433

12/26/2024, 8:34 PM

I'll have to check this tomorrow but as I recall, the runs_since is not meant to override the runs_limit, rather they act together. these are also only used once during the service startup, when the local cache is empty and we want to populate it with some data for the most recent runs.

bulky-afternoon-92433

12/26/2024, 8:36 PM

is the UI slowness only in certain views, or across the board? which version of the services are you running? the most likely culprit would be missing indices from the DB if some of the migrations have not been applied.

acoustic-baker-72585

12/27/2024, 10:19 AM

I think it only happens when analytics team opens up metaflow UI for the first time during their work day, but we do use metaflow 2.4.3 version and 2.4.13 had numerous fixes so that was maybe fixed along the way? also, FEATURE_CACHE_DISABLE=0, what is this setting? Does 0 mean it enabled? Thanks!

bulky-afternoon-92433

12/27/2024, 5:04 PM

correct,

FEATURE_CACHE_DISABLE

is a flag for disabling local cache usage for the service. 0 means that caching is used (not disabled), but going through the code, I would leave the environment variable unset unless you need to disable the cache.

bulky-afternoon-92433

12/27/2024, 5:07 PM

also verified the behaviour of the previous env vars • PREFETCH_RUNS_SINCE applies to the sql query as

where ts_epoch > ?

when prefetching runs for caching during service startup • PREFETCH_RUNS_LIMIT applies to the sql query as a

LIMIT

when prefetching. as such, even with your 1000 runs in the last two days, the prefetching should not lead to any slowdowns with the service.

bulky-afternoon-92433

12/27/2024, 5:08 PM

can you verify the usage statistics of your ui_service pod and db instance during regular usage where users are experiencing slowdowns? a heavy utilization of the DB would point towards missing indices

acoustic-baker-72585

01/03/2025, 8:56 AM

Sorry for delayed response, the holidays and stuff 🙂

acoustic-baker-72585

01/03/2025, 8:58 AM

my analytics team isn't very skilled in Kubernetes and the background details, so only info I have is that on startup, when they start their day, going to metaflow UI takes a lot of time. Their guess was that since the production cluster has hundreds of thousands runs, loading all of those causes some delays. I just wanted to confirm what this envs do so that later on I can check them if pod resources are not the issue. Thank you very much.

22 Views

Open in Slack

Previous Next