Hello, my analytics team noted that, since they ha...
# ask-metaflow
a
Hello, my analytics team noted that, since they have a lot of runs, perhaps 'Last 30 days' default time filter causes severely slowed loading times. I tried to dig through the code, as well as speaking about possible solutions here were I got a reply for
REACT_APP_MF_DEFAULT_TIME_FILTER_DAYS
. Unfortunately, since we build the UI through metadata-service repository, Dockerfile.ui-service does not support this argument and downloads UI part from .sh script. Now, I am not sure if that is even relevant for the slowness part, but I did see this. Could I get some clarification on this? Does PREFETCH_RUNS_LIMIT override the RUNS_SINCE if for example I have 1000 runs in past two days? Thanks. Configure amount of runs to prefetch during server startup (artifact cache): -
PREFETCH_RUNS_SINCE
[in seconds, defaults to 2 days ago (86400 * 2 seconds)] -
PREFETCH_RUNS_LIMIT
[defaults to 50]
āœ… 1
s
Hi! We don't expect a slow down to occur unless the UI is somehow misconfigured (we run the UI with the default settings and have millions of runs).
re:
PREFETCH_RUNS_SINCE
and
PREFETCH_RUNS_LIMIT
- @bulky-afternoon-92433 do you remember the behavior?
b
I'll have to check this tomorrow but as I recall, the runs_since is not meant to override the runs_limit, rather they act together. these are also only used once during the service startup, when the local cache is empty and we want to populate it with some data for the most recent runs.
is the UI slowness only in certain views, or across the board? which version of the services are you running? the most likely culprit would be missing indices from the DB if some of the migrations have not been applied.
a
I think it only happens when analytics team opens up metaflow UI for the first time during their work day, but we do use metaflow 2.4.3 version and 2.4.13 had numerous fixes so that was maybe fixed along the way? also, FEATURE_CACHE_DISABLE=0, what is this setting? Does 0 mean it enabled? Thanks!
b
correct,
FEATURE_CACHE_DISABLE
is a flag for disabling local cache usage for the service. 0 means that caching is used (not disabled), but going through the code, I would leave the environment variable unset unless you need to disable the cache.
also verified the behaviour of the previous env vars • PREFETCH_RUNS_SINCE applies to the sql query as
where ts_epoch > ?
when prefetching runs for caching during service startup • PREFETCH_RUNS_LIMIT applies to the sql query as a
LIMIT
when prefetching. as such, even with your 1000 runs in the last two days, the prefetching should not lead to any slowdowns with the service.
can you verify the usage statistics of your ui_service pod and db instance during regular usage where users are experiencing slowdowns? a heavy utilization of the DB would point towards missing indices
a
Sorry for delayed response, the holidays and stuff šŸ™‚
my analytics team isn't very skilled in Kubernetes and the background details, so only info I have is that on startup, when they start their day, going to metaflow UI takes a lot of time. Their guess was that since the production cluster has hundreds of thousands runs, loading all of those causes some delays. I just wanted to confirm what this envs do so that later on I can check them if pod resources are not the issue. Thank you very much.