<@U0251HKBS4T>
# ask-metaflow
d
@hallowed-glass-14538
h
hey Christian! can you run this simple example once and let me know what you get https://docs.metaflow.org/metaflow/visualizing-results/dynamic-cards#monitoring-progress-with-progressbar . Let me also check the cloudformation template WRT the images of services we are deploying
n
hey valay! Thanks for looking into to this. Here's what I see when I run that I refresh at the end and see the complete progress bar
h
cards have a refresh_interval which dictates how often they get refreshed. Can you set the interval to something smaller like
1
? Can you also set a longer duration in the flow for the step to run (ie. change the
range
and
max
in the flow file)
n
For this example, the refresh_interval is already set to 1. I set the max of the range to 1000. I'm getting the same behavior. Something interesting that I'm noticing as well is that when I terminate these runs locally with
ctrl + c
they are still showing as running in the UI. Take a look at Run ID 170. I terminated it locally, but it says it's still running.
Seems to recognize immediately that the task failed, but leaves the duration going and doesn't update the finished at time for a while.
h
the status on the UI is eventually consistent. metaflow service derives the status of a task based on when it recieved the last heartbeat. We are working on methods to make it more consistent. If you need to finetune this threshold then you can configure the heartbeat threshold.
n
thanks valay that makes sense. Would adjusting the heartbeat threshold fix the cards not dynamically updating as well? For the progress bar example increasing the range size and setting the refresh_rate to 1 still results in no updates on the progress bar until the task completes.
h
No heartbeat changes will only affect the status, Can you share what you see in the chrome network tab for the /card route ? And contents of the responses ?
n
I see a /cards route, which I assume is what we're looking for. Attached is what I'm seeing filtering by that route. Do you want the full payload for the successful request?
h
Are there no other routes but that ?
n
thats all i see when i filter by card
h
Ideally the UI should even be calling the /data under /cards route. Can you check for /data ?
n
this is all im seeing for filter on data
h
Hmm. I see seems like the Metaflow ui is not of the desired version we want. Let me circle back on this
n
sounds good. thanks so much for your help on this!
h
Can you also check if there any logs in the console ?
n
yea looks like there's some errors here
h
Thanks. Let me check this and get back to you
🙏 1
can you share the whats version present in the quick links section
n
Copy code
Application version: v1.2.4
Service version: 2.4.12--
h
I see. So the UI version is older than the current one. But the service version seems to be the latest.
can you check what is the version of the docker container for the metaflow ui service deployed in your setup ?
n
I believe it's the following:
netflixoss/metaflow_metadata_service:v2.4.12
I just took a crack at trying to see what was going on here. I ran the container locally and it appears that
UI_VERSION
environmental variable is set to
v1.3.13
. It seems like on building the container this was passed in and
~/services/ui_backend_service/download_ui.sh
downloads this ui version into
~/services/ui_backend_service/ui
. I forked the repo that builds the container and ran the github action to build it (removing stuff to push the container), so I could see the logs. It says
Download UI version v1.3.13 from <https://github.com/Netflix/metaflow-ui/releases/download/v1.3.13/metaflow-ui-v1.3.13.zip> to /root/services/ui_backend_service/ui
. Any thoughts on the discrepancy here? It seems like things should be working with the latest version, but maybe I'm thinking about this wrong.
h
So I checked on thing.
netflixoss/metaflow_metadata_service:v2.4.12
image has the right version of the UI. Is it possible that you can bounce the containers that are running the metaflow ui in your deployment?
The UI javascipt in the
netflixoss/metaflow_metadata_service:v2.4.12
has the same checksum as this release. It seems very weird that a older version of the Ui is running on that ? Can you also verify how old is your deployment ? Coz this is the first time we are seeing this issue 😅
n
thanks for looking into this valay. super weird. The deployment is quite new. I did it about two weeks ago using the cloudformation template here. I did a diff on the original template and the one I modified for deployment. The only thing that was different were the variables you have to add (EnableUI, PublicDomainName, CertificateArn) for the UI and I had to bump the version of the RDSMasterInstance from 16.1 to 16.3 and add point in time recovery for StepFunctionsStateDDB. Everything else is the same as the template. I'll try bouncing the containers on monday and let you know what I see. Thanks again for the help!
@hallowed-glass-14538 wait i think i see the issue. totally missed this looking early. in the default cloudformation template in the outerbounds/metaflow-tools repo it has the following:
Copy code
ServiceInfoUIStatic:
    StackName:
      value: 'metaflow-infrastructure'
    ServiceName:
      value: 'metadata-ui-static'
    ImageUrl:
      value: 'public.ecr.aws/outerbounds/metaflow_ui:v1.2.4'
    ContainerPort:
      value: 3000
    ContainerCpu:
      value: 512
    ContainerMemory:
      value: 1024
    Path:
      value: '*'
    Priority:
      value: 1
    DesiredCount:
      value: 1
    Role:
      value: ""
take a look here
h
oh isee! Thats a great point. We can submit a change to fix that.
Thanks a lot for identifying the bug!
🙌 1
n
@hallowed-glass-14538 sorry to bother, but I may have spoken too soon on this. I changed the ImageUrl value for
ServiceInfoUIStatic
from
public.ecr.aws/outerbounds/metaflow_ui:v1.2.4
to
public.ecr.aws/outerbounds/metaflow_ui:v1.3.3
and redeployed the cloudformation stack. It appears to have worked correctly as I'm seeing the correct version in quick links (attached), but I'm still having the same issue where the cards are not dynamically updating. Any other suggestions for what might be wrong here?
h
how long are you running your tasks ? can you run them for longer ? can you also share the outputs from the network tab in your browser filtered by the
/cards
route (like API responses, API routes, timings etc.) ?
n
I've been using the clock example and setting to run for 1000 steps. I think there's definitely something up on the networking side. Filtering by the cards route I see the following. Then looking at the console I see the following, which makes me think that it could be something with the way we configured the public dns. It seems like this cors issue might be it