Question about the expected behavior of the heartb...
# ask-metaflow
p
Question about the expected behavior of the heartbeat-daemon launched under Argo Workflows in the just released 2.12.23 metaflow package. Specifically, the heartbeat-daemon template hard codes the request/limit for memory to 100Mi, which causes the heartbeat pod to be OOMKilled within 30s. Increasing that memory limit (edited the template on the fly and re-triggered) allows the pod to launch and run, but its eventual exit code is 143, which I would like to confirm is the expected exit behavior.
1
I think the 143 is expected because the task successfully finished then the kubernetes scheduler (via kubelet) sends a kill -15 to the heartbeat-daemon pod to shut down, as there's no natural termination. But the error messages are a little alarming to me and our users, so I'm wondering if there's a configuration I'm missing that would allow the heartbeat-daemon to exit more gracefully and with less scary looking exit codes and logs.
s
@prehistoric-salesclerk-95013 we are also looking into some of the issues with the daemon container implementation within argo workflows and trying to work around that. the heartbeat daemon's are a best effort daemon with no impact to the execution of metaflow itself - the next release of metaflow will actually turn them off by default.
thankyou 1