Hi, Ive been trying to debug this situation but i ...
# ask-metaflow
b
Hi, Ive been trying to debug this situation but i cant seem to find more logs. context, im running the random forest example, ive had it work a couple of times but in other runs the task hangs and all i can see from the logs is this
Copy code
2024-11-07 08:40:17.452 [44/train_model/155 (pid 1995363)] RandomizedSearchCV initialized.
2024-11-07 08:40:17.452 [44/train_model/155 (pid 1995363)] Proceeding to the evaluation step.
2024-11-07 08:41:59.554 1 task is running: train_model (1 running; 0 done).
2024-11-07 08:41:59.554 No tasks are waiting in the queue.
2024-11-07 08:41:59.554 2 steps have not started: end, evaluate.
2024-11-07 08:46:59.851 1 task is running: train_model (1 running; 0 done).
2024-11-07 08:46:59.851 No tasks are waiting in the queue.
2024-11-07 08:46:59.851 2 steps have not started: end, evaluate.
2024-11-07 08:48:33.830 [44/train_model/155 (pid 1995363)] Internal error:
2024-11-07 08:48:33.831 [44/train_model/155 (pid 1995363)] ('Connection aborted.', TimeoutError('The write operation timed out'))
2024-11-07 08:48:40.879 [44/train_model/155 (pid 1995363)] 
2024-11-07 08:48:41.211 [44/train_model/155 (pid 1995363)] Task failed.
the Proceeding to the evaluation step log is the last print statment i have in the task before calling self.next() in the workflow, this hangs forever and never comes back. I feel like there might be a time out in writting to the data store but i cannot confirm this, is there anywhere i can see more detailed logs from metaflow to see what is going on?
1
a
Hi! What does your infrastructure looks like?
b
@square-wire-39606 I realized my s3 connection was hanging, (there is a retry configuration METAFLOW_S3_RETRY_COUNT set to 7 by default, hence the waiting forever). but the main reason was that i dindt have my credentials set in place (for local dev I have a k3s cluster and a localstack instance) and I just needed to set the environment variables for localstack to be happy and resolved my issue. I still would like to know how to turn on more detailed logs for situations like this, have you found any?