brave-address-82092
03/20/2023, 9:29 PMmetaflow.local
, which I have only configured in my /etc/hosts file on my Mac - where I'm running the next command from)
2. I run this command locally, and launch my workflow with argo - I can see the workflow template with the environment variables and I can see the workflow executed in Argo UI. Note, I have metaflow.local
on my mac point to the external load balancer IP address
METAFLOW_S3_ENDPOINT_URL=<S3_ENDPOINT> AWS_DEFAULT_REGION=<S3_REGION> AWS_ACCESS_KEY_ID=<S3_ACCESS_KEY> AWS_SECRET_ACCESS_KEY=<S3_SECRET_KEY> METAFLOW_DEFAULT_DATASTORE=s3 METAFLOW_DATASTORE_SYSROOT_S3=<s3://metaflow-test> METAFLOW_SERVICE_INTERNAL_URL=<http://metaflow-metaflow-ui.default.svc.cluster.local:8083/> METAFLOW_SERVICE_URL=<http://metaflow.local> python3 hello_metaflow.py --with retry argo-workflows create
3. Following other threads with similar issue:
• Switching METAFLOW_SERVICE_URL
to http://metaflow-metaflow-ui.default.svc.cluster.local:8083/ has the same result.
• Adding METAFLOW_DEFAULT_METADATA
to service in the above variables leads to the error below which is weird, since I used the latest helm which is at 2.3.something.
You are running a version of the metaflow service that currently doesn't support Argo Workflows.
For more information on how to upgrade your service to a compatible version (>= 2.0.2), visit:
<https://admin-docs.metaflow.org/metaflow-on-aws/operations-guide/metaflow-service-migration-guide>
• Configuring metaflow kubernetes, and adding these env vars in there gives me the same error above, and so for now I've deleted the config.json file.
• Running --with kubernetes
also works, but nothing on the UI.
• There are no logs in the actual pods that suggests it is even trying to hit the Metaflow services.
Setting up task environment.
Downloading code package...
Code package downloaded.
Task is starting.
Metaflow says: Hi!
time="2023-03-20T21:27:41.103Z" level=info msg="sub-process exited" argo=true error="<nil>"
time="2023-03-20T21:27:41.104Z" level=info msg="no need to save parameter - on overlapping volume: /mnt/out/task_id" argo=true
Questions:
• What should METAFLOW_SERVICE_URL
and METAFLOW_SERVICE_INTERNAL_URL
be pointing to?
• Do the pods report to the service or does the service need to hit Argo to get details of the execution? In which case, how do I tell it argo's namespace? or Does argo have to be in the same namespace as Metaflow?
• Is there a debug flag, and if I set it will I see some pod trying to hit the Metaflow service internal URL and fail and throw an exception?
• Maybe it's something else?