I came across an interesting bug when using S3 (th...
# dev-metaflow
e
I came across an interesting bug when using S3 (the actual one, not Minio) in a local k8s setup. When we specified the S3 bucket via the Metaflow config variables (specifically
METAFLOW_DEFAULT_DATASTORE
and
METAFLOW_DATASTORE_SYSROOT_S3
), Boto3 defaulted to using
us-east-1
when the bucket region was
us-west-2
. We isolated it down to Boto3 and not Metaflow by shelling into a worker pod and running these commands:
Copy code
>>> import boto3, os
>>> boto3.client("s3").download_file("our-bucket--usw2-az1--x-s3", "metaflow/OurFlow/data/4e/4ef8def1d5cc6f655cd71109ee45ef2d6be4f55c", "job.tar")
...
  File "/usr/local/lib/python3.9/site-packages/botocore/httpsession.py", line 493, in send
    raise EndpointConnectionError(endpoint_url=request.url, error=e)
botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "<https://our-bucket--usw2-az1--x-s3.s3express-usw2-az1.us-east-1.amazonaws.com/?session>"
Notice that
METAFLOW_S3_ENDPOINT_URL
is not set or used in the
boto3
invocation. We're also using an S3 Express directory bucket, not sure if that changes things. It looks like
boto3
defaults to
us-east-1
when a region is not set, cf. https://github.com/boto/boto3-legacy/blob/b3091c5c3062c5a8ddd19926069d069cb6957dae/boto3/core/constants.py#L13. Is it desirable for Metaflow to provide some way to set this region?
METAFLOW_S3_ENDPOINT_URL
could work, but that seems clunky to me. Thoughts?
1
FWIW, using a normal S3 bucket instead of S3 Express solved this!
h
Should work if you set
AWS_DEFAULT_REGION
. Did you try that?
e
Should I set it in the Metaflow config? Would it get passed down to the pod making the request?
h
it's just a normal environment variable (not a metaflow config)
e
Gotcha, makes sense! I guess I’m unsure how to get it in the pod given that pods are created by Metaflow
h
are you using the kubernetes decorator? i think you can do
Copy code
@kubernetes(env={"MY_VAR": "some_value", "OTHER_VAR": "123"})
if not, then
@environment
would do it: https://docs.outerbounds.com/set-env-vars-with-decorator/
e
Oh nice! I need to check the Outerbounds docs :P thanks for the help!