Hi Team I have a question about data store and met...
# ask-metaflow
l
Hi Team I have a question about data store and metadata provider: I understand that I can use S3 as the data store through something like:
Copy code
export METAFLOW_DEFAULT_DATASTORE=s3
export METAFLOW_DATASTORE_SYSROOT_S3='s3://<mybucket>/metaflow_datastore'
But can I at the same also assign a dedicated local directory (instead of
.metaflow
at the run directory) as metadata provider? I tried to use
export METAFLOW_DATASTORE_SYSROOT_LOCAL=$HOME/metaflow_datastore
but it basically enforce the local directory as data store as well.
v
interesting.. let's see
@dry-beach-38304 it seems `DATASTORE_LOCAL_DIR` that specifies `.metaflow` doesn't have
from_conf
so it's not overridable currently afaics, if one wants to use another dir besides
.metaflow
do you remember why is it so or maybe I'm missing something? 🤔
@late-xylophone-98814 you could use a symlink to point
.metaflow
to some other location but that's not very helpful if you want to run flows in many different places
besides changing
.metaflow
directory, using S3 datastore with local metadata should work like in this config
Copy code
"METAFLOW_DATASTORE_SYSROOT_S3": "s3://...",
    "METAFLOW_DATATOOLS_S3ROOT": "s3://...",
    "METAFLOW_DEFAULT_DATASTORE": "s3",
    "METAFLOW_DEFAULT_METADATA" : "local"
l
Thank you very much for the pointers Ville. I also thought about just making a softlink of
.metaflow
at the run directory.
👍 1
So in your suggestion, where do I specify the local metadata provider path?
v
it uses
.metaflow
currently - we need to check if that can be changed, stay tuned
🙏 1
l
Ah I see. Thanks for the clarification.
I know that ideally I should just do a minimal deployment (have metadata service deployed), but right now at work I have some limitations
v
makes sense. We have a new template for deploying a metadata service locally with Minikube easily, which might also work in your case
l
interesting, first time heard about "minikube"
I guess it's a more robust local metadata provider. The current local metadata service that ships with metaflow is also quite robust.
v
if you need a centralized metadata across projects using a proper metadata service with a database is the way to go
l
yeah completely agree.
d
For context, DATASTORE_LOCAL_DIR and LOCAL_CONFIG_FILE are used in
init_local_config
which is used in
from_conf
. Further context, @victorious-lawyer-58417 is here: https://github.com/Netflix/metaflow/pull/1850. That is why the value is not overridable using
from_conf
. We could look to make it work differently if that is causing issues.
👍 1
l
Following up on this thread, may I understand that right now there is no easy way to use s3 as the datastore while dedicating a local dir (other than
.metaflow
) as the metadata provider?
Update: somehow I was able to get this working using the following
Copy code
export METAFLOW_DATASTORE_SYSROOT_LOCAL=$HOME/metaflow_datastore
export METAFLOW_DEFAULT_DATASTORE=s3
export METAFLOW_DATASTORE_SYSROOT_S3='s3://<mybucket>/metaflow_datastore'
this way the
METAFLOW_DATASTORE_SYSROOT_LOCAL
act as the local metadata provider while datastore is located on s3