<https://github.com/zillow/metaflow/pull/81> Thoug...
# dev-metaflow
r
https://github.com/zillow/metaflow/pull/81 Thoughts on this PR? To ensure compatibility with Metaflow origin to make merging easy.
Copy code
@resources(
        local_storage="100",
        cpu_limit="0.6",
        memory="500",
        memory_limit="1G",
        volume="11G",
    )
s
Would adding
disk
attribute to
@resources
make sense?
cpu_limit
,
memory_limit
seem more specific to the underlying compute infrastructure
r
disk
or
volume
?... https://www.maketecheasier.com/difference-between-disk-drive-volume-partition-image/ I believe a
disk
can have multiple volumes, hence why I selected
volume
s
that's true - I am operating under the assumption that the
disk
as a keyword signifying the amount of physical storage in the user space needed for a particular step is easier to grok for the end user. We can expose
volume
as an advanced functionality.
r
You're thinking of a single name/variable to specify total storage available? And the user doesn't need to think about volumes, etc ?
That'd be ideal from the user's perspective... However with k8s containers, there is the root filesystem, and attached volumes... Some enterprise scenarios would like root readonly filesystems. I also, have no way to specify total user space storage available in k8s, it'd have to be either
ephemeral-storage
(local_storage in the @resources) or an attached volume (PVC).
s
Yes, I was wondering if it is feasible for the user to specify just
disk
which works for say 80% of the use cases, but we still expose setting
volumes
and
volume_types
as advanced functionality. That way different enterprises can configure their deployments appropriately by setting global environment variables in the user's workstation.
r
What would
disk
mean?
Maybe it makes sense in the AWS Batch context? I'm guessing it means total available storage on root file system?
💯 1
s
A concrete idea would be to say support
disk
in
@resources
and
volume
in
@k8s
and if we are able to map
disk
appropriately to a specific volume configuration in k8s we can allow folks to set
disk
at the resource level
r
Straw-man of how this looks like from the customer perspective:
Copy code
@kubernetes(volume="11G") # attaches a volume to a default /opt/metaflow_volume
@step
def start(self):
  pass
Copy code
@resources(disk="110G") # works on AWS batch, and on k8s "disk" maps to ephemeral-storage!
@step
def start(self):
  pass
Copy code
@kubernetes(volume="11G") # attaches a volume to a default /opt/metaflow_volume
@resources(disk="110G") # ephemeral-storage of 110G
@step
def start(self):
  pass
thoughts? 💭
s
Makes sense. Is the
volume
for
kubernetes
a persistent volume?
@narrow-lion-2703 What are your thoughts?
r
yes - a PVC attached to the step only, until EKS supports https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/ , but that would be transparent to the user. In both cases they'd get 110G in /opt/metaflow_volume
However, on another thought, must volume be
@kubernetes
only concept? Why not make it meta and a
@resources
concept as originally proposed? AWS Batch may not support it, that's fine.
Copy code
@resources(volume="11G") # attaches a volume to a default /opt/metaflow_volume
@step
def start(self):
  pass
Copy code
@resources(disk="110G") # works on AWS batch, and on k8s "disk" maps to ephemeral-storage!
@step
def start(self):
  pass
Copy code
@resources(disk="110G", volume="11G")) # ephemeral-storage of 110G and attaches a volume to a default /opt/metaflow_volume
@step
def start(self):
  pass
s
And what use cases would a PVC attached to a step support which ephemeral storage wouldn't? (except for the fact that EKS doesn't support ephemeral storage at the moment)
Is there a use case for
Copy code
@resources(disk="110G", volume="11G")) # ephemeral-storage of 110G and attaches a volume to a default /opt/metaflow_volume
@step
def start(self):
  pass
?
r
Imagine an ML library that uses "/tmp/" or some user space disk path, and you don't have the wherewith-all to update it. And you'd like to limit ephemeral-storage (disk) usage in k8s, and mount a very large say 1TB volume?
👍 1
PVCs can span steps and even runs (existing long term disks), which coming EKS ephemeral-volumes does not support. I think that this is a lower priority use case..
s
Yeah, with storage that spans steps/runs we need to be a bit careful and our guidance so far has been to use
s3
alongside the
metaflow.s3
client. That also ensures that artifacts stored as visible and recoverable later using the Metaflow client.
1
n
i'm curious how do you delete PVCs there @rough-terabyte-71304
ephemeral volumes indeed would make a lot of this easier..
my hunch would be for now to expose parts of pod spec directly as parameters to
@kubernetes
, but it seems like it wouldn't work for non-ephemeral as you still have to create a separate PVC resource and manage its lifecycle somehow
r
Hi Oleg, we don't delete the PVCs... we let them be garbage collected when the workflow is TTL'ed, by setting the owner reference to the Argo workflow
👍 1