Are there any recommendations about setting a TTL ...
# ask-metaflow
e
Are there any recommendations about setting a TTL for Metaflow objects in S3 / other object stores? https://github.com/Netflix/metaflow/issues/21 only says that the TTL needs to be longer than the lifetime of the workflow, not sure if that has changed in the last six years. Any info would be great!
āœ… 1
v
the advice is based on the behavior of the content-addressed store (CAS) which is used to store artifacts: 1. Each flow (name) has a CAS of its own 2. Outside flow execution, CAS is accessed by the Client API to retrieve artifacts 3. During flow execution, if an entry is missing in CAS, a new one is created automatically Consequently, if you don't care about a specific flow, you can safely delete (TTL away) its contents without affecting other flows. If you datastore allows prefix-specific TTLs (like S3), you can use this for safe garbage collection. The path structure has flow names as a prefix for this purpose by design. In contrast, if you want to TTL artifacts of an active flow, note the following: • If an artifact disappears during flow execution, any task trying to access the value will fail. However this should get fixed by re-executing a flow, as missing values will get re-created automatically. If you can design your system with this behavior in mind, you should be able to set aggressively short TTLs. • A client trying to access a missing artifact will fail. Metaflow uses some artifact will internal bookkeeping, so the client may fail in unexpected ways if artifacts have been TTL'ed. If you don't use the client to access results outside flow execution, this isn't an issue.
hence the safest approach is to use e.g. S3 Intelligent Tiering to move infrequently accessed objects to cheaper tiers, and then TTL objects away from lower tiers
if this is a big need for you, please leave a comment in this long-standing issue
e
Makes sense. Thanks for the advice!
šŸ‘ 1