Hi, is it possible to set lifecyle on the s3 bucke...
# dev-metaflow
f
Hi, is it possible to set lifecyle on the s3 buckets used for the datastore? I wonder after a while there might be lot of logs and artefacts lying around there.
1
v
yes. You can set a lifecycle policy on the bucket used by Metaflow. A gotcha is that artifacts are shared across runs so old objects may be used by newer runs too. If an old object is suddenly deleted by a lifecycle policy, a task may fail but @retry should take care of it. Another safer option is S3 intelligent tiering which can move least frequently accessed objects to a cheaper storage tier where you can delete them safely
c
A related question, Is there a DB script or process to cleanup older data in the Metaflow DB? The
artifact_v3
table in our deployment has ~30 million rows
f
ah yes I forgot there is a DB that needs to be cleaned too.