Hi, I had a quick question about `merge_artifacts`...
# ask-metaflow
l
Hi, I had a quick question about
merge_artifacts
. In one of our use cases, we have an object persisted as, say
self.a
in an upstream step.
self.a
has a nested class structure, but fundamentally just contains methods/attributes with pandas dataframes/list/dict/int/float/str.
self.a
gets passed into a
foreach
step and is not modified, but then a downstream
merge_artifacts
call is unable to reconcile the hashes for this attribute. Would you have any idea on what could cause this? Manually comparing attributes of
self.a
across the inputs show that the objects contain the same data, so I'm wondering if there's something that could affect the hash without modifying the object explicitly?
d
hello — you have unfortunately run into a problem with
merge_artifacts
that is well known but hard to solve. The basic issue is that pickle isn’t stable and so in your foreach if you access
self.a
, it will be loaded into the step and then, at the end of the step, we try to pickle it again. If it isn’t modified, you would hope that the new pickled representation is the same but that is not always the case (it’s frequently the case). When merge artifact comes along, it looks just at hashes to avoid having to load all the artifacts (and also comparing them would be hard).
That’s the issue
The solution is not ideal but does work, you can just do:
Copy code
self.a = inputs[0].a
before your call to
merge_artifacts
If you know for a fact that all `a`s are the same then you can pick any of them (so the 0th one is sure to exist) and
merge_artifacts
will then ignore it when it is tryingto merge the other articats.
l
Are there any other checks we can do? I think the part I'm concerned with is that I know right now all the
a
s are the same, but someone may inadvertently make a change in the future which breaks this and we'd want to know if that happens.
d
unfortunately, unless you compare them yourself manually, it would be a bit tough to be 100% sure. You could try moving to a DS that is more pickle stable if that is possible.