lemon-magazine-70561
03/07/2025, 6:51 PMmerge_artifacts
. In one of our use cases, we have an object persisted as, say self.a
in an upstream step. self.a
has a nested class structure, but fundamentally just contains methods/attributes with pandas dataframes/list/dict/int/float/str.
self.a
gets passed into a foreach
step and is not modified, but then a downstream merge_artifacts
call is unable to reconcile the hashes for this attribute.
Would you have any idea on what could cause this? Manually comparing attributes of self.a
across the inputs show that the objects contain the same data, so I'm wondering if there's something that could affect the hash without modifying the object explicitly?dry-beach-38304
03/07/2025, 7:23 PMmerge_artifacts
that is well known but hard to solve. The basic issue is that pickle isn’t stable and so in your foreach if you access self.a
, it will be loaded into the step and then, at the end of the step, we try to pickle it again. If it isn’t modified, you would hope that the new pickled representation is the same but that is not always the case (it’s frequently the case). When merge artifact comes along, it looks just at hashes to avoid having to load all the artifacts (and also comparing them would be hard).dry-beach-38304
03/07/2025, 7:23 PMdry-beach-38304
03/07/2025, 7:23 PMself.a = inputs[0].a
before your call to merge_artifacts
dry-beach-38304
03/07/2025, 7:24 PMmerge_artifacts
will then ignore it when it is tryingto merge the other articats.lemon-magazine-70561
03/07/2025, 7:52 PMa
s are the same, but someone may inadvertently make a change in the future which breaks this and we'd want to know if that happens.dry-beach-38304
03/07/2025, 7:58 PM