melodic-train-1526
11/22/2022, 4:25 PMancient-application-36103
11/22/2022, 4:54 PMmelodic-train-1526
11/22/2022, 4:55 PMmelodic-train-1526
11/22/2022, 6:14 PMclass InferentiaCompiledModel:
@classmethod
def save_model(cls, model):
with tempfile.NamedTemporaryFile() as f:
torch.jit.save(model, f.name)
return f.read()
@classmethod
def load_model(cls, blob):
with tempfile.NamedTemporaryFile() as f:
f.write(blob)
f.flush()
return torch.jit.load(f.name)
So we can do (at the end of step 2 above) self.model_obj = InferentiaCompiledModel(compiled_model)
. This can be read in step 3 similarly (which is a parallel AWS Batch step) : model = InferentiaCompiledModel(self.model_obj)
and this works very nicely when orchestrating our code locally.
However, when using step functions, this fails weirdly on the load step (step 3) with some failure to initialise neuron error that comes from this loading line. The docker images seem fine, the inferentia capable machine is running the step and the docker image itself has neuron enabled, as I say everything is the same as the one where we orchestrate this locally. Is there anything you can think that would be messing this up.
Note, there is a small ‘launching’ intermediate step which is sandwiched between the compile step (which runs on AWS batch but only one machine) and the inference step (which runs on AWS batch in a parallel way). This is the only difference between these two methods, where in the local orchestrator, this step receives the self. model_obj from AWS and passes it to the parallel step whereas on Step Functions, this is all handled by AWS. The flow graph is attached as reference if any of this is unclear.
I will say this is, in my opinion shouldn’t affect the PR as the container with inf enabled fires up correctly according to the decorator.melodic-train-1526
11/22/2022, 6:19 PMmelodic-train-1526
11/24/2022, 8:56 AMsquare-wire-39606
11/30/2022, 7:01 PMdry-beach-38304
11/30/2022, 7:04 PMdry-beach-38304
11/30/2022, 7:05 PMmelodic-train-1526
11/30/2022, 7:10 PMdry-beach-38304
11/30/2022, 7:12 PMdry-beach-38304
11/30/2022, 7:13 PMmelodic-train-1526
11/30/2022, 7:14 PMdry-beach-38304
11/30/2022, 7:50 PMdry-beach-38304
11/30/2022, 7:50 PMdry-beach-38304
11/30/2022, 7:51 PMmelodic-train-1526
12/02/2022, 5:28 PMdry-beach-38304
12/02/2022, 5:45 PMmelodic-train-1526
12/05/2022, 1:48 PMancient-application-36103
12/05/2022, 3:42 PMmelodic-train-1526
12/13/2022, 12:35 PMancient-application-36103
12/13/2022, 3:55 PMmelodic-train-1526
12/19/2022, 12:15 PMancient-application-36103
12/19/2022, 3:04 PMmelodic-train-1526
12/19/2022, 3:07 PM