Hi, I have a question around artifacts, and huge a...
# ask-metaflow
n
Hi, I have a question around artifacts, and huge amounts of time being wasted between steps. Issues: 1. See first_step. The print step shows me that the step took 8 minutes to run, however, the time it takes between that print statement (the last line of code before the self.next) and for the "task completed" message is >5minutes! I assume this time is spent in passing the artifacts to the next step (?) 2. In the second_step, it takes me 10 minutes to retrieve the artifact (df). So in total, it takes me an additional 15minutes of runtime, just because I've split up the steps. I'd like to have the steps, for a number of reasons (the same reasons most people use steps in metaflow). So before I jump into the obvious solution of combining into a single large step, I'd like to see if there's anything obvious I'm missing, or can do to avoid this major time wastage. Here's some skeleton code to supplement my text , above
Copy code
import time

    @batch(cpu=8, memory=125000, use_tmpfs=True)
    @step
    def first_step(self):
        start_time = time.time()
        # df = blah##
        # self.df = df
        print("time for this step: {} seconds".format(time.time() - start_time))
        #### takes 5 minutes between this print statement and to finally say "task is completed"
        self.next(self.second_step)

    @batch(cpu=8, memory=125000, use_tmpfs=True)
    @step
    def second_step(self):
        start_time = time.time()
        retrieved_df = self.df # this takes 10 minutes!!!
        print("time for retrieving artifacts: {} seconds".format(time.time() - start_time))
        self.next(self.second_step)