Here is some pseudo code, I would like to run in p...
# ask-metaflow
h
Here is some pseudo code, I would like to run in parallel to a degree. Let us assume I have 10 folds to establish, if some new approach works better. So we may have a incumbent vs challenger or the like situation:
Copy code
@step
def train_models(self):
    
    for fold in self.fold_number_list:

        some_train_data = self.some_data[
            (self.some_data != fold) &
        ...
        # fit incumbent model
        # fit challenger model
        # establish some performance metric(s) on test/validation etc.
I used AWS batch previously, but still think it actually works sequentially from what I have observed (or possibly this is how our AWS infra is configured?). What other approach(s) do people tend to use to speed things up - ideally reduce the runtime to a 10th-ish in this example (scaling out wise that is)? Thanks.
1
😃 Anyone?
f
It looks like you have a single model but you want to train it in parallel? This is kind of distributed training. Maybe you can find something here https://outerbounds.com/blog/distributed-training-with-metaflow/
h
Thanks for the reply. Not sure if the above is applicable as I only want to fit bread and butter/basic xgb models. So all 10 models are fitted with 1/(n-1)th of the data and tested on 1/nth of the data where n is the number of folds. So a foreach should work. I used AWS batch for this in the past but as far as I could tell all n jobs where executed as batch/sequentially … so not in 1/nth-ish of the time …
Having said that. Maybe (most likely), I do not use AWS batch + foreach correctly and should study this: https://docs.aws.amazon.com/batch/latest/userguide/multi-node-parallel-jobs.html and the @parallel decorator more, which may indeed behave differently to @foreach
d
h
Thanks sure I am aware of foreach but in our aws batch the steps appear to be still executed sequentially …