Hi team! I'm using the `Runner` API with `decospec...
# ask-metaflow
b
Hi team! I'm using the
Runner
API with
decospecs
as an argument to the
run()
method. I need to use the AWS Batch operator with different CPU and Memory specifications and at the moment I'm using as
batch:cpu=1, memory=1024
. Can I specify these compute specs for each individual steps?
c
Hi Bruno! A few approaches that come to mind are: 1) use the @batch step decorator to target steps directly on the flow definition, 2) use the @resource step decorator, and when running:
Copy code
with Runner("flow.py").run({"decospecs": ["batch"]}) as runner:
    ...
3) assuming all steps have the same config, leave steps without a compute/resources deco, and:
Copy code
with Runner("flow.py").run({"decospecs": ["batch:cpu=1,memory=1024"]}) as runner:
    ...
Do either of these make sense in your scenario?
b
I think the @resource step scenario works great for what I am trying to accomplish! I also need to check if I can parametrize that decorator with hydra configurations, but it's good enough! Thank you!
c
nice! there was a recent Metaflow release (I believe ≥2.13.4) that includes a new config management feature that is probably relevant here. These are the docs. Here is an example with Hydra.
b
I'm already using the Config class to load my hydra configurations, I could add them there too! Seeing that example, it seems that I just need to update my json config
noice 1
Quick question: For the deployer API, how can I pass the decorators
retry
and
batch
?
c
Maps onto CLI args in the same way the
Runner
does.
Copy code
deployer = Deployer('flow.py', decospecs=['retry'])
deployed_flow = deployer.step_functions().create()
FYI @batch or @kubernetes would be redundant in the Deployer case.
b
Thank you!
I came across with another problem: When passing my configs through the deployer, the only way that I can resolve interpolations and hydra specific instantiations (with target, for example) is by using the Compose API that I think it only accepts file paths, and not a python object like dict. Is there any way of still using these Hydra features without saving to a temporary file during the deployer/runner phase and loading it in the pipeline? I was resolving the interpolations during the deployer / runner phase, however some of them depend on current datetime (getting the current date and go back X, Y, Z days, for instance) so I need to do them when the pipeline is running. I could do it all in python, but I'm wondering if I there's a cleaner way of doing this @crooked-jordan-29960
c
Here is an example that uses instantiation and resolves it at flow runtime. That said, I'm still not entirely sure of the use case for resolving interpolations that depend on current datetime. Could you educate me on the benefits of doing that in hydra/omegaconf instead of plain Python?
b
for instance, my training/prediction job runs periodically and I want to retrieve the latest dataset. I configured my config like this:
Copy code
# dataset intervals
start_date: ${now:"%Y-%m-%d"} 
close_date: ${date_subtract_days:${.start_date}, 30}`
now is a hydra resolver and date_subtract_days is my resolver. I was already using the
instantiate
method, however it wasn't for the these dates because I do not have a dataclass / a target for them. I was trying to use OmegaConf.resolve, but it didn't work... I don't remember exactly why right now, but maybe if I create a class for them hydra may resolve them, I'm going to test it! the benefit is that I can specify a simple date like
2024-12-30
or
${date_subtract_days:%{.start_date}, 30}
without handling the logic in the class that builds the dataset itself.
c
ahh I see, that is an interesting idea to streamline the built in resolvers. I'll tinker with this a bit, shouldn't be too much of a diff.
Added to the example to dynamically set the window. It doesn't use the OmegaConf resolvers, as the code was getting very not clean haha. But it achieves the same goal without too much boilerplate (could easily hide in a Mixin class).
b
Thank you! It seems the best solution so far, having a single and simple
lookback_days
config
👍 1