Hi folks, I'm running into an error in my metaflo...
# ask-metaflow
b
Hi folks, I'm running into an error in my metaflow process and need some help to debug it. The code works fine outside of metaflow, but once I try to execute it within a metaflow step it reliably hits a "SIGKILL" message (locally) or exit 137 (on k8s). I get this message even with a massive (>80GB ram) resource allocation or local execution. Locally, I have profiled the code and it's <100MB of usage. Locally, I also don't see any entries in dmesg or system log. My memory usage on my local machine is within normal bounds at the time of the task being killed. This fails reliably in each of two steps: 1. Installing (or re-installing) the R package I am using, which includes compiling a model via cmdstanr. 2. Running (fitting) the model except when I limit to one process (chain) at a time. I have included a reproducible example of the first, and not of the second. I have a way to get around these two limitations (docker image & limiting to single process), but I'd like to be able to relax both. Are there other limits to the virtualenv in which steps run (file size? swap? multi-threading?) that I should be aware of? Any other hints to help debug this further? Here is a reproducible example using a public package (and the same pypi context our package requires). For me, it fails right during the
cmdstanr.install_cmdstan()
line:
Copy code
from metaflow import FlowSpec, step, pypi#, environment
import os

class MinimumFlow(FlowSpec):
    
    @step
    def start(self):
        self.next(self.install)
    
    # @environment(vars={'GITHUB_PAT': os.getenv('GITHUB_PAT')})
    @pypi(python='3.9.18', packages = {'numpy': '1.26.2', 'pandas': '1.5.3',
                                       'rpy2': '3.5.14', 'cmdstanpy': '1.2.0',
                                       'rpy2-arrow': '0.0.8'})
    @step
    def install(self):
        print("Installing packages")
        from rpy2.robjects.packages import importr
        os.environ['R_REMOTES_STANDALONE'] = "true"
        utils = importr('utils')
        utils.chooseCRANmirror(ind=1)
        print("Installing remotes")
        utils.install_packages('remotes')
        remotes = importr('remotes')
        print("Installing cmdstanr")
        remotes.install_github('stan-dev/cmdstanr', force = True)
        cmdstanr = importr('cmdstanr')
        print("Checking cmdstan requirements")
        cmdstanr.check_cmdstan_toolchain()
        print("Installing cmdstan")
        cmdstanr.install_cmdstan()
        print("Packages installed")
        self.next(self.end)

    @step
    def end(self):
        print("Flow is done!")

if __name__ == "__main__":
    MinimumFlow()