Hi! I'm trying to scale my flow up to 256 tasks in...
# ask-metaflow
i
Hi! I'm trying to scale my flow up to 256 tasks in parallel on GKE (with 128 workers everything works fine). I'm facing issue that my connection breaks (don't know if it's networking or ssh server issue) and processes getting killed. Had someone similar problems? If so how did you overcome them?
1
Ok, it seems the problem was in OOM. Btw why do we have such peak in RAM in flow init?
a
Where was the OOM happening?
i
On the host during tasks creation, at peak it was 25 Gb for 256 workers
a
What decorators are you using for your tasks?
i
Copy code
@kubernetes(
        memory=4096,
        cpu=8,
        image="xxxx",
        namespace="default",
        service_account="my-ksa",
        node_selector={
            "<http://cloud.google.com/compute-class|cloud.google.com/compute-class>": "Performance",
            "<http://cloud.google.com/machine-family|cloud.google.com/machine-family>": "n2d",
        },
    )
    @retry
    @step
a
Are there any flow level decorators? The only thing that’s worth of note here is that the code package is downloaded when the task in being initialized - depending on what you have in your code package, there might be a memory spike but not anything that will consume 4G
Do you see the oom before the - task is starting - logline?
i
No, I don't have flow decorators. Btw in my project I have following structure: • root ◦ flows ▪︎ A.py ▪︎ B.py ◦ other libs And I run
python flows/A.py run ....
I assume that only the flows folder is packed, is it true? Before the task is started.
a
Your current working directory is packaged. You can check what is packaged using the ‘package list’ command
i
But it packs only *.py files?
How to use 'package list'?
I see, it packed all my .venv🤦