Hi team, sorry this could be a vague question, but...
# ask-metaflow
r
Hi team, sorry this could be a vague question, but I couldn’t comprehend how Internals of Dependency Management works 🤯 > It ships the locally resolved environment for remote execution, even when the remote environment uses a different operating system and CPU architecture than the client (OS X vs. Linux). 1. In the picture (attached in thread), during “download packages” and “upload packages to datastore” step, does it download
whl
for every possible platform and upload it to datastore? 2. At this point, metaflow only knows what local env is (say, macos) and do not yet know about remote env (whatever my argo pod is running) right? 3. when remote instance is hydrating the environment, it’ll search the datastore for its platform-specific wheel??? Is this about right? For context, I’m exploring how to support source distribution (tar.gz) for
@pypi
, it seems the remote instance has to run pip install from
tar.gz
since local machine may not build the wheel for remote machine.
1
cc @dry-beach-38304 as this relates one of prev thread you helped on.
message has been deleted
a
hi! we make assumption on the target platform (most likely you are using linux-64, but we also support linux-aarch64 - but need to know that ahead of time) - we cannot simply upload wheels for multiple architectures, because there is no guarantee that the solved output will be the same
💡 1
ideally - you wouldn't want to run pip install from tar.gz on the remote machine since that isn't going to be deterministic and you may inadvertently ddos your package repository since pip install from tar.gz will very likely result in a fresh solve (which isn't going to stable either)
💡 1
d
To add a bit: • source distributions are not buildable for remote architectures even with the Metaflow extensions (well, caveat: if they actually have an arch specific component). This is because cross platform building is hard. • the metaflow extension DOES support building environments for multiple architectures (that caveat above none-withstanding — as in, pure conda envs, ones that use source distributiosn but don’t have an arch component, etc) and Metaflow will properly know that they are the “same” environments (so you can execute on different archs). • I have a pattern (I think I linked to it earlier) about how you can use a trampoline node to build your environment properly.
r
Thanks all! (and hi Savin here’s the thread i posted my setup and what I’m trying to do) So my understanding now is that it’s preferred to: 1. build the wheel for target platform linux-64 on my local machine macos. a. > but cross platform build is hard. i. since I have control of my remote instance, maybe I can hardcode the platform and use a specific container to build it? ii. Is the
trampoline node
pattern to help resolve this? (@dry-beach-38304 i don’t think i got the link. appreciate if you can attach it again, thanks! ) 2. then upload it to datastore.
a
yeah we went down the path of having a low latency builder to work around these limitations and directly bake the container image
👀 1