Has anyone had any luck installing a private python package Outerbounds #ask-metaflow

Has anyone had any luck installing a private pytho...

hundreds-policeman-70711

03/18/2025, 11:13 PM

Has anyone had any luck installing a private python package (in the case it happens to be using google artifact registry) with the

@pypi_base

decorator and the metaflow-netflixext package? When running this locally, I seem to be running into all sorts of issues getting this to work.

✅ 1

hundreds-rainbow-67050

03/18/2025, 11:16 PM

Yes used to use it with AWS CodeArtifact. What issues are you seeing?

hundreds-policeman-70711

03/18/2025, 11:33 PM

I'm having an issue when initializing. I had assumed when using the

@pypi_base

decorator I would need to use the

--environment=pypi

parameter based on Metaflow's docs, but that results in an error message:

Copy code

Metaflow 2.15.4+netflix-ext(1.2.3) executing AnalyticsFlow for user:patricktravis
    Incompatible environment:
    The pypi_base decorator requires --environment=conda

When using the

--environment=conda

parameter, there seems to be an issue grabbing my conda environment version. I raised an issue here, but I think it could potentially be a bug? But pretty much I'm just running:

Copy code

python analytics_flow.py --environment=conda run

dry-beach-38304

03/19/2025, 12:03 AM

Ah right. Sorry I saw this and didn’t respond. Use mamba < 2 for now. I have the fix internally.

dry-beach-38304

03/19/2025, 12:04 AM

And it’s always environment=conda for either conda or pypi

dry-beach-38304

03/19/2025, 12:05 AM

I’ll add the alias if it is confusing. Pypi environment is literally there so people don’t have to see conda but it does the same thing.

hundreds-policeman-70711

03/19/2025, 12:16 AM

Ah I gotcha. Assuming it's just not hooked up rn given this error when attempting to use

--environment=pypi

Copy code

Metaflow 2.15.4+netflix-ext(1.2.3) executing AnalyticsFlow for user:patricktravis
    Incompatible environment:
    The pypi_base decorator requires --environment=conda

dry-beach-38304

03/19/2025, 1:43 AM

Right, in the mainline pypi

--environment=conda

== `--environment=pypi`(and I really mean that 🙂 : https://github.com/Netflix/metaflow/blob/master/metaflow/plugins/pypi/pypi_environment.py#L4). I did not do that in the extension.

😄 2

hundreds-policeman-70711

03/19/2025, 4:01 AM

Seems to be working with the downgrade! Last question: what's a reasonable amount of time for environment solving, and potentially some advice on speeding this up? I generally don't use conda, but I've basically been stuck on this step for over an hour. Tried both being more specific (ie. every pip package currently installed plus version), and reducing the number of packages specified (just the custom package I'm attempting to install) and I haven't gotten passed this step:

Copy code

Bootstrapping Conda environment... (this could take a few minutes)
    Resolving 1 environment ...

dry-beach-38304

03/19/2025, 6:44 AM

Is your environment solely composed of pypi packages? With nothing else?

dry-beach-38304

03/19/2025, 6:45 AM

You can see where it is « stuck » by running with METAFLOW_DEBUG_CONDA=1 as env var. that will print out the command it is executing. Typically it will be a pip command.

hundreds-policeman-70711

03/19/2025, 1:33 PM

Seems like it's hung up on retrieving the private python package. I attempted to install (with a recently refreshed token) both leaving it to default credentials, and attempting the add the token to the url like so:

Copy code

import urllib

from google.auth import default
from google.auth.transport.requests import Request

def refresh_access_token() -> str:
    credentials, _ = default()
    credentials.refresh(Request())
    return credentials.token

GCP_ARTIFACT_REGISTRY_ACCESS_TOKEN = refresh_access_token()



@pypi_base(
    python="3.11",
    extra_indices=[
        urllib.parse.quote(
            f"https://{GCP_ARTIFACT_REGISTRY_ACCESS_TOKEN}@my/artifact/registry/url",
            safe=""
        )

    ],
    packages={"my_package": "0.2.5"}
)

hundreds-policeman-70711

03/19/2025, 1:33 PM

Any advice on auth here?

hundreds-rainbow-67050

03/19/2025, 1:59 PM

You seem to be escaping the whole url (including the https:// part). Can you try with only escaping the token?

hundreds-policeman-70711

03/19/2025, 2:12 PM

Tried and still running into an auth issue (401). My general process for auth/installing packages so far has been to have all the packages installed via poetry before running locally like so:

Copy code

gcloud auth login
poetry config http-basic.my-source oauth2accesstoken $(gcloud auth print-access-token)

hundreds-rainbow-67050

03/19/2025, 2:17 PM

side question: is there a reason you need to refresh the token inside the flow? the credentials are only needed when deploying the flow, so if you have it in your pip.conf it will work too. after that the env is already resolved and cached

hundreds-rainbow-67050

03/19/2025, 2:19 PM

can you confirm that the url you have generated works with pip directly? ie

pip install xyz --extra-index-url=<your_generated_url>

hundreds-rainbow-67050

03/19/2025, 2:25 PM

ah.. looks like the format is like this:

Copy code

extra-index-url = <https://oauth2accesstoken>:$(gcloud auth print-access-token)@your-repository-url/simple/

hundreds-policeman-70711

03/19/2025, 4:21 PM

@hundreds-rainbow-67050 maybe I don't, but still getting familiar with our setup. Right now we have flows separated by repo (largely an integration thing for the Data Science team I'm supporting), and each are being deployed on GKE/Argo using an image we build in a separate process that's almost completely decoupled from this deployment. As a result, we are pretty much constrained to using only packages on that image, unless we dynamically install/import the package in a step/flow. That's not an amazing approach for a number of reasons (hence, why I'm trying to get this decorator to work). The reason I threw the token refresh in was because I assumed that this decorator would get run each time the flow was triggered, and given the default token extirpation of 1 hour, I assumed this would cause issues otherwise (even if it's cached, I'm assuming that if the cache needs to be refreshed for whatever reason, we would still want this to be able to authenticate and load the package on refresh). Hopefully I have some missed assumptions here and the process can be easier 🙂

dry-beach-38304

03/19/2025, 4:22 PM

I actually have something with tokens but haven’t tested it (the person it was for dissapeared). https://github.com/Netflix/metaflow-nflx-extensions/pull/41

dry-beach-38304

03/19/2025, 4:22 PM

But it should work without that as Nissan mentioned.

dry-beach-38304

03/19/2025, 4:23 PM

to clarify: the only time we access the repo is when resolving (not running)

dry-beach-38304

03/19/2025, 4:23 PM

this is why the token is only needed then.

hundreds-policeman-70711

03/19/2025, 4:29 PM

Adding keyring would be excellent

hundreds-policeman-70711

03/19/2025, 4:31 PM

And that makes sense in terms of refresh

hundreds-policeman-70711

03/19/2025, 4:42 PM

Also, in terms of getting this to build, it seems that updating the token formatting worked, but now access to the specific imports seems off. ie. it looks like someone had previously imported something like this WITHIN a step:

Copy code

from daily_aggregates.utils.utils import read_config

and I'm getting a ModuleNotFound error:

Copy code

ModuleNotFoundError: No module named 'daily_aggregates'

I guess my question then becomes, is the flow package not automatically included in the

@pypi_base

install?

dry-beach-38304

03/19/2025, 4:53 PM

the flow package (and by that I suppose you mean the code from the flow) is definitely included. could you sharea bit more detail about how the code is structured?

hundreds-policeman-70711

03/19/2025, 4:58 PM

The flow is contained in this top level py file, and the directory structure is included below: Flow py file:

analytics_flow.py

Structure:

hundreds-policeman-70711

03/19/2025, 5:26 PM

Looks like previously they were also adding

src

to the path in each step dynamically:

Copy code

sys.path.insert(0, "src")

hundreds-policeman-70711

03/19/2025, 5:37 PM

Ah, removing src as the top level solved this

hundreds-policeman-70711

03/19/2025, 6:57 PM

Appreciate all the help guys! This was really awesome 🙂

👍 1

5 Views

Open in Slack

Previous Next