Has anyone had any luck installing a private pytho...
# ask-metaflow
h
Has anyone had any luck installing a private python package (in the case it happens to be using google artifact registry) with the
@pypi_base
decorator and the metaflow-netflixext package? When running this locally, I seem to be running into all sorts of issues getting this to work.
1
h
Yes used to use it with AWS CodeArtifact. What issues are you seeing?
h
I'm having an issue when initializing. I had assumed when using the
@pypi_base
decorator I would need to use the
--environment=pypi
parameter based on Metaflow's docs, but that results in an error message:
Copy code
Metaflow 2.15.4+netflix-ext(1.2.3) executing AnalyticsFlow for user:patricktravis
    Incompatible environment:
    The pypi_base decorator requires --environment=conda
When using the
--environment=conda
parameter, there seems to be an issue grabbing my conda environment version. I raised an issue here, but I think it could potentially be a bug? But pretty much I'm just running:
Copy code
python analytics_flow.py --environment=conda run
d
Ah right. Sorry I saw this and didn’t respond. Use mamba < 2 for now. I have the fix internally.
And it’s always environment=conda for either conda or pypi
I’ll add the alias if it is confusing. Pypi environment is literally there so people don’t have to see conda but it does the same thing.
h
Ah I gotcha. Assuming it's just not hooked up rn given this error when attempting to use
--environment=pypi
?
Copy code
Metaflow 2.15.4+netflix-ext(1.2.3) executing AnalyticsFlow for user:patricktravis
    Incompatible environment:
    The pypi_base decorator requires --environment=conda
d
Right, in the mainline pypi
--environment=conda
== `--environment=pypi`(and I really mean that 🙂 : https://github.com/Netflix/metaflow/blob/master/metaflow/plugins/pypi/pypi_environment.py#L4). I did not do that in the extension.
😄 2
h
Seems to be working with the downgrade! Last question: what's a reasonable amount of time for environment solving, and potentially some advice on speeding this up? I generally don't use conda, but I've basically been stuck on this step for over an hour. Tried both being more specific (ie. every pip package currently installed plus version), and reducing the number of packages specified (just the custom package I'm attempting to install) and I haven't gotten passed this step:
Copy code
Bootstrapping Conda environment... (this could take a few minutes)
    Resolving 1 environment ...
d
Is your environment solely composed of pypi packages? With nothing else?
You can see where it is « stuck » by running with METAFLOW_DEBUG_CONDA=1 as env var. that will print out the command it is executing. Typically it will be a pip command.
h
Seems like it's hung up on retrieving the private python package. I attempted to install (with a recently refreshed token) both leaving it to default credentials, and attempting the add the token to the url like so:
Copy code
import urllib

from google.auth import default
from google.auth.transport.requests import Request

def refresh_access_token() -> str:
    credentials, _ = default()
    credentials.refresh(Request())
    return credentials.token

GCP_ARTIFACT_REGISTRY_ACCESS_TOKEN = refresh_access_token()



@pypi_base(
    python="3.11",
    extra_indices=[
        urllib.parse.quote(
            f"https://{GCP_ARTIFACT_REGISTRY_ACCESS_TOKEN}@my/artifact/registry/url",
            safe=""
        )

    ],
    packages={"my_package": "0.2.5"}
)
Any advice on auth here?
h
You seem to be escaping the whole url (including the https:// part). Can you try with only escaping the token?
h
Tried and still running into an auth issue (401). My general process for auth/installing packages so far has been to have all the packages installed via poetry before running locally like so:
Copy code
gcloud auth login
poetry config http-basic.my-source oauth2accesstoken $(gcloud auth print-access-token)
h
side question: is there a reason you need to refresh the token inside the flow? the credentials are only needed when deploying the flow, so if you have it in your pip.conf it will work too. after that the env is already resolved and cached
can you confirm that the url you have generated works with pip directly? ie
pip install xyz --extra-index-url=<your_generated_url>
ah.. looks like the format is like this:
Copy code
extra-index-url = <https://oauth2accesstoken>:$(gcloud auth print-access-token)@your-repository-url/simple/
h
@hundreds-rainbow-67050 maybe I don't, but still getting familiar with our setup. Right now we have flows separated by repo (largely an integration thing for the Data Science team I'm supporting), and each are being deployed on GKE/Argo using an image we build in a separate process that's almost completely decoupled from this deployment. As a result, we are pretty much constrained to using only packages on that image, unless we dynamically install/import the package in a step/flow. That's not an amazing approach for a number of reasons (hence, why I'm trying to get this decorator to work). The reason I threw the token refresh in was because I assumed that this decorator would get run each time the flow was triggered, and given the default token extirpation of 1 hour, I assumed this would cause issues otherwise (even if it's cached, I'm assuming that if the cache needs to be refreshed for whatever reason, we would still want this to be able to authenticate and load the package on refresh). Hopefully I have some missed assumptions here and the process can be easier 🙂
d
I actually have something with tokens but haven’t tested it (the person it was for dissapeared). https://github.com/Netflix/metaflow-nflx-extensions/pull/41
But it should work without that as Nissan mentioned.
to clarify: the only time we access the repo is when resolving (not running)
this is why the token is only needed then.
h
Adding keyring would be excellent
And that makes sense in terms of refresh
Also, in terms of getting this to build, it seems that updating the token formatting worked, but now access to the specific imports seems off. ie. it looks like someone had previously imported something like this WITHIN a step:
Copy code
from daily_aggregates.utils.utils import read_config
and I'm getting a ModuleNotFound error:
Copy code
ModuleNotFoundError: No module named 'daily_aggregates'
I guess my question then becomes, is the flow package not automatically included in the
@pypi_base
install?
d
the flow package (and by that I suppose you mean the code from the flow) is definitely included. could you sharea bit more detail about how the code is structured?
h
The flow is contained in this top level py file, and the directory structure is included below: Flow py file:
analytics_flow.py
Structure:
Looks like previously they were also adding
src
to the path in each step dynamically:
Copy code
sys.path.insert(0, "src")
Ah, removing src as the top level solved this
Appreciate all the help guys! This was really awesome 🙂
👍 1