Hey all is there any way I can get FFMPEG on a batch instanc Outerbounds #ask-metaflow

Hey all, is there any way I can get FFMPEG on a ba...

clean-megabyte-14460

08/20/2024, 11:16 PM

Hey all, is there any way I can get FFMPEG on a batch instance using the @Batch decorator and @pypi? I know its a CLI, is there some workaround or trick I can use to have that library on the machine after its bootstrapped the environment?

crooked-jordan-29960

08/20/2024, 11:23 PM

Not that I’m aware of via pypi only, unfortunately. Last time I tried had to install it in a base docker image at the system level.

clean-megabyte-14460

08/20/2024, 11:25 PM

Yeah! Your blog post was great @crooked-jordan-29960 and I'm trying to do a similar thing to your whisper post. The only difference being using the transformers library to use a more performant version of whisper

crooked-jordan-29960

08/20/2024, 11:27 PM

Nice, would be good to update that content btw, many improvements to whisper models since. If you’re able to share any findings it would be awesome.

clean-megabyte-14460

08/20/2024, 11:28 PM

The community has really taken the model and run with it, there's some cpp implementations that are being wrapped that are 80%+ faster than the original models, as well of some of the new inference engines like JAX showing insane quality and speed increases. Its a crazy time to be alive 😄

clean-megabyte-14460

08/20/2024, 11:29 PM

Not to mention things like Groq or BetterTransformer working at a lower level to improve inference times

crooked-jordan-29960

08/20/2024, 11:31 PM

For sure, whisper.cpp and groq are both mind blowing.

crooked-jordan-29960

08/20/2024, 11:32 PM

on the install part, theoretically I think conda could do this, so quite possible I missed some viable package/channel combination there.

dry-beach-38304

08/21/2024, 12:17 AM

if you use the bleeding edge decorators, you can definitely do that with a mixture of pypi and conda decorator

noice 1

dry-beach-38304

08/21/2024, 12:17 AM

this exact use case (ffmpeg) is actually in use currently

dry-beach-38304

08/21/2024, 12:18 AM

it also supports an extension to requirements.txt

--conda-pkg ffmpeg

as an example.

victorious-lawyer-58417

08/21/2024, 12:30 AM

this seems to work out of the box

Copy code

@conda(libraries={'ffmpeg': '7.0.2'})
    @step
    def start(self):
        from subprocess import call
        call(['ffmpeg'])
        self.next(self.end)

noice 1

dry-beach-38304

08/21/2024, 3:24 AM

Yes. It will work. If you need both Conda and pypi packages though it won’t.

dry-beach-38304

08/21/2024, 3:25 AM

Technically it could also be supported for regular decorators too for non Python packages in Conda but I don’t think it is. It doesn’t require resolving with Conda lock in that case but a few extra checks are needed.

clean-megabyte-14460

08/21/2024, 3:17 PM

The next hurdle, I'm developing on a mac currently and don't have glibc obviously so mamba struggles to bootstrap the environments due to ffmpeg and torch both needing glibc. I think I can work around this one unless its also a solved problem 🙂 . Also, I feel honored to have both Romain and Ville in the thread 😄

happydance 1

dry-beach-38304

08/21/2024, 3:21 PM

Whr are you doing to resolve ? At least for the bleeding edge one, if it knows you are resolving for remote execution (batch, kube, etc), it will inject a glibc dependency.

dry-beach-38304

08/21/2024, 3:22 PM

For regular one, depending on what you are doing, you may be able to inject it manually.

clean-megabyte-14460

08/21/2024, 3:24 PM

The decorators are as follows on the step:

Copy code

@batch(cpu=2, memory=12288)
    @conda(python='3.10.8', packages={'boto3':'1.35.1', 'transformers':'4.44.0', 'pytorch':'2.3.1', 'numpy':'1.23.1','ffmpeg':'7.0.2'})
    @step
    def step_name(self):

The cli is just

python main.py --environment conda run

clean-megabyte-14460

08/21/2024, 3:25 PM

Output failure:

Copy code

2024-08-21 08:24:19.697 Bootstrapping virtual environment(s) ...
    Micromamba ran into an error while setting up environment:
    command '/Users/daniel.mcgoldrick/.metaflowconfig/micromamba/bin/micromamba create --yes --quiet --dry-run --no-extra-safety-checks --repodata-ttl=86400 --retry-clean-cache --prefix=/var/folders/2l/zhhpcjjx6sqct256svgv1myh0000gp/T/tmpg5hcms0k/prefix --channel=conda-forge requests==>=2.21.0 boto3==1.35.1 transformers==4.44.0 pytorch==2.3.1 numpy==1.23.1 ffmpeg==7.0.2 python==3.10.8' returned error (1)
    nothing provides __glibc >=2.17,<3.0.a0 needed by pytorch-2.3.1-cpu_generic_py310ha4c588e_0
    nothing provides __glibc >=2.17,<3.0.a0 needed by ffmpeg-7.0.2-gpl_h0db5852_100

clean-megabyte-14460

08/21/2024, 3:26 PM

Same issue with

--with batch

appended at runtime

dry-beach-38304

08/21/2024, 3:39 PM

You can try setting CONDA_OVERRIDE_GLIBC to 2.35 for example

✅ 1

clean-megabyte-14460

08/21/2024, 3:59 PM

Overriding using the env variable worked! Thank you @dry-beach-38304

dry-beach-38304

08/21/2024, 4:32 PM

cool

2 Views

Open in Slack

Previous Next