Hey all, is there any way I can get FFMPEG on a ba...
# ask-metaflow
c
Hey all, is there any way I can get FFMPEG on a batch instance using the @Batch decorator and @pypi? I know its a CLI, is there some workaround or trick I can use to have that library on the machine after its bootstrapped the environment?
c
Not that I’m aware of via pypi only, unfortunately. Last time I tried had to install it in a base docker image at the system level.
c
Yeah! Your blog post was great @crooked-jordan-29960 and I'm trying to do a similar thing to your whisper post. The only difference being using the transformers library to use a more performant version of whisper
c
Nice, would be good to update that content btw, many improvements to whisper models since. If you’re able to share any findings it would be awesome.
c
The community has really taken the model and run with it, there's some cpp implementations that are being wrapped that are 80%+ faster than the original models, as well of some of the new inference engines like JAX showing insane quality and speed increases. Its a crazy time to be alive 😄
Not to mention things like Groq or BetterTransformer working at a lower level to improve inference times
c
For sure, whisper.cpp and groq are both mind blowing.
on the install part, theoretically I think conda could do this, so quite possible I missed some viable package/channel combination there.
d
if you use the bleeding edge decorators, you can definitely do that with a mixture of pypi and conda decorator
noice 1
this exact use case (ffmpeg) is actually in use currently
it also supports an extension to requirements.txt
--conda-pkg ffmpeg
as an example.
v
this seems to work out of the box
Copy code
@conda(libraries={'ffmpeg': '7.0.2'})
    @step
    def start(self):
        from subprocess import call
        call(['ffmpeg'])
        self.next(self.end)
noice 1
d
Yes. It will work. If you need both Conda and pypi packages though it won’t.
Technically it could also be supported for regular decorators too for non Python packages in Conda but I don’t think it is. It doesn’t require resolving with Conda lock in that case but a few extra checks are needed.
c
The next hurdle, I'm developing on a mac currently and don't have glibc obviously so mamba struggles to bootstrap the environments due to ffmpeg and torch both needing glibc. I think I can work around this one unless its also a solved problem 🙂 . Also, I feel honored to have both Romain and Ville in the thread 😄
happydance 1
d
Whr are you doing to resolve ? At least for the bleeding edge one, if it knows you are resolving for remote execution (batch, kube, etc), it will inject a glibc dependency.
For regular one, depending on what you are doing, you may be able to inject it manually.
c
The decorators are as follows on the step:
Copy code
@batch(cpu=2, memory=12288)
    @conda(python='3.10.8', packages={'boto3':'1.35.1', 'transformers':'4.44.0', 'pytorch':'2.3.1', 'numpy':'1.23.1','ffmpeg':'7.0.2'})
    @step
    def step_name(self):
The cli is just
python main.py --environment conda run
Output failure:
Copy code
2024-08-21 08:24:19.697 Bootstrapping virtual environment(s) ...
    Micromamba ran into an error while setting up environment:
    command '/Users/daniel.mcgoldrick/.metaflowconfig/micromamba/bin/micromamba create --yes --quiet --dry-run --no-extra-safety-checks --repodata-ttl=86400 --retry-clean-cache --prefix=/var/folders/2l/zhhpcjjx6sqct256svgv1myh0000gp/T/tmpg5hcms0k/prefix --channel=conda-forge requests==>=2.21.0 boto3==1.35.1 transformers==4.44.0 pytorch==2.3.1 numpy==1.23.1 ffmpeg==7.0.2 python==3.10.8' returned error (1)
    nothing provides __glibc >=2.17,<3.0.a0 needed by pytorch-2.3.1-cpu_generic_py310ha4c588e_0
    nothing provides __glibc >=2.17,<3.0.a0 needed by ffmpeg-7.0.2-gpl_h0db5852_100
Same issue with
--with batch
appended at runtime
d
You can try setting CONDA_OVERRIDE_GLIBC to 2.35 for example
1
c
Overriding using the env variable worked! Thank you @dry-beach-38304
d
cool