Hi team, Was wondering if I could kindly ask for ...
# ask-metaflow
a
Hi team, Was wondering if I could kindly ask for some assistance. I'm working on a demo for LLM Fine Tuning using PyTorch Lightning that leverages Metaflow's
@pytorch_parallel
decorator. I'm still rather new to distributed training, so learning the ropes. I've tried
fsdp
to do sharded training, but it was having some trouble with the model parameters. I was getting this error -
valueerror: optimizer got an empty parameter list
. When I printed out the params though, they're definitely there. I also tried
ddp
as the strategy, which didn't give any explicit error, but it just stalled and didn't have any progress output in the stdout logs... I have a reproducible example here - https://github.com/rileyhun/llm_finetuning_metaflow/blob/main/gpt-j-8bit-flow.py. Any pointers or guidance would be greatly appreciated.
1