I am getting this error when I am trying to error ...
# ask-metaflow
s
I am getting this error when I am trying to error in multi-gpu training on metaflow, using
python -m torch.distributed.launch --nproc_per_node=3 --node_rank=0 --master_port=8888
Copy code
2023-04-07 20:13:21.739 [206/train/707 (pid 32119)] [e50c4a8b-8e8a-44fc-bf67-5eb1dcc912fa]   File "trainMF.py", line 1205, in <module>                                                                          2023-04-07 20:13:21.740 [206/train/707 (pid 32119)] [e50c4a8b-8e8a-44fc-bf67-5eb1dcc912fa]     main()                                                                                                           2023-04-07 20:13:21.740 [206/train/707 (pid 32119)] [e50c4a8b-8e8a-44fc-bf67-5eb1dcc912fa]   File "trainMF.py", line 607, in main                                                                               2023-04-07 20:13:21.740 [206/train/707 (pid 32119)] [e50c4a8b-8e8a-44fc-bf67-5eb1dcc912fa]     transform=transform                                                                                              2023-04-07 20:13:21.740 [206/train/707 (pid 32119)] [e50c4a8b-8e8a-44fc-bf67-5eb1dcc912fa]   File "/root/metaflow/cottonDataExp.py", line 52, in __init__                                                       2023-04-07 20:13:21.740 [206/train/707 (pid 32119)] [e50c4a8b-8e8a-44fc-bf67-5eb1dcc912fa]     self.img_blobs = [img.blob for img in raw_imgs]                                                                  2023-04-07 20:13:21.740 [206/train/707 (pid 32119)] [e50c4a8b-8e8a-44fc-bf67-5eb1dcc912fa]   File "/root/metaflow/cottonDataExp.py", line 52, in <listcomp>                                                     2023-04-07 20:13:21.740 [206/train/707 (pid 32119)] [e50c4a8b-8e8a-44fc-bf67-5eb1dcc912fa]     self.img_blobs = [img.blob for img in raw_imgs]                                                                  2023-04-07 20:13:21.740 [206/train/707 (pid 32119)] [e50c4a8b-8e8a-44fc-bf67-5eb1dcc912fa]   File "/root/metaflow/metaflow/plugins/datatools/s3/s3.py", line 274, in blob
2023-04-07 20:13:21.741 [206/train/707 (pid 32119)] [e50c4a8b-8e8a-44fc-bf67-5eb1dcc912fa]     return f.read()
2023-04-07 20:13:21.741 [206/train/707 (pid 32119)] [e50c4a8b-8e8a-44fc-bf67-5eb1dcc912fa] OSError: [Errno 14] Bad address