The human-centric platform for production ML & AI

Outerbounds

image.png

Hi, I am trying to read images from a S3 bucket, the URI to the image is read from the CSV file, which is dumped to a pandas csv.
I am noticing a weird pattern in the reading time, after reading 4 images (with approx 0.07ms per image) the 5th image is taking approx 200 secs, there are 64 images in a batch and total 119 batches. This is slowing down the entire training pipeline.

 This is how my dataloader's getitem() looks like:
```def __getitem__(self, index):
        with S3() as s3:
            byte_data = s3.get(self.data.iloc[index]['url']).blob
        if byte_data is None:
            return self.__getitem__((index + 1) % len(self.data))
        img = Image.open(BytesIO(byte_data)).convert('RGB')
        if self.transform is not None:
            img = self.transform(img)
        if self.target_transform is not None:
            label = self.target_transform(self.data.iloc[index]['label'])
        else:
            label = self.data.iloc[index]['label']
        return img, label```
Is there anything which I might have missed out in the code which is slowing the loading process?