curved-island-17262
03/23/2023, 1:21 AMDockerTimeoutError
yesterday, our AWS EBS Burst Balance reached 0%(image 1). Based on this thread and a couple others I did the following:
1. Set METAFLOW_DEFAULT_CONTAINER_REGISTRY to public.ecr.aws/docker/library/
2. Move from GP2 to GP3 with the following custom AWS Launch Template(AWS EBS):
ebs {
volume_size = 100
delete_on_termination = true
encrypted = true
volume_type = "gp3"
iops = 3000
throughput = 125
}
But I am still noticing large StorageWriteBytes that are probably going to cause more issues from a single pipeline(image 2).
Here is a bit of the code for the pipeline:
class Flow(FlowSpec):
@step
def start(self):
self.next(self.process_video, foreach="chunks")
@resources(memory=8_000)
@pip(
libraries={
"mux-python": "3.7.1",
"opencv-python-headless": "4.6.0.66",
"openai": "0.27.0",
"scikit-learn": "1.2.1",
}
)
@step
def process_video(self):
self.next(self.join)
@step
def join(self, inputs):
self.next(self.extract_keywords)
@resources(memory=32_000, cpu=4)
@pip(
libraries={
"mux-python": "3.7.1",
"opencv-python-headless": "4.6.0.66",
"wheel": "0.38.4",
"setuptools": "65.6.3",
"spacy": "3.4.3",
"nltk": "3.7",
"keybert": "0.7.0",
"pytextrank": "3.2.4",
"rake_nltk": "1.0.6",
"yake": "0.4.8",
}
)
@download_nlp_libraries()
@step
def extract_keywords(self):
self.next(self.end)
I noticed the owner of the Flow using @pip
more than @conda
even extending it to create a @download_nlp_libraries
which basically installs Spacy and NLTK extentions such as "stopwords", "punkt", "averaged_perceptron_tagger", "wordnet", "omw-1.4"
. The pipeline does a lot of processing on videos and text data leveraging 3rd party APIs.
I am struggling to understand what could be writing so much to storage? Is it the usage of the NLP libraries via @pip
especially since its done within a foreach
? Could creating a docker image and using it be a solution to this?