microscopic-painting-61536
07/09/2024, 10:39 PMfailed (code 400): {"message": "need to register run_id and task_id first"}
prehistoric-salesclerk-95013
07/09/2024, 10:46 PMmicroscopic-painting-61536
07/09/2024, 11:02 PMprehistoric-salesclerk-95013
07/09/2024, 11:07 PMmicroscopic-painting-61536
07/09/2024, 11:10 PMmicroscopic-painting-61536
07/09/2024, 11:11 PMprehistoric-salesclerk-95013
07/09/2024, 11:11 PM•set to "True"USE_SEPARATE_READER_POOL
•to the host for the readonly connections.MF_METADATA_DB_READ_REPLICA_HOST
prehistoric-salesclerk-95013
07/09/2024, 11:12 PMMF_METADATA_DB_READ_REPLICA_HOST
variable.prehistoric-salesclerk-95013
07/09/2024, 11:16 PMmicroscopic-painting-61536
07/10/2024, 4:42 PMprehistoric-salesclerk-95013
07/10/2024, 4:54 PMmicroscopic-painting-61536
07/10/2024, 9:26 PMsquare-wire-39606
07/10/2024, 10:25 PMancient-application-36103
07/10/2024, 10:29 PMancient-application-36103
07/10/2024, 10:29 PMprehistoric-salesclerk-95013
07/10/2024, 10:32 PMsquare-wire-39606
07/10/2024, 10:34 PMsquare-wire-39606
07/10/2024, 10:35 PMprehistoric-salesclerk-95013
07/10/2024, 10:43 PMmicroscopic-painting-61536
07/10/2024, 10:59 PMmicroscopic-painting-61536
07/10/2024, 10:59 PMsquare-wire-39606
07/11/2024, 6:31 AMsquare-wire-39606
07/11/2024, 6:36 AMmicroscopic-painting-61536
07/11/2024, 4:55 PMmicroscopic-painting-61536
07/11/2024, 4:55 PMancient-application-36103
07/11/2024, 5:03 PMgifted-helicopter-39432
07/11/2024, 9:18 PMsquare-wire-39606
07/11/2024, 9:32 PMgifted-helicopter-39432
07/11/2024, 9:39 PMml-ops-metaflow-service-5576458685-9njk7 metaflow-service INFO:aiohttp.access:127.0.0.6 [11/Jul/2024:21:29:53 +0000] "GET /flows/AutoSegmentsInferenceFlow/runs/argo-autosegments.prod.autosegmentsinferenceflow-1720731600 HTTP/1.1" 200 689 "-" "python-requests/2.32.3"
ml-ops-metaflow-service-5576458685-tgxd2 metaflow-service ERROR:AsyncPostgresDB:global:Exception occurred
ml-ops-metaflow-service-5576458685-tgxd2 metaflow-service Traceback (most recent call last):
ml-ops-metaflow-service-5576458685-tgxd2 metaflow-service File "/opt/latest/lib/python3.11/site-packages/aiopg/pool.py", line 317, in _acquire
ml-ops-metaflow-service-5576458685-tgxd2 metaflow-service await self._cond.wait()
ml-ops-metaflow-service-5576458685-tgxd2 metaflow-service File "/usr/local/lib/python3.11/asyncio/locks.py", line 267, in wait
ml-ops-metaflow-service-5576458685-tgxd2 metaflow-service await fut
ml-ops-metaflow-service-5576458685-tgxd2 metaflow-service asyncio.exceptions.CancelledError
ml-ops-metaflow-service-5576458685-tgxd2 metaflow-service
ml-ops-metaflow-service-5576458685-tgxd2 metaflow-service During handling of the above exception, another exception occurred:
ml-ops-metaflow-service-5576458685-tgxd2 metaflow-service
ml-ops-metaflow-service-5576458685-tgxd2 metaflow-service Traceback (most recent call last):
ml-ops-metaflow-service-5576458685-tgxd2 metaflow-service File "/root/services/data/postgres_async_db.py", line 366, in update_row
ml-ops-metaflow-service-5576458685-tgxd2 metaflow-service await self.db.pool.cursor(
ml-ops-metaflow-service-5576458685-tgxd2 metaflow-service File "/opt/latest/lib/python3.11/site-packages/aiopg/pool.py", line 414, in cursor
ml-ops-metaflow-service-5576458685-tgxd2 metaflow-service conn = await self.acquire()
ml-ops-metaflow-service-5576458685-tgxd2 metaflow-service ^^^^^^^^^^^^^^^^^^^^
ml-ops-metaflow-service-5576458685-tgxd2 metaflow-service File "/opt/latest/lib/python3.11/site-packages/aiopg/pool.py", line 307, in _acquire
ml-ops-metaflow-service-5576458685-tgxd2 metaflow-service async with async_timeout.timeout(self._timeout), self._cond:
ml-ops-metaflow-service-5576458685-tgxd2 metaflow-service File "/opt/latest/lib/python3.11/site-packages/async_timeout/__init__.py", line 141, in __aexit__
ml-ops-metaflow-service-5576458685-tgxd2 metaflow-service self._do_exit(exc_type)
ml-ops-metaflow-service-5576458685-tgxd2 metaflow-service File "/opt/latest/lib/python3.11/site-packages/async_timeout/__init__.py", line 228, in _do_exit
ml-ops-metaflow-service-5576458685-tgxd2 metaflow-service raise asyncio.TimeoutError
ml-ops-metaflow-service-5576458685-tgxd2 metaflow-service TimeoutError
its simply one Inference flow causing the spike. Its normal dataset had increased by about 10x since the last two monthsancient-application-36103
07/11/2024, 9:41 PMancient-application-36103
07/11/2024, 9:42 PMgifted-helicopter-39432
07/11/2024, 9:49 PM