user
05/31/2023, 3:23 AMdb.m5.2xlarge instance. The following query on a 200 million row artifact_v3 takes up to four minutes to run:
SELECT * FROM ( SELECT flow_id,run_number,run_id,step_name,task_id,task_name,name,location,ds_type,sha,type,content_type,user_name,attempt_id,ts_epoch,tags,system_tags FROM artifact_v3 ) T WHERE flow_id = '[REDACTED]' AND run_id = '[REDACTED]' AND ("ds_type" = 'local') LIMIT 1;
While acknowledging that searching 200 million rows would be potentially slow, we’ve noticed via EXPLAIN and EXPLAIN ANALYZE that the query does not actually make use of the indexes defined on artifacts_v3. There is an index:
"artifact_v3_idx_str_ids_primary_key" btree (flow_id, run_id, step_name, task_name, attempt_id, name) WHERE run_id IS NOT NULL AND task_name IS NOT NULL
but for PostgreSQL to use the index we need to add AND task_name IS NOT NULL to the query.
In general, would we ever expect either run_id or task_name to be null?