Optimizing views in pgSQL
During upgrades to IMSMA v6 (especially in Afghanistan), it has been noted that although the translation of MySQL queries/views into pgSQL queries/views (for details see here) is relatively easy, sometimes views have to be rewritten due to poor performance. This is due to the fact that in each database system the query optimizer works a bit differently, and therefore a query that is fast in one system, might be very slow in another system. Generally, the query optimizer has to make the following decisions:
- Scan method: e.g. sequential scan, (bitmap) index scan
- Join method: e.g. nested loop, hash join, merge join
- Join order
In PostgreSQL, the query execution plan can be viewed using EXPLAIN. For example, the following statement details the query execution plan for a query joining HAZARD, LINK and HAZREDUC:
EXPLAIN SELECT * FROM hazard JOIN link ON (hazard.hazard_guid = link.to_guid) JOIN hazreduc ON (link.from_guid = hazreduc.hazreduc_guid);
This produces the following output:
Hash Join (cost=82524.41..518756.82 rows=87716 width=8756)
Hash Cond: ((link.from_guid)::text = (hazreduc.hazreduc_guid)::text)
-> Hash Join (cost=7586.82..148077.88 rows=224849 width=4125)
Hash Cond: ((link.to_guid)::text = (hazard.hazard_guid)::text)
-> Seq Scan on link (cost=0.00..38279.60 rows=1028260 width=335)
-> Hash (cost=1702.03..1702.03 rows=12303 width=3790)"
-> Seq Scan on hazard (cost=0.00..1702.03 rows=12303 width=3790)
-> Hash (cost=7386.93..7386.93 rows=116293 width=4631)
-> Seq Scan on hazreduc (cost=0.00..7386.93 rows=116293 width=4631)
In some cases, particularly when many tables are joined together, the query optimizer might not choose the best execution plan. In such a case, it can be tried to rewrite the query differently in order to make it faster. Examples and some guiding principles are outlined on his page.
Examples
View my_view_hazardreductions in Afghanistan
Version with a very poor performance (approx. 860 seconds):
CREATE OR REPLACE VIEW my_view_hazardreductions AS SELECT type.enumvalue AS Type, imsmaenum.enumvalue AS Status, hazreduc.hazreduc_guid AS hazreduc_guid, round(hazreduc.areasize,0) AS areasize, hazreduc.enddate AS enddate, areatype.stringvalue AS stringvalue, my_view_gazetteer.Region AS Region, my_view_gazetteer.Province AS Province, my_view_gazetteer.District AS District, hazreduc.destructionmethod AS destructionmethod, areatype.cdf_id AS cdf_id, method_used.enumcategory AS enumcategory, method_used.enumvalue AS method_used, my_view_organization.ShortName AS ShortName FROM hazreduc LEFT JOIN imsmaenum Type ON (type.imsmaenum_guid = hazreduc.hazreductypeenum_guid) LEFT JOIN imsmaenum ON (imsmaenum.imsmaenum_guid = hazreduc.status_guid) JOIN hazreduc_has_cdfvalue ON (hazreduc.hazreduc_guid = hazreduc_has_cdfvalue."HazReduc_GUID") JOIN cdfvalue AreaType ON (hazreduc_has_cdfvalue."CDFValue_GUID" = areatype.cdfvalue_guid) LEFT JOIN location ON (hazreduc.location_guid = location.location_guid) LEFT JOIN my_view_gazetteer ON (location.gazetteer_guid = my_view_gazetteer.gazetteer_guid) LEFT JOIN hazreduc_has_imsmaenum ON (hazreduc.hazreduc_guid = hazreduc_has_imsmaenum.hazreduc_guid) LEFT JOIN imsmaenum method_used ON n (hazreduc_has_imsmaenum.imsmaenum_guid = method_used.imsmaenum_guid) JOIN customdefinedfield ON (customdefinedfield.cdf_id = areatype.cdf_id) LEFT JOIN my_view_organization ON (hazreduc.org_guid = my_view_organization.org_guid) WHERE type.enumvalue = 'Progress Report' AND method_used.enumcategory = 'Clearance Operation' AND customdefinedfield.name = 'CDF-Clearance Type';
Optimised version (approx. 160 seconds):
WITH hazreduc_imsmaenum AS ( SELECT hazreduc.hazreduc_guid, type.enumvalue AS Type, imsmaenum.enumvalue AS Status, round(hazreduc.areasize,0) AS areasize, hazreduc.enddate AS enddate, hazreduc.destructionmethod AS destructionmethod, hazreduc.org_guid, hazreduc.location_guid LEFT JOIN imsmaenum Type on (type.imsmaenum_guid = hazreduc.hazreductypeenum_guid) LEFT JOIN imsmaenum on (imsmaenum.imsmaenum_guid = hazreduc.status_guid) where type.enumvalue = 'Progress Report' ), hazreduc_cdfs AS ( SELECT hazreduc.hazreduc_guid, method_used.enumcategory AS enumcategory, method_used.enumvalue AS method_used, areatype.stringvalue AS stringvalue, areatype.cdf_id AS cdf_id FROM hazreduc JOIN hazreduc_has_cdfvalue on (hazreduc.hazreduc_guid = hazreduc_has_cdfvalue."HazReduc_GUID") JOIN cdfvalue AreaType on (hazreduc_has_cdfvalue."CDFValue_GUID" = areatype.cdfvalue_guid) JOIN customdefinedfield on (areatype.cdf_id = customdefinedfield.cdf_id) LEFT JOIN hazreduc_has_imsmaenum on (hazreduc.hazreduc_guid = hazreduc_has_imsmaenum.hazreduc_guid) LEFT JOIN imsmaenum method_used on (hazreduc_has_imsmaenum.imsmaenum_guid = method_used.imsmaenum_guid) WHERE method_used.enumcategory = 'Clearance Operation' AND customdefinedfield.name = 'CDF-Clearance Type' ) SELECT hazreduc_imsmaenum.Type, hazreduc_imsmaenum.Status, hazreduc_imsmaenum.hazreduc_guid, hazreduc_imsmaenum.areasize, hazreduc_imsmaenum.enddate, hazreduc_cdfs.stringvalue, my_view_gazetteer.Region AS Region, my_view_gazetteer.Province AS Province, my_view_gazetteer.District AS District, hazreduc_imsmaenum.destructionmethod, hazreduc_cdfs.cdf_id, hazreduc_cdfs.enumcategory, hazreduc_cdfs.method_used, my_view_organization.ShortName AS ShortName FROM hazreduc_imsmaenum JOIN my_view_gazetteer on (hazreduc_imsmaenum.location_guid = my_view_gazetteer.location_guid) JOIN hazreduc_cdfs on (hazreduc_imsmaenum.hazreduc_guid = hazreduc_cdfs.hazreduc_guid) LEFT JOIN my_view_organization on (hazreduc_imsmaenum.org_guid = my_view_organization.org_guid);
View annualreport_view_bac in Afghanistan
Version with a very poor performance (more than 15 minutes):
SELECT progress.hazreduc_guid AS "ProgressGUID", cast(progress.startdate as date) AS "StartofReportingPeriod", cast(progress.enddate as date) AS "EndofReportingPeriod", progress.areasize AS "AreaCleared", sum(case when annualreport_view_devicesdestroyed.Model = 'SAA' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "SAA", sum(case when annualreport_view_devicesdestroyed.Model = 'AP' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "AP", sum(case when annualreport_view_devicesdestroyed.Model = 'UXO' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "UXO", sum(case when annualreport_view_devicesdestroyed.Model = 'AT' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "AT", sum(case when annualreport_view_devicesdestroyed.Model = 'Fragments' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "Fragments", sum(case when annualreport_view_devicesdestroyed.Model = 'AIED' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "AIED", annualreport_view_clearancetype."Clearance Type" AS "ClearanceType", my_view_organization.ShortName AS "Agency", my_view_gazetteer.Region AS "Region", my_view_gazetteer.Province AS "Province", my_view_gazetteer.District AS "District", my_view_gazetteer.Village AS "Village" FROM hazreduc progress JOIN imsmaenum on (progress.hazreductypeenum_guid = imsmaenum.imsmaenum_guid) LEFT JOIN annualreport_view_devicesdestroyed on (annualreport_view_devicesdestroyed.ProgressGUID = progress.hazreduc_guid) JOIN annualreport_view_clearancetype on (annualreport_view_clearancetype.Hazreduc_guid = progress.hazreduc_guid) JOIN my_view_organization on (progress.org_guid = my_view_organization.org_guid) JOIN my_view_gazetteer on (my_view_gazetteer.location_guid = progress.location_guid) WHERE (imsmaenum.enumvalue = 'Progress Report') GROUP BY progress.hazreduc_guid, cast(progress.startdate as date), cast(progress.enddate as date), progress.areasize, annualreport_view_clearancetype."Clearance Type", my_view_organization.ShortName, my_view_gazetteer.Region, my_view_gazetteer.Province, my_view_gazetteer.District, my_view_gazetteer.Village;
Optimised version (approx. 15 seconds):
WITH progress as ( SELECT progress.hazreduc_guid AS "ProgressGUID", progress.org_guid, progress.location_guid, cast(progress.startdate as date) AS "StartofReportingPeriod", cast(progress.enddate as date) AS "EndofReportingPeriod", progress.areasize AS "AreaCleared", sum(case when annualreport_view_devicesdestroyed.Model = 'SAA' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "SAA", sum(case when annualreport_view_devicesdestroyed.Model = 'AP' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "AP", sum(case when annualreport_view_devicesdestroyed.Model = 'UXO' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "UXO", sum(case when annualreport_view_devicesdestroyed.Model = 'AT' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "AT", sum(case when annualreport_view_devicesdestroyed.Model = 'Fragments' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "Fragments", sum(case when annualreport_view_devicesdestroyed.Model = 'AIED' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "AIED", annualreport_view_clearancetype."Clearance Type" AS "ClearanceType" FROM hazreduc progress LEFT JOIN annualreport_view_devicesdestroyed on (progress.hazreduc_guid = annualreport_view_devicesdestroyed.ProgressGUID) JOIN annualreport_view_clearancetype on (progress.hazreduc_guid = annualreport_view_clearancetype.Hazreduc_guid) JOIN imsmaenum on (imsmaenum.imsmaenum_guid = progress.hazreductypeenum_guid) WHERE (imsmaenum.enumvalue = 'Progress Report') GROUP BY progress.hazreduc_guid, progress.org_guid, progress.location_guid, cast(progress.startdate as date), cast(progress.enddate as date), progress.areasize, annualreport_view_clearancetype."Clearance Type") SELECT "ProgressGUID", "StartofReportingPeriod", "EndofReportingPeriod", "AreaCleared", "SAA", "AP", "UXO", "AT", "Fragments", "AIED", "ClearanceType", my_view_organization.ShortName AS "Agency", my_view_gazetteer.Region AS "Region", my_view_gazetteer.Province AS "Province", my_view_gazetteer.District AS "District", my_view_gazetteer.Village AS "Village" FROM progress JOIN my_view_organization on (progress.org_guid = my_view_organization.org_guid) JOIN my_view_gazetteer on (progress.location_guid = my_view_gazetteer.location_guid);
Problematic WHERE clause example
In Afghanistan, there is an Access-based reporting system that is building on views created in the IMSMA database. When upgrading to V6, some of the Access queries had a very poor performance. In particular, the following query was affected (it initially took about 30 minutes!):
SELECT districtfree_view_hazardclosed.Region, districtfree_view_hazardclosed.Province, districtfree_view_hazardclosed.District, districtfree_view_hazardclosed.district_guid, Max(districtfree_view_hazardclosed.StatusChangedDate) AS [Last Hazard Closed] FROM districtfree_view_hazardclosed WHERE (((districtfree_view_hazardclosed.Status)="Expired")) GROUP BY districtfree_view_hazardclosed.Region, districtfree_view_hazardclosed.Province, districtfree_view_hazardclosed.District, districtfree_view_hazardclosed.district_guid;
After investigation, it turned out that the WHERE clause (Status = "Expired") was the bottleneck. The following, equivalent query takes approx. 18 seconds:
SELECT districtfree_view_hazardclosed.Region, districtfree_view_hazardclosed.Province, districtfree_view_hazardclosed.District, districtfree_view_hazardclosed.district_guid, Max(districtfree_view_hazardclosed.StatusChangedDate) AS [Last Hazard Closed] FROM districtfree_view_hazardclosed WHERE (((districtfree_view_hazardclosed.Status) <> "Active" AND (districtfree_view_hazardclosed.Status) <> "Transitional")) GROUP BY districtfree_view_hazardclosed.Region, districtfree_view_hazardclosed.Province, districtfree_view_hazardclosed.District, districtfree_view_hazardclosed.district_guid;
| These are Access queries, not pgSQL queries |
In this case, it seemed safe to make the according changes, because it is not very likely that another status will be added to the system (and thus the two where clauses are and will stay equivalent). However, this cannot be generalised to other cases.
Summary of recommendations
Things to check before looking into rewriting the query itself:
- Gather database statistics by running the following command on the database:
ANALYZE;
Freshly updated database statistics can improve the performance of specific queries (especially after big changes - such as after heavy imports - when the automatic statistics gathering process has not yet run).
- Rebuilding the indexes on the affected table(s) by running the following command (specifying the index or table name):
REINDEX { INDEX | TABLE | DATABASE | SYSTEM } name
Rewriting the query:
- Run subsets of the query in order to identify which part is causing the performance problem (isolate joins, remove WHERE clauses, etc.)
- Isolate well performing parts using the WITH clause
- Reformulate WHERE clauses, if possible
- If using (NOT) IN in a query (and this is identified as the problem), try using (NOT) EXISTS instead