Optimizing views in pgSQL: Difference between revisions
No edit summary |
No edit summary |
||
| (One intermediate revision by one other user not shown) | |||
| Line 31: | Line 31: | ||
== Examples ==__NOEDITSECTION__ | == Examples ==__NOEDITSECTION__ | ||
=== View my_view_hazardreductions in Afghanistan ===__NOEDITSECTION__ | === View my_view_hazardreductions in Afghanistan ===__NOEDITSECTION__ | ||
Version with a very poor performance (approx. 860 seconds / 86' | Version with a very poor performance (approx. 860 seconds / approx. 86'000 rows): | ||
<pre> | <pre> | ||
SELECT type.enumvalue AS Type, | SELECT type.enumvalue AS Type, | ||
| Line 62: | Line 62: | ||
AND customdefinedfield.name = 'CDF-Clearance Type'; | AND customdefinedfield.name = 'CDF-Clearance Type'; | ||
</pre> | </pre> | ||
Optimised version (approx. 160 seconds / 86' | Optimised version (approx. 160 seconds / approx. 86'000 rows): | ||
<pre> | <pre> | ||
WITH hazreduc_imsmaenum AS ( | WITH hazreduc_imsmaenum AS ( | ||
| Line 115: | Line 115: | ||
=== View annualreport_view_bac in Afghanistan ===__NOEDITSECTION__ | === View annualreport_view_bac in Afghanistan ===__NOEDITSECTION__ | ||
Version with a very poor performance (more than 15 minutes): | Version with a very poor performance (more than 15 minutes / approx. 86'000 rows): | ||
<pre> | <pre> | ||
SELECT progress.hazreduc_guid AS "ProgressGUID", | SELECT progress.hazreduc_guid AS "ProgressGUID", | ||
| Line 151: | Line 151: | ||
my_view_gazetteer.Village; | my_view_gazetteer.Village; | ||
</pre> | </pre> | ||
Optimised version (approx. 15 seconds): | Optimised version (approx. 15 seconds / approx. 86'000 rows): | ||
<pre> | <pre> | ||
WITH progress as ( | WITH progress as ( | ||
| Line 256: | Line 256: | ||
{{NavBox Hub}} | {{NavBox Hub}} | ||
[[Category:NoPublic]] | [[Category:NoPublic]] | ||
Latest revision as of 20:28, 20 February 2020
During upgrades to IMSMA v6 (especially in Afghanistan), it has been noted that although the translation of MySQL queries/views into pgSQL queries/views (for details see here) is relatively easy, sometimes views have to be rewritten due to poor performance. This is due to the fact that in each database system (e.g. MySQL vs. PostgreSQL vs. SQL Server) the query optimizer works a bit differently, and therefore a query that is fast in one system, might be very slow in another system. Generally, the query optimizer has to make the following decisions:
- Scan method: e.g. sequential scan, (bitmap) index scan
- Join method: e.g. nested loop, hash join, merge join
- Join order
In PostgreSQL, the query execution plan can be viewed using EXPLAIN. For example, the following statement details the query execution plan for a query joining HAZARD, LINK and HAZREDUC:
EXPLAIN SELECT * FROM hazard JOIN link ON (hazard.hazard_guid = link.to_guid) JOIN hazreduc ON (link.from_guid = hazreduc.hazreduc_guid);
This produces the following output:
Hash Join (cost=82524.41..518756.82 rows=87716 width=8756)
Hash Cond: ((link.from_guid)::text = (hazreduc.hazreduc_guid)::text)
-> Hash Join (cost=7586.82..148077.88 rows=224849 width=4125)
Hash Cond: ((link.to_guid)::text = (hazard.hazard_guid)::text)
-> Seq Scan on link (cost=0.00..38279.60 rows=1028260 width=335)
-> Hash (cost=1702.03..1702.03 rows=12303 width=3790)
-> Seq Scan on hazard (cost=0.00..1702.03 rows=12303 width=3790)
-> Hash (cost=7386.93..7386.93 rows=116293 width=4631)
-> Seq Scan on hazreduc (cost=0.00..7386.93 rows=116293 width=4631)
In some cases, particularly when many tables are joined together, the query optimizer might not choose the best execution plan. In such a case, it can be tried to rewrite the query differently in order to make it faster. Examples and some guiding principles are outlined on his page.
Examples
View my_view_hazardreductions in Afghanistan
Version with a very poor performance (approx. 860 seconds / approx. 86'000 rows):
SELECT type.enumvalue AS Type, imsmaenum.enumvalue AS Status, hazreduc.hazreduc_guid AS hazreduc_guid, round(hazreduc.areasize,0) AS areasize, hazreduc.enddate AS enddate, areatype.stringvalue AS stringvalue, my_view_gazetteer.Region AS Region, my_view_gazetteer.Province AS Province, my_view_gazetteer.District AS District, hazreduc.destructionmethod AS destructionmethod, areatype.cdf_id AS cdf_id, method_used.enumcategory AS enumcategory, method_used.enumvalue AS method_used, my_view_organization.ShortName AS ShortName FROM hazreduc LEFT JOIN imsmaenum Type ON (type.imsmaenum_guid = hazreduc.hazreductypeenum_guid) LEFT JOIN imsmaenum ON (imsmaenum.imsmaenum_guid = hazreduc.status_guid) JOIN hazreduc_has_cdfvalue ON (hazreduc.hazreduc_guid = hazreduc_has_cdfvalue."HazReduc_GUID") JOIN cdfvalue AreaType ON (hazreduc_has_cdfvalue."CDFValue_GUID" = areatype.cdfvalue_guid) LEFT JOIN location ON (hazreduc.location_guid = location.location_guid) LEFT JOIN my_view_gazetteer ON (location.gazetteer_guid = my_view_gazetteer.gazetteer_guid) LEFT JOIN hazreduc_has_imsmaenum ON (hazreduc.hazreduc_guid = hazreduc_has_imsmaenum.hazreduc_guid) LEFT JOIN imsmaenum method_used ON (hazreduc_has_imsmaenum.imsmaenum_guid = method_used.imsmaenum_guid) JOIN customdefinedfield ON (customdefinedfield.cdf_id = areatype.cdf_id) LEFT JOIN my_view_organization ON (hazreduc.org_guid = my_view_organization.org_guid) WHERE type.enumvalue = 'Progress Report' AND method_used.enumcategory = 'Clearance Operation' AND customdefinedfield.name = 'CDF-Clearance Type';
Optimised version (approx. 160 seconds / approx. 86'000 rows):
WITH hazreduc_imsmaenum AS ( SELECT hazreduc.hazreduc_guid, type.enumvalue AS Type, imsmaenum.enumvalue AS Status, round(hazreduc.areasize,0) AS areasize, hazreduc.enddate AS enddate, hazreduc.destructionmethod AS destructionmethod, hazreduc.org_guid, hazreduc.location_guid FROM hazreduc LEFT JOIN imsmaenum Type on (type.imsmaenum_guid = hazreduc.hazreductypeenum_guid) LEFT JOIN imsmaenum on (imsmaenum.imsmaenum_guid = hazreduc.status_guid) where type.enumvalue = 'Progress Report' ), hazreduc_cdfs AS ( SELECT hazreduc.hazreduc_guid, method_used.enumcategory AS enumcategory, method_used.enumvalue AS method_used, areatype.stringvalue AS stringvalue, areatype.cdf_id AS cdf_id FROM hazreduc JOIN hazreduc_has_cdfvalue on (hazreduc.hazreduc_guid = hazreduc_has_cdfvalue."HazReduc_GUID") JOIN cdfvalue AreaType on (hazreduc_has_cdfvalue."CDFValue_GUID" = areatype.cdfvalue_guid) JOIN customdefinedfield on (areatype.cdf_id = customdefinedfield.cdf_id) LEFT JOIN hazreduc_has_imsmaenum on (hazreduc.hazreduc_guid = hazreduc_has_imsmaenum.hazreduc_guid) LEFT JOIN imsmaenum method_used on (hazreduc_has_imsmaenum.imsmaenum_guid = method_used.imsmaenum_guid) WHERE method_used.enumcategory = 'Clearance Operation' AND customdefinedfield.name = 'CDF-Clearance Type' ) SELECT hazreduc_imsmaenum.Type, hazreduc_imsmaenum.Status, hazreduc_imsmaenum.hazreduc_guid, hazreduc_imsmaenum.areasize, hazreduc_imsmaenum.enddate, hazreduc_cdfs.stringvalue, my_view_gazetteer.Region AS Region, my_view_gazetteer.Province AS Province, my_view_gazetteer.District AS District, hazreduc_imsmaenum.destructionmethod, hazreduc_cdfs.cdf_id, hazreduc_cdfs.enumcategory, hazreduc_cdfs.method_used, my_view_organization.ShortName AS ShortName FROM hazreduc_imsmaenum JOIN my_view_gazetteer on (hazreduc_imsmaenum.location_guid = my_view_gazetteer.location_guid) JOIN hazreduc_cdfs on (hazreduc_imsmaenum.hazreduc_guid = hazreduc_cdfs.hazreduc_guid) LEFT JOIN my_view_organization on (hazreduc_imsmaenum.org_guid = my_view_organization.org_guid);
View annualreport_view_bac in Afghanistan
Version with a very poor performance (more than 15 minutes / approx. 86'000 rows):
SELECT progress.hazreduc_guid AS "ProgressGUID", cast(progress.startdate as date) AS "StartofReportingPeriod", cast(progress.enddate as date) AS "EndofReportingPeriod", progress.areasize AS "AreaCleared", sum(case when annualreport_view_devicesdestroyed.Model = 'SAA' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "SAA", sum(case when annualreport_view_devicesdestroyed.Model = 'AP' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "AP", sum(case when annualreport_view_devicesdestroyed.Model = 'UXO' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "UXO", sum(case when annualreport_view_devicesdestroyed.Model = 'AT' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "AT", sum(case when annualreport_view_devicesdestroyed.Model = 'Fragments' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "Fragments", sum(case when annualreport_view_devicesdestroyed.Model = 'AIED' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "AIED", annualreport_view_clearancetype."Clearance Type" AS "ClearanceType", my_view_organization.ShortName AS "Agency", my_view_gazetteer.Region AS "Region", my_view_gazetteer.Province AS "Province", my_view_gazetteer.District AS "District", my_view_gazetteer.Village AS "Village" FROM hazreduc progress JOIN imsmaenum on (progress.hazreductypeenum_guid = imsmaenum.imsmaenum_guid) LEFT JOIN annualreport_view_devicesdestroyed on (annualreport_view_devicesdestroyed.ProgressGUID = progress.hazreduc_guid) JOIN annualreport_view_clearancetype on (annualreport_view_clearancetype.Hazreduc_guid = progress.hazreduc_guid) JOIN my_view_organization on (progress.org_guid = my_view_organization.org_guid) JOIN my_view_gazetteer on (my_view_gazetteer.location_guid = progress.location_guid) WHERE (imsmaenum.enumvalue = 'Progress Report') GROUP BY progress.hazreduc_guid, cast(progress.startdate as date), cast(progress.enddate as date), progress.areasize, annualreport_view_clearancetype."Clearance Type", my_view_organization.ShortName, my_view_gazetteer.Region, my_view_gazetteer.Province, my_view_gazetteer.District, my_view_gazetteer.Village;
Optimised version (approx. 15 seconds / approx. 86'000 rows):
WITH progress as ( SELECT progress.hazreduc_guid AS "ProgressGUID", progress.org_guid, progress.location_guid, cast(progress.startdate as date) AS "StartofReportingPeriod", cast(progress.enddate as date) AS "EndofReportingPeriod", progress.areasize AS "AreaCleared", sum(case when annualreport_view_devicesdestroyed.Model = 'SAA' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "SAA", sum(case when annualreport_view_devicesdestroyed.Model = 'AP' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "AP", sum(case when annualreport_view_devicesdestroyed.Model = 'UXO' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "UXO", sum(case when annualreport_view_devicesdestroyed.Model = 'AT' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "AT", sum(case when annualreport_view_devicesdestroyed.Model = 'Fragments' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "Fragments", sum(case when annualreport_view_devicesdestroyed.Model = 'AIED' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "AIED", annualreport_view_clearancetype."Clearance Type" AS "ClearanceType" FROM hazreduc progress LEFT JOIN annualreport_view_devicesdestroyed on (progress.hazreduc_guid = annualreport_view_devicesdestroyed.ProgressGUID) JOIN annualreport_view_clearancetype on (progress.hazreduc_guid = annualreport_view_clearancetype.Hazreduc_guid) JOIN imsmaenum on (imsmaenum.imsmaenum_guid = progress.hazreductypeenum_guid) WHERE (imsmaenum.enumvalue = 'Progress Report') GROUP BY progress.hazreduc_guid, progress.org_guid, progress.location_guid, cast(progress.startdate as date), cast(progress.enddate as date), progress.areasize, annualreport_view_clearancetype."Clearance Type" ) SELECT "ProgressGUID", "StartofReportingPeriod", "EndofReportingPeriod", "AreaCleared", "SAA", "AP", "UXO", "AT", "Fragments", "AIED", "ClearanceType", my_view_organization.ShortName AS "Agency", my_view_gazetteer.Region AS "Region", my_view_gazetteer.Province AS "Province", my_view_gazetteer.District AS "District", my_view_gazetteer.Village AS "Village" FROM progress JOIN my_view_organization on (progress.org_guid = my_view_organization.org_guid) JOIN my_view_gazetteer on (progress.location_guid = my_view_gazetteer.location_guid);
Problematic WHERE clause example
In Afghanistan, there is an Access-based reporting system that is building on views created in the IMSMA database. When upgrading to V6, some of the Access queries had a very poor performance. In particular, the following query was affected (it initially took about 30 minutes!):
SELECT districtfree_view_hazardclosed.Region, districtfree_view_hazardclosed.Province, districtfree_view_hazardclosed.District, districtfree_view_hazardclosed.district_guid, Max(districtfree_view_hazardclosed.StatusChangedDate) AS [Last Hazard Closed] FROM districtfree_view_hazardclosed WHERE (((districtfree_view_hazardclosed.Status)="Expired")) GROUP BY districtfree_view_hazardclosed.Region, districtfree_view_hazardclosed.Province, districtfree_view_hazardclosed.District, districtfree_view_hazardclosed.district_guid;
After investigation, it turned out that the WHERE clause (Status = "Expired") was the bottleneck. The following, equivalent query takes approx. 18 seconds:
SELECT districtfree_view_hazardclosed.Region, districtfree_view_hazardclosed.Province, districtfree_view_hazardclosed.District, districtfree_view_hazardclosed.district_guid, Max(districtfree_view_hazardclosed.StatusChangedDate) AS [Last Hazard Closed] FROM districtfree_view_hazardclosed WHERE (((districtfree_view_hazardclosed.Status) <> "Active" AND (districtfree_view_hazardclosed.Status) <> "Transitional")) GROUP BY districtfree_view_hazardclosed.Region, districtfree_view_hazardclosed.Province, districtfree_view_hazardclosed.District, districtfree_view_hazardclosed.district_guid;
In this case, it seemed safe to make the according changes, because it is not very likely that another status will be added to the system (and thus the two where clauses are and will stay equivalent). However, this cannot be generalised to other cases.
Summary of recommendations
Things to check before looking into rewriting the query itself:
- Refresh the database statistics used by the query optimizer (to determine the best or a good query execution plan) by running the following command on the database:
ANALYZE;
There will be no visible output of that command, but the database statistics will be refreshed. Freshly updated database statistics can improve the performance of specific queries (especially after big changes - such as after heavy imports - when the automatic statistics gathering process has not yet run).
- Rebuilding the indexes on the table(s) used in the problematic query by running the following command (specifying the index or table name):
REINDEX { INDEX | TABLE | DATABASE | SYSTEM } name
Rewriting the query:
- Run subsets of the query in order to identify which part is causing the performance problem (isolate joins, remove WHERE clauses, etc.)
- Isolate well performing parts using the WITH clause
- Reformulate WHERE clauses, if possible
- If using (NOT) IN in a query (and this is identified as the problem), try using (NOT) EXISTS instead