Optimizing views in pgSQL: Difference between revisions

Latest revision as of 20:28, 20 February 2020

During upgrades to IMSMA v6 (especially in Afghanistan), it has been noted that although the translation of MySQL queries/views into pgSQL queries/views (for details see here) is relatively easy, sometimes views have to be rewritten due to poor performance. This is due to the fact that in each database system (e.g. MySQL vs. PostgreSQL vs. SQL Server) the query optimizer works a bit differently, and therefore a query that is fast in one system, might be very slow in another system. Generally, the query optimizer has to make the following decisions:

Scan method: e.g. sequential scan, (bitmap) index scan
Join method: e.g. nested loop, hash join, merge join
Join order

In PostgreSQL, the query execution plan can be viewed using EXPLAIN. For example, the following statement details the query execution plan for a query joining HAZARD, LINK and HAZREDUC:

EXPLAIN
SELECT *
FROM hazard
JOIN link ON (hazard.hazard_guid = link.to_guid)
JOIN hazreduc ON (link.from_guid = hazreduc.hazreduc_guid);

This produces the following output:

Hash Join  (cost=82524.41..518756.82 rows=87716 width=8756)
  Hash Cond: ((link.from_guid)::text = (hazreduc.hazreduc_guid)::text)
  ->  Hash Join  (cost=7586.82..148077.88 rows=224849 width=4125)
        Hash Cond: ((link.to_guid)::text = (hazard.hazard_guid)::text)
        ->  Seq Scan on link  (cost=0.00..38279.60 rows=1028260 width=335)
        ->  Hash  (cost=1702.03..1702.03 rows=12303 width=3790)
              ->  Seq Scan on hazard  (cost=0.00..1702.03 rows=12303 width=3790)
  ->  Hash  (cost=7386.93..7386.93 rows=116293 width=4631)
        ->  Seq Scan on hazreduc  (cost=0.00..7386.93 rows=116293 width=4631)

In some cases, particularly when many tables are joined together, the query optimizer might not choose the best execution plan. In such a case, it can be tried to rewrite the query differently in order to make it faster. Examples and some guiding principles are outlined on his page.

Examples

View my_view_hazardreductions in Afghanistan

Version with a very poor performance (approx. 860 seconds / approx. 86'000 rows):

SELECT type.enumvalue AS Type,
imsmaenum.enumvalue AS Status,
hazreduc.hazreduc_guid AS hazreduc_guid,
round(hazreduc.areasize,0) AS areasize,
hazreduc.enddate AS enddate,
areatype.stringvalue AS stringvalue,
my_view_gazetteer.Region AS Region,
my_view_gazetteer.Province AS Province,
my_view_gazetteer.District AS District,
hazreduc.destructionmethod AS destructionmethod,
areatype.cdf_id AS cdf_id,
method_used.enumcategory AS enumcategory,
method_used.enumvalue AS method_used,
my_view_organization.ShortName AS ShortName
FROM hazreduc
LEFT JOIN imsmaenum Type ON (type.imsmaenum_guid = hazreduc.hazreductypeenum_guid)
LEFT JOIN imsmaenum ON (imsmaenum.imsmaenum_guid = hazreduc.status_guid)
JOIN hazreduc_has_cdfvalue ON (hazreduc.hazreduc_guid = hazreduc_has_cdfvalue."HazReduc_GUID")
JOIN cdfvalue AreaType ON (hazreduc_has_cdfvalue."CDFValue_GUID" = areatype.cdfvalue_guid)
LEFT JOIN location ON (hazreduc.location_guid = location.location_guid)
LEFT JOIN my_view_gazetteer ON (location.gazetteer_guid = my_view_gazetteer.gazetteer_guid)
LEFT JOIN hazreduc_has_imsmaenum ON (hazreduc.hazreduc_guid = hazreduc_has_imsmaenum.hazreduc_guid)
LEFT JOIN imsmaenum method_used ON (hazreduc_has_imsmaenum.imsmaenum_guid = method_used.imsmaenum_guid)
JOIN customdefinedfield ON (customdefinedfield.cdf_id = areatype.cdf_id)
LEFT JOIN my_view_organization ON (hazreduc.org_guid = my_view_organization.org_guid)
WHERE type.enumvalue = 'Progress Report'
AND method_used.enumcategory = 'Clearance Operation'
AND customdefinedfield.name = 'CDF-Clearance Type';

Optimised version (approx. 160 seconds / approx. 86'000 rows):

WITH hazreduc_imsmaenum AS (
SELECT hazreduc.hazreduc_guid,
type.enumvalue AS Type,
imsmaenum.enumvalue AS Status,
round(hazreduc.areasize,0) AS areasize,
hazreduc.enddate AS enddate,
hazreduc.destructionmethod AS destructionmethod,
hazreduc.org_guid,
hazreduc.location_guid
FROM hazreduc
LEFT JOIN imsmaenum Type on (type.imsmaenum_guid = hazreduc.hazreductypeenum_guid)
LEFT JOIN imsmaenum on (imsmaenum.imsmaenum_guid = hazreduc.status_guid)
where type.enumvalue = 'Progress Report'
),
hazreduc_cdfs AS (
SELECT hazreduc.hazreduc_guid,
method_used.enumcategory AS enumcategory,
method_used.enumvalue AS method_used,
areatype.stringvalue AS stringvalue,
areatype.cdf_id AS cdf_id
FROM hazreduc
JOIN hazreduc_has_cdfvalue on (hazreduc.hazreduc_guid = hazreduc_has_cdfvalue."HazReduc_GUID")
JOIN cdfvalue AreaType on (hazreduc_has_cdfvalue."CDFValue_GUID" = areatype.cdfvalue_guid)
JOIN customdefinedfield on (areatype.cdf_id = customdefinedfield.cdf_id)
LEFT JOIN hazreduc_has_imsmaenum on (hazreduc.hazreduc_guid = hazreduc_has_imsmaenum.hazreduc_guid)
LEFT JOIN imsmaenum method_used on (hazreduc_has_imsmaenum.imsmaenum_guid = method_used.imsmaenum_guid)
WHERE method_used.enumcategory = 'Clearance Operation'
AND customdefinedfield.name = 'CDF-Clearance Type'
)
SELECT
hazreduc_imsmaenum.Type,
hazreduc_imsmaenum.Status,
hazreduc_imsmaenum.hazreduc_guid,
hazreduc_imsmaenum.areasize,
hazreduc_imsmaenum.enddate,
hazreduc_cdfs.stringvalue,
my_view_gazetteer.Region AS Region,
my_view_gazetteer.Province AS Province,
my_view_gazetteer.District AS District,
hazreduc_imsmaenum.destructionmethod,
hazreduc_cdfs.cdf_id,
hazreduc_cdfs.enumcategory,
hazreduc_cdfs.method_used,
my_view_organization.ShortName AS ShortName
FROM hazreduc_imsmaenum
JOIN my_view_gazetteer on (hazreduc_imsmaenum.location_guid = my_view_gazetteer.location_guid)
JOIN hazreduc_cdfs on (hazreduc_imsmaenum.hazreduc_guid = hazreduc_cdfs.hazreduc_guid)
LEFT JOIN my_view_organization on (hazreduc_imsmaenum.org_guid = my_view_organization.org_guid);

View annualreport_view_bac in Afghanistan

Version with a very poor performance (more than 15 minutes / approx. 86'000 rows):

SELECT progress.hazreduc_guid AS "ProgressGUID",
cast(progress.startdate as date) AS "StartofReportingPeriod",
cast(progress.enddate as date) AS "EndofReportingPeriod",
progress.areasize AS "AreaCleared",
sum(case when annualreport_view_devicesdestroyed.Model = 'SAA' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "SAA",
sum(case when annualreport_view_devicesdestroyed.Model = 'AP' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "AP",
sum(case when annualreport_view_devicesdestroyed.Model = 'UXO' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "UXO",
sum(case when annualreport_view_devicesdestroyed.Model = 'AT' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "AT",
sum(case when annualreport_view_devicesdestroyed.Model = 'Fragments' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "Fragments",
sum(case when annualreport_view_devicesdestroyed.Model = 'AIED' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "AIED",
annualreport_view_clearancetype."Clearance Type" AS "ClearanceType",
my_view_organization.ShortName AS "Agency",
my_view_gazetteer.Region AS "Region",
my_view_gazetteer.Province AS "Province",
my_view_gazetteer.District AS "District",
my_view_gazetteer.Village AS "Village"
FROM hazreduc progress
JOIN imsmaenum on (progress.hazreductypeenum_guid = imsmaenum.imsmaenum_guid)
LEFT JOIN annualreport_view_devicesdestroyed on (annualreport_view_devicesdestroyed.ProgressGUID = progress.hazreduc_guid)
JOIN annualreport_view_clearancetype on (annualreport_view_clearancetype.Hazreduc_guid = progress.hazreduc_guid)
JOIN my_view_organization on (progress.org_guid = my_view_organization.org_guid)
JOIN my_view_gazetteer on (my_view_gazetteer.location_guid = progress.location_guid)
WHERE (imsmaenum.enumvalue = 'Progress Report')
GROUP BY progress.hazreduc_guid,
cast(progress.startdate as date),
cast(progress.enddate as date),
progress.areasize,
annualreport_view_clearancetype."Clearance Type",
my_view_organization.ShortName,
my_view_gazetteer.Region,
my_view_gazetteer.Province,
my_view_gazetteer.District,
my_view_gazetteer.Village;

Optimised version (approx. 15 seconds / approx. 86'000 rows):

WITH progress as (
SELECT progress.hazreduc_guid AS "ProgressGUID",
progress.org_guid,
progress.location_guid,
cast(progress.startdate as date) AS "StartofReportingPeriod",
cast(progress.enddate as date) AS "EndofReportingPeriod",
progress.areasize AS "AreaCleared",
sum(case when annualreport_view_devicesdestroyed.Model = 'SAA' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "SAA",
sum(case when annualreport_view_devicesdestroyed.Model = 'AP' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "AP",
sum(case when annualreport_view_devicesdestroyed.Model = 'UXO' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "UXO",
sum(case when annualreport_view_devicesdestroyed.Model = 'AT' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "AT",
sum(case when annualreport_view_devicesdestroyed.Model = 'Fragments' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "Fragments",
sum(case when annualreport_view_devicesdestroyed.Model = 'AIED' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "AIED",
annualreport_view_clearancetype."Clearance Type" AS "ClearanceType"
FROM hazreduc progress
LEFT JOIN annualreport_view_devicesdestroyed on (progress.hazreduc_guid = annualreport_view_devicesdestroyed.ProgressGUID)
JOIN annualreport_view_clearancetype on (progress.hazreduc_guid = annualreport_view_clearancetype.Hazreduc_guid)
JOIN imsmaenum on (imsmaenum.imsmaenum_guid = progress.hazreductypeenum_guid)
WHERE (imsmaenum.enumvalue = 'Progress Report')
GROUP BY progress.hazreduc_guid,
progress.org_guid,
progress.location_guid,
cast(progress.startdate as date),
cast(progress.enddate as date),
progress.areasize,
annualreport_view_clearancetype."Clearance Type"
)
SELECT
"ProgressGUID",
"StartofReportingPeriod",
"EndofReportingPeriod",
"AreaCleared",
"SAA",
"AP",
"UXO",
"AT",
"Fragments",
"AIED",
"ClearanceType",
my_view_organization.ShortName AS "Agency",
my_view_gazetteer.Region AS "Region",
my_view_gazetteer.Province AS "Province",
my_view_gazetteer.District AS "District",
my_view_gazetteer.Village AS "Village"
FROM progress
JOIN my_view_organization on (progress.org_guid = my_view_organization.org_guid)
JOIN my_view_gazetteer on (progress.location_guid = my_view_gazetteer.location_guid);

Problematic WHERE clause example

In Afghanistan, there is an Access-based reporting system that is building on views created in the IMSMA database. When upgrading to V6, some of the Access queries had a very poor performance. In particular, the following query was affected (it initially took about 30 minutes!):

	These two queries are in MS Access syntax, not pgSQL syntax. In order to get a working pgSQL syntax, the [] (square brackets) need to be replaced by "" (double quotes) and the "" (double quotes) in the WHERE clause by single quotes

SELECT districtfree_view_hazardclosed.Region,
districtfree_view_hazardclosed.Province,
districtfree_view_hazardclosed.District,
districtfree_view_hazardclosed.district_guid,
Max(districtfree_view_hazardclosed.StatusChangedDate) AS [Last Hazard Closed]
FROM districtfree_view_hazardclosed
WHERE (((districtfree_view_hazardclosed.Status)="Expired"))
GROUP BY districtfree_view_hazardclosed.Region, districtfree_view_hazardclosed.Province, districtfree_view_hazardclosed.District, districtfree_view_hazardclosed.district_guid;

After investigation, it turned out that the WHERE clause (Status = "Expired") was the bottleneck. The following, equivalent query takes approx. 18 seconds:

SELECT districtfree_view_hazardclosed.Region,
districtfree_view_hazardclosed.Province,
districtfree_view_hazardclosed.District,
districtfree_view_hazardclosed.district_guid,
Max(districtfree_view_hazardclosed.StatusChangedDate) AS [Last Hazard Closed]
FROM districtfree_view_hazardclosed
WHERE (((districtfree_view_hazardclosed.Status) <> "Active" AND (districtfree_view_hazardclosed.Status) <> "Transitional"))
GROUP BY districtfree_view_hazardclosed.Region, districtfree_view_hazardclosed.Province, districtfree_view_hazardclosed.District, districtfree_view_hazardclosed.district_guid;

In this case, it seemed safe to make the according changes, because it is not very likely that another status will be added to the system (and thus the two where clauses are and will stay equivalent). However, this cannot be generalised to other cases.

	This does not mean that it is generally better to have negative WHERE clauses instead of positive ones! It is just an example encountered in a specific context, with a specific database and data volume and content. Therefore, all the previous examples are just illustrative and aim at giving ideas on how to optimize a query.

Summary of recommendations

Things to check before looking into rewriting the query itself:

Refresh the database statistics used by the query optimizer (to determine the best or a good query execution plan) by running the following command on the database:

ANALYZE;

There will be no visible output of that command, but the database statistics will be refreshed. Freshly updated database statistics can improve the performance of specific queries (especially after big changes - such as after heavy imports - when the automatic statistics gathering process has not yet run).

Rebuilding the indexes on the table(s) used in the problematic query by running the following command (specifying the index or table name):

REINDEX { INDEX | TABLE | DATABASE | SYSTEM } name

Rewriting the query:

Run subsets of the query in order to identify which part is causing the performance problem (isolate joins, remove WHERE clauses, etc.)
Isolate well performing parts using the WITH clause
Reformulate WHERE clauses, if possible
If using (NOT) IN in a query (and this is identified as the problem), try using (NOT) EXISTS instead

External links

Template:NavBox Hub

Optimizing views in pgSQL: Difference between revisions

Latest revision as of 20:28, 20 February 2020

Contents

Examples

View my_view_hazardreductions in Afghanistan

View annualreport_view_bac in Afghanistan

Problematic WHERE clause example

Summary of recommendations

External links

Navigation menu

@@ Line 1: / Line 1: @@
 {{TOC right}}
-During upgrades to IMSMA v6 (especially in Afghanistan), it has been noted that although the translation of MySQL queries/views into pgSQL queries/views (for details see [[Difference_MySQL_PostGreSQL|here]]) is relatively easy, sometimes views have to be rewritten due to poor performance. This is due to the fact that in each database system the query optimizer works a bit differently, and therefore a query that is fast in one system, might be very slow in another system.
+During upgrades to IMSMA v6 (especially in Afghanistan), it has been noted that although the translation of MySQL queries/views into pgSQL queries/views (for details see [[Difference_MySQL_PostGreSQL|here]]) is relatively easy, sometimes views have to be rewritten due to poor performance. This is due to the fact that in each database system (e.g. MySQL vs. PostgreSQL vs. SQL Server) the query optimizer works a bit differently, and therefore a query that is fast in one system, might be very slow in another system.
 Generally, the query optimizer has to make the following decisions:
 * Scan method: e.g. sequential scan, (bitmap) index scan
@@ Line 21: / Line 21: @@
          Hash Cond: ((link.to_guid)::text = (hazard.hazard_guid)::text)
          ->  Seq Scan on link  (cost=0.00..38279.60 rows=1028260 width=335)
-         ->  Hash  (cost=1702.03..1702.03 rows=12303 width=3790)"
+         ->  Hash  (cost=1702.03..1702.03 rows=12303 width=3790)
                ->  Seq Scan on hazard  (cost=0.00..1702.03 rows=12303 width=3790)
    ->  Hash  (cost=7386.93..7386.93 rows=116293 width=4631)
@@ Line 31: / Line 31: @@
 == Examples ==__NOEDITSECTION__
 === View my_view_hazardreductions in Afghanistan ===__NOEDITSECTION__
-Version with a very poor performance (approx. 860 seconds):
+Version with a very poor performance (approx. 860 seconds / approx. 86'000 rows):
 <pre>
-CREATE OR REPLACE VIEW my_view_hazardreductions AS
 SELECT type.enumvalue AS Type,
 imsmaenum.enumvalue AS Status,
@@ Line 56: / Line 55: @@
 LEFT JOIN my_view_gazetteer ON (location.gazetteer_guid = my_view_gazetteer.gazetteer_guid)
 LEFT JOIN hazreduc_has_imsmaenum ON (hazreduc.hazreduc_guid = hazreduc_has_imsmaenum.hazreduc_guid)
-LEFT JOIN imsmaenum method_used ON n (hazreduc_has_imsmaenum.imsmaenum_guid = method_used.imsmaenum_guid)
+LEFT JOIN imsmaenum method_used ON (hazreduc_has_imsmaenum.imsmaenum_guid = method_used.imsmaenum_guid)
 JOIN customdefinedfield ON (customdefinedfield.cdf_id = areatype.cdf_id)
 LEFT JOIN my_view_organization ON (hazreduc.org_guid = my_view_organization.org_guid)
@@ Line 63: / Line 62: @@
 AND customdefinedfield.name = 'CDF-Clearance Type';
 </pre>
-Optimised version (approx. 160 seconds):
+Optimised version (approx. 160 seconds / approx. 86'000 rows):
 <pre>
 WITH hazreduc_imsmaenum AS (
@@ Line 74: / Line 73: @@
 hazreduc.org_guid,
 hazreduc.location_guid
+FROM hazreduc
 LEFT JOIN imsmaenum Type on (type.imsmaenum_guid = hazreduc.hazreductypeenum_guid)
 LEFT JOIN imsmaenum on (imsmaenum.imsmaenum_guid = hazreduc.status_guid)
@@ Line 115: / Line 115: @@
 === View annualreport_view_bac in Afghanistan ===__NOEDITSECTION__
-Version with a very poor performance (more than 15 minutes):
+Version with a very poor performance (more than 15 minutes / approx. 86'000 rows):
 <pre>
 SELECT progress.hazreduc_guid AS "ProgressGUID",
@@ Line 151: / Line 151: @@
 my_view_gazetteer.Village;
 </pre>
-Optimised version (approx. 15 seconds):
+Optimised version (approx. 15 seconds / approx. 86'000 rows):
 <pre>
 WITH progress as (
@@ Line 178: / Line 178: @@
 cast(progress.enddate as date),
 progress.areasize,
-annualreport_view_clearancetype."Clearance Type")
+annualreport_view_clearancetype."Clearance Type"
+)
 SELECT
 "ProgressGUID",
@@ Line 203: / Line 204: @@
 === Problematic WHERE clause example ===__NOEDITSECTION__
 In Afghanistan, there is an Access-based reporting system that is building on views created in the IMSMA database. When upgrading to V6, some of the Access queries had a very poor performance. In particular, the following query was affected (it initially took about 30 minutes!):
+{{Note|These two queries are in MS Access syntax, not pgSQL syntax. In order to get a working pgSQL syntax, the [] (square brackets) need to be replaced by "" (double quotes) and the "" (double quotes) in the WHERE clause by single quotes}}
 <pre>
-SELECT districtfree_view_hazardclosed.Region, districtfree_view_hazardclosed.Province, districtfree_view_hazardclosed.District, districtfree_view_hazardclosed.district_guid, Max(districtfree_view_hazardclosed.StatusChangedDate) AS [Last Hazard Closed]
+SELECT districtfree_view_hazardclosed.Region,
+districtfree_view_hazardclosed.Province,
+districtfree_view_hazardclosed.District,
+districtfree_view_hazardclosed.district_guid,
+Max(districtfree_view_hazardclosed.StatusChangedDate) AS [Last Hazard Closed]
 FROM districtfree_view_hazardclosed
 WHERE (((districtfree_view_hazardclosed.Status)="Expired"))
@@ Line 211: / Line 217: @@
 After investigation, it turned out that the WHERE clause (Status = "Expired") was the bottleneck. The following, equivalent query takes approx. 18 seconds:
 <pre>
-SELECT districtfree_view_hazardclosed.Region, districtfree_view_hazardclosed.Province, districtfree_view_hazardclosed.District, districtfree_view_hazardclosed.district_guid, Max(districtfree_view_hazardclosed.StatusChangedDate) AS [Last Hazard Closed]
+SELECT districtfree_view_hazardclosed.Region,
+districtfree_view_hazardclosed.Province,
+districtfree_view_hazardclosed.District,
+districtfree_view_hazardclosed.district_guid,
+Max(districtfree_view_hazardclosed.StatusChangedDate) AS [Last Hazard Closed]
 FROM districtfree_view_hazardclosed
 WHERE (((districtfree_view_hazardclosed.Status) <> "Active" AND (districtfree_view_hazardclosed.Status) <> "Transitional"))
@@ Line 223: / Line 233: @@
 == Summary of recommendations ==__NOEDITSECTION__
 '''Things to check before looking into rewriting the query itself:'''
-* Gather database statistics by running the following command on the database:
+* Refresh the database statistics used by the query optimizer (to determine the best or a good query execution plan) by running the following command on the database:
 <pre>
 ANALYZE;
 </pre>
-Freshly updated database statistics can improve the performance of specific queries (especially after big changes - such as after heavy imports - when the automatic statistics gathering process has not yet run).
+There will be no visible output of that command, but the database statistics will be refreshed. Freshly updated database statistics can improve the performance of specific queries (especially after big changes - such as after heavy imports - when the automatic statistics gathering process has not yet run).
-* Rebuilding the indexes on the affected table(s) by running the following command (specifying the index or table name):
+* Rebuilding the indexes on the table(s) used in the problematic query by running the following command (specifying the index or table name):
 <pre>
 REINDEX { INDEX | TABLE | DATABASE | SYSTEM } name
@@ Line 246: / Line 256: @@
 {{NavBox Hub}}
 [[Category:NoPublic]]
-[[Category:VIE]]

Optimizing views in pgSQL: Difference between revisions

Latest revision as of 20:28, 20 February 2020

Examples

View my_view_hazardreductions in Afghanistan

View annualreport_view_bac in Afghanistan

Problematic WHERE clause example

Summary of recommendations

External links

Navigation menu

Search