Contents
Introduction
IMSMANG version 6.0 is building on a PostgreSQL database management system, whereas previous IMSMANG versions were built on a MySQL database. Both systems are relational database systems and using SQL, the Structured Query Language, as means for interacting with it. However, there are some differences between the MySQL and PosrgreSQL syntax. It is not possible to list all differences here, but the main ones encountered in the IMSMANG world are highlighted below. The biggest differences are:
- PostgreSQL is case sensitive
- Built-in functions (e.g. conditional expressions)
- Dates and time formats and queries
- Custom functions
For more details about the PostgreSQL syntax it is advised to consult the official PostgreSQL documentation at [1]. For more details about using SQL statements in IMSMANG import scripts contact your GICHD IM advisor.
PostgreSQL complies to ANSI SQL standards, whereas MySQL does not. |
Most common changes in queries and views from MySQL to PostgreSQL
When upgrading from IMSMANG 5.08.04 to 6.0, the database views need to be adapted, i.e. changed from MySQL syntax to PostgreSQL syntax. The country focal point is taking care of this. For reference, this section describes the main changes encountered. This can also be used as a tutorial for creating views in PostgreSQL, for IMSMANG administrators who were used to creating them in MySQL.
Case sensitivity and column names
PostgreSQL, as opposed to MySQL, is case-sensitive. This means that strings need to be provided in exactly the way they are stored in the database. For example, in the table IMSMAENUM, one out of many values in the column ENUMVALUE is Progress Report, with capital P and capital R. In MySQL it was possible to write:
SELECT * FROM imsmaenum WHERE enumvalue = 'progress report'
and still get one row as result. In PostgreSQL, the same query will return no result.
Therefore, a create view statement like this:
MySQL
CREATE OR REPLACE VIEW my_view AS SELECT ... FROM `imsmaenum`, ... WHERE `imsmaenum`.`enumvalue` = 'progress report';
needs to be changed to:
PostgreSQL
CREATE OR REPLACE VIEW my_view AS SELECT ... FROM imsmaenum, ... WHERE imsmaenum.enumvalue = 'Progress Report';
The single quote ` present in the MySQL queries is an artifact of the tool Navicat from which these queries have been extracted. |
Similarly, in MySQL it is possible to define column names with a mix of upper and lower case as well as blanks and other special characters without any trouble. In PostgreSQL, this is also possible, but in this case the column name must be enclosed in double quotes. For example, the following create view statement:
MySQL
CREATE VIEW my_view AS SELECT `hazreducdeviceinfo`.`hazreduc_guid` AS `hazreduc_guid`, `ordnance`.`model` AS `Device Type` FROM `hazreducdeviceinfo`;
needs to be changed to:
PostgreSQL
CREATE VIEW my_view AS SELECT hazreducdeviceinfo.hazreduc_guid AS hazreduc_guid, ordnance.model AS "Device Type" FROM hazreducdeviceinfo;
This also applies to querying the data. For select rows where the device type is equal to 'xy', the syntax is the following: ... WHERE "Device Type" = 'xy' |
If an column name in PostgreSQL is written with upper case but not enclosed into double quotes, it will be created in lower case in the database. |
CAST AS CHAR CHARSET utf8 and _utf8 (mySQL)
In some (older) MySQL versions, UTF-8 was not the default character set and strings thus needed to be explicitly converted to UTF-8. This is not the case in PostgreSQL and UTF-8 related syntax as used in MySQL does not work in PostgreSQL. Below are two examples encountered in IMSMANG views.
MySQL
AND (`customdefinedfield`.`label` = _utf8'CDF-Area Reduction MDU')
PostgreSQL
AND (customdefinedfield.label = 'CDF-Area Reduction MDU')
MySQL
cast(`geopoint`.`latitude` as char charset utf8) AS `lat`
PostgreSQL
cast(geopoint.latitude as char) AS lat
IF and IFNULL (mySQL) / CASE (PostgreSQL) statements (conditional expressions)
The statements IF() and IFNULL() do not exist in PostgreSQL. In PostgreSQL, the CASE statement is used for conditional expressions. The syntax of the CASE statement is as follows:
CASE WHEN condition THEN result [WHEN ...] [ELSE result] END
Therefore, for example the following create view statement:
MySQL
CREATE VIEW `my_view` AS SELECT sum(if((`my_view2`.`Device Type` = _utf8'SAA'),`my_view2`.`Quantity`,0)) AS`SAA`, sum(if((`my_view2`.`Device Type` = _utf8'AP'),`my_view2`.`Quantity`,0)) AS `AP`, sum(if((`my_view2`.`Device Type` = _utf8'AIED'),`my_view2`.`Quantity`,0)) AS `AIED`, sum(if((`my_view2`.`Device Type` = _utf8'UXO'),`my_view2`.`Quantity`,0)) AS `UXO`, sum(if((`my_view2`.`Device Type` = _utf8'AT'),`my_view2`.`Quantity`,0)) AS `AT`, sum(if((`my_view2`.`Device Type` = _utf8'Fragments'),`my_view2`.`Quantity`,0)) AS `Fragments` FROM `my_view2`;
needs to be changed to:
PostgreSQL
CREATE VIEW my_view AS SELECT sum(case when my_view2.Device Type = 'SAA') then my_view2.Quantity else 0 end) AS "SAA", sum(case when my_view2.Device Type = 'AP') then my_view2.Quantity else 0 end) AS "AP", sum(case when my_view2.Device Type = 'AIED') then my_view2.Quantity else 0 end) AS "AIED", sum(case when my_view2.Device Type = 'UXO') then my_view2.Quantity else 0 end) AS "UXO", sum(case when my_view2.Device Type = 'AT') then my_view2.Quantity else 0 end) AS "AT", sum(case when my_view2.Device Type = 'Fragments') then my_view2.Quantity else 0 end) AS "Fragments" FROM my_view2;
The ISNULL() function can also be translated into a CASE statement:
CASE WHEN field IS NULL THEN result END
Alternatively, the built-in function COALESCE() can be used in PostgreSQL:
SELECT COALESCE(field, 'N/A') FROM table;
This query will return the value of field if it is not null, or the string N/A if field is null. In general, COALESCE() returns the first value that is not null:
SELECT COALESCE(field1, field2, field3, 'N/A') FROM table;
If field1 is null, this query will evaluate field2 -- if field2 is null, it will evaluate field3, etc. If all the provided fields are null, it will return the last provided value, in this case the string N/A.
Date manipulations
There are differences between MySQL and PostgreSQL when it comes to manipulating fields of type DATE, such as for example extracting the year or the month of a date. In MySQL, the related functions are year() and month(). In PostgreSQL, there is an extract() function that does the job. See the below example for reference. The following create view statement:
MySQL
CREATE VIEW `my_view` AS SELECT (year(`my_view2`.`DateofAccident`) - year(`my_view2`.`DateofBirth`)) AS `Age`, year(`my_view2`.`DateofAccident`) AS `Year`, month(`my_view2`.`DateofAccident`) AS `Month` FROM `my_view2`;
needs to be changed to:
PostgreSQL
CREATE VIEW my_view AS SELECT extract(year from my_view2."DateofAccident") - extract(year from my_view2."DateofBirth") AS "Age", extract(year from my_view2."DateofAccident") AS "Year", extract(month from my_view2."DateofAccident") AS "Month" FROM my_view2;
The GROUP BY clause
MySQL has much more "relaxed" rules about using the GROUP BY clause than PostgreSQL has. Therefore, adaptations might need to be done in views using grouping with GROUP BY. The things to consider are the following:
- PostgreSQL: When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or if the ungrouped column is functionally dependent on the grouped columns (cf. [2]). In other words, when the SELECT part of a query or create view statement contains three columns without aggregation and two with an aggregation function, all three columns without the aggregation must be specified in the GROUP BY clause. This is not the case in MySQL.
An example best illustrates that behavior. Consider this MySQL view from an IMSMANG deployment in a country:
CREATE VIEW `annualreport_view_bac` AS select `progress`.`hazreduc_guid` AS `ProgressGUID`, cast(`progress`.`startdate` as date) AS `StartofReportingPeriod`, cast(`progress`.`enddate` as date) AS `EndofReportingPeriod`, `progress`.`areasize` AS `AreaCleared`, sum(if((`annualreport_view_devicesdestroyed`.`Model` = _utf8'SAA'),`annualreport_view_devicesdestroyed`.`Quantity`,0)) AS `SAA`, sum(if((`annualreport_view_devicesdestroyed`.`Model` = _utf8'AP'),`annualreport_view_devicesdestroyed`.`Quantity`,0)) AS `AP`, sum(if((`annualreport_view_devicesdestroyed`.`Model` = _utf8'UXO'),`annualreport_view_devicesdestroyed`.`Quantity`,0)) AS `UXO`, sum(if((`annualreport_view_devicesdestroyed`.`Model` = _utf8'AT'),`annualreport_view_devicesdestroyed`.`Quantity`,0)) AS `AT`, sum(if((`annualreport_view_devicesdestroyed`.`Model` = _utf8'Fragments'),`annualreport_view_devicesdestroyed`.`Quantity`,0)) AS `Fragments`, sum(if((`annualreport_view_devicesdestroyed`.`Model` = _utf8'AIED'),`annualreport_view_devicesdestroyed`.`Quantity`,0)) AS `AIED`, `annualreport_view_clearancetype`.`Clearance Type` AS `ClearanceType`, `my_view_organization`.`ShortName` AS `Agency`, `my_view_gazetteer`.`Region` AS `Region`, `my_view_gazetteer`.`Province` AS `Province`, `my_view_gazetteer`.`District` AS `District`, `my_view_gazetteer`.`Village` AS `Village` from (((((`hazreduc` `Progress` join `imsmaenum` on((`progress`.`hazreductypeenum_guid` = `imsmaenum`.`imsmaenum_guid`))) left join `annualreport_view_devicesdestroyed` on((`annualreport_view_devicesdestroyed`.`ProgressGUID` = `progress`.`hazreduc_guid`))) join `annualreport_view_clearancetype` on((`annualreport_view_clearancetype`.`Hazreduc_guid` = `progress`.`hazreduc_guid`))) join `my_view_organization` on((`progress`.`org_guid` = `my_view_organization`.`org_guid`))) join `my_view_gazetteer` on((`my_view_gazetteer`.`location_guid` = `progress`.`location_guid`))) where (`imsmaenum`.`enumvalue` = _utf8'progress report') group by `progress`.`hazreduc_guid`;
Note the following regarding the above code:
- All the column definitions starting with sum(if... use aggregation functions (in this case sum()).
- Only one column (hazreduc_guid) is present in the GROUP BY clause, but many columns without an aggregation function are specified in the SELECT clause.
In PostgreSQL, even after performing all changes described above on this page, this view definition will not work. The reason is that all the non-aggregation columns of the SELECT statement need to be referred to in the GROUP BY clause. An exception can only be made for those that are dependent on another attribute that is specified in the GROUP BY. In this example, since the hazreduc_guid is in the GROUP BY and it is the primary key of the table HAZREDUC (renamed to progress in the above query), all other columns of that same table (in this case cast(progress.startdate as date), cast(progress.enddate as date) and progress.areasize can be omitted from the GROUP BY clause. Taking into account all the changes to be made as described on this page, and looking into the GROUP BY clause issue, the following code would be equivalent in PostgreSQL:
CREATE VIEW annualreport_view_bac AS select progress.hazreduc_guid AS "ProgressGUID", cast(progress.startdate as date) AS "StartofReportingPeriod", cast(progress.enddate as date) AS "EndofReportingPeriod", progress.areasize AS "AreaCleared", sum(case when annualreport_view_devicesdestroyed.Model = 'SAA' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "SAA", sum(case when annualreport_view_devicesdestroyed.Model = 'AP' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "AP", sum(case when annualreport_view_devicesdestroyed.Model = 'UXO' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "UXO", sum(case when annualreport_view_devicesdestroyed.Model = 'AT' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "AT", sum(case when annualreport_view_devicesdestroyed.Model = 'Fragments' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "Fragments", sum(case when annualreport_view_devicesdestroyed.Model = 'AIED' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "AIED", annualreport_view_clearancetype."Clearance Type" AS "ClearanceType", my_view_organization.ShortName AS "Agency", my_view_gazetteer.Region AS "Region", my_view_gazetteer.Province AS "Province", my_view_gazetteer.District AS "District", my_view_gazetteer.Village AS "Village" from hazreduc progress join imsmaenum on (progress.hazreductypeenum_guid = imsmaenum.imsmaenum_guid) left join annualreport_view_devicesdestroyed on (annualreport_view_devicesdestroyed.ProgressGUID = progress.hazreduc_guid) join annualreport_view_clearancetype on (annualreport_view_clearancetype.Hazreduc_guid = progress.hazreduc_guid) join my_view_organization on (progress.org_guid = my_view_organization.org_guid) join my_view_gazetteer on (my_view_gazetteer.location_guid = progress.location_guid) where (imsmaenum.enumvalue = 'Progress Report') group by progress.hazreduc_guid, cast(progress.startdate as date), cast(progress.enddate as date), progress.areasize, annualreport_view_clearancetype."Clearance Type", my_view_organization.ShortName, my_view_gazetteer.Region, my_view_gazetteer.Province, my_view_gazetteer.District, my_view_gazetteer.Village;
Note that the columns cast(progress.startdate as date), cast(progress.enddate as date) and progress.areasize in the GROUP BY are optional.
The MySQL way of using the GROUP BY can also be emulated by the DISTINCT ON in PostgreSQL. The MySQL query
SELECT a,b,c,d,e FROM table GROUP BY a
can be translated into the PostgreSQL query
SELECT DISTINCT ON (a) a,b,c,d,e FROM table ORDER BY a,b,c
Postgres Tips & Tricks
The following is a collection of useful PostgreSQL queries using special functions.
String manipulations
Concatenation
SELECT hazard_localid||'----'||hazardname AS full_haz_desc FROM hazard;
Substring
SELECT DISTINCT substr(hazreduc_localid, 1, 2) FROM hazreduc;
Rounding numeric values
SELECT round(areasize) FROM hazard; SELECT round(areasize, 2) FROM hazard;
Type conversions
- to_char
- to_date
- to_timestamp
- to_number
- cast
SELECT to_char(areasize, '9999999999') FROM hazard; SELECT cast(areasize as varchar) FROM hazard;
Query for duplicates
SELECT hazard_localid, count(*) FROM hazard GROUP BY hazard_localid HAVING count(*) > 1;
WITH clause
Provides a very convenient way to write auxiliary statements for use in a larger query
WITH my_subquery1 AS ( SELECT * FROM a ), my_subquery2 AS ( SELECT * FROM b ) SELECT * FROM my_subquery1 WHERE my_subquery1.x IN (SELECT x FROM my_subquery2);
References
- PostgreSQL 1.9.4 documentation: http://www.postgresql.org/docs/9.1/static/index.html