Difference MySQL PostGreSQL

Revision as of 10:29, 19 June 2014 by Evinek (talk | contribs)
Ambox warning blue construction.png This page is under construction

Contents

Introduction

IMSMANG version 6.0 is building on a PostgreSQL database management system, whereas previous IMSMANG versions were built on a MySQL database. Both systems are relational database systems and using SQL, the Structured Query Language, as means for interacting with it. However, there are some differences between the MySQL and PosrgreSQL syntax. It is not possible to list all differences here, but the main ones encountered in the IMSMANG world are highlighted below. The biggest differences are:

  • PostgreSQL is case sensitive
  • Built-in functions (e.g. conditional expressions)
  • Dates and time formats and queries
  • Custom functions

For more details about the PostgreSQL syntax it is advised to consult the official PostgreSQL documentation at [1].

Note.jpg PostgreSQL complies to ANSI SQL standards, whereas MySQL does not.

Changes in database views from MySQL to PostreSQL

When upgrading from IMSMANG 5.08.04 to 6.0, the database views need to be adapted, i.e. changed from MySQL syntax to PostgreSQL syntax. The country focal point is taking care of this. For reference, this section describes the main changes encountered. This can also be used as a tutorial for creating views in PostgreSQL, for IMSMANG administrators who were used to creating them in MySQL.

Case sensitivity and column names

PostgreSQL, as opposed to MySQL, is case-sensitive. This means that strings need to be provided in exactly the way they are stored in the database. For example, in the table IMSMAENUM, one out of many values in the column ENUMVALUE is Progress Report, with capital P and capital R. In MySQL it was possible to write:

SELECT *
FROM imsmaenum
WHERE enumvalue = 'progress report'

and still get one row as result. In PostgreSQL, the same query will return no result.

Therefore, a create view statement like this:
MySQL

CREATE OR REPLACE VIEW my_view AS
SELECT ...
FROM imsmaenum, ...
WHERE imsmaenum.enumvalue = 'progress report';

needs to be changed to:
PostgreSQL

CREATE OR REPLACE VIEW my_view AS
SELECT ...
FROM imsmaenum, ...
WHERE imsmaenum.enumvalue = 'Progress Report';

Similarly, in MySQL it is possible to define column names with a mix of upper and lower case as well as blanks and other special characters without any trouble. In PostgreSQL, this is also possible, but in this case the column name must be enclosed in double quotes. For example, the following create view statement:
MySQL

CREATE VIEW my_view AS
SELECT hazreducdeviceinfo.hazreduc_guid AS hazreduc_guid,
ordnance.model AS Device Type
FROM hazreducdeviceinfo;

needs to be changed to:
PostgreSQL

CREATE VIEW my_view AS
SELECT hazreducdeviceinfo.hazreduc_guid AS hazreduc_guid,
ordnance.model AS "Device Type"
FROM hazreducdeviceinfo;
Note.jpg This also applies to querying the data. For select rows where the device type is equal to 'xy', the syntax is the following:
 ... WHERE "Device Type" = 'xy'
Note.jpg If an column name in PostgreSQL is written with upper case but not enclosed into double quotes, it will be created in lower case in the database.

CAST AS CHAR CHARSET utf8 and _utf8 (mySQL)

In some (older) MySQL versions, UTF-8 was not the default character set and strings thus needed to be explicitly converted to UTF-8. This is not the case in PostgreSQL and UTF-8 related syntax as used in MySQL does not work in PostgreSQL. Below are two examples encountered in IMSMANG views.

MySQL

AND (`customdefinedfield`.`label` = _utf8'CDF-Area Reduction MDU')

PostgreSQL

AND (customdefinedfield.label = 'CDF-Area Reduction MDU')

MySQL

cast(`geopoint`.`latitude` as char charset utf8) AS `lat`

PostgreSQL

cast(geopoint.latitude as char) AS lat

IF and IFNULL (mySQL) / CASE (PostgreSQL) statements (conditional expressions)

The statements IF() and IFNULL() do not exist in PostgreSQL. In PostgreSQL, the CASE statement is used for conditional expressions. The syntax of the CASE statement is as follows:

CASE WHEN condition THEN result
     [WHEN ...]
     [ELSE result]
END

Therefore, for example the following create view statement:
MySQL

CREATE VIEW `my_view` AS
SELECT
sum(if((`my_view2`.`Device Type` = _utf8'SAA'),`my_view2`.`Quantity`,0)) AS`SAA`,
sum(if((`my_view2`.`Device Type` = _utf8'AP'),`my_view2`.`Quantity`,0)) AS `AP`,
sum(if((`my_view2`.`Device Type` = _utf8'AIED'),`my_view2`.`Quantity`,0)) AS `AIED`,
sum(if((`my_view2`.`Device Type` = _utf8'UXO'),`my_view2`.`Quantity`,0)) AS `UXO`,
sum(if((`my_view2`.`Device Type` = _utf8'AT'),`my_view2`.`Quantity`,0)) AS `AT`,
sum(if((`my_view2`.`Device Type` = _utf8'Fragments'),`my_view2`.`Quantity`,0)) AS `Fragments`
FROM
`my_view2`;

needs to be changed to:
PostgreSQL

CREATE VIEW my_view AS
SELECT
sum(case when my_view2.Device Type = 'SAA') then my_view2.Quantity else 0 end) AS "SAA",
sum(case when my_view2.Device Type = 'AP') then my_view2.Quantity else 0 end) AS "AP",
sum(case when my_view2.Device Type = 'AIED') then my_view2.Quantity else 0 end) AS "AIED",
sum(case when my_view2.Device Type = 'UXO') then my_view2.Quantity else 0 end) AS "UXO",
sum(case when my_view2.Device Type = 'AT') then my_view2.Quantity else 0 end) AS "AT",
sum(case when my_view2.Device Type = 'Fragments') then my_view2.Quantity else 0 end) AS "Fragments"
FROM
my_view2;

The ISNULL() function can also be translated into a CASE statement:

CASE WHEN field IS NULL THEN result
END

Alternatively, the built-in function COALESCE() can be used in PostgreSQL:

SELECT COALESCE(field, 'N/A')
FROM table;

This query will return the value of field if it is not null, or the string N/A if field is null. In general, COALESCE() returns the first value that is not null:

SELECT COALESCE(field1, field2, field3, 'N/A')
FROM table;

If field1 is null, this query will evaluate field2 -- if field2 is null, it will evaluate field3, etc. If all the provided fields are null, it will return the last provided value, in this case the string N/A.

Date manipulations

There are differences between MySQL and PostgreSQL when it comes to manipulating fields of type DATE, such as for example extracting the year or the month of a date. In MySQL, the related functions are year() and month(). In PostgreSQL, there is an extract() function that does the job. See the below example for reference. The following create view statement:
MySQL

CREATE VIEW `my_view` AS
SELECT
(year(`my_view2`.`DateofAccident`) - year(`my_view2`.`DateofBirth`)) AS `Age`,
year(`my_view2`.`DateofAccident`) AS `Year`,
month(`my_view2`.`DateofAccident`) AS `Month`
FROM `my_view2`;

needs to be changed to:
PostgreSQL

CREATE VIEW my_view AS
SELECT
extract(year from my_view2."DateofAccident") - extract(year from my_view2."DateofBirth") AS "Age",
extract(year from my_view2."DateofAccident") AS "Year",
extract(month from my_view2."DateofAccident") AS "Month"
FROM my_view2;

The GROUP BY clause

MySQL has much more "relaxed" rules about using the GROUP BY clause than PostgreSQL has. Therefore, adaptations might need to be done in views using grouping with GROUP BY. The things to consider are the following:

  • PostgreSQL: When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or if the ungrouped column is functionally dependent on the grouped columns (cf. [2]). In other words, when the SELECT part of a query or create view statement contains three columns without aggregation and two with an aggregation function, all three columns without the aggregation must be specified in the GROUP BY clause. This is not the case in MySQL.

An example best illustrates that behavior. Consider this MySQL view from an IMSMANG deployment in a country:

CREATE VIEW `annualreport_view_bac` AS
select `progress`.`hazreduc_guid` AS `ProgressGUID`,
cast(`progress`.`startdate` as date) AS `StartofReportingPeriod`,
cast(`progress`.`enddate` as date) AS `EndofReportingPeriod`,
`progress`.`areasize` AS `AreaCleared`,
sum(if((`annualreport_view_devicesdestroyed`.`Model` = _utf8'SAA'),`annualreport_view_devicesdestroyed`.`Quantity`,0)) AS `SAA`,
sum(if((`annualreport_view_devicesdestroyed`.`Model` = _utf8'AP'),`annualreport_view_devicesdestroyed`.`Quantity`,0)) AS `AP`,
sum(if((`annualreport_view_devicesdestroyed`.`Model` = _utf8'UXO'),`annualreport_view_devicesdestroyed`.`Quantity`,0)) AS `UXO`,
sum(if((`annualreport_view_devicesdestroyed`.`Model` = _utf8'AT'),`annualreport_view_devicesdestroyed`.`Quantity`,0)) AS `AT`,
sum(if((`annualreport_view_devicesdestroyed`.`Model` = _utf8'Fragments'),`annualreport_view_devicesdestroyed`.`Quantity`,0)) AS `Fragments`,
sum(if((`annualreport_view_devicesdestroyed`.`Model` = _utf8'AIED'),`annualreport_view_devicesdestroyed`.`Quantity`,0)) AS `AIED`,
`annualreport_view_clearancetype`.`Clearance Type` AS `ClearanceType`,
`my_view_organization`.`ShortName` AS `Agency`,
`my_view_gazetteer`.`Region` AS `Region`,
`my_view_gazetteer`.`Province` AS `Province`,
`my_view_gazetteer`.`District` AS `District`,
`my_view_gazetteer`.`Village` AS `Village`
from (((((`hazreduc` `Progress`
join `imsmaenum` on((`progress`.`hazreductypeenum_guid` = `imsmaenum`.`imsmaenum_guid`)))
left join `annualreport_view_devicesdestroyed` on((`annualreport_view_devicesdestroyed`.`ProgressGUID` = `progress`.`hazreduc_guid`))) 
join `annualreport_view_clearancetype` on((`annualreport_view_clearancetype`.`Hazreduc_guid` = `progress`.`hazreduc_guid`)))
join `my_view_organization` on((`progress`.`org_guid` = `my_view_organization`.`org_guid`)))
join `my_view_gazetteer` on((`my_view_gazetteer`.`location_guid` = `progress`.`location_guid`)))
where (`imsmaenum`.`enumvalue` = _utf8'progress report')
group by `progress`.`hazreduc_guid`;

Note the following regarding the above code:

  • All the column definitions starting with sum(if... use aggregation functions (in this case sum()).
  • Only one column (hazreduc_guid) is present in the GROUP BY clause, but many columns without an aggregation function are specified in the SELECT clause.

In PostgreSQL, even after performing all changes described above on this page, this view definition will not work. The reason is that all the non-aggregation columns of the SELECT statement need to be referred to in the GROUP BY clause. An exception can only be made for those that are dependent on another attribute that is specified in the GROUP BY. In this example, since the hazreduc_guid is in the GROUP BY and it is the primary key of the table HAZREDUC (renamed to progress in the above query), all other columns of that same table (in this case cast(progress.startdate as date), cast(progress.enddate as date) and progress.areasize can be omitted from the GROUP BY clause. Taking into account all the changes to be made as described on this page, and looking into the GROUP BY clause issue, the following code would be equivalent in PostgreSQL:

CREATE VIEW annualreport_view_bac AS
select progress.hazreduc_guid AS "ProgressGUID",
cast(progress.startdate as date) AS "StartofReportingPeriod",
cast(progress.enddate as date) AS "EndofReportingPeriod",
progress.areasize AS "AreaCleared",
sum(case when annualreport_view_devicesdestroyed.Model = 'SAA' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "SAA",
sum(case when annualreport_view_devicesdestroyed.Model = 'AP' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "AP",
sum(case when annualreport_view_devicesdestroyed.Model = 'UXO' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "UXO",
sum(case when annualreport_view_devicesdestroyed.Model = 'AT' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "AT",
sum(case when annualreport_view_devicesdestroyed.Model = 'Fragments' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "Fragments",
sum(case when annualreport_view_devicesdestroyed.Model = 'AIED' then annualreport_view_devicesdestroyed.Quantity else 0 end) AS "AIED",
annualreport_view_clearancetype."Clearance Type" AS "ClearanceType",
my_view_organization.ShortName AS "Agency",
my_view_gazetteer.Region AS "Region",
my_view_gazetteer.Province AS "Province",
my_view_gazetteer.District AS "District",
my_view_gazetteer.Village AS "Village"
from hazreduc progress
join imsmaenum on (progress.hazreductypeenum_guid = imsmaenum.imsmaenum_guid)
left join annualreport_view_devicesdestroyed on (annualreport_view_devicesdestroyed.ProgressGUID = progress.hazreduc_guid)
join annualreport_view_clearancetype on (annualreport_view_clearancetype.Hazreduc_guid = progress.hazreduc_guid)
join my_view_organization on (progress.org_guid = my_view_organization.org_guid)
join my_view_gazetteer on (my_view_gazetteer.location_guid = progress.location_guid)
where (imsmaenum.enumvalue = 'Progress Report')
group by progress.hazreduc_guid,
cast(progress.startdate as date),
cast(progress.enddate as date),
progress.areasize,
annualreport_view_clearancetype."Clearance Type",
my_view_organization.ShortName,
my_view_gazetteer.Region,
my_view_gazetteer.Province,
my_view_gazetteer.District,
my_view_gazetteer.Village;

Note that the columns cast(progress.startdate as date), cast(progress.enddate as date) and progress.areasize in the GROUP BY are optional.

Note.jpg Although the PostgreSQL version can seem more "tedious" here, it is actually more correct and standard-compliant. In fact, it is important to specify all the groupings, otherwise the system (e.g. MySQL) will choose the remaining ones and apply some sorting to do so.

Template:Important

The MySQL way of using the GROUP BY can also be emulated by the DISTINCT ON in PostgreSQL. The MySQL query

SELECT a,b,c,d,e FROM table GROUP BY a

can be translated into the PostgreSQL query

SELECT DISTINCT ON (a) a,b,c,d,e FROM table ORDER BY a,b,c

Postgres Tips & Tricks

The following is a collection of useful PostgreSQL queries using special functions.

String manipulations

Concatenation

SELECT hazard_localid||'----'||hazardname AS full_haz_desc
FROM hazard;

Substring

SELECT DISTINCT substr(hazreduc_localid, 1, 2)
FROM hazreduc;

Rounding numeric values

SELECT round(areasize)
FROM hazard;

SELECT round(areasize, 2)
FROM hazard;

Type conversions

  • to_char
  • to_date
  • to_timestamp
  • to_number
  • cast
SELECT to_char(areasize, '9999999999')
FROM hazard;

SELECT cast(areasize as varchar)
FROM hazard;

Query for duplicates

SELECT hazard_localid, count(*)
FROM hazard
GROUP BY hazard_localid
HAVING count(*) > 1;

WITH clause

Provides a very convenient way to write auxiliary statements for use in a larger query

WITH my_subquery1 AS (
        SELECT *
        FROM a
     ), my_subquery2 AS (
        SELECT *
        FROM b
     )
SELECT *
FROM my_subquery1
WHERE my_subquery1.x IN (SELECT x FROM my_subquery2);

References