Data quality checks before upgrading to V6.0: Difference between revisions

From IMSMA Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 179: Line 179:
Duplicates may have been by migration and import mistakes [[Duplicate allow value|(query)]]. This could be addressed when CDFvalues are moved to standard fields after upgrade.
Duplicates may have been by migration and import mistakes [[Duplicate allow value|(query)]]. This could be addressed when CDFvalues are moved to standard fields after upgrade.


===Invalid date in link table (mandatory)===__NOEDITSECTION__
===Invalid date in tables (mandatory)===__NOEDITSECTION__
The date value 0000-00-00 00:00:00 is invalid and has been found in the table ''link'' [[Link invalid date|(query)]].
The date value 0000-00-00 00:00:00 is invalid and has been found in the tables ''fieldreport'' and ''link'' [[Link invalid date|(query)]].


===Attachment table (mandatory)===__NOEDITSECTION__
===Attachment table (mandatory)===__NOEDITSECTION__

Revision as of 11:15, 3 May 2015

All IMSMA databases are different and all of the steps below are important for getting all countries to a functional V6.0.

Invalid / space coordinates (mandatory)

It was not possible to upgrade to V5.08.04 if there are invalid coordinates in the database. Please run this SQL query anyhow because importing may overrun application rules. If you get any rows here, there are more SQL queries that you may run which will help you identify which object has the invalid coordinates.

MGRS coordinates (mandatory)

In V5.xxx it was possible to enter MGRS with wrong format. The first query checks for coordinates written with lower-case letters e.g. 4qfj12345678 and the second query checks for blanks e.g. 4 QFJ 1234 5678.

Coordinates display rules (mandatory)

In IMSMA NG there are 3 rules on how to handle and display coordinates, based on how they were entered;

  • MGRS,
  • Distance/Bearing
  • X and Y.

When data has been migrated or imported it is possible that the column userinputformat in the table geopoint, which controls coordinate display, has not been set correctly. When the display field is set correctly the columns are high-lighted with yellow. If the user tries to edit a row which does not have one of the rules set, the coordinates will not be populated in the Point window. In 5.08.04 this issue only affects the edit of Auxiliary data coordinates and Data Entry Forms in the Workbench.

Example of valid combinations
coordrefsys coordformat userinputformat userenteredcoord
MGRS MGRS MGRS 18NXL0251208301|en|GB|
MGRS MGRS Bearing and distance 18NXL0251208301|en|GB|
UTM X/Y X and Y 508301|602512|en|GB|
UTM X/Y Bearing and distance 508301|602512|en|GB|
WGS 1984 Decimal Degrees X and Y -74.07584|4.59806|en|GB|
WGS 1984 Decimal Degrees Bearing and distance -74.07584|4.59806|en|GB|
WGS 1984 Degrees:Decimal Minutes X and Y -74:04.55040|004:35.88360|en|GB|
WGS 1984 Degrees:Decimal Minutes Bearing and distance -74:04.55040|004:35.88360|en|GB|
WGS 1984 Degrees:Minutes:Seconds X and Y -74:4:33.0234|4:35:53.016|en|GB|
WGS 1984 Degrees:Minutes:Seconds Bearing and distance -74:4:33.0234|4:35:53.016|en|GB|

The real values of Distance and Bearing are stored in other columns then the ones above and what is stored in userenteredcoord is in the same format as the first point

Later in the upgrade process the calculated area/length of polygons/polylines need to be recalculated and with the upgrade script the field userenteredcoord will be split up in three different columns:

  • user_entered_x,
  • user_entered_y;
  • user_entered_mgrs.

The three fields coordrefsys, coordformat and userinputformat MUST be in a valid combination AND they MUST be consistent with the value of userenteredcoord e.g.

  • coordrefsys = WGS 84
  • coordformat = Degrees:Minutes:Seconds
  • userinputformat = X and Y

in combination with

  • userenteredcoord = 17.2345|25.5678|en|UK

is not a valid combination (query).

If invalid combinations are not corrected, the area/length recalculation and the database upgrade will not work.

Length of Point ID

Point (local) ID should be unique and clearly identify single and polygon points. But it is possible to type blanks instead of characters/numbers. If spaces have been used it is not possible for the users to see difference when entering/editing the points. If you get length 0, 1 or 2 it is recommended to look into the values of Point ID (query).

Location Point Type

The upgrade script should have changed all points that used Location point to Reference point but it does not. If there are any Location points you need to inform the country that the type has changed to Reference points. These queries will do the update.

Missing approval information (mandatory)

Due to changed behaviour of migration/import scripts in version 5.08.02 and a bug in 5.08.04 approval information may be missing. Note that Last updated gets updated e.g. when templates are switched and should not be used for setting Approved Date. If the Data Entry Form that misses Approved Date is an Activity or Education then you might be able to use End date (query).

Column Old label New label
reportReceivedDate Initiated Data Entry Date
reportCompletedDate Submitted Submitted Date
reportVerifiedDate Approved Approved Date
dateofreport Date of Report Date of Information
dataentrydate Last updated (no change)

Missing Date of Information (mandatory)

Date of Information (stored in table fieldreport) is used when IMSMANG calculates the Summary (former Current view). If this data is missing then the result may not be as the expected when the Summary is updated. Note that Last updated gets updated e.g. when templates are switched and should NOT be used for setting Date of Information (query). I have also added a query for checking if any Date of Information is in the future e.g. year 2024.

Missing Local ID (mandatory)

In some database also Form ID and the items' local ID have been missing. The queries do not include Country structure, Organisation, Place, Task and WorkItem.

Data Entry Form Templates

Some countries have many published templates that have never been used which makes it difficult to know which template to update etc. It is also good to know which templates that have been used for data entry in case you need to update or switch them (queries).

DIM categories (mandatory)

If categories have been deleted there will be errors when the upgrade script is applied. As a quick indication if there will be errors, numbers of categories may be used. There should be at least 52 categories.

These 5 categories for Task and Work Item should exists (queries).

Parent Category
TASK GENERAL_INFO
TASK PLANNING_MONITORING
TASK UNCATEGORISED
WORK_ITEM GENERAL_INFO
WORK_ITEM UNCATEGORISED

Number of enumeration categories and values (mandatory)

If the country has deleted standard enums from the table imsmaenum then upgrade scripts will not give the expected result which will create a lot of problems. As a quick indication if there will be errors, numbers of categories and values may be used. There should be at least 117 categories and 978 values in the result set of these queries.

Duplicate enumvalues (mandatory)

Duplicate enumvalues cause import problems, see Duplicate Enumvalue. The upgrade scripts will create some duplicates so it is important to know if there were duplicates in 5.08.04 too (query).

Number of translations (mandatory)

If the country has deleted standard translations from the table translation then upgrade scripts will not give the expected result which will create a lot of problems. The result of this query should be at least 1269 (English) translations.

CDF display format (mandatory)

The combination of cdf_datatype in table customdefinedfield and fieldtype in table field must be correct. When data type has been changed directly in table customdefinedfield or data has been migrated these combination might be wrong (query).

Common errors are.

  • display_mechanism for multi select is set to RADIO_BUTTON;
  • no display_mechanism is set for single select fields;
  • wrong fieldtype for multi and single selects;
  • display_mechanism is "" i.e. length is 0.

The combinations in the table below are the correct 5.08.04 values

fieldtype display_mechanism cdf_datatype
cdf (null) DATE
cdf (null) GAZETTEER
cdf (null) MULTI_SELECT
cdf (null) NUMBER
cdf (null) ORGANISATION
cdf (null) PLACE
select_cdf COMBO_BOX SINGLE_SELECT
select_cdf RADIO_BUTTONS SINGLE_SELECT
cdf (null) TEXT_FIELD

CDF missing in table field (mandatory)

We have had a few cases where CDFs have been missing in the table field or in the table customdefinedfield (queries).

Duplicate CDFs (mandatory)

In 5.08.04 it was allowed that CDFs had the same name and same parent if they had different data type. This query checks if there are duplicates.

Multi-select CDFs (mandatory)

There is a display issue with imported/migrated Multi-select CDFs in data entry templates. Manually entered data or xml imported data is not affected. Novetta has verified that there is a bug in the code in both 5.08.04 and 6.0 (ICR-114) but it will take time until this is fixed so wee need to add an extra step to all import/migration scripts. (query).

CDF never used for Data Entry

Note that a CDF may be included in Data Entry templates, Summary templates, Saved searches, etc. but have no values in table cdfvalue. This is not an error but if they have many non-used CDFs it is an indication that IM procedures may be improved (query).

Records with empty values in table cdfvalue

The next level of check is to check if the value ‘ ‘ have been stored AND if all different data types are NULL. An old bug created empty strings for specially Auxiliary data CDFs. This bug created unnecessary many rows in the table cdfvalue and makes it impossible to delete CDFs that actually never have been used for data entry (query).

Duplicated values in allow_value_set

Duplicates may have been by migration and import mistakes (query). This could be addressed when CDFvalues are moved to standard fields after upgrade.

Invalid date in tables (mandatory)

The date value 0000-00-00 00:00:00 is invalid and has been found in the tables fieldreport and link (query).

Attachment table (mandatory)

If the database was created with an early version of IMSMANG the name of the column Filedescription might be written without leading capital F.

Old tables
hazreducmarkinginfo injury qamethodinfo vegetationinfo
hazreducmarkinginfoversion injuryversion qamethodinfoversion vegetationinfoversion
hazreducmethodinfo markinginfo suitableforinfo
hazreducmethodinfoversion markinginfoversion suitableforinfoversion

If there are old tables like these in the database then there is a high likelihood that the column name is wrong! These tables were used for storage of multi-select values before tables like hazard_has_imsmaenum were introduced. The upgrade functionality will not remove these old tables during upgrade and it is recommended to delete them manually before upgrade due to that they may cause upgrade errors.

Check the table design with Navicat and if necessary update the name of the column to leading capital F (query).

Orphans in Country Structure (mandatory)

If there are orphans in the country structure there will be problems e.g. with creating a staging area database (queries).

Duplicates in the Country Structure (mandatory)

Having duplicates on name (on the same level/node) in the country structure e.g. two villages called “Berg” in the one municipality creates a lot of problems when importing data and creating statistics. Note that the duplicates are not shown in the GUI (query).

Accident - Unknown device (mandatory)

The field Unknown device is an old Yes/No field from Legacy. This field is very often conflicting or duplicate to the values in table accdeviceinfo and therefore it was decided to remove it from the database. The upgrade script will update the Type of Accident depending on the values of Unknown Device, see the rules here. Run the query and highlight to the country that Accident type will be updated.

Inactive items (mandatory)

The upgrade functionality should have set inactive items to active. Here you will find SQL that fix that. You need to notify the country how many were changed.

Data quality queries - Duplicates

The upgrade script will add some of these queries to the database as saved searches. You may run them now as SQL queries or later as Saved searches.

Data quality queries - No geographical data

The upgrade script will add these queries to the database as saved searches. You may run them now as SQL queries or later as Saved searches.

Data quality queries - No links

The upgrade script will add these queries to the database as saved searches. You may run them now as SQL queries or later as Saved searches.

Data quality queries - Different Location

The upgrade script will add these queries to the database as saved searches. You may run them now as SQL queries or later as Saved searches.

Data quality queries - Task

The upgrade script will add these queries to the database as saved searches. You may run them now as SQL queries or later as Saved searches.

Data quality queries - Mine Action quality

The upgrade script will add these queries to the database as saved searches. You may run them now as SQL queries or later as Saved searches. There are more Saved Searches than these mentioned here added to the database but they are referencing to tables/fields that do not exist in 5.08.04.

Delete Data quality queries?

If the country has added the SQL Saved searches developed by GICHD to the database you should delete them since:

  • they are with MySQL syntax and will not work after upgrade
  • confusing for the countries with both MySQL and PostgreSQL Saved searches
  • the upgrade script will throw errors if MySQL and PostgreSQL Saved searches have the same names e.g. Accident Duplicate ID.

If the country has reported that they have Saved searches that does not work and/or Saved searches that they cannot delete then it is recommended to do that before upgrade.

{{#switch:|subgroup|child=|none=|#default=

}}{{#if:|}}{{#if:Upgrade Process|<td style="text-align:left;border-left-width:2px;border-left-style:solid;|{{#if:|}}}}{{#if:|{{#if:IMSMA Hub{{#switch:{{#if:|{{{border}}}|child}}|subgroup|child=|none=|#default=

}}{{#ifeq:|Template|{{#ifeq:{{#if:|{{{border}}}|child}}|child||{{#ifeq:{{#if:|{{{border}}}|child}}|subgroup||{{#switch:data quality checks before upgrading to v6.0

|doc
|sandbox
|testcases =
|#default = {{#switch:
 |plainlist
 |hlist
 |hlist hnum
 |hlist vcard
 |vcard hlist = 
 |#default = 
 }}
}}

}}}}}}|}}{{#if:|{{{group2}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list2}}}

}}{{#if:|{{#if:IMSMA Hub{{#switch:{{#if:|{{{border}}}|child}}|subgroup|child=|none=|#default=

}}{{#ifeq:|Template|{{#ifeq:{{#if:|{{{border}}}|child}}|child||{{#ifeq:{{#if:|{{{border}}}|child}}|subgroup||{{#switch:data quality checks before upgrading to v6.0

|doc
|sandbox
|testcases =
|#default = {{#switch:
 |plainlist
 |hlist
 |hlist hnum
 |hlist vcard
 |vcard hlist = 
 |#default = 
 }}
}}

}}}}}}|}}{{#if:|{{{group3}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list3}}}

}}{{#if:|{{#if:|{{{group4}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list4}}}

}}{{#if:|{{#if:|{{{group5}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list5}}}

}}{{#if:|{{#if:|{{{group6}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list6}}}

}}{{#if:|{{#if:|{{{group7}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list7}}}

}}{{#if:|{{#if:|{{{group8}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list8}}}

}}{{#if:|{{#if:|{{{group9}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list9}}}

}}{{#if:|{{#if:|{{{group10}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list10}}}

}}{{#if:|{{#if:|{{{group11}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list11}}}

}}{{#if:|{{#if:|{{{group12}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list12}}}

}}{{#if:|{{#if:|{{{group13}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list13}}}

}}{{#if:|{{#if:|{{{group14}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list14}}}

}}{{#if:|{{#if:|{{{group15}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list15}}}

}}{{#if:|{{#if:|{{{group16}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list16}}}

}}{{#if:|{{#if:|{{{group17}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list17}}}

}}{{#if:|{{#if:|{{{group18}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list18}}}

}}{{#if:|{{#if:|{{{group19}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list19}}}

}}{{#if:|{{#if:|{{{group20}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list20}}}

}}{{#if:|{{#if:IMSMA Hub{{#switch:{{#if:|{{{border}}}|child}}|subgroup|child=|none=|#default=

}}{{#ifeq:|Template|{{#ifeq:{{#if:|{{{border}}}|child}}|child||{{#ifeq:{{#if:|{{{border}}}|child}}|subgroup||{{#switch:data quality checks before upgrading to v6.0

|doc
|sandbox
|testcases =
|#default = {{#switch:
 |plainlist
 |hlist
 |hlist hnum
 |hlist vcard
 |vcard hlist = 
 |#default = 
 }}
}}

}}}}}}|}}

{{{below}}}

}}{{#switch:|subgroup|child=

|none=|#default=}}{{#ifeq:|Template|{{#ifeq:|child||{{#ifeq:|subgroup||{{#switch:data quality checks before upgrading to v6.0
|doc
|sandbox
|testcases =
|#default = {{#switch:hlist
 |plainlist
 |hlist
 |hlist hnum
 |hlist vcard
 |vcard hlist = 
 |#default = 
 }}
}}

}}}}}}