Data quality checks before upgrading to V6.0

From IMSMA Wiki
Revision as of 12:32, 11 July 2014 by Rana (talk | contribs)
Jump to navigation Jump to search
All IMSMA databases are different and all of the steps below are important for getting all countries to a functional V6.0.

Invalid / space coordinates (mandatory)

It was not possible to upgrade to V5.08.04 if there are invalid coordinates in the database. Please run this SQL query anyhow because importing may overrun application rules. If you get any rows here, there are more SQL queries that you may run which will help you identify which object has the invalid coordinates.

MGRS coordinates (mandatory)

In V5.xxx it was possible to enter MGRS with wrong format. The first query checks for coordinates written with lower-case letters e.g. 4qfj12345678 and the second query checks for blanks e.g. 4 QFJ 1234 5678.

Coordinates display rules (mandatory)

In IMSMA NG there are 3 rules on how to handle and display coordinates, based on how they were entered;

  • MGRS,
  • Distance/Bearing
  • X and Y.

When data has been migrated or imported it is possible that the column userinputformat in the table geopoint, which controls coordinate display, has not been set correctly. When the display field is set correctly the columns are high-lighted with yellow. If the user tries to edit a row which does not have one of the rules set, the coordinates will not be populated in the Point window. This issue only affects the edit of Auxiliary data coordinates and Data Entry Forms in the Workbench (query).

The three fields coordrefsys, coordformat and userinputformat must be in a valid combination and must be consistent with the value of userenteredcoord e.g. (WGS 84, Degrees:Minutes:Seconds, X and Y) with 17.2345|25.5678|en|UK is not a valid combination. If these records are not corrected the area/length recalculation and the upgrade will not work (query).

Length of Point ID

Point (local) ID should be unique and clearly identify single and polygon points. But it is possible to type blanks instead of characters/numbers. If spaces have been used it is not possible for the users to see difference when entering/editing the points. If you get length 0, 1 or 2 it is recommended to look into the values of Point ID (query).

Missing approval information (mandatory)

Due to changed behaviour of migration/import scripts in version 5.08.02 and a bug in 5.08.04 approval information may be missing. Note that Last updated gets updated e.g. when templates are switched and should not be used for setting Approved Date. If the Data Entry Form that misses Approved Date is an Activity or Education then you might be able to use End date (query).

Column Old label New label
reportReceivedDate Initiated Data Entry Date
reportCompletedDate Submitted Submitted Date
reportVerifiedDate Approved Approved Date
dateofreport Date of Report Date of Information
dataentrydate Last updated (no change)

Missing Date of Information (mandatory)

Date of Information (stored in table fieldreport) is used when IMSMANG calculates the Summary (former Current view). If this data is missing then the result may not be as the expected when the Summary is updated. Note that Last updated gets updated e.g. when templates are switched and should not be used for setting Date of Information (query).

Missing Local ID (mandatory)

In some database also Form ID and the items' local ID have been missing. The queries do not include Country structure, Organisation, Place, Task and WorkItem.

Data Entry Form Templates

Some countries have many published templates that have never been used which makes it difficult to know which template to update etc. It is also good to know which templates that have been used for data entry in case you need to update or switch them (queries).

DIM categories (mandatory)

If categories have been deleted there will be errors when the upgrade script is applied. As a quick indication if there will be errors, numbers of categories may be used. There should be at least 52 categories.

These 5 categories for Task and Work Item should exists (queries).

Parent Category
TASK GENERAL_INFO
TASK PLANNING_MONITORING
TASK UNCATEGORISED
WORK_ITEM GENERAL_INFO
WORK_ITEM UNCATEGORISED

Number of enumeration categories and values (mandatory)

If the country has deleted standard enums from the table imsmaenum then upgrade scripts will not give the expected result which will create a lot of problems. As a quick indication if there will be errors, numbers of categories and values may be used. There should be at least 117 categories and 978 values in the result set of these queries.

Duplicate enumvalues (mandatory)

Duplicate enumvalues cause import problems, see Duplicate Enumvalue. The upgrade scripts will create some duplicates so it is important to know if there were duplicates in 5.08.04 too (query).

Number of translations (mandatory)

If the country has deleted standard translations from the table translation then upgrade scripts will not give the expected result which will create a lot of problems. The result of this query should be at least 1269 (English) translations.

CDF display format (mandatory)

The combination of cdf_datatype in table customdefinedfield and fieldtype in table field must be correct. When data type has been changed directly in table customdefinedfield or data has been migrated these combination might be wrong (query).

Common errors are.

  • display_mechanism for multi select is set to RADIO_BUTTON;
  • no display_mechanism is set for single select fields;
  • wrong fieldtype for multi and single selects;
  • display_mechanism is "" i.e. length is 0.

The combinations in the table below is the correct 5.08.04 values

fieldtype display_mechanism cdf_datatype
cdf (null) DATE
cdf (null) GAZETTEER
cdf (null) MULTI_SELECT
cdf (null) NUMBER
cdf (null) ORGANISATION
cdf (null) PLACE
select_cdf COMBO_BOX SINGLE_SELECT
select_cdf RADIO_BUTTONS SINGLE_SELECT
cdf (null) TEXT_FIELD

CDF missing in table field (mandatory)

We have had a few cases where CDFs have been missing in the table field or in the table customdefinedfield (queries).

Duplicate CDFs (mandatory)

In 5.08.04 it was allowed that CDFs had the same name and same parent if they had different data type. This query checks if there are duplicates.

Multi-select CDFs (mandatory)

There is a display issue with imported/migrated Multi-select CDFs in data entry templates. Manually entered data is not affected. Novetta is currently (2014-06-18) looking into a long-term solution for this. The short-term solution is a database update and to ask the country if they import/migrate regularly any multi-select CDFs (query).

CDF never used for Data Entry

Note that a CDF may be included in Data Entry templates, Summary templates, Saved searches, etc. but have no values in table cdfvalue. This is not an error but if they have many non-used CDFs it is an indication that IM procedures may be improved (query).

Records with empty values in table cdfvalue

The next level of check is to check if the value ‘ ‘ have been stored AND if all different data types are NULL. An old bug created empty strings for specially Auxiliary data CDFs. This bug created unnecessary many rows in the table cdfvalue and makes it impossible to delete CDFs that actually never have been used for data entry (query).

Duplicated values in allow_value_set

Duplicates may have been by migration and import mistakes (query). This could be addressed when CDFvalues are moved to standard fields after upgrade.

Invalid date in link table (mandatory)

The date value 0000-00-00 00:00:00 is invalid and has been found in the table link (query).

Attachment table (mandatory)

One country had changed the name of the column Filedescription. Note that the name has to start with capital F. Check this with Navicat and update the name if necessary in Table design (query).

Orphans in Country Structure (mandatory)

If there are orphans in the country structure there will be problems e.g. with creating a staging area database (queries).

Duplicates in the Country Structure (mandatory)

Having duplicates on name (on the same level/node) in the country structure e.g. two villages called “Berg” in the one municipality creates a lot of problems when importing data and creating statistics. Note that the duplicates are not shown in the GUI (query).

Accident - Unknown device (mandatory)

The field Unknown device is an old Yes/No field from Legacy. This field is very often conflicting or duplicate to the values in table accdeviceinfo and therefore it was decided to remove it from the database. The upgrade script will update the Type of Accident depending on the values of Unknown Device, see the rules here. Run the query and highlight to the country that Accident type will be updated.

Data quality queries - Duplicates

The upgrade script will add some of these queries to the database as saved searches. You may run them now as SQL queries or later as Saved searches.

Data quality queries - No geographical data

The upgrade script will add these queries to the database as saved searches. You may run them now as SQL queries or later as Saved searches.

Data quality queries - No links

The upgrade script will add these queries to the database as saved searches. You may run them now as SQL queries or later as Saved searches.

Data quality queries - Different Location

The upgrade script will add these queries to the database as saved searches. You may run them now as SQL queries or later as Saved searches.

Data quality queries - Task

The upgrade script will add these queries to the database as saved searches. You may run them now as SQL queries or later as Saved searches.

Data quality queries - Mine Action quality

The upgrade script will add these queries to the database as saved searches. You may run them now as SQL queries or later as Saved searches. There are more Saved Searches than these mentioned here added to the database but they are referencing to tables/fields that do not exist in 5.08.04.

{{#switch:|subgroup|child=|none=|#default=

}}{{#if:|}}{{#if:Upgrade Process|<td style="text-align:left;border-left-width:2px;border-left-style:solid;|{{#if:|}}}}{{#if:|{{#if:IMSMA Hub{{#switch:{{#if:|{{{border}}}|child}}|subgroup|child=|none=|#default=

}}{{#ifeq:|Template|{{#ifeq:{{#if:|{{{border}}}|child}}|child||{{#ifeq:{{#if:|{{{border}}}|child}}|subgroup||{{#switch:data quality checks before upgrading to v6.0

|doc
|sandbox
|testcases =
|#default = {{#switch:
 |plainlist
 |hlist
 |hlist hnum
 |hlist vcard
 |vcard hlist = 
 |#default = 
 }}
}}

}}}}}}|}}{{#if:|{{{group2}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list2}}}

}}{{#if:|{{#if:IMSMA Hub{{#switch:{{#if:|{{{border}}}|child}}|subgroup|child=|none=|#default=

}}{{#ifeq:|Template|{{#ifeq:{{#if:|{{{border}}}|child}}|child||{{#ifeq:{{#if:|{{{border}}}|child}}|subgroup||{{#switch:data quality checks before upgrading to v6.0

|doc
|sandbox
|testcases =
|#default = {{#switch:
 |plainlist
 |hlist
 |hlist hnum
 |hlist vcard
 |vcard hlist = 
 |#default = 
 }}
}}

}}}}}}|}}{{#if:|{{{group3}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list3}}}

}}{{#if:|{{#if:|{{{group4}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list4}}}

}}{{#if:|{{#if:|{{{group5}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list5}}}

}}{{#if:|{{#if:|{{{group6}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list6}}}

}}{{#if:|{{#if:|{{{group7}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list7}}}

}}{{#if:|{{#if:|{{{group8}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list8}}}

}}{{#if:|{{#if:|{{{group9}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list9}}}

}}{{#if:|{{#if:|{{{group10}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list10}}}

}}{{#if:|{{#if:|{{{group11}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list11}}}

}}{{#if:|{{#if:|{{{group12}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list12}}}

}}{{#if:|{{#if:|{{{group13}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list13}}}

}}{{#if:|{{#if:|{{{group14}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list14}}}

}}{{#if:|{{#if:|{{{group15}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list15}}}

}}{{#if:|{{#if:|{{{group16}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list16}}}

}}{{#if:|{{#if:|{{{group17}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list17}}}

}}{{#if:|{{#if:|{{{group18}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list18}}}

}}{{#if:|{{#if:|{{{group19}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list19}}}

}}{{#if:|{{#if:|{{{group20}}}<td style="text-align:left;border-left-width:2px;border-left-style:solid;|

{{{list20}}}

}}{{#if:|{{#if:IMSMA Hub{{#switch:{{#if:|{{{border}}}|child}}|subgroup|child=|none=|#default=

}}{{#ifeq:|Template|{{#ifeq:{{#if:|{{{border}}}|child}}|child||{{#ifeq:{{#if:|{{{border}}}|child}}|subgroup||{{#switch:data quality checks before upgrading to v6.0

|doc
|sandbox
|testcases =
|#default = {{#switch:
 |plainlist
 |hlist
 |hlist hnum
 |hlist vcard
 |vcard hlist = 
 |#default = 
 }}
}}

}}}}}}|}}

{{{below}}}

}}{{#switch:|subgroup|child=

|none=|#default=}}{{#ifeq:|Template|{{#ifeq:|child||{{#ifeq:|subgroup||{{#switch:data quality checks before upgrading to v6.0
|doc
|sandbox
|testcases =
|#default = {{#switch:hlist
 |plainlist
 |hlist
 |hlist hnum
 |hlist vcard
 |vcard hlist = 
 |#default = 
 }}
}}

}}}}}}