Difference between revisions of "Staging Area Generator"

From IMSMA Wiki
Jump to: navigation, search
(11 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
{{TOC right}}
 
{{TOC right}}
The IMSMA Staging Area Generator is a tool that creates a staging area from an {{IMSMANG}} database. It is thus part of the {{IMSMANG}} tool set, but not included in the {{IMSMANG}} installation package. IMSMA Staging Area Generator thus needs to be installed separately. Please contact your [[Management Consulting Team | GICHD IM advisor]] in order to get the latest installation package.
+
The IMSMA Staging Area Generator is a tool that creates a staging area from an {{IMSMANG}} database. It is thus part of the {{IMSMANG}} tool set, but not included in the {{IMSMANG}} installation package. IMSMA Staging Area Generator thus needs to be installed separately. Please contact your [[Information Management Team | GICHD IM advisor]] in order to get the latest installation package.
  
 
{{Note | The Staging Area Generator can only be used with {{IMSMANG}} 6.0 and higher.}}
 
{{Note | The Staging Area Generator can only be used with {{IMSMANG}} 6.0 and higher.}}
 
==Basic Principles==__NOEDITSECTION__
 
==Basic Principles==__NOEDITSECTION__
The Staging Area Generator takes an {{IMSMANG}} database as input and outputs a staging area database, i.e. a flattened version of the IMSMA database, containing only '''approved''' data. The structure of the staging area and a details on the data it contains can be found on the [[IMSMA Staging Area|IMSMA Staging Area]] page.
+
The Staging Area Generator takes an {{IMSMANG}} database as input and outputs a staging area database, i.e. a flattened version of the IMSMA database, containing only '''approved''' data. The structure of the staging area and a details on the data it contains can be found on the [[Staging Area Database| IMSMA Staging Area Database]] page.
==Where to install and use it==__NOEDITSECTION__
 
In terms of the technical architecture for IMSMA and the staging area, there are the following options:
 
* The staging area can be generated on the same server and database server as IMSMA. In this case, the tool is run locally and makes only local connections to the source and target databases.
 
* The staging area can be generated on another server and database server than IMSMA. For example, there could be a dedicated reporting server, in order not to overload the server on which data entry in IMSMA is done. In this case, the Staging Area Generator has to be run on the '''target server'''.
 
== Installing the IMSMA Staging Area Generator==__NOEDITSECTION__
 
Prior to installing the Staging Area Generator
 
<ol>
 
<li>To install the staging area generator (SAG) double-click on the '''etl.exe''' file.</li>
 
<li>The graphical installer then starts. On the first screen, choose the language you would like to use for the installation and click '''OK'''.
 
[[Image:SAGInstallStep1.png|center|150px]]</li>
 
<li>Navigate the Installation panels of the graphical installer as described below:
 
<ol>
 
<li>On the Welcome screen, click '''Next'''.</li>
 
<li>On the Infromation screen, click '''Next'''.</li>
 
<li>We strongly recommend to use the default installation path for the Staging Area Generator. Click '''Next'''.</li>
 
<li>If the target folder (C:\IMSMAETLTool) does not already exist a message will be shown, click '''OK'''. </li>
 
<li>In the packages selection panel, click '''Next'''.</li>
 
<li>Wait for the installation progress bar to finish, then click '''Next'''.</li>
 
<li>When the Setup Shortcuts panel displays, click '''Next'''.</li>
 
<li>When the installation has completed successfully, click '''Done'''.</li>
 
</li>
 
</ol>
 
</ol>
 
After the installation, the following menu entry will be visible in the Start menu &rarr; All Programs &rarr; IMSMA:
 
[[File:Staging_Area_Generator_Icon.png]]<br />
 
  
==Using the IMSMA Staging Area Generator==__NOEDITSECTION__
+
When creating the Staging Area database with the [[Using Staging Area Generator | Staging Area Generator]] is possible to choose:
To start the Staging Area Generator, click on the Staging Area Generator Icon in the Start menu &rarr; All Programs &rarr; IMSMA. This will open the following graphical user interface.
+
* which language to use
[[Image:Staging_Area_Generator_GUI.png|center]]<br />
+
* which Country structure name to use (main or any of the alternative names)
{| class="wikitable" width="1000"
+
* if PostGIS geodata should be created
|-
+
* which projection to use
| align="center" colspan="2" | '''Sections in the Staging Area Generator user interface'''
+
* to do data post-processing.
|-
 
| width="200pt" | '''Section'''
 
| width="800pt" | '''Description'''
 
|-
 
| align="left" | '''1 - Source Database'''
 
| align="left" | This section defines the connection of the source database, i.e. the {{IMSMANG}} database from which a staging area should be generated. If the source and the target databases should be on the same server, the default connection parameters can be kept. If the staging area is generated on another server, then the IP address of the source server hosting the {{IMSMANG}} database has to be provided in the '''Host''' field. Remember to update the password in the '''Password''' field if you have changed the default imsma password on your {{IMSMANG}} installation.
 
|-
 
| align="left" | '''2 - Target Database'''
 
| align="left" | This section defined the connection to the target database, i.e. the staging area database to be created. The parameters to be provided are the following:
 
* '''Host:''' since the SAG tool has to be run on the target server, the value for this field should always be '''localhost'''
 
* '''Port:''' port of your target Postgres database server installation - the default is 5432 (which is also the port on which IMSMA is running on the source)
 
* '''Database:''' this is the name of the staging area database to be created. You can specify any name here, using '''lower-case alphanumeric characters and underscores'''. If a database with the provided name '''does not yet exist''', the SAG tool will create a database with that name on the target database server. If a database with the provided name '''already exists''', the content will be '''dropped''' and '''recreated''' by the SAG tool. If, for any reason, the database cannot be created, an error message will be displayed. For example, if the specified target database name contains upper-case letters, the following error message will be displayed:
 
[[Image:SAG_error_msg_wrong_db_name.png|center|450px]]
 
Contact your GICHD IM advisor if you are unsure how to solve that.
 
* '''User:''' this is the username of the database user that should be the '''owner''' of the staging area database. The default is imsma - however, this will only work if the staging area is created on the same database server on which IMSMA is running. On another, independent, database, there might not be any user named imsma. In that case, another, existing, user can be provided, or the database superuser named ''postgres'' can be used.
 
* '''Password:''' password of the above user.
 
* '''Postgres User:''' the above defined user that was specified as the owner of the staging area database might not have the permissions to actually ''create'' the new staging area database. Therefore, a user with enough permissions has to be provided here. The default is '''postgres''', which is the superuser having all necessary permissions.
 
* '''Postgres Password:''' password of the above user - in case of doubt, leave the default (''password'').
 
|-
 
| align="left" | '''3 - Post Processing SQL Scripts (optional)'''
 
| align="left" | In this optional field it is possible to provide the full path and name to a '''validated SQL file'''. Whatever is specified inside that file will be executed on the staging area database after its generation. This can be valuable in several use cases, for example:
 
* If a staging area with only a subset of the IMSMA data should be generated (in this case the SQL file would include DELETE statements)
 
* If sensitive data (e.g. victim names and addresses) should not be transferred to the staging area (in this case the SQL file would include UPDATE statements to set all the victim names and addresses to NULL)
 
* If data has to be pre-computed for reporting purposes
 
* If additional database views are necessary, also for reporting and analysis purposes
 
Contact your GICHD IM advisor for help in developing SQL statements according to specific requirements.
 
|-
 
| align="left" | '''Validate Selections'''
 
| align="left" | After having provided all required connection details (sections 1 and 2), click on '''Validate Selections'''. The following actions will be performed:
 
* A connection to the source database will be attempted
 
* A connection to the target database will be attempted - if a database with the specified name does not yet exist, it will be created
 
* Some data will be read from the source database in order to populate the '''Additional Options''' section.
 
In case of an error, the related message usually specifies what the exact issue is.
 
|-
 
| align="left" | '''4 - Additional Options'''
 
| align="left" | This section is only enabled after a successful validation of the connection details.
 
* '''Conversion Language:''' specifies the language in which the staging area database should be created. This refers to the translation concept in {{IMSMANG}}. For example, the English values for the status of a Land include ''Open'', ''Worked on'' and ''Closed''. However, in IMSMA translations of these terms into other languages can be defined. Considering this example, the option chosen in the Conversion Language field specifies which value will be written into the status_enum field of the HAZARD table in the staging area. If a translation does not exist in the specified language, it will default to English or, if this is not available, to the IMSMA base term. When translations are heavily used, it would be possible for instance to create one staging area in language 1 and another one in language 2, with the same data but values in different languages.
 
* '''Gazetteer Seq No:''' this option allows to specify the flattening of the country structure into the main object tables, if gazetteer alternative names are used in {{IMSMANG}}. Alternative names in the country structure, if defined, have a sequence number. For example, if for each country structure element two alternative names are defined, the first one has sequence number 0 , the second one sequence number 1. If the standard gazetteer names should be used in the staging area, choose '''None'''. Otherwise, choose the appropriate sequence number.
 
* '''Do Geodata:''' this is a switch to enable/disable the generation of geodata. The staging area can either be created with non-geodata only, or including all the coordinates (points, polylines and polygons), i.e. the data stored in the GEOPOINT, GEOSPATIALINFO and related tables in {{IMSMANG}}. Refer to the section on geodata below for more details.
 
* '''Target SRID:''' when generating geodata in the staging area, the data can be re-projected. The Target SRID allows to define the projection. The default, 4326, corresponds to '''WGS 84'''. For a complete list of available projections, issue the following query on an empty staging area (i.e. the one created after clicking on '''Validate Selections'''):
 
<pre>
 
select *
 
from spatial_ref_sys
 
</pre>
 
|-
 
|}
 
[[Image:Staging_Area_Generator_GUI_options.png|center]]
 
  
Finally, click on '''Start''' to start the staging area generation. In the next window, click on '''Start''' once more:
+
It is easy to [[Scheduling the Staging Area creation | schedule the creation]] of the Staging Area database.
[[Image:Data_Staging_Progreess.png|center|400px]]
 
A log with all the steps is written in this window as the generation process goes on. This might take some time, depending on the size of the source IMSMA database. The range is from very few minutes up to two hours for very big databases. Once the generation is finished, the newly generated staging area database can be explored with a database query tool such as ''pgAdmin3'' or ''Navicat'', and reporting tools can be connected to it.
 
  
==Scheduling the Staging Area creation==__NOEDITSECTION__
 
The above described generation of a staging area needs human interaction. However, once all the options are fixed, there might be the requirement to re-generate the staging area regularly, in order to have an up-to-date reporting database. It is possible to automatize the staging area generation by following these steps:
 
<ol>
 
<li>In the C:\IMSMAETLTool folder, create a .bat script named for example '''runStagingArea.bat''' with the following content:
 
<pre>
 
cd C:\IMSMAETLTool
 
call bin\runnogui.bat
 
</pre>
 
</li>
 
<li>Edit the properties file C:\IMSMAETLTool\conf\config.PROPERTIES by specifying the required values for all the parameters (they correspond to the ones from the graphical user interface described above). If a parameter should not be used, e.g. '''script_file''', then comment it by placing a # at the beginning of the line:
 
<pre>
 
conversion_language=en
 
gazetteer_sequence_level=0
 
#script_file=scripfile.sql
 
</pre>
 
</li>
 
<li>Schedule a task in an operating system's task scheduler. For example, using Windows 7, follow those steps:
 
<ol>
 
<li>Go to Control Panel &rarr; Administrative Tools &rarr; Task Scheduler</li>
 
<li>In the Task Scheduler interface, click on '''Create Basic Task...'''</li>
 
<li>In the Create Basic Task Wizard, provide a name, e.g. '''Staging Area Generation''' and an optional description. Click '''Next'''.</li>
 
<li>In the next window, specify the required recurrence (daily, weekly, etc.) and click '''Next'''.</li>
 
<li>Next, refine recurrence criteria (e.g. time of the day) and click '''Next'''.</li>
 
<li>In the next window, select '''Start a Program''' and click '''Next'''.</li>
 
<li>Browser to the previously created script, e.g. C:\IMSMAETLTool\runStagingArea.bat. Click '''Next''' and then '''Finish'''.</li>
 
</ol>
 
</li>
 
</ol>
 
The staging area generation should now start at the specified time. <br/>
 
[[Image:SAGTaskScheduler.png|center|800px]]
 
It is advised to plan the generation time carefully, i.e. best during the night when no one is accessing and entering data into {{IMSMANG}}, as this might lead to inconsistencies. Ideally, the IMSMA server should be stopped before the staging area generation and then restarted. It also needs to be assured that no one (and no automatic reporting process) is accessing the staging area database at the time it has to be re-created. As long as there are open connections to the database, it cannot be dropped and re-created. Contact your GICHD IM advisor in case of questions.
 
 
{{NavBox Business Intelligence}}
 
{{NavBox Business Intelligence}}
 
[[Category:VIE]]
 
[[Category:VIE]]

Revision as of 19:54, 29 September 2015

The IMSMA Staging Area Generator is a tool that creates a staging area from an IMSMANG database. It is thus part of the IMSMANG tool set, but not included in the IMSMANG installation package. IMSMA Staging Area Generator thus needs to be installed separately. Please contact your GICHD IM advisor in order to get the latest installation package.

Note.jpg The Staging Area Generator can only be used with IMSMANG 6.0 and higher.

Basic Principles

The Staging Area Generator takes an IMSMANG database as input and outputs a staging area database, i.e. a flattened version of the IMSMA database, containing only approved data. The structure of the staging area and a details on the data it contains can be found on the IMSMA Staging Area Database page.

When creating the Staging Area database with the Staging Area Generator is possible to choose:

  • which language to use
  • which Country structure name to use (main or any of the alternative names)
  • if PostGIS geodata should be created
  • which projection to use
  • to do data post-processing.

It is easy to schedule the creation of the Staging Area database.