Generating the Staging Update Procedure

Once a stage table has been defined in the metadata and created in the database, an update procedure can be generated to handle the joining of any tables and the lookup of the dimension table artificial keys.

Note

You can also generate an update procedure via a template, refer to Rebuilding Update Procedures for details.

Generating a Procedure

To generate a procedure, right-click on the stage table in the left pane and select Properties.
From the Update Procedure drop-down list, select (Build Procedure...).
Click OK to update the properties and start the process of generating the new procedure.
A series of prompts are displayed during the procedure generation to join the tables and link the dimensions.

Procedure Type

The first dialog box asks for the type of procedure that is to be generated. An example of this follows:

Note

Default settings for hint and truncate options are set in Tools > Options > Code Generation > Default Update Procedure Options.

Oracle

Tip

Oracle Only

Select Hint: Enter a database hint to be used in the SELECT statement.
Insert Hint: Enter a database hint to be used in the INSERT statement. Factory default value is APPEND.
Update Hint: Enter a database hint to be used in the UPDATE statement.
Truncate Options: Optional Oracle truncate options such as REUSE STORAGE (default value) or DROP STORAGE.

SQL Server

Tip

SQL Server Only

Enter a database hint to be used in the INSERT statement. Factory default value is TABLOCK.
Enter a database hint to be used in the UPDATE statement.

DB2

Tip

DB2 Only
Use Minimal Logging (for set insert): The staging procedure uses the LOAD FROM CURSOR option in DB2 to minimize database logging.

A number of different types of procedure generation are available. Each type is discussed below.
There is a check-box at the bottom of the dialog box. This check-box is only present if advanced procedure building features are enabled.
This check-box allows the editing of the 'Where' clause when no table joining is being performed, and hence the 'Where' clause would not be exposed.

Cursor Based Procedure

This option creates a procedure that loops through each source transaction performing specific tasks on each transaction before writing the transaction to the stage table. The tasks performed include the lookup of each dimension key, and any bespoke value assignments or adjustments.

Sorted Cursor Procedure

This is a variant on the regular cursor procedure. In this case dimension key lookups only occur when there is a change in the business key for a given dimension. This can improve the performance of the procedure particularly when one or more of the dimensions has a low cardinality. Where slowly changing dimensions are used this advantage is lessened as the relative date of the dimension records also comes into play.

Set based procedure

A set based procedure performs one SQL statement to join all the dimensions and source tables together and then insert this data into the stage table. This is normally the fastest method of building the stage table. It is also the least flexible method and is easily prone to incorrect syntax. Caution and experience must be used with this option to ensure that all source transactions are processed and that they are processed correctly. This option is only recommended for the experienced practitioner where performance is the overriding requirement. Particular care must be taken with regards to NULL values in the business keys that are used to make the table joins. The generated code deliberately does not handle such null values.

Set + Cursor Procedure

This option is normally used where a number of dimension keys will fail to join and we wish to automatically add new dimension records. The first phase of the generated code will perform a set based join and insert of the data. The dimensions that failed to join will be assigned the zero key value. The second phase of the procedure will use a cursor to cycle through all the stage records that have a zero key dimension join. By setting the appropriate flags in the dimension joins when building this procedure we can request that any missing dimension records be automatically added.

Set Merge Procedure

This option is to allow the merging of two or more identical tables. The tables to be merged must have exactly the same number of columns and column names. If necessary additional blank columns could be added to one or other of the tables to make them identical. To use this procedure you must simply have the tables to be merged mentioned at least once in the Source Table field of a column's properties. Refer to Set Merge Procedure for details.

Set Distinct

Essentially the same as Set, except for the DISTINCT key word being added to the SELECT statement. This option therefore removes duplicate rows.

Set Minus

The Set Minus option can be used to determine change data or for programmatic referential integrity checking. This option works in a similar way to Set Merge. It generates SQL code in this form: SELECT ... FROM source_table1 {where} MINUS SELECT ... FROM source_table2 {where}. It requires exactly two source tables to be specified. All source columns must exist in both source tables.

Bulk Bind Procedure

The bulk bind procedure is only available for Oracle. It uses memory arrays to join all the information before writing it to the table. It can be the fastest method of performing a stage table update when a lot of dimension joins are being used.
There are some other additional options available in this dialog common to SQL Server, Oracle and DB2:

Allow modification of the Where clause: the 'Where' clause dialog is displayed automatically if the table has more than one source table, but not for a single source table stage table unless this option is chosen.
Distinct data select: choosing this option adds the DISTINCT key word to the SELECT statement in the staging procedure.

Parameters

If WhereScape RED parameters exist in the metadata, the following window is displayed. Any parameters selected in this dialog (by moving them to the right pane), are included in the generated update procedure as variables.

The variables can also be used in column transformations and in the From/Where clause for the update procedure.
Some databases have a 30 character limit for variable names. WhereScape RED ensures the variables added for any parameters are less than 30 characters long by creating variable names in the form v_ followed by the first 28 characters of the parameter name.
For example, a parameter called MINIMUM_ORDER_NUMBER_SINCE_LAST_SOURCE_LOAD will be available as the variable v_MINIMUM_ORDER_NUMBER_SINCE_L.

Tip

WhereScape RED parameters should be unique within the first 28 characters to avoid conflicting variables names.
If the desired parameter doesn't exist in the metadata yet, a new parameter can be added by clicking on the Add New button on the bottom leftmost corner of the Select Parameters dialog.

Refer to Parameters for more information on WhereScape RED Parameters.

Multiple Source Tables

WhereScape RED provides a wizard that enables you to graphically generate the join clause for an object with multiple source tables. The Source Join wizard enables you to specify the primary (driving) table and the source tables to join. A graphical representation of the tables and the join criteria is displayed by the wizard which you can manipulate to change the defined joins.

The Source Join wizard guides you in generating the join clause through the following steps:

Choose Primary Table The first source table encountered is the default primary (driving) table which can be changed before any joins are defined.
Choose Tables to Join Once the primary table is set, the remaining source tables are listed in this drop down. Select a table to join and then click Add. The two tables are graphically added in pane 4, which is the basis for the join displayed in pane 3.
Defined Joins The join type defaults to Inner join but this can be changed from the drop down.
Join criteria of selected join The column(s) used to join the two tables can be joined manually by clicking the corresponding join column connection point from the table on the left and then dragging the line that appears to the connection point of the required column in the table on the right.Alternatively, if a column with a matching or similar name exists in the left hand table(s) then a context menu is provided on the columns on the right-hand table. You can right click the column to display the context menu and then select a column from the list.
Tip
To aid manual column join, you can expand or collapse columns in the tables on the left and toggle between natural and alphabetic column ordering.
Once the join column(s) for the table pair are defined, then any additional source tables can be added from the drop-down list in step 2 and the above column mapping process repeat to join this table.
This process is repeated until all source tables are joined.
Clicking the Next button or the Join Query tab progresses to the join editor window, where custom joins, database functions, etc. can be added.

Note

If RED is unable to parse a join statement, the user cannot navigate back to the define join tab and the following warning is displayed:

Dimension Joins

For each dimension key, a dialog appears asking for the business key from the stage table that matches the business key for the dimension.
In the example below, we are asked for the stage table business key for the customer dimension. The dimension name is shown both on the first prompt line and at the lower left side of the window.
The customer dimension has a unique business key named code.
We must provide the corresponding business key from the staging table. In the case of our example, this is the customer_code column.

Click OK after the correct business key has been entered.
If the business key does not exist and will be derived from another dimension or from some form of lookup, then enter any column and edit the procedure once produced.

Note

The Add Text button and the associated message and edit box are only shown if the user possess a full license, thus enabling the advanced procedure build options. When the Add Text button is clicked, any data in its edit box is placed in the Stage table column list. In this way, a number or string can be assigned as part or all of a Dimension join.

Slowly Changing Dimension information

If the dimension being joined was defined as a slowly changing dimension, then an additional window appears, as shown below. This window asks for a date field in the Stage table that enables RED to determine which version of the slowly changing dimension (the customer_name field, below) to use based on the specified date range in the Customer Dimension.
Select the appropriate date field for your business needs and click OK. If you wish to take the last (or current) version for the dimension, select No Date.

For Example:
As shown in the screen above, we have defined the customer_name as an item in the dimension that we expect to have versions for over time, e.g. each time the data warehouse processing sees a new customer_name value, the dimension will record the date range for that version's validity even though the business key (customer_code in this example) remains the same. This implies we want to create a new dimensional record whenever a customer name is changed even though the customer_code remains the same.
Let's say a customer changes their name on the 5th of the month. If the Staging Table Dates field is set to order_date, any order received before the 5th of the month is identified under the old customer name and any order received on or after the 5th has the new customer name.
Alternatively, by setting the Staging Table Dates to ship_date, we can specify that any order shipped on or after the 5th of the month is shipped with the new name.

Staging Business Key

Once all the dimensional joins have been defined, we will be asked to define the unique business key for the staging table. This is the column or columns that allow us to uniquely identify each record in the staging table.
In the example below, the unique business key is a combination of the order_number and the order_line_no.

Note

The order of the columns in the business key list is set to the order that the columns appear in the table. This is done to optimize performance for any indexes.
NULL VALUES: None of the columns chosen as the business key should ever contain a NULL value. See the note in the start of the Dimension chapter.

The Include Update Statement check-box provides two options in the generated procedure. If selected, then an update/insert combination will be included. If clear, then only an Insert statement will be included in the procedure. If cleared, you must be sure that the business key for the table is unique, otherwise either a duplicate key error will occur or records will be lost, depending on the type of procedure.

Index re-creation

Finally, you are asked if you want to drop and re-create any indexes defined on the table:

Click Yes to drop and re-create.
No to leave existing indexes in place.

Building and Compiling the Procedure

Once the above questions are completed, the procedure is built and compiled automatically.
If the compile fails, an error is displayed along with the first few lines of error messages. Compilation failures typically occur when the physical creation of the table was not done.
If the compile fails for some other reason, the best approach is to use the procedure editor to edit and compile the procedure. The procedure editor highlights all the errors within the context of the procedure.
Once the procedure has been successfully compiled, it can either be executed interactively or passed to the scheduler.

Content

Space Tools

Generating a Procedure

Procedure Type

Oracle

SQL Server

DB2

Cursor Based Procedure

Sorted Cursor Procedure

Set based procedure

Set + Cursor Procedure

Set Merge Procedure

Set Distinct

Set Minus

Bulk Bind Procedure

Parameters

Multiple Source Tables

Dimension Joins

Slowly Changing Dimension information

Staging Business Key

Index re-creation

Building and Compiling the Procedure