Once a dimension has been defined in the meta data and created in the database, an update procedure can be generated to handle the joining of any tables and the update of the dimension records.

Note

The zero key row

WhereScape RED by default, inserts a record into the dimension with an artificial key value of zero. This record is used to link any fact records that do not have valid dimension joins.
The values of the various columns in this record are acquired from the contents of the field Zero Key Value which is set in the Properties screen of each dimension column. 

Generating a Procedure

  1. To generate a procedure, right-click on the dimension name to edit the properties for the dimension.
  2. From the Update Procedure drop-down list select (Build Procedure...).
  3. Click OK to update the properties and start the process of generating the new procedure.
  4. A Procedure Build Type dialog appears, allowing you to select between a Cursor and Set procedure build types from the drop-down list.

A Set based procedure performs one SQL statement to join all the source tables together and then insert this data into the Dimension. This is normally the fastest method of populating a table.
See below in this section for different options using the Cursor procedure building type.

Set Based Procedure Building types

  1. Select Set from the drop-down menu.
  2. The following window appears to enter the Update Procedure Build Options.
  3. Click on the ellipsis button on the top rightmost corner to select the Business Key Columns for the dimension.

Processing tab

Business key

A business key can be made up of multiple columns, but it must provide a unique identifier. Where multiple columns uniquely and separately identify the dimension, choose one to act as the primary business key.
For example, a source table may have a unique constraint on both a product code and a product description. Therefore, the description as well as the code must be unique. It is of course possible to combine the two columns, but the normal practice would be to choose the code as the business key.

Tip

  • Use the column name ascending/descending buttons to sort column names. To revert to the meta column order, click on the meta column order button.
  • NULL Values - none of the columns chosen as the business key should ever contain a NULL value. See the note at the start of the Dimensions chapter.

Parameters

If WhereScape RED parameters exist in the metadata, the following window is displayed. Any parameters selected in this window (by moving them to the right pane), are included in the generated update procedure as variables.
The procedure includes code to retrieve the value of the parameter at run time and store it in the declared variable.
 
The variables can also be used in column transformations and in the from/where clause for the update procedure. Some databases have a 30 character limit for variable names. WhereScape RED ensures the variables added for any parameters are less than 30 characters long by creating variable names in the form v_ followed by the first 28 characters of the parameter name.
For example, a parameter called MINIMUM_ORDER_NUMBER_SINCE_LAST_SOURCE_LOAD will be available as the variable v_MINIMUM_ORDER_NUMBER_SINCE_L.

Tip

  • WhereScape RED parameters should be unique within the first 28 characters to avoid conflicting variables names.
  • If the desired parameter doesn't exist in the metadata yet, a new parameter can be added by clicking on the Add New button on the bottom leftmost corner of the Select Parameters dialog.

Refer to Parameters for more information about WhereScape RED Parameters.

Fields

Description

Enable Parallel DML

Oracle Only - Adds all the code required to the update procedure to enable Oracle parallel inserts. The default for this option is not set.

Insert Zero Key Record

This option enables you to include the zero key (unknown record) for tables with an artificial key. The default for this field is set.

Include Initial Load insert

This option adds an additional insert statement to the update procedure that runs if the target Dimension Object is empty. The benefit of this is improved performance inserting into an empty table without performing any checks to see if rows already exist. The default for this field is FALSE (i.e. an initial insert statement is not added to the procedure).

Process by Batch

This field enables the user to select a column to break up the data being processed in a loop based on the distinct values in the field. The update procedure loops on this field and performs the delete, update and/or insert for each value. If the field chosen is a date datatype (date, datetime or timestamp), then the user is able to specify yearly, monthly, daily or column level looping. The default for this field is False (do not do batch processing).

Batch Processing Field

Enables selecting a field to batch process on. If you select a date field you will have the ability to process by date part. If you select a join field to process by you can choose and attribute of that related table to group by.

Include Explicit Lock

Enables locking of the table to avoid concurrent transactions. Double click the Lock Clause field to open a window that enables you to create the lock clause to insert.

Delete before insert

This option enables a delete statement to be added to the update procedure before any update or insert statement. This is a particularly useful option for purging old data and for updates based on a source system batch number. The default for this field is No which automatically grays out the Issue Warning if a Delete Occurs and the Delete Where Clause fields.

  • Issue Warning if a Delete occurs - sets the procedure to a warning state if deletes occur.
  • Delete Where Clause - The delete where clause is appended to the generated delete statement to constrain the rows deleted.

Process Method

Enables selecting whether the table should be updated using an Insert/Update statement or a Merge statement.

Source Table Locking

Enables a locking request modifier to be specified for each source table. The specified locking request modifier is applied to each source table during generated update procedures. By default this is set to 'ACCESS' which locks each row being accessed, a blank entry will result in no locking clause in the generated procedure. This option may also be presented in a separate dialog

Include Insert Statement

The Include Insert Statement option includes an insert statement in the procedure to insert new rows in the Dimension. If this option is set, the Insert New Rows Only option is available. If this option is turned off, the update procedure will not contain an insert statement. The default for this field is set (i.e. an insert statement is included).

Insert New Rows only

The insert new rows only option uses change detection to work out what rows require inserting.

  • New Row Identification Method - Method used to identify that the records in the source are not currently recorded in the target table.

Insert Hint

Enter a database hint to be used in the INSERT statement. This is an Oracle and SQL Server only option. Defaults can be configured in Tools > Options > Default Update Procedure Options.

  Default is TABLOCK.

   Default is APPEND.

Insert New Rows Only

Uses change detection to work out what rows require inserting.

New Row Identification Method

Method used to identify that records in source are not currently recorded in the target table.

Existing Data Selection Hint

Database-compliant hint to be used for the existing data select statement.

Include Update Statement

Includes an update statement in the procedure to update changing rows in the Data Store. If this option is chosen, then the Update Changed rows only option is available.

Include Hint

Enter a database hint to be used in the INCLUDE statement. This is an Oracle and SQL Server only option. Defaults can be configured in Tools > Options > Default Update Procedure Options.

  Default is TABLOCK.

   Default is APPEND.

Update Changed Rows Only

Uses change detection to work out what rows require updating. Choosing this option, enables the Change Row identification Method.

Change Row Identification Method

Method used to identify that records in source have changed from what is currently recorded in the target table.

Existing Data Selection Hint

Database-compliant hint to be used for the existing data select statement.

Merge Hint

Enter a database hint to be used in the MERGE statement. This is an Oracle and SQL Server only option. Defaults can be configured in Tools > Options > Default Update Procedure Options.

  Default is TABLOCK.

   Default is APPEND.

Source tab

If only one source table was used, adding the Business Key and checking the above fields is enough to proceed, otherwise use the Source tab to Join the relevant tables.

Fields

Description

Distinct Data Select

Ensures duplicate rows are not added to the Dimension. This is achieved by the word DISTINCT being added to the source select in the update procedure. The default for this field is not set.

Source Join

The From clause, including Source Join information. See example below for Joining multiple source tables.

Where Clause

The Where Clause. Use as a filter to extract only the necessary records that fulfill a specified criteria.

Group By

The Group by clause. Use in collaboration with the SELECT statement to arrange identical data into groups.

Joining multiple source tables

WhereScape RED provides a wizard that enables you to graphically generate the join clause for an object with multiple source tables. The Source Join wizard enables you to specify the primary (driving) table and the source tables to join. A graphical representation of the tables and the join criteria is displayed by the wizard which you can manipulate to change the defined joins.
 
 
The Source Join wizard guides you in generating the Join Clause through the following steps:

  1. Choose Primary Table
    The first source table encountered is the default primary (driving) table which can be changed before any joins are defined.
  2. Choose Tables to Join
    Once the primary table is set, the remaining source tables are listed in this drop down. Select a table to join and then click Add. The two tables are added to pane 4 and the basis for the join is displayed in pane 3.
  3. Defined Joins
    The join type defaults to Inner join but this can be changed from the drop down.
  4. Join criteria of selected join
    The column(s) used to join the two tables can be joined manually by clicking the corresponding join column connection point from the left-hand table and then dragging the line that appears to the connection point of the required column in the table on the right.
    Alternatively, if a column with a matching or similar name exists in the left hand table(s) then a context menu is provided on the columns on the right-hand table. You can right click on the column to display the context menu and then select a column from the list.

    Tip

    To aid manual column join, you can expand or collapse columns in the left-hand tables and toggle between natural and alphabetic column ordering.
  5. Once the join column(s) for the table pair are defined, then any additional source tables can be added from the drop-down list on step 2 and the above column mapping process repeat to join this table.This process is repeated until all source tables are joined.
  6. Clicking the Next button or the Join query tab progresses to the join editor window, where custom joins, database functions, etc. can be added.

Note

If RED is unable to parse a join statement, the user cannot navigate back to the Define joins tab and the following warning is displayed:

Change Detection tab

For Slowly Changing Dimension columns. If the dimension was defined as a Slowly Changing Dimension, the Change Detection tab is displayed in the table Update Build Options screen.

Note

For custom database targets, you can set the initial default values for the DSS columns that are used in the procedure generation for slowly changing dimensions. Please refer to the section DSS Columns for Custom Targets for details. The default values set are populated in the corresponding fields of the Change Detection tab.

A change detection field(s) must be selected for this Dimension type.

Fields

Description

Change Detection Fields

Click the ellipsis button to select the change detection fields that are required for the dimension

Null Support

If the Add Null support is selected, the change detect column management caters for null values in any change detect columns. If this option is not selected and null values are present, errors may occur running the update procedure. The default for this value is ON (i.e. null values are catered for).

Update Current Records Only

The update current record only option only applies changes to non-change detect columns on the current record. If this option is not selected, all past (non-current) rows will also be updated to reflect changes to non-change detect columns. The default for this value is ON (i.e. only the current record is updated).

Reset Dates to Initial Values

Resets dss_start date and dss_end_date date values to original values.

Start Date for Initial Member

The start date for initial member field contains the start date for the first version of a particular business key. The value should be specified in an appropriate form, taking into account the default date format in the databases. The date may need to be explicitly cast to the current data type. The default value provided is usually cast to the correct database and can be treated as a template. The default for this field is 1 January 1900.

End Date for Current Member

The end date for current member field contains the end date for the current version (the row with a current flag of Y and the maximum version number) of a particular business key. The value must be specified in an appropriate form, taking into account the default date format in the databases. The date may need to be explicitly cast to the current data type. The default value provided is usually cast to the correct database and can be treated as a template. The default for this field is 31 December 2999.

Start Date for New Member Entry

The start date for new member entry field contains the start date for any subsequent rows added to the history table (not the first row for a particular business key i.e. not version 1). The value must be specified in an appropriate form, taking into account the default date format in the databases. The date may need to be explicitly cast to the current data type. The default value provided is usually cast to the correct database and can be treated as a template. The default for this field is the current date and time.

End Date for Expiring Member Entry

The end date for the expiring member entry field contains the end date for any rows updated no longer to no longer be the current row in the history table (i.e. rows that are replaced by a new current row). The value must be specified in an appropriate form, taking into account the default date format in the databases. The date may need to be explicitly cast to the current data type. The default value provided is usually cast to the correct database and can be treated as a template. The default for this field is the current date and time less an arbitrary small amount (for SQL Server this is 0.00000005 of a day, or about 4 thousandth of a second).

Cursor procedure building types

Source Join / Lookup

If columns from multiple source tables were used to create the Dimension with a Cursor Procedure building type, the window below prompt users to Define the Primary Dimension source table.
The fields need to be joined in the Source Join field.
 
The choice provided is either to create a join that combines all the information from the various tables into one large select statement, or to do a series of lookups.
The normal process would be to join the tables to create one large select statement. There are however situations where this is not desirable or possible. When a large number of tables are involved then a join is slower than a series of lookups. In some situations, we may not have all the information to do an initial join. A series of lookups may be required to build up the information.
For complex situations, it is necessary to edit and enhance the generated procedure code.
If Lookup is chosen, then the primary table for the Dimension must be chosen. The philosophy here is that a series of lookups is conducted against the secondary tables using information from this primary table.

Lookup

If multiple source tables were used to build the Dimension, then an option would have appeared in an earlier window to choose between a join or a lookup.
If Lookup was chosen, then a window appears prompting for the joins between the primary table and each secondary (lookup) table.
The load_product table is the main table and a lookup is being performed on the load_prod_line table. Columns are selected from the drop-down column list for each table and then the Add Join button clicked to add these two columns to the 'Where' clause as shown in the bottom box.
This 'Where' clause may be edited and goes to form a lookup select statement within the main cursor loop of the update procedure. In this example the 'Where' clause has had the second line manually added to identify the type of lookup in the code table.
 
Multiple lookups can occur on the same table. Where we have a generic code table, for example, we may do multiple lookups to get descriptions.
The check-box Multiple Lookups Required on This Table results in repeated lookups against the second table until all columns sourced from this table have been utilized in the lookup statements.
If the Multiple Lookup check-box is selected, then the following window appears to enable selection of the columns that are to be retrieved by the lookup.

Business Key

Define the Business Key Columns by adding one or more business key columns in the right pane of the Define Dimension Business Key Columns window.

Allow Where Clause Editing

  • Not Exists Select - if this check-box is selected, it adds additional code to the main select statement. This code excludes every row in the source table that has not changed in the Dimension table and can result in faster execution of the update procedure for large Dimensions.
  • Add a hierarchy for Cubes - this check-box opens another window that enables the definition of a simple hierarchy structure. This is useful if the hierarchy of the Dimension is known and the Dimension is used in the creation of Analysis services cubes. Hierarchies may be defined now or later when required. They may also be defined or modified after the dimension has been created.

Slowly Changing columns

If the Dimension was defined as a slowly changing dimension, then the following additional window appears.
The first requests the selection of the columns to be managed as Slowly Changing Dimension columns.
Select the required columns and then click OK to proceed.
In the example below, the customer name is to be managed as a Slowly Changing column in a customer Dimension.
 
The advantages and disadvantages of Slowly Changing Dimensions are discussed earlier in this chapter, but as a general rule try to minimize their usage, as they invariably complicate the processing and end user queries.
The following window appears if one or more columns are chosen for management.
 
NULL values are the enemy of a successful data warehouse. They result in unreliable query results and can often lead to a lack of confidence in the data.
If a column is considered important enough to be managed as a Slowly Changing column, then it should not normally contain null values. It is often best to ensure that a Null cannot occur by using a IsNull() (SQL Server), Nvl() (Oracle) or Coalesce() (DB2) transformation when loading the column.
If however, this situation is unavoidable then answering Yes to this question results in additional code to test for Nulls, during the update of the Dimension records.
If No is answered and Nulls are encountered then a unique constraint violation occurs during the Dimension update.

Building and Compiling the Procedure

  • Once the above questions are completed the procedure is built and compiled automatically.
  • If the compile fails an error is displayed along with the first few lines of error messages. Compile fails typically occur when the physical creation of the table was not done.
  • If the compile fails for some other reason, the best approach is to use the Procedure editor to edit and compile the procedure. The procedure editor highlights all the errors within the context of the procedure.
  • Once the procedure has been successfully compiled, it can either be executed interactively or passed to the RED Scheduler.

Indexes

By default, a number of indexes is created to support the Dimension. These indexes is added once the procedure has been built.
An example of the type of indexes created is as follows:
 
This example shows three indexes created:

  1. A primary key constraint placed on the artificial key for the Dimension.
  2. A unique index placed on the business key for the Dimension.
  3. A unique index placed on the business key and a Slowly Changing column from the Dimension.

This third index is only created when a Slowly Changing Dimension is chosen.
Additional indexes can be added, or these indexes changed. Refer to Indexes for details.
 

  • No labels