A dimension table is normally defined, for our purposes, as a table that enables us to constrain queries on the fact table.A dimension is built from the Data Warehouse connection. Unless you are doing a retro-fit of an existing system, dimensions are typically built from one or more load tables.
The normal steps for creating a dimension are defined below and are covered in this chapter. The steps are:
- Identify the source transactional data that will constitute the dimension. If the data is sourced from multiple tables, ascertain if a join between the source tables is possible, or if a series of lookups would be a better option.
- Using the 'drag and drop' functionality, drag the load table that is the primary source of information for the dimension into a dimension target. Refer to Building a Dimension for details.
- If only one table is being sourced and most of the columns are to be used (or if prototyping) you can select the auto create option to build and load the dimension and skip the next 4 steps. Refer to Building a Dimension for details.
- Add columns from other load tables if required. Refer to Building a Dimension for details.
- Create the dimension table in the database. Refer to Building a Dimension for details.
- Build the update procedure. Refer to Generating the Dimension Update Procedure for details.
- Run the update procedure and analyze the results. Refer to Dimension Initial Build for details.
- Modify the update procedure as required. Refer to Dimension Initial Build for details.
Dimension Keys
Dimensions have two types of keys that we will refer to frequently. These are the Business Key and the Artificial Key. A definition of these two key types follows:
Artificial Key
The artificial key is the unique identifier that is used to join a dimension record to a Fact table. When joining dimensions to Fact tables it would be possible to perform the join using the business key. For Fact tables with a large number of records this however would result in slow query times and very large indexes. As query time is one of our key drivers in data warehouse implementations, the best answer is to always use some form of artificial key. A price is paid in the additional processing required to build the Fact table rows, but this is offset by the reduced query times and index sizes.
The artificial key is an integer and is built sequentially from 1 upwards.
Refer to Dimension Artificial Keys for a more detailed explanation. An artificial key is sometimes referred to as a surrogate key.
Business Key
The business key is the column or columns that uniquely identify a record within the dimension. Where the dimension maps back to a single or a main table in the source system, it is usually possible to ascertain the business key by looking at the unique keys for that source table. Some people refer to the business key as the natural key. Examples of business keys are:
- The product SKU in a product dimension
- The customer code in a customer dimension
- The calendar date in a date dimension
- The 24 hour time in a time dimension (i.e. HHMM) (e.g.1710)
- The airport short code in an airport dimension.
It is assumed that business keys will never be NULL. If a null value is possible in a business key then the generated code will need to be modified to handle the null value by assigning some default value.
For example the 'Where' clause in a dimension update may become:
- SQL Server: Where isnull(business_key,'N/A') = isnull(v_LoadRec.business_key,'N/A')
- DB2: Where coalesce(business_key,'N/A') = coalesce(v_LoadRec.business_key,'N/A')
Business keys are assumed to never be Null. If they could be null it is best to transform them to some value before dimension or stage table update. If this is not done, an unmodified update will probably fail with a duplicate key error on the business key index.