Flat file loads can be performed from either a Windows, UNIX/Linux or Hadoop connection. As with all other load types it is easier to use the drag and drop functionality to create load tables.
Flat files can also be loaded using SQL Server Integration Services (SSIS). For Flat File load instructions using SSIS, refer to the next section Loading Data from a Flat File using SSIS.

Note

Greenplum - To process Greenplum file loads via the Windows Scheduler, please ensure that the Windows Scheduler user has the all the necessary permissions including having the ssh host key saved.

The drag and drop process for flat files is as follows:

  1. Browse to the directory and file via the appropriate Connections.
  2. Double-click on the Load Table object in the left pane to create a drop target.
  3. Drag the file from the right pane and drop it into the middle pane. 
  4. The dialog below appears. Rename the file if necessary and then click the ADD button.

    Note

    The option Add meta data columns to table  is used for creating Load tables that are used in creating Data Vault objects. If this option is selected, two DSS columns ( dss_record_source  and d ss_load_date ) are included in the meta data for the table and are populated by transformations. These two DSS columns could equally be applied to other Load tables not used in a Data Vault system but are particularly important to comply with the Data Vault standards. Refer to  Data Vaults  for details.

  5. The following dialog displays for file loads from Windows, UNIX/Linux and Hadoop connections.

    Note

    • First Row is a Header- To make changes in relational database tables after a table has been defined, users can edit the First Row is a Header option in the Source tab of the relevant table.
    • Hive - For File loads into Hive tables, the First Row is a Header option is a table option. Please ensure this option is selected in the file load wizard if you want to load files with header rows. To make any changes after Hive tables have been defined, the First Row is a Header option can be found in the Storage tab of the relevant Hive table. 
  6. The load type selected in the New Table Default Load Type field in the connection Properties screen is the pre-selected option in the Load type drop-down list.
    To change the desired load type and file parsing, use the Load type and File parsing drop-down list options.

    Load type options
    • The File based load options results in a load where the bulk of the load management is handled by the scheduler. 
    • The Script based load option will make WhereScape RED generate a host script and the load table will be a Script based load. This host script is executed by the scheduler to perform the load. For further details about Script based loads, refer to Script based load.
    • The XML file load option will only be an available load option from a Windows connection. To see more details about specific XML loads, refer to XML File load.
    • The Integration Services load option will load the file via an Integration Services Package that is generated and executed dynamically at run time. This option is only available from a Windows connection. For specific details about this load option, refer to Loading Data from a Flat File using SSIS.
    • The Externally loaded option will not execute an actual load into the table but will process the actions specified in the Post Load procedure property. Any After transformations recorded against any of the columns in an Externally loaded table will also be processed.

    Post load procedures can either be manually generated from a RED provided procedure outline or generated leveraging a RED template—refer to Rebuilding Update Procedures for details.

    Note

    For custom database targets, the option to use Post Load Scripts is provided as an alternative to post load procedure. Post load scripts can be generated using a RED template—refer to Rebuilding Update Procedures for details.


    File parsing options
    • Single data column - with this option, the majority of the work in terms of parsing the file must occur in a subsequent procedure within the data warehouse. The data is dumped into a single column. The task of coding a procedure to parse the data must then be undertaken. Three columns are created under Oracle. These include the data column, a sequence column (row_sequence) and the file name column (row_file_name). The file name and sequence columns can be deleted if they are not required for a File based load.
    • Columns parsed - with this option, WhereScape RED will attempt to parse the columns. You will be asked for details and for the column delimiter. You then step through the columns providing names and data types. WhereScape RED attempts to guess the data type, but it needs to be checked and the field length will probably need to be adjusted.

      Note

      • DB2 - DB2 databases do not support data import from files with header row.
      • The Decimal Code button will show the decimal value of each character in the lines retrieved from the source file. These decimal codes will be shown below each line and are green.
  7. Once the selection on the screen above is completed, a screen displays which enable the breakdown of the source data into columns. If no delimiter is entered, then width based parsing is assumed and an addition width size is prompted for.

    Note

    • If no delimiter is entered, then width based parsing is assumed and an addition width size is prompted.
    • Teradata - If this is a Fixed Width file loaded via TPT, the source file format can be specified later in the File Load Source Screen of the Load table's Properties window
    The following screen is an example of the file parsing technique.
    Use the Back button to revert to the previous column if an incorrect width or delimiter is entered.

Conversion 

Oracle

During the parsing of the columns an Oracle SQL*Loader conversion string can be used and is required in the case of dates for Oracle. Refer to the Oracle Sql loader manual for syntax. 

Any Oracle function can be used if enclosed in quotes and with the column name prefixed by a colon. For example: when loading a column called 'product_name', use the following syntax to bring over only the first 30 characters:

product_name "substr(:product_name,1,30)"

A special variable %FILE_NAME% can be used in an Oracle File based load. This will be substituted with the actual source file name.
To load the full file name (including the path) into a field, enter this transformation:

"RTRIM('%FILE_NAME%')"

To remove the path from the full file name on a UNIX or Linux system, enter this transformation:

"SUBSTR('%FILE_NAME%',INSTR('%FILE_NAME%','/',-1,1)+1)"

SQL

After Load transformations can be processed. The conversion string can be entered in the relevant column conversion field during the parsing of the columns, however, to process the column conversion, users will need to do the following after the table is created and loaded:

  1. Go into the Properties of the loaded table's relevant column(s) by double-clicking on the column(s) in the middle pane.
  2. Click the Transformation tab.
  3. Select the After load option in the Transformation Stage drop-down list.

Recreate the load table.

Note

If running from the Windows scheduler, then a wildcard must be used in the file name if you wish to use the %FILE_NAME% variable. If a wild card is not used, then the literal file name appears enclosed in double quotes. When a wild card is used, Windows looks up the file and returns the actual file name with no enclosing double quotes.

Understanding SQL*Loader

When using either type of loading method SQL*Loader is called to perform the actual load. There is therefore a requirement that the executable called 'sqlldr' be in the path.
Under UNIX this means that the UNIX user must be able to execute 'sqlldr' from the command prompt under a normal log on.
Under Windows 'sqlldr' must execute when the command is issued from a DOS prompt.
In both the Windows and UNIX environments sqlldr returns a result code indicating if the load worked fully, partially or failed. WhereScape RED makes use of this returned code to report on the results of the load.

  • No labels