The Apache Sqoop load type enables loading data directly from Hive/HDFS to any (non-Hive) targets. This load method provides a non-database specific method of loading data into any database target, which is especially helpful when loading data directly from HDFS or Hive to SQL Server and Oracle target databases.
Apache Sqoop load from a Hive connection to a Hive target are not supported.
To load tables directly from HDFS/Hive into SQL Server and Oracle target databases:

  1. Ensure the relevant Data Warehouse connection has the following fields set:
    • JDBC Connection string (JDBC URL)
    • JDBC Driver Class Name
    • Omit Sqoop Driver Option - tick this check-box for loads into Oracle target databases
    • JDBC User ID
    • JDBC Password
  2. When loading from Hadoop connections, please ensure the Hadoop connection has its  BDA server host and port fields set, in addition to Hive connections.
     
  3. Browse the desired Hadoop/Hive connection.
  4. Drag and Drop the table from the Hadoop/Hive connection displayed in the right pane into the middle pane.
    • Change the table name if necessary, and select the relevant target location to place the table from the drop-down list.

    Note

    The option Add meta data columns to table is used for creating Load tables that are used in creating Data Vault objects.  If this option is selected, two DSS columns (dss_record_source and dss_load_date) are included in the meta data for the table and are populated by transformations. These two DSS columns could equally be applied to other Load tables not used in a Data Vault system, but are particularly important to comply with the Data Vault standards. Refer to Data Vaults for details.

  5. Select Apache Hadoop Load from the Load Type drop-down list.
  6. Click the Source tab to add any Apache Sqoop specific options:
     

    Fields

    Description

    Temporary HDFS Directory

    Loading from a Hive source to the data warehouse is implemented in two steps. First, the data is extracted from the Hive table into a temporary HDFS directory. Then, this temporary directory is loaded using "sqoop export". The location of the temporary directory can be configured in RED on the source tab of the load table. When this field is left blank, the default is "/tmp".

    Generic Hadoop Arguments

    This field enables you to add additional arguments just after the Sqoop command keyword, in this case it is import, in the Sqoop command line.

    Additional Tool Arguments

    This field enables you to add additional arguments after the generated Sqoop command line.

  7. Click Create and Load to create and load the table.
  • No labels