Hadoop for Oracle Connectors

This topic describes in greater detail the connection properties as they apply to Hadoop connections using either Oracle's SQL Connector for Hadoop (OSCH) or Oracle's Loader for Hadoop (OLH).
If the connection to Hadoop will not be using Oracle's Big Data connectors, refer to the previous section - Hadoop. In RED, Hadoop as a source works with connections to Hadoop on UNIX/Linux from which users can do script-based loads. The connection must be set via a Secure Shell (SSH) protocol.
To know more about system prerequisites required for setting up a connection within RED using either of these connectors refer to Hadoop using Oracle's Big Data Connectors for details.

Oracle Wallet for OSCH

An Oracle Wallet must be configured and used for the OSCH connector to work.
The Oracle user + SID used in the RED Connection object must be added to the Oracle Wallet.
If the Oracle user is not added to the wallet (or if it's added against a different SID, even if equivalent to the one used in the RED Connection object) then OSCH will fail with this message: "KUP04076: file name cannot contain a path specification."

Oracle Wallet for OLH:

RED only supports OLH with the Oracle Wallet, therefore an Oracle Wallet must be used with OLH.
Although OLH does not require the use of the Oracle Wallet (unlike OSCH), the use of OLH without it is discouraged by Oracle. The Oracle user + SID used in the RED Connection object must be added to the Oracle Wallet.
If the UNIX/Linux connection returns a blank screen or an error message in the Results pane after the connection is browsed, take necessary action through the Server (SSH) tab next to the main Builder and Scheduler tabs. This tab is displayed after browsing the UNIX connection.

Sample Hadoop OSCH/OLH connection screen:

General

Options	Description
Connection Name	Name used to label the connection within WhereScape RED.
Connection Type	Indicates the connection source type or the connection method, such as Database, ODBC, Windows, Unix. Select the Hadoop connection type.

Apache Hadoop

Options	Description
UNIX/Linux Host Name	IP address or host name that identifies the Hadoop server.
Script Shell	Path to the POSIX-compliant UNIX/Linux shell to use for generated scripts. For UNIX hosts, set to /bin/ksh. For Linux hosts set to /bin/sh. If this field is left blank, a default will be chosen based on the name of the connection and the type of database used for the WhereScape RED metadata repository.
Work Directory	Windows directory used by WhereScape RED to create temporary files for minimal logged extracts. The directory must exist and allow write access. There must be a different work directory for each WhereScape RED Scheduler running on the same machine to avoid file conflicts. Typically C:\Temp or a sub-directory of C:\Temp is used.
Database ID	Database Identifier (e.g. Oracle SID or TNS Name, Teradata TDPID) or Database Name (e.g. as in DB2 or SQL Server).
Database Server/Home Directory	Optional to specify the Database Home Directory if it is different from the standard home directory.
Connection Protocol	Telnet or Secure Shell (SSH) protocol to use to connect to the UNIX/Linux machine. For SSH, the Secure Shell (SSH) Command property is enabled to specify how to connect.
Secure Shell (SSH) Command	Command to execute to connect to a Hadoop machine using the Secure Shell (SSH) protocol, such as C:\putty\plink.exe -ssh some_host_name.
Pre-Login Action, Login Prompt, Password Prompt, Post-Login Action, and Command Prompt.	These fields are only used to create a Telnet connection to the host machine. WhereScape RED uses the Telnet connection in the drag and drop functionality. They are not used in the actual production running of the Data Warehouse and is only necessary if you wish to use the drag and drop functionality.
Pre-Login Action	Response or command to send BEFORE logging in to the UNIX/Linux machine. Typically this is NOT necessary but it can be used to indicate that the UNIX/Linux Login Prompt is preceded by a line-feed (\n). However it is preferable that the UNIX/Linux login displays the Login Prompt without anything preceding it. [Optional]
Login Prompt	The UNIX login prompt, or the tail end of the login prompt, e.g. ogin as:.
Password Prompt	The UNIX password prompt, or the tail end of the password prompt, e.g. ssword:.
Post-Login Action	Not often used but may be necessary to respond to a login question. It is preferable that the UNIX login goes straight to the command prompt.
Command Prompt	Enter the UNIX/Linux command prompt, or the tail end of that prompt, typically >. Note To ascertain some of the above fields, you have to log in to the UNIX system.
Big Data Adapter Settings	When using Oracle's Big Data Connectors to load data from Hadoop into Hive, it is not required to set the BDA fields.

Hadoop Connectors

Options

Description

Connector Type

Big Data Connectors that connect Hadoop with the Datawarehouse Database. Oracle SQL Connector for HDFS and Oracle Loader for Hadoop fields are available. Enter the relevant settings for each connector type in the fields displayed.

Note

To load data using one of these two specific connectors, the relevant connector must be selected from the Hadoop Loader field in the load table's Source screen. Refer to Flat File Load - Source Screen for more details.

Oracle Loader for Hadoop

Options

Description

Hadoop Log Path

Hadoop path to the log directory where a sub directory will be created with logs for each load run. This is the Hadoop path to main log directory. A sub directory is created for each job run and is then passed to OLH – if the directory passed to OLH does not exist or is not empty this will result in a job run failure. ExampleIf "/user/oracle/hadoop/output/" is supplied in the connection object and the user then runs the job to load "hundredR.csv" file, the sub directory "/user/oracle/hadoop/output/WSL_hundredR_SEQ_RESTARTCOUNT/" will be created to store the logs, where "SEQ" is RED's sequence number and "RESTARTCOUNT" is the job's restart count.

Oracle SQL Connector for HDFS

Options	Description
'bin' Directory Object	Oracle DIRECTORY object that points to the file location where the hdfs_stream exists. Oracle Directory object for the 'bin' directory of OSCH that must be installed on the Oracle server. This is best to be set up within the home directory of the user that runs Oracle but outside of ORACLE_HOME, e.g. if the Oracle user is 'oracle' then a good choice would be '/home/oracle/osch/<install_dir>/bin'). Note The Hadoop client must be installed on the Oracle server. Oracle user must have read and execute permissions.
External Tables Directory Object	Oracle DIRECTORY object that points to the directory where OSCH location files live. Oracle Directory object for the directory where Oracle is to keep 'location files' (xml files that 'link' external table in Oracle and target file in Hadoop). This directory must be on the Oracle server, and it is best to set up within the home directory of the user that runs Oracle, but outside of ORACLE_HOME. e.g. If the Oracle user is 'oracle' then good choice would be '/home/oracle/osch/exttab'). The Oracle user should have read and write permissions.
Log Directory Object	Oracle DIRECTORY object that points to the OSCH log directory.It can point to the directory where the 'location files' are (it will point to the loader default directory if it is not specified). Oracle user should have read and write permissions.
Oracle Admin Directory Path	OS path to the directory that contains the 'tnsnames.ora' file.This is the full path to 'tnsnames.ora' – "$ORACLE_HOME/network/admin".
Oracle Wallet Directory Path	OS path to an Oracle wallet directory where the connection credential is stored. The full path to the Oracle Wallet location. It can be set to anything – e.g. "$ORACLE_HOME/network/admin" or "$ORACLE_HOME/wallets", but if the wallet is created on the Oracle server then it will likely be in Oracle ACFS when available (e.g. "/u02/app/oracle/wallet/"). Example To create an Oracle directory object named "osch_bin_path", pointing to "/home/oracle/osch/orahdfs-3.1.0/bin" directory and then grant read and execute permissions on that object to Oracle user "oschuser":CREATE DIRECTORY osch_bin_path AS '/home/oracle/osch/orahdfs-3.1.0/bin';GRANT EXECUTE ON DIRECTORY osch_bin_path TO oschuser;GRANT READ ON DIRECTORY osch_bin_path TO oschuser; Note It is possible to either provide values or provide names of UNIX environment variables in the connection object parameter values.If UNIX environment variables are used instead of actual values, note that UNIX/Linux scheduler must be restarted after any changes to these variables or those changes will not be available to any processes scheduled to run on that scheduler. Since these values should be very stable (they probably never change) this is only an issue during the environment setup—the administrator must remember to add those variables to the environment first before starting the scheduler. If the variables are ever changed then the scheduler needs to be restarted. All environment variables must start with $.

Credentials

Options	Description
UNIX/Linux User ID	User Account to login to the UNIX/Linux Host.
UNIX/Linux User Password	Password to login to the UNIX/Linux Host.
DSS User ID	Database User to connect to the WhereScape RED metadata repository. Not required. All authentication is handled through Oracle Wallet since Oracle SQL Connector for Hadoop will not work without it.
DSS User Password	Database Password to connect to the WhereScape RED metadata repository. Not required. All authentication is handled through Oracle Wallet since Oracle SQL Connector for Hadoop will not work without it. Note To perform loads from Hadoop using multiple schemas, the RED must be granted an extra set of privileges described in section 9.3 Creating an Oracle Dss User of the RED Installation Guide.

Other

Options	Description
Default Path for Browsing	Optional default Path for browser pane filter. When a path has been selected in this field, it becomes the initial point for browsing and it is also expanded on open in the right hand browser pane.
New Table Default Load Type	The default Load type for new tables created using this connection. Select from the Script based load, Native SSH or Externally loaded options.
New Table Default Load Script Template	The default Script Template to use when a Script based load type is defined for a Load table object that is sourced from this connection.
Data Type Mapping Set	XML files have been created to store mappings from one set of data types to another. Setting this field to (Default) will cause RED to automatically select the relevant mapping set; otherwise you can choose one of the standard mapping sets from the drop-down list or create a new one.
To test the drag and drop functionality	From the menu strip select Browse > Source Tables Drill down to the area required Drag an item to the middle pane, (having first selected the object in the left pane)

Closing the Connection

To close the collection, right-click in the browser pane and select Close UNIX/LINUX session:

Content

Space Tools