This topic describes in greater detail the connection properties, as they apply to Hadoop connections. 
Hadoop as a source works only with connections to Hadoop from which users can only do flat file loads. The connection must be set via a Secure Shell (SSH) protocol.
For Hadoop script loads from Oracle using Oracle's Big Data Connectors, refer to Hadoop using Oracle's Big Data Connectors for details.

Note

WhereScape RED only fully supports HDFS as the underlying file system.

Tip

When the Big Data Adapter Settings are populated in Hadoop connections, RED can load data from Hadoop into Hive and/or Datawarehouse tables and also perform loads from Hadoop directly into the Datawarehouse using Sqoop through WhereScape RED's Big Data Adapter (BDA).For more information about these settings, refer to the Big Data Adapter Settings fields description below and also refer to Connections to the Data Warehouse/Metadata RepositoryConfiguring your database for use by BDA and Apache Sqoop Load.

Note

If the Hadoop connection returns a blank screen or an error message in the Results pane after the connection is browsed, take necessary action through the Server (SSH) tab next to the main Builder and Scheduler tabs. This tab is displayed after browsing the UNIX connection.

Sample Hadoop Connection screen:

General

Options

Description

Connection Name

Name used to label the connection within WhereScape RED.

Connection Type

Indicates the connection source type or the connection method, such as Database, ODBC, Windows, Unix. Select the Hadoop connection type.

Apache Hadoop

Options

Description

UNIX/Linux Host Name

IP address or host name that identifies the Hadoop server.

Script Shell

Path to the POSIX-compliant UNIX/Linux shell to use for generated scripts. For UNIX hosts, set to /bin/ksh. For Linux hosts set to /bin/sh.
If this field is left blank, a default will be chosen based on the name of the connection and the type of database used for the WhereScape RED metadata repository.

Work Directory

Directory used by WhereScape RED to create temporary files for minimal logged extracts. The directory must exist and allow write access. There must be a different work directory for each WhereScape RED Scheduler running on the same machine to avoid file conflicts. Typically /tmp or a sub-directory of the UNIX user is used.

Database ID

Database Identifier (e.g. Oracle SID or TNS Name, Teradata TDPID) or Database Name (e.g. as in DB2 or SQL Server).

Database Server/Home Directory

Optional to specify the Database Home Directory if it is different from the standard home directory.

Connection Protocol

Telnet or Secure Shell (SSH) protocol to use to connect to the Hadoop machine. For SSH, the Secure Shell (SSH) Command property is enabled to specify how to connect.

Secure Shell (SSH) Command

Command to execute to connect to a Hadoop machine using the Secure Shell (SSH) protocol, such as C:\putty\plink.exe -ssh some_host_name.

Pre-Login Action, Login Prompt, Password Prompt, Post-Login Action, and Command Prompt.

These fields are only used to create a Telnet connection to the host machine. WhereScape RED uses the Telnet connection in the drag and drop functionality. They are not used in the actual production running of the Data Warehouse, and is only necessary if you wish to use the drag and drop functionality.

Pre-Login Action

Response or command to send BEFORE logging in to the host machine. Typically this is NOT necessary but it can be used to indicate that the host Login Prompt is preceded by a line-feed (\n). However, it is preferable that the host login displays the Login Prompt without anything preceding it. [Optional]

Login Prompt

The host login prompt, or the tail end of the login prompt, e.g. ogin as:

Password Prompt

The host password prompt, or the tail end of the password prompt, e.g. ssword:

Post-Login Action

Not often used but may be necessary to respond to a login question. It is preferable that the host login goes straight to the command prompt.

Command Prompt

Enter the UNIX/Linux command prompt, or the tail end of that prompt, typically >.

Note

To ascertain some of the above fields, you have to log in to the UNIX system.

Big Data Adapter Settings

Set the two fields below to enable RED to communicate with BDA and enable loading data from Hadoop into Hive and/or into data warehouse tables using Sqoop. For further information about setting these fields, refer to Connections to the Data Warehouse/Metadata Repository and Configuring the BDA Server/Configuring your database for use by BDA for details.

Big Data Adapter Host

Host machine on which the Big Data Adapter is running its web-server.

Big Data Adapter Port

Port that Tomcat is running. Default is 8080.

Hadoop Connectors

Only available for Oracle databases using Oracle's Big Data Connectors. Refer to Hadoop using Oracle's Big Data Connectors for details.

Credentials

Options

Description

UNIX/Linux User ID

User Account to login to the UNIX/Linux Host.

UNIX/Linux User Password

Password to login to the UNIX/Linux Host.

DSS User ID

Database user to connect to the WhereScape RED metadata repository.

DSS User Password

Database password to connect to the WhereScape RED metadata repository.

Other

Options

Description

Default Path for Browsing

Optional default Path for browser pane filter. When a path has been selected in this field, it becomes the initial point for browsing and it is also expanded on open in the right-hand browser pane.

New Table Default Load Type

The default Load type for new tables created using this connection. Select from the Script based load, Native SSH or Externally loaded options.

Note

The available options in this drop-down list is configured from Tools > Options > Available Load Types.

New Table Default Load Script Template

The default Script Template to use when a Script based load type is defined for a Load table object that is sourced from this connection.

SQL Server Hadoop Native SSH Loads 
Please note that for SQL Server Hadoop Native SSH loads to be processed successfully, you must be running RED and the RED Scheduler on same machine as SQL Server to ensure that the files are accessible from the same path. 

Data Type Mapping Set

XML files have been created to store mappings from one set of data types to another. Setting this field to (Default) will cause RED to automatically select the relevant mapping set; otherwise, you can choose one of the standard mapping sets from the drop-down list or create a new one.

To test the drag and drop functionality

  • From the menu strip select Browse > Source Tables
  • Drill down to the area required
  • Drag an item to the middle pane, (having first selected the object in the left pane).

Closing the Connection

To close the collection, right-click in the browser pane and select Close UNIX/LINUX session:

  • No labels