Hadoop

This topic describes in greater detail the connection properties, as they apply to Hadoop connections.
Hadoop as a source works only with connections to Hadoop from which users can only do flat file loads. The connection must be set via a Secure Shell (SSH) protocol.
For Hadoop script loads from Oracle using Oracle's Big Data Connectors, refer to Hadoop using Oracle's Big Data Connectors for details.

Note

WhereScape RED only fully supports HDFS as the underlying file system.

Tip

When the Big Data Adapter Settings are populated in Hadoop connections, RED can load data from Hadoop into Hive and/or Datawarehouse tables and also perform loads from Hadoop directly into the Datawarehouse using Sqoop through WhereScape RED's Big Data Adapter (BDA).For more information about these settings, refer to the Big Data Adapter Settings fields description below and also refer to Connections to the Data Warehouse/Metadata Repository, Configuring your database for use by BDA and Apache Sqoop Load.

Note

If the Hadoop connection returns a blank screen or an error message in the Results pane after the connection is browsed, take necessary action through the Server (SSH) tab next to the main Builder and Scheduler tabs. This tab is displayed after browsing the UNIX connection.

Sample Hadoop Connection screen:

General

Options	Description
Connection Name	Name used to label the connection within WhereScape RED.
Connection Type	Indicates the connection source type or the connection method, such as Database, ODBC, Windows, Unix. Select the Hadoop connection type.

Apache Hadoop

Options	Description
UNIX/Linux Host Name	IP address or host name that identifies the Hadoop server.
Script Shell	Path to the POSIX-compliant UNIX/Linux shell to use for generated scripts. For UNIX hosts, set to /bin/ksh. For Linux hosts set to /bin/sh. If this field is left blank, a default will be chosen based on the name of the connection and the type of database used for the WhereScape RED metadata repository.
Work Directory	Directory used by WhereScape RED to create temporary files for minimal logged extracts. The directory must exist and allow write access. There must be a different work directory for each WhereScape RED Scheduler running on the same machine to avoid file conflicts. Typically /tmp or a sub-directory of the UNIX user is used.
Database ID	Database Identifier (e.g. Oracle SID or TNS Name, Teradata TDPID) or Database Name (e.g. as in DB2 or SQL Server).
Database Server/Home Directory	Optional to specify the Database Home Directory if it is different from the standard home directory.
Connection Protocol	Telnet or Secure Shell (SSH) protocol to use to connect to the Hadoop machine. For SSH, the Secure Shell (SSH) Command property is enabled to specify how to connect.
Secure Shell (SSH) Command	Command to execute to connect to a Hadoop machine using the Secure Shell (SSH) protocol, such as C:\putty\plink.exe -ssh some_host_name.
Pre-Login Action, Login Prompt, Password Prompt, Post-Login Action, and Command Prompt.	These fields are only used to create a Telnet connection to the host machine. WhereScape RED uses the Telnet connection in the drag and drop functionality. They are not used in the actual production running of the Data Warehouse, and is only necessary if you wish to use the drag and drop functionality.
Pre-Login Action	Response or command to send BEFORE logging in to the host machine. Typically this is NOT necessary but it can be used to indicate that the host Login Prompt is preceded by a line-feed (\n). However, it is preferable that the host login displays the Login Prompt without anything preceding it. [Optional]
Login Prompt	The host login prompt, or the tail end of the login prompt, e.g. ogin as:
Password Prompt	The host password prompt, or the tail end of the password prompt, e.g. ssword:
Post-Login Action	Not often used but may be necessary to respond to a login question. It is preferable that the host login goes straight to the command prompt.
Command Prompt	Enter the UNIX/Linux command prompt, or the tail end of that prompt, typically >. Note To ascertain some of the above fields, you have to log in to the UNIX system.
Big Data Adapter Settings	Set the two fields below to enable RED to communicate with BDA and enable loading data from Hadoop into Hive and/or into data warehouse tables using Sqoop. For further information about setting these fields, refer to Connections to the Data Warehouse/Metadata Repository and Configuring the BDA Server/Configuring your database for use by BDA for details.
Big Data Adapter Host	Host machine on which the Big Data Adapter is running its web-server.
Big Data Adapter Port	Port that Tomcat is running. Default is 8080.
Hadoop Connectors	Only available for Oracle databases using Oracle's Big Data Connectors. Refer to Hadoop using Oracle's Big Data Connectors for details.

Credentials

Options	Description
UNIX/Linux User ID	User Account to login to the UNIX/Linux Host.
UNIX/Linux User Password	Password to login to the UNIX/Linux Host.
DSS User ID	Database user to connect to the WhereScape RED metadata repository.
DSS User Password	Database password to connect to the WhereScape RED metadata repository.

Other

Options	Description
Default Path for Browsing	Optional default Path for browser pane filter. When a path has been selected in this field, it becomes the initial point for browsing and it is also expanded on open in the right-hand browser pane.
New Table Default Load Type	The default Load type for new tables created using this connection. Select from the Script based load, Native SSH or Externally loaded options. Note The available options in this drop-down list is configured from Tools > Options > Available Load Types.
New Table Default Load Script Template	The default Script Template to use when a Script based load type is defined for a Load table object that is sourced from this connection. SQL Server Hadoop Native SSH Loads Please note that for SQL Server Hadoop Native SSH loads to be processed successfully, you must be running RED and the RED Scheduler on same machine as SQL Server to ensure that the files are accessible from the same path.
Data Type Mapping Set	XML files have been created to store mappings from one set of data types to another. Setting this field to (Default) will cause RED to automatically select the relevant mapping set; otherwise, you can choose one of the standard mapping sets from the drop-down list or create a new one.

To test the drag and drop functionality

From the menu strip select Browse > Source Tables
Drill down to the area required
Drag an item to the middle pane, (having first selected the object in the left pane).

Closing the Connection

To close the collection, right-click in the browser pane and select Close UNIX/LINUX session:

Content

Space Tools