This topic describes in greater detail the connection properties, as they apply to Hadoop connections.
Hadoop as a source works only with connections to Hadoop from which users can only do flat file loads. The connection must be set via a Secure Shell (SSH) protocol.
For Hadoop script loads from Oracle using Oracle's Big Data Connectors, refer to Hadoop using Oracle's Big Data Connectors for details.
Note
WhereScape RED only fully supports HDFS as the underlying file system.
Tip
When the Big Data Adapter Settings are populated in Hadoop connections, RED can load data from Hadoop into Hive and/or Datawarehouse tables and also perform loads from Hadoop directly into the Datawarehouse using Sqoop through WhereScape RED's Big Data Adapter (BDA).For more information about these settings, refer to the Big Data Adapter Settings fields description below and also refer to Connections to the Data Warehouse/Metadata Repository, Configuring your database for use by BDA and Apache Sqoop Load.
Note
If the Hadoop connection returns a blank screen or an error message in the Results pane after the connection is browsed, take necessary action through the Server (SSH) tab next to the main Builder and Scheduler tabs. This tab is displayed after browsing the UNIX connection.
Sample Hadoop Connection screen:
General
Options | Description |
---|---|
Connection Name | Name used to label the connection within WhereScape RED. |
Connection Type | Indicates the connection source type or the connection method, such as Database, ODBC, Windows, Unix. Select the Hadoop connection type. |
Apache Hadoop
Options | Description |
---|---|
UNIX/Linux Host Name | IP address or host name that identifies the Hadoop server. |
Script Shell | Path to the POSIX-compliant UNIX/Linux shell to use for generated scripts. For UNIX hosts, set to /bin/ksh. For Linux hosts set to /bin/sh. |
Work Directory | Directory used by WhereScape RED to create temporary files for minimal logged extracts. The directory must exist and allow write access. There must be a different work directory for each WhereScape RED Scheduler running on the same machine to avoid file conflicts. Typically /tmp or a sub-directory of the UNIX user is used. |
Database ID | Database Identifier (e.g. Oracle SID or TNS Name, Teradata TDPID) or Database Name (e.g. as in DB2 or SQL Server). |
Database Server/Home Directory | Optional to specify the Database Home Directory if it is different from the standard home directory. |
Connection Protocol | Telnet or Secure Shell (SSH) protocol to use to connect to the Hadoop machine. For SSH, the Secure Shell (SSH) Command property is enabled to specify how to connect. |
Secure Shell (SSH) Command | Command to execute to connect to a Hadoop machine using the Secure Shell (SSH) protocol, such as C:\putty\plink.exe -ssh some_host_name. |
Pre-Login Action, Login Prompt, Password Prompt, Post-Login Action, and Command Prompt. | These fields are only used to create a Telnet connection to the host machine. WhereScape RED uses the Telnet connection in the drag and drop functionality. They are not used in the actual production running of the Data Warehouse, and is only necessary if you wish to use the drag and drop functionality. |
Pre-Login Action | Response or command to send BEFORE logging in to the host machine. Typically this is NOT necessary but it can be used to indicate that the host Login Prompt is preceded by a line-feed (\n). However, it is preferable that the host login displays the Login Prompt without anything preceding it. [Optional] |
Login Prompt | The host login prompt, or the tail end of the login prompt, e.g. ogin as: |
Password Prompt | The host password prompt, or the tail end of the password prompt, e.g. ssword: |
Post-Login Action | Not often used but may be necessary to respond to a login question. It is preferable that the host login goes straight to the command prompt. |
Command Prompt | Enter the UNIX/Linux command prompt, or the tail end of that prompt, typically >. Note To ascertain some of the above fields, you have to log in to the UNIX system. |
Big Data Adapter Settings | Set the two fields below to enable RED to communicate with BDA and enable loading data from Hadoop into Hive and/or into data warehouse tables using Sqoop. For further information about setting these fields, refer to Connections to the Data Warehouse/Metadata Repository and Configuring the BDA Server/Configuring your database for use by BDA for details. |
Big Data Adapter Host | Host machine on which the Big Data Adapter is running its web-server. |
Big Data Adapter Port | Port that Tomcat is running. Default is 8080. |
Hadoop Connectors | Only available for Oracle databases using Oracle's Big Data Connectors. Refer to Hadoop using Oracle's Big Data Connectors for details. |
Credentials
Options | Description |
---|---|
UNIX/Linux User ID | User Account to login to the UNIX/Linux Host. |
UNIX/Linux User Password | Password to login to the UNIX/Linux Host. |
DSS User ID | Database user to connect to the WhereScape RED metadata repository. |
DSS User Password | Database password to connect to the WhereScape RED metadata repository. |
Other
Options | Description |
---|---|
Default Path for Browsing | Optional default Path for browser pane filter. When a path has been selected in this field, it becomes the initial point for browsing and it is also expanded on open in the right-hand browser pane. |
New Table Default Load Type | The default Load type for new tables created using this connection. Select from the Script based load, Native SSH or Externally loaded options. Note The available options in this drop-down list is configured from Tools > Options > Available Load Types. |
New Table Default Load Script Template | The default Script Template to use when a Script based load type is defined for a Load table object that is sourced from this connection. SQL Server Hadoop Native SSH Loads |
Data Type Mapping Set | XML files have been created to store mappings from one set of data types to another. Setting this field to (Default) will cause RED to automatically select the relevant mapping set; otherwise, you can choose one of the standard mapping sets from the drop-down list or create a new one. |
To test the drag and drop functionality
- From the menu strip select Browse > Source Tables
- Drill down to the area required
- Drag an item to the middle pane, (having first selected the object in the left pane).
Closing the Connection
To close the collection, right-click in the browser pane and select Close UNIX/LINUX session: