Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This is a guide to installing the WhereScape Enablement Pack for Databricks for WhereScape RED.

Table of Contents


Table of Contents
maxLevel2
minLevel2


Include Page

...

Prerequisites For PostgreSQL Metadata

...

Before you begin the following prerequisites must be met:

  • Create Database and ODBC DSN  :
    • Supported* version of PostgreSQL (PostgreSQL 12 or higher)
      • A database to house the RED Metadata Repository.
      • A database for the Range Table DB (Optional)
      • A database to house scheduler (Optional)
  • Software Installations
    • WhereScape RED10 with valid license key entered and EULA accepted
    • WhereScape Enablement Pack for target database version RED10
  • Windows Powershell (64 bit) version 4 or higher
    • To check Windows Powershell Version:
      • Run below command in Windows Powershell

Get-Host|Select-Object Version

      • Run below command in Command Prompt

powershell $psversiontable

  • Run the following command using PowerShell
      • The security protocol TLS 1.0 and 1.1 used by PowerShell to communicate with PowerShell gallery has deprecated and TLS 1.2 has been made mandatory
Wiki Markup
\[Net.ServicePointManager\]::SecurityProtocol = \[Net.ServicePointManager\]::SecurityProtocol -bor \[Net.SecurityProtocolType\]::Tls12
Register-PSRepository -Default -Verbose
Set-PSRepository -Name "PSGallery" -InstallationPolicy Trusted
      • Progress bar placeholder info line

Install-Module -Name PoshProgressBar -SkipPublisherCheck -Force

...

-WIP
Prerequisites For PostgreSQL Metadata -WIP

Prerequisites For Target Database Databricks

  • Before you begin the following prerequisites must be met:
  • Create Database and ODBC

...

...

    • Add Python 3.8 to PATH

...

    • from the installation Window
    • Pip Manager Install with the command: python -m pip install --upgrade pip

Enablement Pack Setup Scripts

The Enablement Pack Install process is entirely driven by scripts. The below table outlines these scripts, their purpose and if "Run as Administrator" is required. 

1

Setup_Enablement_Pack.ps1

Setup and configure a RED Metadata Repository for target database
If RED repository exists then updates the repository with 1.Templates 2.Scripts 3.Extended Properties 4.Datatype Mappings 5.UI Configurations 

             Yes           

New and Existing installations

2

install_WslPython_Modules.bat

Installs or updates WslPython Modules and required Python libraries on this machine.

             Yes              

New and Existing installations

3

import_python_templates.ps1

Imports or updates the Python Templates  to a RED Metadata Repository. Also includes any Script Imports

             No*           

Existing installations

4

set_default_templates.ps1

Applies the RED Connection defaults in a RED Metadata Repository for Python or Powershell templates.

             No*           

Existing installations

...

Step-By-Step Guide

Setup and configure RED Metadata Repository

...

Connection UI Config

Load UI Config

Amazon S3

Load From Amazon S3

Azure Data Lake  Storage Gen2

Load From Azure Data Lake  Storage Gen2

Google Cloud

Load From Google Cloud

...

Install or Update WhereScape Python Templates (For Existing Installations)

...

Set Connection defaults for a Template Set (For Existing Installations)

...


Include Page
Install Guides-Common
Install Guides-Common

Post Install Steps - Optional

If you used the script

...

Setup_Enablement_Pack.ps1

...

then the following optional post-install steps are available

Configure Connections

There were Three connections added that will optionally require your attention:

  1. Connection: 'Database Source System' - this connection was

...

  1. set up as an example source connection,
    • open it's properties and set it up for a source DB in your environment
    • or you can remove it if not required.
  2. Connection: 'Databricks' - this connection was

...

  1. set as per parameters provided in script 1
    1. open properties and check

...

    1. ifthe Database ID is

...

    1. set correctly
    2. open

...

    1. its properties and check the extended properties tab, set it up for HTTP_PATH, SERVER_HOSTNAME, DB_ACCESS_TOKEN, and DBFS_TMP

Enable Script Launcher Toolbar

...

Several stand-alone scripts

...

provide some features such as "Ranged Loading", these scripts have been added to the Script Launcher menu but you will need to enable the menu toolbar item to see them.
To enable the Script Launcher menu in RED: Select menu item

...

View>Toolbars>Script Launcher

Source Enablement Pack Support

Source Pack Name

Supported by Databricks

Supported Features

Prerequisites

Amazon S3

Yes

Bulk load to Databricks

Include the Access Key and Secret Key in the Amazon S3 Cloud Parser Connection for S3. For guidance on obtaining these credentials, please refer to the relevant documentation: {+}https://docs.aws.amazon.com/IAM/latest/UserGuide/security-creds.html+Image Modified

Azure Data Lake Storage Gen2

Yes

Bulk load to Databricks

Add the SAS Token to the ADLG2 Cloud Parser Connection. Refer to {+}https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview+Image Modified for information on SAS Tokens.

Google Cloud Storage

Yes

Bulk load to Databricks

Step 1: Service Account Setup

  1. Create a service account in Google Cloud Console.
  2. Navigate to IAM and Admin > Service Accounts.
  3. Click + CREATE SERVICE ACCOUNT, enter details, and create the account.

    Step 2: Generate Access Key for GCS Bucket
  4. In the service accounts list, click the created account.
  5. In the Keys section, click ADD KEY > Create new key.
  6. Choose JSON key type and click CREATE to download the key file.

    Step 3: Bucket Configuration
  7. Configure bucket details in Google Cloud Console.
  8. Navigate to the Permissions tab and click ADD next to Permissions.
  9. Grant Storage Admin permission to the service account on the bucket.
  10. Click SAVE.

    Step 4: Databricks Cluster Configuration
  11. In the Spark Config tab, set the keys using the following snippet:

    spark.hadoop.google.cloud.auth.service.account.enable true
    spark.hadoop.fs.gs.auth.service.account.email <client-email>
    spark.hadoop.fs.gs.project.id <project-id>
    spark.hadoop.fs.gs.auth.service.account.private.key secrets/scope/gsa_private_key
    spark.hadoop.fs.gs.auth.service.account.private.key.id secrets/scope/gsa_private_key_id
    Replace `<client-email>` and `<project-id>` with values from the downloaded JSON key.
    For detailed documentation, refer to:
    {+}https://learn.microsoft.com/en-us/azure/databricks/storage/gcs+Image Modified

Windows Parser

1.  CSV
2.  Excel
3.  JSON
4.  XML
5.  AVRO
6.  ORC
7.  PARQUET

Load Template, Source Properties will have the option to select parser type to load the files.

Refer to Windows Parser Guide

Include Page
Troubleshooting and Tips

...

Run As Administrator

...

-

...

Databricks

...

1.Add a system variable DATABRICKS_CONFIG_FILE to point to a location that permits you to configure the databricks-cli.
Image Removed
2.open command prompt and configure databricks-cli using "databricks configure --aad-token".
Image Removed
3.On running this command, config file should be created in the location specified in the config file system variable
Image Removed

Windows Powershell Script Execution

On some systems Windows Powershell script execution is disabled by default. There are a number of workarounds for this which can be found by searching the term "Powershell Execution Policy".
Here is the most common workaround which WhereScape suggests, which does not permanently change the execution rights:
Start a Windows CMD prompt as Administrator, change directory to your script directory and run the WhereScape Powershell scripts with this command:

  • cmd:>Powershell -ExecutionPolicy Bypass -File .\<script_file_name.ps1>

Restarting failed scripts

Some of the setup scripts will track each step and output the step number when there is a failure. To restart from the failed step (or to skip the step) provide the parameter "-startAtStep <step number>" to the script.
Example: 
Powershell -ExecutionPolicy Bypass -File .\<script_file_name.ps1> -startAtStep 123
Tip: to avoid having to provide all the parameters again you can copy the full command line with parameters from the first "INFO" message from the beginning of the console output.

Python requirements for offline install

Additionally to the base Python installation being required, the WhereScape Python Template set also requires certain additional Python libraries. The install scripts uses the PIP (package manager) to download these libraries, however for offline installs you will need to install the required libraries yourself.
Required Python libraries/add-ons:

  • pywin32-ctypes
  • python-tds
  • pywin32
  • glob2
  • gzip-reader
  • regex
  • pyodbc
  • databricks
  • databricks-sql-connector

If a valid RED installation can not be found

...

Troubleshooting and Tips - Databricks

Attachments:

...