WhereScape Enablement Pack for Databricks - RED 10
This is a guide to installing the WhereScape Enablement Pack for Databricks for WhereScape RED10
- Prerequisites For PostgreSQL Metadata
- Prerequisites For Databricks Target Database
- Installation Through Setup Wizard
- Upgrade Of Existing Repository
- Post Install Steps – Optional
- Source Enablement Pack Support
- Troubleshooting and Tips
Prerequisites For PostgreSQL Metadata
Before you begin the following prerequisites must be met:
- Create Database and ODBC DSN :
- Supported* version of PostgreSQL (PostgreSQL 12 or higher)
- A database to house the RED Metadata Repository.
- A database for the Range Table DB (Optional)
- A database to house scheduler (Optional)
- Supported* version of PostgreSQL (PostgreSQL 12 or higher)
- Software Installations
- WhereScape RED10 with valid license key entered and EULA accepted
- WhereScape Enablement Pack for target database version RED10
- Windows Powershell (64 bit) version 4 or higher
- To check Windows Powershell Version:
- Run below command in Windows Powershell
Get-Host|Select-Object Version
- Run below command in Windows Powershell
- To check Windows Powershell Version:
- Run below command in Command Prompt
powershell $psversiontable
- Run below command in Command Prompt
- Run the following command using PowerShell
- The security protocol TLS 1.0 and 1.1 used by PowerShell to communicate with PowerShell gallery has deprecated and TLS 1.2 has been made mandatory
\[Net.ServicePointManager\]::SecurityProtocol = \[Net.ServicePointManager\]::SecurityProtocol -bor \[Net.SecurityProtocolType\]::Tls12 Register-PSRepository -Default -Verbose Set-PSRepository -Name "PSGallery" -InstallationPolicy Trusted
- The security protocol TLS 1.0 and 1.1 used by PowerShell to communicate with PowerShell gallery has deprecated and TLS 1.2 has been made mandatory
- Progress bar placeholder info line
Install-Module -Name PoshProgressBar -SkipPublisherCheck -Force
- RED supports the following versions for the metadata repository: PostgreSQL 12 or higher
Prerequisites For Databricks Target Database
Before you begin the following prerequisites must be met:
- Create Database and ODBC DSN :
- Databricks (ODBC driver version 2.7.5 or higher(64-bit))
- At least one schema available to use as a RED Data Warehouse Target
- Databricks (ODBC driver version 2.7.5 or higher(64-bit))
- Software Installations
- Databricks CLI - Refer to Setup Guide Databricks CLI Setup
- Python 3.8 or higher
- Select "Add Python 3.8 to PATH" from installation Window
- Pip Manager Install with command :
python -m pip install --upgrade pip
Installation Through Setup Wizard
Run Setup Wizard as administrator
Create new repository or upgrade already existing repository.
Select the created ODBC DSN, input login details and then select "Validate". Press Next
Select the directory that contains unzipped Enablement Pack for installation. Press Next
Using the check boxed list, include or exclude the components that are to be installed. Press Next
Configure a target connection (example, Data Warehouse) and its target locations.
Validate and press ADD.
When done, press ADD and then Press Next to advance.
Configure a data source connection (optional) and its target locations. Validate and press ADD. Press Next to advance.
Review the installation summary and click Install
Clicking on the View Logs will take to the installation log. Click on Finish once the installation is completed successfully.
Login to WhereScape RED.
There is a post-install script that will run at the first login to RED10 to complete the post-setup wizard installation process.
You will be directed to below PowerShell window which will give brief explanation about post installation process.
Press OK to start the post installation. If pressed Cancel installation will stop and user will be directed to RED.
The user will be directed to the window below, where they have to select the target connection to be configured. Additionally, by deselecting the provided options, the user can choose not to install a particular option.
You will be directed to below PowerShell window. Provide the directory that contains unzipped Enablement Pack.
Press OK
The progress bar will show the post installation progress.
User will have to choose the schema for the target setting that were provided. One pop up will come for setting default target schema for Date Dimension.
After selecting the target schema progress bar will show the progress for the installation and once it's completed, you will get the below pop up.
After pressing OK RED10 will open automatically.
User will need to refresh the All Objects tree once.
Upgrade of Existing Repository
For upgrade of existing repository
- From host script set script type of wsl_post_install_enablement_pack as Auto Execute - PowerShell Script
Important Upgrade Notes
If RED upgrade the repository option is chosen.
This enablement pack will overwrite any existing Source Enablement Pack UI Configs:
Connection UI Config | Load UI Config |
Amazon S3 | Load From Amazon S3 |
Azure Data Lake Storage Gen2 | Load From Azure Data Lake Storage Gen2 |
Google Cloud | Load From Google Cloud |
To ensure existing Source Enablement Pack connections and associated Load Tables continue to browse and load:
Go into UI Configuration Maintenance in RED before installing this Enablement Pack and rename the affected UI Configurations. While the updated Load Template will work with previous Source Enablement Pack's we recommend moving these previous versions of Load Tables to newly created Parser-based connections following this install. The earlier versions of the Source Enablement Pack will be deprecated following this release.
Post Install Steps – Optional
If you used the script Setup Wizard for installation, the following optional post-install steps are available.
Configure Connections
These connections added that will optionally require your attention:
- Connection: Data Warehouse ('Databricks')- This connection was setup as per parameters provided in Setup Wizard
- open properties and check if Database ID is setup correctly
- open properties and check the extended properties tab, set it up for HTTP_PATH, SERVER_HOSTNAME, DB_ACCESS_TOKEN, and DBFS_TMP
- Connection: 'Database Source System' - this connection was setup as an example source connection,
- open its properties and set it up for a source DB in your environment
- or you can remove it if not required
Enable Script Launcher Toolbar
Several stand-alone scripts provide some features such as "Ranged Loading", these scripts have been added to the Script Launcher menu but you will need to enable the menu toolbar item to see them.
To enable the Script Launcher menu in RED, select Home>Script Launcher
Source Enablement Pack Support
Source Pack Name | Supported By Databricks | Supported Features | Prerequisites |
Amazon S3 | Yes | Bulk load to Databricks | Include the Access Key and Secret Key in the Amazon S3 Cloud Parser Connection for S3. For guidance on obtaining these credentials, please refer to the relevant documentation: {+}https://docs.aws.amazon.com/IAM/latest/UserGuide/security-creds.html+ |
Azure Data Lake Storage Gen2 | Yes | Bulk load to Databricks | Add the SAS Token to the ADLG2 Cloud Parser Connection. Refer to {+}https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview+ |
Google Cloud Storage | Yes | Bulk load to Databricks | Step 1: Service Account Setup
|
Windows Parser | 1. CSV | Load Template, Source Properties will have option to select parser type to load the files. | Refer to Windows Parser Guide. |
Troubleshooting and Tips
Run As Administrator
Press the Windows Key on your keyboard and start typing cmd.exe, when the cmd.exe icon shows up in the search list right click it to bring up the context menu, select "Run As Administrator"
Now you have an admin prompt navigate to to the folder where you have unpacked your WhereScape Red Enablement Pack using the 'cd' command:C:\Windows\system32> cd <full path to the unpacked folder>
Run batch (.bat) scripts from the administrator prompt by simply typing the name at the prompt and clicking enter, for example:C:\temp\EnablementPack>install_WslPython_Modules.bat
Run Powershell (.ps1) scripts from the administrator prompt by typing the Powershell run script command, for example:C:\temp\EnablementPack>Powershell -ExecutionPolicy Bypass -File .\Setup_Enablement_Pack.ps1
If you can not bypass the Powershell execution policy due to group policies you can instead try "-ExecutionPolicy RemoteSigned" which should allow unsigned local scripts.
Setting Up Databricks Configuration
- Add a system variable
DATABRICKS_CONFIG_FILE
to point to a location that permits you to configure the databricks-cli. - Open command prompt and configure
databricks-cli
usingdatabricks configure --aad-token
. - On running this command, the config file should be created in the location specified in the config file system variable
Windows Powershell Script Execution
On some systems Windows Powershell script execution is disabled by default. There are several workarounds for this which can be found by searching the term "Powershell Execution Policy".
Here is the most common workaround that WhereScape suggests, which does not permanently change the execution rights:
Start a Windows CMD prompt as Administrator, change the directory to your script directory, and run the WhereScape Powershell scripts with this command:
- cmd:
>Powershell -ExecutionPolicy Bypass -File .\<script_file_name.ps1>
Re-install Python Libraries
Press the Windows Key on your keyboard and start typing cmd.exe, when the cmd.exe icon shows up in the search list right click it to bring up the context menu, select "Run As Administrator"
Now you have an admin prompt navigate to to the folder where you have unpacked your WhereScape Red Enablement Pack to using the 'cd' command:C:\Windows\system32> cd <full path to the unpacked folder>
Run batch (.bat) scripts from the administrator prompt by simply typing the name at the prompt and clicking enter, for example:C:\temp\EnablementPack>uninstall_WslPython_Modules.bat
For the installation of Python libraries, there are two methods:
- Method 1
Press the Windows Key on your keyboard and start typing cmd.exe, when the cmd.exe icon shows up in the search list right click it to bring up the context menu, select "Run As Administrator"
Now you have an admin prompt navigate to to the folder where you have unpacked your WhereScape Red Enablement Pack using the 'cd' command:C:\Windows\system32> cd <full path to the unpacked folder>
Run batch (.bat) scripts from the administrator prompt by simply typing the name at the prompt and clicking enter, for example:C:\temp\EnablementPack>install_WslPython_Modules.bat
- Method 2
Press the Windows Key on your keyboard and start typing cmd.exe, when the cmd.exe icon shows up in the search list right click it to bring up the context menu, select "Run As Administrator"
Now you have an admin prompt navigate to to the folder where you have unpacked your WhereScape Red Enablement Pack using the 'cd' command:C:\Windows\system32> cd <full path to the unpacked folder>
Run the below commandpython -m pip install -r requirements.txt
For upgrade of existing repository
In the upgrade of the existing repository if the user gets the above error then it means the script type of wsl_post_install_enablement_pack is set to PowerShell(64-bit) change the script type to Auto Execute-PowerShell before upgrade or manually run the wsl_post_install_enablement_pack
script from the host script from RED after the upgrade.
If a valid RED installation can not be found
If you have RED 10.x or higher installed but the script (Setup_Enablement_Pack.ps1) fails to find it on your system then you are most likely running PowerShell (x86) version which does not show installed 64-bit apps by default. Please open a 64-bit version of Powershell instead and re-run the script