Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
  1. RED Enablement Packs
  2. Enablement Pack - Home
  3. Install Guides

RED Enablement Packs : Install Guide - SetupWizard - Databricks

Created by Rekha Singh, last modified by Nikhil Bhamere on Dec 11, 2023

WhereScape Enablement Pack for Databricks - RED 10

This is a guide to installing the WhereScape Enablement Pack for Databricks for WhereScape

...

RED 10


Table of Contents
maxLevel2
minLevel2


Include Page
Prerequisites For PostgreSQL Metadata -WIP
Prerequisites For PostgreSQL Metadata -WIP


 Prerequisites For Databricks Target Database

...

Before you begin the following prerequisites must be met:

  • Create Database and ODBC

...

Windows Parser

...

1.  CSV
2.  Excel
3.  JSON
4.  XML
5.  AVRO
6.  ORC
7.  PARQUET

...

Load Template, Source Properties will have option to select parser type to load the files.

...

Refer to Windows Parser Guide.

Troubleshooting and Tips

Run As Administrator

Press the Windows Key on your keyboard and start typing cmd.exe, when the cmd.exe icon shows up in the search list right click it to bring up the context menu, select "Run As Administrator"
Now you have an admin prompt navigate to to the folder where you have unpacked your WhereScape Red Enablement Pack to using the 'cd' command:
C:\Windows\system32> cd <full path to the unpacked folder> 
Run batch (.bat) scripts from the administrator prompt by simply typing the name at the prompt and hit enter, for example:
C:\temp\EnablementPack>install_WslPython_Modules.bat
Run Powershell (.ps1) scripts from the administrator prompt by typing the Powershell run script command, for example:
C:\temp\EnablementPack>Powershell -ExecutionPolicy Bypass -File .\Setup_Enablement_Pack.ps1
Notes: In the event you can not bypass the Powershell execution policy due to group policies you can instead try "-ExecutionPolicy RemoteSigned" which should allow unsigned local scripts.

Setting Up Databricks Configuration

1.Add a system variable DATABRICKS_CONFIG_FILE to point to a location that permits you to configure the databricks-cli.
Image Removed
2.open command prompt and configure databricks-cli using "databricks configure --aad-token".
Image Removed
3.On running this command, config file should be created in the location specified in the config file system variable
Image Removed

Windows Powershell Script Execution

On some systems Windows Powershell script execution is disabled by default. There are a number of workarounds for this which can be found by searching the term "Powershell Execution Policy".
Here is the most common workaround which WhereScape suggests, which does not permanently change the execution rights:
Start a Windows CMD prompt as Administrator, change directory to your script directory and run the WhereScape Powershell scripts with this command:

  • cmd:>Powershell -ExecutionPolicy Bypass -File .\<script_file_name.ps1>

...

  • DSN:

    ...

    • A database to house the RED Metadata Repository.
    • A database for the Range Table DB (Optional)
    • A database to house scheduler (Optional)

    ...

    • WhereScape RED10 with valid license key entered and EULA accepted
    • WhereScape Enablement Pack for target database version RED10

    ...

    • To check Windows Powershell Version:
      • Run below command in Windows Powershell

    Get-Host|Select-Object Version

        • Run below command in Command Prompt

    powershell $psversiontable

    • Run the following command using PowerShell
        • The security protocol TLS 1.0 and 1.1 used by PowerShell to communicate with PowerShell gallery has deprecated and TLS 1.2 has been made mandatory

    Wiki Markup
    \[Net.ServicePointManager\]::SecurityProtocol = \[Net.ServicePointManager\]::SecurityProtocol -bor \[Net.SecurityProtocolType\]::Tls12
    Register-PSRepository -Default -Verbose
    Set-PSRepository -Name "PSGallery" -InstallationPolicy Trusted

        • Progress bar placeholder info line

    Install-Module -Name PoshProgressBar -SkipPublisherCheck -Force

    ...

    Prerequisites For Databricks Target Database

    ...

    ...

    • Databricks CLI - Refer to Setup Guide Databricks CLI Setup 

    ...

    • Select "Add Python 3.8 to PATH" from installation Window
    • Pip Manager Install with command : python -m pip install --upgrade pip

    Installation Through Setup Wizard

    Run Setup Wizard as administrator
    Image Removed
    Create new repository or upgrade already existing repository.
    Image Removed
    Select the created ODBC DSN, input login details and then select "Validate". Press Next
    Image Removed
    Select the directory that contains unzipped Enablement Pack for installation. Press Next
    Image Removed
    Using the check boxed list, include or exclude the components that are to be installed. Press Next
    Image Removed
    Configure a target connection (example, Data Warehouse) and its target locations.
    Image Removed
    Validate and press ADD.
    Image Removed
    When done, press ADD and then Press Next to advance.
    Image Removed
    Configure a data source connection (optional) and its target locations. Validate and press ADD. Press Next to advance.
    Image Removed
    Review the installation summary and click Install
    Image Removed
    Clicking on the View Logs will take to the installation log. Click on Finish once the installation is completed successfully.
    Image Removed
    Login to WhereScape RED.
    Image Removed Note: There is a post-install script that will run at the first login to RED10 to complete the post setup wizard installation process. You will be directed to below PowerShell window which will give brief explanation about post installation process.
    Image Removed
    Press OK to start the post installation. If pressed Cancel installation will stop and user will be directed to RED.
    The user will be directed to the window below, where they have to select the target connection to be configured. Additionally, by deselecting the provided options, the user can choose not to install a particular option.
    Image Removed
    You will be directed to below PowerShell window. Provide the directory that contains unzipped Enablement Pack.
    Image Removed
    Press OK
    The progress bar will show the post installation progress.
    Image Removed
    User will have to choose the schema for the target setting that were provided. One pop up will come for setting default target schema for Date Dimension.
    Image Removed
    After selecting the target schema progress bar will show the progress for the installation and once it's completed, you will get the below pop up.
    Image Removed
    After pressing OK RED10 will open automatically.
    Image Removed
    User will need to refresh the All Objects tree once.
    Image Removed

    Upgrade Of Existing Repository

    For upgrade of existing repository

    • From host script set script type of wsl_post_install_enablement_pack as Auto Execute - PowerShell Script

        Image Removed
    Important Upgrade Notes
    If RED upgrade the repository option is chosen.
    This enablement pack will overwrite any existing Source Enablement Pack UI Configs:

    Connection UI Config

    Load UI Config

    Amazon S3

    Load From Amazon S3

    Azure Data Lake  Storage Gen2

    Load From Azure Data Lake  Storage Gen2

    Google Cloud

    Load From Google Cloud

    To ensure existing Source Enablement Pack connections and associated Load Tables continue to browse and load:
    Go into UI Configuration Maintenance in RED prior to installing this  Enablement Pack and rename the affected UI Configurations. While the updated Load Template will work with previous Source Enablement Pack's we recommend moving these previous versions of Load Tables to newly created Parser based connections following this install. The earlier versions of the Source Enablement Pack will be deprecated following this release.

    Post Install Steps – Optional

    If you used the script Setup Wizard for installation then the following optional post install steps are available.
    Configure Connections
    These connections added that will optionally require your attention:

    1. Connection: Data Warehouse ('Databricks')- This connection was setup as per parameters provided in Setup Wizard
      1. open properties and check if Database ID is setup correctly
      2. open properties and check extended properties tab, set it up for HTTP_PATH,SERVER_HOSTNAME,DB_ACCESS_TOKEN and DBFS_TMP
    2. Connection: 'Database Source System' - this connection was setup as an example source connection,
      1. open its properties and set it up for a source DB in your environment
      2. or you can remove it if not required

    Enable Script Launcher Toolbar

    ...

    Source Enablement Pack Support

    ...

    Source Pack Name

    ...

    Supported By Databricks

    ...

    Supported Features

    ...

    Prerequisites

    ...

    Amazon S3

    ...

    Yes

    ...

    Bulk load to Databricks

    ...

    Include the Access Key and Secret Key in the Amazon S3 Cloud Parser Connection for S3. For guidance on obtaining these credentials, please refer to the relevant documentation: {+}https://docs.aws.amazon.com/IAM/latest/UserGuide/security-creds.html+Image Removed

    ...

    Azure Data Lake Storage Gen2

    ...

    Yes

    ...

    Bulk load to Databricks

    ...

    Add the SAS Token to the ADLG2 Cloud Parser Connection. Refer to {+}https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview+Image Removed for information on SAS Tokens.

    ...

    Google Cloud Storage

    ...

    Yes

    ...

    Bulk load to Databricks

    Step 1: Service Account Setup

    ...

    • Software Installations
      • Databricks CLI - Refer to Setup Guide Databricks CLI Setup 
    • Python 3.8 or higher
      • Select "Add Python 3.8 to PATH" from the installation Window
      • Pip Manager Install with the command : python -m pip install --upgrade pip

    Installation Through Setup Wizard

    Run Setup Wizard as administrator
    Image Added
    Create a new repository or upgrade an already existing repository.
    Image Added
    Select the created ODBC DSN, input login details, and then select Validate. Click Next.
    Image Added
    Select the directory that contains the unzipped Enablement Pack for installation. Click Next.
    Image Added
    Using the checkboxed list, include or exclude the components that are to be installed. Click Next.
    Image Added
    Configure a target connection (for example, Data Warehouse) and its target locations.
    Image Added
    Validate and click Add.
    Image Added
    Click Done when completed, and click Next to continue.
    Image Added
    Configure a data source connection (optional) and its target locations. Validate and click Add. Click Next to continue.
    Image Added
    Review the installation summary and click Install
    Image Added
    Clicking View Logs will take you to the installation log. Click Finish once the installation is completed successfully.
    Image Added
    Login to WhereScape RED.
    Image Added

    Note

    There is a post-install script that will run at the first login to RED10 to complete the post-setup wizard installation process.

    You will be directed to the below PowerShell window which will give a brief explanation about the post-installation process.
    Image Added
    Click OK to start the post-installation. If pressed Cancel installation will stop and the user will be directed to RED.
    The user will be directed to the window below, where they have to select the target connection to be configured. Additionally, by deselecting the provided options, the user can choose not to install a particular option.
    Image Added
    You will be directed to the PowerShell window below. Provide the directory that contains the unzipped Enablement Pack.
    Image Added
    Click OK
    The progress bar will show the post-installation progress.
    Image Added
    Users have to choose the schema for the target setting that was provided. One pop-up will come for setting the default target schema for Date Dimension.
    Image Added 
    After selecting the target schema, the progress bar shows the progress for the installation and once completed, you will get the below pop-up.
    Image Added
    After clicking OK RED opens automatically.
    Image Added
    The user will need to refresh the All Objects tree once.
    Image Added

    Upgrade of Existing Repository

    For upgrade of an existing repository

    • From host script set the script type of wsl_post_install_enablement_pack as Auto Execute - PowerShell Script

        Image Added


    Important Upgrade Notes
    If RED upgrade the repository option is chosen.
    This enablement pack will overwrite any existing Source Enablement Pack UI Configs:

    Connection UI Config

    Load UI Config

    Amazon S3

    Load From Amazon S3

    Azure Data Lake  Storage Gen2

    Load From Azure Data Lake  Storage Gen2

    Google Cloud

    Load From Google Cloud

    To ensure existing Source Enablement Pack connections and associated Load Tables continue to browse and load:
    Go into UI Configuration Maintenance in RED before installing this  Enablement Pack and rename the affected UI Configurations. While the updated Load Template will work with previous Source Enablement Pack we recommend moving these previous versions of Load Tables to newly created Parser-based connections following this install. The earlier versions of the Source Enablement Pack will be deprecated following this release.

    Post Install Steps – Optional

    If you used the script Setup Wizard for installation, the following optional post-install steps are available.
    Configure Connections
    These connections added that will optionally require your attention:

    1. Connection: Data Warehouse ('Databricks')- This connection was set as per parameters provided in Setup Wizard
      1. open properties and check if Database ID is set correctly
      2. open properties and check the extended properties tab, set it up for HTTP_PATH, SERVER_HOSTNAME, DB_ACCESS_TOKEN, and DBFS_TMP
    2. Connection: 'Database Source System' - this connection was set up as an example source connection,
      1. open its properties and set it up for a source DB in your environment
      2. or you can remove it if not required

    Enable Script Launcher Toolbar

    Several stand-alone scripts provide some features such as "Ranged Loading", these scripts have been added to the Script Launcher menu but you will need to enable the menu toolbar item to see them.
    To enable the Script Launcher menu in RED, select Home>Script Launcher

    Source Enablement Pack Support

    Source Pack Name

    Supported By Databricks

    Supported Features

    Prerequisites

    Amazon S3

    Yes

    Bulk load to Databricks

    Include the Access Key and Secret Key in the Amazon S3 Cloud Parser Connection for S3. For guidance on obtaining these credentials, please refer to the relevant documentation: {+}https://docs.aws.amazon.com/IAM/latest/UserGuide/security-creds.html+Image Added

    Azure Data Lake Storage Gen2

    Yes

    Bulk load to Databricks

    Add the SAS Token to the ADLG2 Cloud Parser Connection. Refer to {+}https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview+Image Added for information on SAS Tokens.

    Google Cloud Storage

    Yes

    Bulk load to Databricks

    Step 1: Service Account Setup

    1. Create a service account in Google Cloud Console.
    2. Navigate to IAM and Admin > Service Accounts.
    3. Click + CREATE SERVICE ACCOUNT, enter details, and create the account.

      Step 2: Generate Access Key for GCS Bucket
    4. In the service accounts list, click the created account.
    5. In the Keys section, click ADD KEY > Create new key.
    6. Choose JSON key type and click CREATE to download the key file.

      Step 3: Bucket Configuration
    7. Configure bucket details in Google Cloud Console.
    8. Navigate to the Permissions tab and click ADD next to Permissions.
    9. Grant Storage Admin permission to the service account on the bucket.
    10. Click SAVE.

      Step 4: Databricks Cluster Configuration
    11. In the Spark Config tab, set the keys using the following snippet:

      spark.hadoop.google.cloud.auth.service.account.enable true
      spark.hadoop.fs.gs.auth.service.account.email <client-email>
      spark.hadoop.fs.gs.project.id <project-id>
      spark.hadoop.fs.gs.auth.service.account.private.key secrets/scope/gsa_private_key
      spark.hadoop.fs.gs.auth.service.account.private.key.id secrets/scope/gsa_private_key_id
      Replace `<client-email>` and `<project-id>` with values from the downloaded JSON key.
      For detailed documentation, refer to:
      {+}https://learn.microsoft.com/en-us/azure/databricks/storage/gcs+Image Added

    Windows Parser

    1.  CSV
    2.  Excel
    3.  JSON
    4.  XML
    5.  AVRO
    6.  ORC
    7.  PARQUET

    Load Template, Source Properties will have option to select parser type to load the files.

    Refer to Windows Parser Guide.

    Troubleshooting and Tips

    Run As Administrator

    Press the Windows Key on your keyboard and start typing cmd.exe, when the cmd.exe icon shows up in the search list right click it to bring up the context menu, select "Run As Administrator"
    Now you have an admin prompt navigate to to the folder where you have unpacked your WhereScape Red Enablement Pack using the 'cd' command:
    C:\Windows\system32> cd <full path to the unpacked folder> 
    Run batch (.bat) scripts from the administrator prompt by simply typing the name at the prompt and clicking enter, for example:
    C:\temp\EnablementPack>install_WslPython_Modules.bat
    Run Powershell (.ps1) scripts from the administrator prompt by typing the Powershell run script command, for example:
    C:\temp\EnablementPack>Powershell -ExecutionPolicy Bypass -File .\Setup_Enablement_Pack.ps1

    Note

    If you can not bypass the Powershell execution policy due to group policies you can instead try "-ExecutionPolicy RemoteSigned" which should allow unsigned local scripts.

    Setting Up Databricks Configuration

    1. Add a system variable DATABRICKS_CONFIG_FILE to point to a location that permits you to configure the databricks-cli.
      Image Added
    2. Open command prompt and configure databricks-cli using databricks configure --aad-token.
      Image Added
    3. On running this command, the config file should be created in the location specified in the config file system variable
      Image Added

    Windows Powershell Script Execution

    On some systems, Windows Powershell script execution is disabled by default. There are several workarounds for this which can be found by searching the term "Powershell Execution Policy".
    Here is the most common workaround that WhereScape suggests, which does not permanently change the execution rights:
    Start a Windows CMD prompt as Administrator, change the directory to your script directory, and run the WhereScape Powershell scripts with this command:

    • cmd:>Powershell -ExecutionPolicy Bypass -File .\<script_file_name.ps1>

    Re-install Python Libraries

    Press the Windows Key on your keyboard and start typing cmd.exe, when the cmd.exe icon shows up in the search list right click it to bring up the context menu, select "Run As Administrator"
    Now you have an admin prompt navigate to to the folder where you have unpacked your WhereScape Red Enablement Pack to using the 'cd' command:
    C:\Windows\system32> cd <full path to the unpacked folder> 
    Run batch (.bat) scripts from the administrator prompt by simply typing the name at the prompt and clicking enter, for example:
    C:\temp\EnablementPack>uninstall_WslPython_Modules.bat
    For the installation of Python libraries, there are two methods:

    • Method 1
      Press the Windows Key on your keyboard and start typing cmd.exe, when the cmd.exe icon shows up in the search list right click it to bring up the context menu, select "Run As Administrator"
      Now you have an admin prompt navigate to to the folder where you have unpacked your WhereScape Red Enablement Pack using the 'cd' command:
      C:\Windows\system32> cd <full path to the unpacked folder> 
      Run batch (.bat) scripts from the administrator prompt by simply typing the name at the prompt and clicking enter, for example:
      C:\temp\EnablementPack>install_WslPython_Modules.bat
    • Method 2

    • Press the Windows Key on your keyboard and start typing cmd.exe, when the cmd.exe icon shows up in the search list right click it to bring up the context menu, select "Run As Administrator"
      Now you have an admin prompt navigate to to the folder where you have unpacked your WhereScape Red Enablement Pack

    ...

    • using the 'cd' command:
      C:\Windows\system32> cd <full path to the unpacked

    ...

      • Method 1

    Press the Windows Key on your keyboard and start typing cmd.exe, when the cmd.exe icon shows up in the search list right click it to bring up the context menu, select "Run As Administrator"
    Now you have an admin prompt navigate to to the folder where you have unpacked your WhereScape Red Enablement Pack to using the 'cd' command:
    C:\Windows\system32> cd <full path to the unpacked folder> 
    Run batch (.bat) scripts from the administrator prompt by simply typing the name at the prompt and hit enter, for example:
            C:\temp\EnablementPack>install_WslPython_Modules.bat

      • Method 2

    Press the Windows Key on your keyboard and start typing cmd.exe, when the cmd.exe icon shows up in the search list right click it to bring up the context menu, select "Run As Administrator"
    Now you have an admin prompt navigate to to the folder where you have unpacked your WhereScape Red Enablement Pack to using the 'cd' command:
            C:\Windows\system32> cd <full path to the unpacked folder> 
    Run the below command
            python -m pip install -r requirements.txt

    For upgrade of existing repository 

    Image Removed
    In upgrade of exiting repository if the user gets above error then it means the script type of wsl_post_install_enablement_pack is set to PowerShell(64-bit) change the script type to Auto Execute-PowerShell before upgrade or manually run the wsl_post_install_enablement_pack script from host script from RED after upgrade.

    If a valid RED installation can not be found

    If you have RED 10.x or higher installed but the script (Setup_Enablement_Pack.ps1) fails to find it on you system then you are most likely running PowerShell (x86) version which does not show installed 64 bit apps by default. Please open a 64 bit version of Powershell instead and re-run the script

    Attachments:

    ...

    • folder>
      Run the below command
      python -m pip install -r requirements.txt

    For upgrade of an existing repository 

    Image Added
    In the upgrade of the existing repository if the user gets the above error then it means the script type of wsl_post_install_enablement_pack is set to PowerShell(64-bit) change the script type to Auto Execute-PowerShell before upgrading or manually running the wsl_post_install_enablement_pack script from the host script from RED after the upgrade.

    If a valid RED installation can not be found

    If you have RED 10.x or higher installed but the script (Setup_Enablement_Pack.ps1) fails to find it on your system then you are most likely running the PowerShell (x86) version which does not show installed 64-bit apps by default. Please open a 64-bit version of Powershell instead and re-run the script

    ...