Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This is a guide to installing the WhereScape Enablement Pack for Databricks for WhereScape RED.


Table of Contents
maxLevel2
minLevel2


Include Page
Prerequisites For PostgreSQL Metadata -WIP
Prerequisites For PostgreSQL Metadata -WIP

 Prerequisites For Target Database Databricks

  • Before you begin the following prerequisites must be met:
  • Create Database and ODBC DSN:
  • Software Installations
    • Databricks CLI - Refer to Setup Guide Databricks CLI Setup 
  • Python 3.8 or higher
    • Select Add Python 3.8 to PATH from the installation Window
    • Pip Manager Install with the command: python -m pip install --upgrade pip


Include Page
Install Guides-Common
Install Guides-Common

Post Install Steps - Optional

If you used the script Setup_Enablement_Pack.ps1 then the following optional post-install steps are available

Configure Connections

There were Three connections added that will optionally require your attention:

  1. Connection: 'Database Source System' - this connection was set up as an example source connection,
    • open it's properties and set it up for a source DB in your environment
    • or you can remove it if not required.
  2. Connection: 'Databricks' - this connection was set as per parameters provided in script 1
    1. open properties and check ifthe Database ID is set correctly
    2. open its properties and check the extended properties tab, set it up for HTTP_PATH, SERVER_HOSTNAME, DB_ACCESS_TOKEN, and DBFS_TMP

Enable Script Launcher Toolbar

Several stand-alone scripts provide some features such as "Ranged Loading", these scripts have been added to the Script Launcher menu but you will need to enable the menu toolbar item to see them.
To enable the Script Launcher menu in RED: Select menu item View>Toolbars>Script Launcher

Source Enablement Pack Support

Source Pack Name

Supported by Databricks

Supported Features

Prerequisites

Amazon S3

Yes

Bulk load to Databricks

Include the Access Key and Secret Key in the Amazon S3 Cloud Parser Connection for S3. For guidance on obtaining these credentials, please refer to the relevant documentation: {+}https://docs.aws.amazon.com/IAM/latest/UserGuide/security-creds.html+

Azure Data Lake Storage Gen2

Yes

Bulk load to Databricks

Add the SAS Token to the ADLG2 Cloud Parser Connection. Refer to {+}https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview+ for information on SAS Tokens.

Google Cloud Storage

Yes

Bulk load to Databricks

Step 1: Service Account Setup

  1. Create a service account in Google Cloud Console.
  2. Navigate to IAM and Admin > Service Accounts.
  3. Click + CREATE SERVICE ACCOUNT, enter details, and create the account.

    Step 2: Generate Access Key for GCS Bucket
  4. In the service accounts list, click the created account.
  5. In the Keys section, click ADD KEY > Create new key.
  6. Choose JSON key type and click CREATE to download the key file.

    Step 3: Bucket Configuration
  7. Configure bucket details in Google Cloud Console.
  8. Navigate to the Permissions tab and click ADD next to Permissions.
  9. Grant Storage Admin permission to the service account on the bucket.
  10. Click SAVE.

    Step 4: Databricks Cluster Configuration
  11. In the Spark Config tab, set the keys using the following snippet:

    spark.hadoop.google.cloud.auth.service.account.enable true
    spark.hadoop.fs.gs.auth.service.account.email <client-email>
    spark.hadoop.fs.gs.project.id <project-id>
    spark.hadoop.fs.gs.auth.service.account.private.key secrets/scope/gsa_private_key
    spark.hadoop.fs.gs.auth.service.account.private.key.id secrets/scope/gsa_private_key_id
    Replace `<client-email>` and `<project-id>` with values from the downloaded JSON key.
    For detailed documentation, refer to:
    {+}https://learn.microsoft.com/en-us/azure/databricks/storage/gcs+

Windows Parser

1.  CSV
2.  Excel
3.  JSON
4.  XML
5.  AVRO
6.  ORC
7.  PARQUET

Load Template, Source Properties will have the option to select parser type to load the files.

Refer to Windows Parser Guide

Include Page
Troubleshooting and Tips - Databricks
Troubleshooting and Tips - Databricks