This is a guide for installing Source Enablement Packs for WhereScape RED 8.6.1.x
Prerequisites
- Python 3.8 or higher
- Download the Python installer from {+}https://www.python.org/downloads/+
- Select Add Python 3.8 to PATH from the installation Window
- PIP Manager
- From Command Prompt (Run As Administrator) run the below commandPIP Manager Install
python -m pip install --upgrade pip
- From Command Prompt (Run As Administrator) run the below command
- Azure Data Lake Storage Gen2
- Azure Data Lake Storage Gen2 Account Name
- Azure Data Lake Storage Gen2 Access Key
- Azure Data Lake Storage Gen2 SAS Token
- Azure Data Lake Storage Gen2 File System Name (Created in Storage Explorer Preview). For example:
- Azure Data Lake Storage Gen2 Directory Name (Created in Storage Explorer Preview). For example:
- Install Python package
- pip install azure-storage-file-datalake
- Net Framework 4.8 or higher
- Windows Powershell version 5 or higher
- Install Python package
- Run these commands in Windows PowerShell:Note: Use a 64-bit powershell terminal
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12 Install-Module Az.Storage
Enablement Pack Setup Scripts
Scripts entirely drive the Enablement Pack Install process. The table below outlines these scripts, their purpose, and if Run as Administrator is required.
# | Enablement Pack Setup Scripts | Script Purpose | Run as Admin | Intended Application |
1 | install_Source_Enablement_Pack.ps1 | Install Python scripts and UI Config Files for browsing files from Amazon S3, Azure Data Lake Gen2, Google Drive | Yes | New and Existing installations |
The Powershell script above provides some help at the command line, this can be output by passing the -help
parameter to the script.
Source Enablement Pack Installation
Run Windows Powershell as Administrator
<Script1 Location > Powershell -ExecutionPolicy Bypass -File .\install_Source_Enablement_Pack.ps1
If prompted enter the source enablement pack as Azure
Azure Data Lake Storage Gen2 Connection Setup
- Login to RED
- Check-in Host Script Browse_Azure_DataLakeStorageGen2 in the objects list.
- Check UI Configurations in Menu, Tools → UI Configurations → Maintain UI Configurations
- Create a new connection in RED
- Select properties as shown below screenshot
- Property Section Azure Data Lake Gen2 Storage Authentication
- Azure Data Lake Gen2 Storage Account: Azure Data Lake Gen2 Storage Account Name.
The token used to read the storage account name in the scripts is $WSL_SRCCFG_azureStorageAccountName$ - Azure Data Lake Gen2 Storage Account Access Key(Account Key): Azure Data Lake Gen2 Storage Account Access Key also called Account Key.
The token used to read the access key is the environment variable: WSL_SRCCFG_azureStorageAccountAccessKey - Azure Data Lake Gen2 Storage Account SAS Token: Azure Data Lake Gen2 Storage Account Shared Access Signature (SAS) Token.
The token used to read environment variable: WSL_SRCCFG_azureSASToken
- Azure Data Lake Gen2 Storage Account: Azure Data Lake Gen2 Storage Account Name.
- Property Section Azure Data Lake Gen2 Storage Settings
- Azure Data Lake Gen2 Storage File System: Azure Data Lake Gen2 Storage File System name.
The token used to read the storage file system name in the scripts is $WSL_SRCCFG_azureStorageFileSystem$ - Azure Data Lake Gen2 Storage File System Directory: Azure Data Lake Gen2 Storage Directory name where blob exists.
The token used to read the directory name in the browse script is $WSL_SRCCFG_azureStorageFileSystemDirectory$ - File Download Path: Local directory where the file needs to be downloaded for data profiling from the sourceAzure Data Lake Gen2 Storage. For Example Eg: C:\\Source\\Subfolder
or C:/Source/Subfolder/ The token used to read path name in the browse script is $WSL_SRCCFG_fileDownloadPath$
- Azure Data Lake Gen2 Storage File System: Azure Data Lake Gen2 Storage File System name.
- Property Section Azure Data Lake Gen2 Storage File Filter Options
- Field Headings/Labels: Indicates whether the first line of the source file contains a heading/label for each field, which is not regarded as data so it should not be loaded. The token used to read field header boolean value in the script is $WSL_SRCCFG_azureDataLakeGen2FirstLineHeader$
- File Filter Name: Indicates source file name. Provide Azure Blob filename pattern. The file list filters with file extensions, and file name patterns.
- *.*
- *.<File Extension>
- <File Name>.<File Extension>
- <File Name Start>*
The Token used to read File Filter Name in the scripts is $WSL_SRCCFG_azureDataLakeGen2FileFilterName$
- Field Delimiter: This is a character that separates the fields within each record of the source file. The field delimiter identifies the end of each field. For Example, comma ( , ),pipe( | ). The token used to reader field delimiter in the script is $WSL_SRCCFG_azureDataLakeGen2FieldDelimiter$
- Field Enclosure Delimiter: This is a character that delimits BOTH start and end of field value i.e. encapsulates value. A double quote is a common enclosure delimiter. The token used to read the field enclosure delimiter in the script is $WSL_SRCCFG_azureDataLakeGen2FieldEnclosureDelimiter$
- Record Delimiter: This is to identify how each line/record in the source file is ended/terminated/delineated. Default is '\n' The token used to read the record delimiter value in the script is $WSL_SRCCFG_azureDataLakeGen2RecordDelimiter$
- Row Limit for Data Profiling: Number of records to scan for Data Profiling. Data profiling is used to get the column names and data types from the source file. By default, 100 records will be scanned. The token used to read the record delimiter value in the script is $WSL_SRCCFG_azureDataLakeGen2RowLimit$
Troubleshooting and Tips
Run As Administrator
Press the Windows Key on your keyboard and start typing cmd.exe, when the cmd.exe icon shows up in the search list right click it to bring up the context menu, select Run As Administrator
Now you have an admin prompt navigate to to the folder where you have unpacked your WhereScape Source Enablement Pack to using the cd
command:C:\Windows\system32> cd <full path to the unpacked folder>
Run Powershell (.ps1) scripts from the administrator prompt by typing the Powershell run script command, for example:C:\temp\EnablementPack>Powershell -ExecutionPolicy Bypass -File .\install_Source_Enablement_Pack.ps1
-ExecutionPolicy RemoteSigned
which should allow unsigned local scripts.Windows Powershell Script Execution
On some systems, Windows Powershell script execution is disabled by default. There are several workarounds for this which can be found by searching the term "Powershell Execution Policy".
Here is the most common workaround that WhereScape suggests, which does not permanently change the execution rights:
Start a Windows CMD prompt as Administrator, change the directory to your script directory, and run the WhereScape Powershell scripts with this command:
cmd:>Powershell -ExecutionPolicy Bypass -File .\<script_file_name.ps1>
Restarting failed scripts
Some of the setup scripts will track each step and output the step number when there is a failure. To restart from the failed step (or to skip the step) provide the parameter -startAtStep <step number>
to the script.
Example: Powershell -ExecutionPolicy Bypass -File .\<script_file_name.ps1> -startAtStep 123
If a valid RED installation can not be found
If you have Red 8.6.1.x or higher installed but the script (install_Source_Enablement_Pack.ps1) fails to find it on your system then you are most likely running the PowerShell (x86) version which does not show installed 64-bit apps by default. Please open a 64-bit version of PowerShell instead and re-run the script.