3 min read

Databricks : Configuring the Databricks CLI

The Databricks CLI offers a straightforward way for automation and scripting in your workspace, particularly when it comes to setting up secrets and managing mounts. I've been running the majority of my tooling through the Windows Subsystem for Linux. Below, I capture the steps to set up the CLI in WSL, create a new secret, and mount to my storage account, showcasing the ease of managing Databricks resources through this setup. Commands in this post target versions 0.205+ of the CLI.

Install the CLI

From the Databricks CLI documentation, we use curl to run the cli install shell script. Before I do that, I'm running apt update and ensuring unzip is installed.

sudo apt update
sudo apt install unzip
curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sudo sh

The CLI is now installed in my /usr/local/bin/databricks folder. Next we want to create a cli profile to connect to the workspace. A Databricks profile stores configuration and credentials in the Databricks CLI to simplify and secure access to different Databricks workspaces without having to provide credentials repeatedly.

To do that, we run databricks configure which ask for the host URL of our Databricks instance

Configure Databricks CLI

Configuring the Databricks CLI requires your host name and a personal access token. The host name in Databricks CLI configuration specifies the URL of your Databricks workspace, while the personal access token is a secure credential used to authenticate and access your Databricks account programmatically.

From the Databricks UI open your profile and user settings. From the settings screen you can open Developer and then Access Tokens Manage. Note the URL too is the host name (ex: https://adb-<<WORKSPACEID>>.azuredatabricks.net/)

Select the 'Manage' button then 'Generate New Token'. I called mine CLI with a 90 day lifetime. Copy the token value - you won't be able to access it again. Now you should have the host value and token value for configuring the CLI.

From the command line:

databricks configure

Enter your host name and personal access token. You can now run:

databricks auth profiles

To see the newly added profile. The default profile in Databricks CLI is a configuration profile that gets used when no other profile is specified

Our profile is setup but I want to note that the .databrickscfg file is located within in the user home directory in WSL (echo $HOME). The databrickscfg file is a configuration file used by the Databricks CLI and other Databricks utilities to store credentials and configuration settings for connecting to Databricks workspaces. When we setup a new profile, it will be saved in the .databrickcfg

Running cat .databrickscfg shows the [DEFAULT] profile host and token value. As noted in the Databricks docs, you can run databricks clusters spark-versions to list the available Databricks Runtime versions and ensure your auth is setup correctly.

References

Install or update the Databricks CLI
Learn how to install the Databricks CLI. The Databricks CLI is a command-line tool that works with Databricks.
Use Azure managed identities in Unity Catalog to access storage - Azure Databricks
Learn how to use Azure managed identities to connect to Azure Databricks Unity Catalog metastore root storage and other external storage accounts.