How to configure databricks-connect
Databricks-connect is a Python package that lets users run code on clusters from their local machines. This can significantly improve the developer experience and enables easier debugging. To get the most from this guide, you should be familiar with virtual environments (if not see this tutorial from Corey Schafer). The following steps create a virtual environment, install databricks-connect, configure it, and test it to make sure it’s working.
Install and Configure databricks-connect
Create a new Python virtual environment using Python 3.6 or greater. If you are using a Windows laptop, then I would recommend working in WSL or WSL2 because most of the tools used in this guide are written primarily for Linux (however they may also work on Windows). If you have not used WSL, it is an extremely useful tool: https://docs.microsoft.com/en-us/windows/wsl/install
Activate the virtual environment.
Update your packages repo. For example, on Ubuntu this would be:
sudo apt-get update
Install java 8. For example if you are using Ubuntu this would be:
sudo apt-get install openjdk-8-jdk
Run:
pip install databricks-connect==9.1.*
In a browse, open Databricks and create a Personal Access Token (PAT) by going to Settings -> User Settings -> Access Tokens
On your local machine, in the same terminal/virtual environment you’ve used to install databricks-connect, configure databricks-connect by running:
databricks-connect configure
You’ll be asked for some information, including for the PAT you just created.
Decide which cluster you want to connect to by opening up the compute page in Databricks and clicking on the cluster on which you want to run your code. Please note that databricks-connect only works with clusters running certain versions of the Databricks runtime. Check the requirements here. Once you have clicked on the right cluster, you can harvest the configuration information you need from the URL like so:
Test that it’s working
To test databricks-connect is working, run:
databricks-connect test
If the tests pass you’re connected to the cluster. Now you can run code on the Databricks cluster from your local laptop without having to use a notebook or the Databricks user interface!