How to configure databricks-connect

Databricks-connect is a Python package that lets users run code on clusters from their local machines. This can significantly improve the developer experience and enables easier debugging. To get the most from this guide, you should be familiar with virtual environments (if not see this tutorial from Corey Schafer). The following steps create a virtual environment, install databricks-connect, configure it, and test it to make sure it’s working. 

Install and Configure databricks-connect

  1. Create a new Python virtual environment using Python 3.6 or greater. If you are using a Windows laptop, then I would recommend working in WSL or WSL2 because most of the tools used in this guide are written primarily for Linux (however they may also work on Windows). If you have not used WSL, it is an extremely useful tool: https://docs.microsoft.com/en-us/windows/wsl/install 

  2. Activate the virtual environment. 

  3. Update your packages repo. For example, on Ubuntu this would be:

    sudo apt-get update

  4. Install java 8. For example if you are using Ubuntu this would be:

    sudo apt-get install openjdk-8-jdk 

  5. Run:

    pip install databricks-connect==9.1.* 

  6. In a browse, open Databricks and create a Personal Access Token (PAT) by going to Settings -> User Settings -> Access Tokens

  7. On your local machine, in the same terminal/virtual environment you’ve used to install databricks-connect, configure databricks-connect by running:

    databricks-connect configure

    • You’ll be asked for some information, including for the PAT you just created.

    • Decide which cluster you want to connect to by opening up the compute page in Databricks and clicking on the cluster on which you want to run your code. Please note that databricks-connect only works with clusters running certain versions of the Databricks runtime. Check the requirements here. Once you have clicked on the right cluster, you can harvest the configuration information you need from the URL like so:

Test that it’s working

To test databricks-connect is working, run:

databricks-connect test

If the tests pass you’re connected to the cluster. Now you can run code on the Databricks cluster from your local laptop without having to use a notebook or the Databricks user interface!


Lachlan McLachlan

Lachlan is a Machine Learning Engineer and data-nerd. He’s one of the co-founders of Machine Alchemy and loves nothing more than receiving no comments on his merge requests.

Previous
Previous

Moving Beyond Databricks Notebooks