Databricks
This is the implementation of the Databricks data handler for MindsDB.
Databricks is a data lakehouse that unifies the best of data warehouses and data lakes in one simple platform to handle all your data, analytics, and AI use cases. It’s built on an open and reliable data foundation that efficiently handles all data types and applies one common security and governance approach across all of your data and cloud platforms.
Implementation
This handler is implemented using databricks-sql-connector
, a Python library that allows you to use Python code to run SQL commands on Databricks clusters and Databricks SQL warehouses.
The required arguments to establish a connection are as follows:
server_hostname
is the server hostname for the cluster or SQL warehouse.http_path
is the HTTP path of the cluster or SQL warehouse.access_token
is a Databricks personal access token for the workspace.
There are several optional arguments as follows:
session_configuration
is a dictionary of Spark session configuration parameters.http_headers
stores additional (key, value) pairs to set in HTTP headers on every RPC request the client makes.catalog
is the catalog to use for the connection. Typically, defaults tohive_metastore
if not provided.schema
is the schema (database) to use for the connection. Defaults todefault
if not provided.
If you installed MindsDB locally via pip, you need to install all handler dependencies manually. To do so, go to the handler’s folder (mindsdb/integrations/handlers/databricks_handler) and run this command: pip install -r requirements.txt
.
Usage
In order to make use of this handler and connect to the Databricks database in MindsDB, the following syntax can be used:
You can use this established connection to query your table as follows: