mirror of
https://github.com/apache/superset.git
synced 2026-04-08 10:55:20 +00:00
90 lines
2.7 KiB
Plaintext
90 lines
2.7 KiB
Plaintext
---
|
|
title: Databricks
|
|
hide_title: true
|
|
sidebar_position: 37
|
|
version: 1
|
|
---
|
|
|
|
## Databricks
|
|
|
|
Databricks now offer a native DB API 2.0 driver, `databricks-sql-connector`, that can be used with the `sqlalchemy-databricks` dialect. You can install both with:
|
|
|
|
```bash
|
|
pip install "superset[databricks]"
|
|
```
|
|
|
|
To use the Hive connector you need the following information from your cluster:
|
|
|
|
- Server hostname
|
|
- Port
|
|
- HTTP path
|
|
|
|
These can be found under "Configuration" -> "Advanced Options" -> "JDBC/ODBC".
|
|
|
|
You also need an access token from "Settings" -> "User Settings" -> "Access Tokens".
|
|
|
|
Once you have all this information, add a database of type "Databricks Native Connector" and use the following SQLAlchemy URI:
|
|
|
|
```
|
|
databricks+connector://token:{access_token}@{server_hostname}:{port}/{database_name}
|
|
```
|
|
|
|
You also need to add the following configuration to "Other" -> "Engine Parameters", with your HTTP path:
|
|
|
|
```json
|
|
{
|
|
"connect_args": {"http_path": "sql/protocolv1/o/****"}
|
|
}
|
|
```
|
|
|
|
## Older driver
|
|
|
|
Originally Superset used `databricks-dbapi` to connect to Databricks. You might want to try it if you're having problems with the official Databricks connector:
|
|
|
|
```bash
|
|
pip install "databricks-dbapi[sqlalchemy]"
|
|
```
|
|
|
|
There are two ways to connect to Databricks when using `databricks-dbapi`: using a Hive connector or an ODBC connector. Both ways work similarly, but only ODBC can be used to connect to [SQL endpoints](https://docs.databricks.com/sql/admin/sql-endpoints.html).
|
|
|
|
### Hive
|
|
|
|
To connect to a Hive cluster add a database of type "Databricks Interactive Cluster" in Superset, and use the following SQLAlchemy URI:
|
|
|
|
```
|
|
databricks+pyhive://token:{access_token}@{server_hostname}:{port}/{database_name}
|
|
```
|
|
|
|
You also need to add the following configuration to "Other" -> "Engine Parameters", with your HTTP path:
|
|
|
|
```json
|
|
{"connect_args": {"http_path": "sql/protocolv1/o/****"}}
|
|
```
|
|
|
|
### ODBC
|
|
|
|
For ODBC you first need to install the [ODBC drivers for your platform](https://databricks.com/spark/odbc-drivers-download).
|
|
|
|
For a regular connection use this as the SQLAlchemy URI after selecting either "Databricks Interactive Cluster" or "Databricks SQL Endpoint" for the database, depending on your use case:
|
|
|
|
```
|
|
databricks+pyodbc://token:{access_token}@{server_hostname}:{port}/{database_name}
|
|
```
|
|
|
|
And for the connection arguments:
|
|
|
|
```json
|
|
{"connect_args": {"http_path": "sql/protocolv1/o/****", "driver_path": "/path/to/odbc/driver"}}
|
|
```
|
|
|
|
The driver path should be:
|
|
|
|
- `/Library/simba/spark/lib/libsparkodbc_sbu.dylib` (Mac OS)
|
|
- `/opt/simba/spark/lib/64/libsparkodbc_sb64.so` (Linux)
|
|
|
|
For a connection to a SQL endpoint you need to use the HTTP path from the endpoint:
|
|
|
|
```json
|
|
{"connect_args": {"http_path": "/sql/1.0/endpoints/****", "driver_path": "/path/to/odbc/driver"}}
|
|
```
|