mirror of
https://github.com/apache/superset.git
synced 2026-04-07 18:35:15 +00:00
feat: Databricks native driver (#20320)
This commit is contained in:
@@ -7,16 +7,12 @@ version: 1
|
||||
|
||||
## Databricks
|
||||
|
||||
To connect to Databricks, first install [databricks-dbapi](https://pypi.org/project/databricks-dbapi/) with the optional SQLAlchemy dependencies:
|
||||
Databricks now offer a native DB API 2.0 driver, `databricks-sql-connector`, that can be used with the `sqlalchemy-databricks` dialect. You can install both with:
|
||||
|
||||
```bash
|
||||
pip install databricks-dbapi[sqlalchemy]
|
||||
pip install "superset[databricks]"
|
||||
```
|
||||
|
||||
There are two ways to connect to Databricks: using a Hive connector or an ODBC connector. Both ways work similarly, but only ODBC can be used to connect to [SQL endpoints](https://docs.databricks.com/sql/admin/sql-endpoints.html).
|
||||
|
||||
### Hive
|
||||
|
||||
To use the Hive connector you need the following information from your cluster:
|
||||
|
||||
- Server hostname
|
||||
@@ -27,15 +23,44 @@ These can be found under "Configuration" -> "Advanced Options" -> "JDBC/ODBC".
|
||||
|
||||
You also need an access token from "Settings" -> "User Settings" -> "Access Tokens".
|
||||
|
||||
Once you have all this information, add a database of type "Databricks (Hive)" in Superset, and use the following SQLAlchemy URI:
|
||||
Once you have all this information, add a database of type "Databricks Native Connector" and use the following SQLAlchemy URI:
|
||||
|
||||
```
|
||||
databricks+pyhive://token:{access token}@{server hostname}:{port}/{database name}
|
||||
databricks+connector://token:{access_token}@{server_hostname}:{port}/{database_name}
|
||||
```
|
||||
|
||||
You also need to add the following configuration to "Other" -> "Engine Parameters", with your HTTP path:
|
||||
|
||||
```json
|
||||
{
|
||||
"connect_args": {"http_path": "sql/protocolv1/o/****"},
|
||||
"http_headers": [["User-Agent", "Apache Superset"]]
|
||||
}
|
||||
```
|
||||
|
||||
The `User-Agent` header is optional, but helps Databricks identify traffic from Superset. If you need to use a different header please reach out to Databricks and let them know.
|
||||
|
||||
## Older driver
|
||||
|
||||
Originally Superset used `databricks-dbapi` to connect to Databricks. You might want to try it if you're having problems with the official Databricks connector:
|
||||
|
||||
```bash
|
||||
pip install "databricks-dbapi[sqlalchemy]"
|
||||
```
|
||||
|
||||
There are two ways to connect to Databricks when using `databricks-dbapi`: using a Hive connector or an ODBC connector. Both ways work similarly, but only ODBC can be used to connect to [SQL endpoints](https://docs.databricks.com/sql/admin/sql-endpoints.html).
|
||||
|
||||
### Hive
|
||||
|
||||
To connect to a Hive cluster add a database of type "Databricks Interactive Cluster" in Superset, and use the following SQLAlchemy URI:
|
||||
|
||||
```
|
||||
databricks+pyhive://token:{access_token}@{server_hostname}:{port}/{database_name}
|
||||
```
|
||||
|
||||
You also need to add the following configuration to "Other" -> "Engine Parameters", with your HTTP path:
|
||||
|
||||
```json
|
||||
{"connect_args": {"http_path": "sql/protocolv1/o/****"}}
|
||||
```
|
||||
|
||||
@@ -43,15 +68,15 @@ You also need to add the following configuration to "Other" -> "Engine Parameter
|
||||
|
||||
For ODBC you first need to install the [ODBC drivers for your platform](https://databricks.com/spark/odbc-drivers-download).
|
||||
|
||||
For a regular connection use this as the SQLAlchemy URI:
|
||||
For a regular connection use this as the SQLAlchemy URI after selecting either "Databricks Interactive Cluster" or "Databricks SQL Endpoint" for the database, depending on your use case:
|
||||
|
||||
```
|
||||
databricks+pyodbc://token:{access token}@{server hostname}:{port}/{database name}
|
||||
databricks+pyodbc://token:{access_token}@{server_hostname}:{port}/{database_name}
|
||||
```
|
||||
|
||||
And for the connection arguments:
|
||||
|
||||
```
|
||||
```json
|
||||
{"connect_args": {"http_path": "sql/protocolv1/o/****", "driver_path": "/path/to/odbc/driver"}}
|
||||
```
|
||||
|
||||
@@ -62,6 +87,6 @@ The driver path should be:
|
||||
|
||||
For a connection to a SQL endpoint you need to use the HTTP path from the endpoint:
|
||||
|
||||
```
|
||||
```json
|
||||
{"connect_args": {"http_path": "/sql/1.0/endpoints/****", "driver_path": "/path/to/odbc/driver"}}
|
||||
```
|
||||
|
||||
5
setup.py
5
setup.py
@@ -129,7 +129,10 @@ setup(
|
||||
"cockroachdb": ["cockroachdb>=0.3.5, <0.4"],
|
||||
"cors": ["flask-cors>=2.0.0"],
|
||||
"crate": ["crate[sqlalchemy]>=0.26.0, <0.27"],
|
||||
"databricks": ["databricks-dbapi[sqlalchemy]>=0.5.0, <0.6"],
|
||||
"databricks": [
|
||||
"databricks-sql-connector>=2.0.2, <3",
|
||||
"sqlalchemy-databricks>=0.2.0",
|
||||
],
|
||||
"db2": ["ibm-db-sa>=0.3.5, <0.4"],
|
||||
"dremio": ["sqlalchemy-dremio>=1.1.5, <1.3"],
|
||||
"drill": ["sqlalchemy-drill==0.1.dev"],
|
||||
|
||||
@@ -65,3 +65,9 @@ class DatabricksODBCEngineSpec(BaseEngineSpec):
|
||||
@classmethod
|
||||
def epoch_to_dttm(cls) -> str:
|
||||
return HiveEngineSpec.epoch_to_dttm()
|
||||
|
||||
|
||||
class DatabricksNativeEngineSpec(DatabricksODBCEngineSpec):
|
||||
engine = "databricks"
|
||||
engine_name = "Databricks Native Connector"
|
||||
driver = "connector"
|
||||
|
||||
Reference in New Issue
Block a user