diff --git a/docs/developer_docs/extensions/mcp-server.md b/docs/developer_docs/extensions/mcp-server.md new file mode 100644 index 00000000000..769ada1bc0d --- /dev/null +++ b/docs/developer_docs/extensions/mcp-server.md @@ -0,0 +1,679 @@ +--- +title: MCP Server Deployment & Authentication +hide_title: true +sidebar_position: 9 +version: 1 +--- + + + +# MCP Server Deployment & Authentication + +Superset includes a built-in [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) server that lets AI assistants -- Claude, ChatGPT, and other MCP-compatible clients -- interact with your Superset instance. Through MCP, clients can list dashboards, query datasets, execute SQL, create charts, and more. + +This guide covers how to run, secure, and deploy the MCP server. + +```mermaid +flowchart LR + A["AI Client
(Claude, ChatGPT, etc.)"] -- "MCP protocol
(HTTP + JSON-RPC)" --> B["MCP Server
(:5008/mcp)"] + B -- "Superset context
(app, db, RBAC)" --> C["Superset
(:8088)"] + C --> D[("Database
(Postgres)")] +``` + +--- + +## Quick Start + +Get the MCP server running locally and connect an AI client in three steps. + +### 1. Start the MCP server + +The MCP server runs as a separate process alongside Superset: + +```bash +superset mcp run --host 127.0.0.1 --port 5008 +``` + +| Flag | Default | Description | +|------|---------|-------------| +| `--host` | `127.0.0.1` | Host to bind to | +| `--port` | `5008` | Port to bind to | +| `--debug` | off | Enable debug logging | + +The endpoint is available at `http://:/mcp`. + +### 2. Set a development user + +For local development, tell the MCP server which Superset user to impersonate (the user must already exist in your database): + +```python +# superset_config.py +MCP_DEV_USERNAME = "admin" +``` + +### 3. Connect an AI client + +Point your MCP client at the server. For **Claude Desktop**, edit the config file: + +- **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json` +- **Windows**: `%APPDATA%\Claude\claude_desktop_config.json` +- **Linux**: `~/.config/Claude/claude_desktop_config.json` + +```json +{ + "mcpServers": { + "superset": { + "url": "http://localhost:5008/mcp" + } + } +} +``` + +Restart Claude Desktop. The hammer icon in the chat bar confirms the connection. + +See [Connecting AI Clients](#connecting-ai-clients) for Claude Code, Claude Web, ChatGPT, and raw HTTP examples. + +--- + +## Prerequisites + +- Apache Superset 5.0+ running and accessible +- Python 3.10+ +- The `fastmcp` package (`pip install fastmcp`) + +--- + +## Authentication + +The MCP server supports multiple authentication methods depending on your deployment scenario. + +```mermaid +flowchart TD + R["Incoming MCP Request"] --> F{"MCP_AUTH_FACTORY
set?"} + F -- Yes --> CF["Custom Auth Provider"] + F -- No --> AE{"MCP_AUTH_ENABLED?"} + AE -- "True" --> JWT["JWT Validation"] + AE -- "False" --> DU["Dev Mode
(MCP_DEV_USERNAME)"] + + JWT --> ALG{"MCP_JWT_ALGORITHM"} + ALG -- "RS256 + JWKS" --> JWKS["Fetch keys from
MCP_JWKS_URI"] + ALG -- "RS256 + static" --> PK["Use
MCP_JWT_PUBLIC_KEY"] + ALG -- "HS256" --> SEC["Use
MCP_JWT_SECRET"] + + JWKS --> V["Validate token
(exp, iss, aud, scopes)"] + PK --> V + SEC --> V + V --> UR["Resolve Superset user
from token claims"] + UR --> OK["Authenticated request"] + CF --> OK + DU --> OK +``` + +### Development Mode (No Auth) + +Disable authentication and use a fixed user: + +```python +# superset_config.py +MCP_AUTH_ENABLED = False +MCP_DEV_USERNAME = "admin" +``` + +All operations run as the configured user. + +:::warning +Never use development mode in production. Always enable authentication for any internet-facing deployment. +::: + +### JWT Authentication + +For production, enable JWT-based authentication. The MCP server validates a Bearer token on every request. + +#### Option A: RS256 with JWKS endpoint + +The most common setup for OAuth 2.0 / OIDC providers that publish a JWKS (JSON Web Key Set) endpoint: + +```python +# superset_config.py +MCP_AUTH_ENABLED = True +MCP_JWT_ALGORITHM = "RS256" +MCP_JWKS_URI = "https://your-identity-provider.com/.well-known/jwks.json" +MCP_JWT_ISSUER = "https://your-identity-provider.com/" +MCP_JWT_AUDIENCE = "your-superset-instance" +``` + +#### Option B: RS256 with static public key + +Use this when you have a fixed RSA key pair (e.g., self-signed tokens): + +```python +# superset_config.py +MCP_AUTH_ENABLED = True +MCP_JWT_ALGORITHM = "RS256" +MCP_JWT_PUBLIC_KEY = """-----BEGIN PUBLIC KEY----- +MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA... +-----END PUBLIC KEY-----""" +MCP_JWT_ISSUER = "your-issuer" +MCP_JWT_AUDIENCE = "your-audience" +``` + +#### Option C: HS256 with shared secret + +Use this when both the token issuer and the MCP server share a symmetric secret: + +```python +# superset_config.py +MCP_AUTH_ENABLED = True +MCP_JWT_ALGORITHM = "HS256" +MCP_JWT_SECRET = "your-shared-secret-key" +MCP_JWT_ISSUER = "your-issuer" +MCP_JWT_AUDIENCE = "your-audience" +``` + +:::warning +Store `MCP_JWT_SECRET` securely. Never commit it to version control. Use environment variables: +```python +import os +MCP_JWT_SECRET = os.environ.get("MCP_JWT_SECRET") +``` +::: + +#### JWT claims + +The MCP server validates these standard claims: + +| Claim | Config Key | Description | +|-------|-----------|-------------| +| `exp` | -- | Expiration time (always validated) | +| `iss` | `MCP_JWT_ISSUER` | Token issuer (optional but recommended) | +| `aud` | `MCP_JWT_AUDIENCE` | Token audience (optional but recommended) | +| `sub` | -- | Subject -- primary claim used to resolve the Superset user | + +#### User resolution + +After validating the token, the MCP server resolves a Superset username from the claims. It checks these in order, using the first non-empty value: + +1. `subject` -- the standard `sub` claim (via the access token object) +2. `client_id` -- for machine-to-machine tokens +3. `payload["sub"]` -- fallback to raw payload +4. `payload["email"]` -- email-based lookup +5. `payload["username"]` -- explicit username claim + +The resolved value must match a `username` in the Superset `ab_user` table. + +#### Scoped access + +Require specific scopes in the JWT to limit what MCP operations a token can perform: + +```python +# superset_config.py +MCP_REQUIRED_SCOPES = ["mcp:read", "mcp:write"] +``` + +Only tokens that include **all** required scopes are accepted. + +### Custom Auth Provider + +For advanced scenarios (e.g., a proprietary auth system), provide a factory function. This takes precedence over all built-in JWT configuration: + +```python +# superset_config.py +def my_custom_auth_factory(app): + """Return a FastMCP auth provider instance.""" + from fastmcp.server.auth.providers.jwt import JWTVerifier + return JWTVerifier( + jwks_uri="https://my-auth.example.com/.well-known/jwks.json", + issuer="https://my-auth.example.com/", + audience="superset-mcp", + ) + +MCP_AUTH_FACTORY = my_custom_auth_factory +``` + +--- + +## Connecting AI Clients + +### Claude Desktop + +**Local development (no auth):** + +```json +{ + "mcpServers": { + "superset": { + "url": "http://localhost:5008/mcp" + } + } +} +``` + +**With JWT authentication:** + +```json +{ + "mcpServers": { + "superset": { + "command": "npx", + "args": [ + "-y", + "mcp-remote@latest", + "http://your-superset-host:5008/mcp", + "--header", + "Authorization: Bearer YOUR_TOKEN" + ] + } + } +} +``` + +### Claude Code (CLI) + +Add to your project's `.mcp.json`: + +```json +{ + "mcpServers": { + "superset": { + "type": "url", + "url": "http://localhost:5008/mcp" + } + } +} +``` + +With authentication: + +```json +{ + "mcpServers": { + "superset": { + "type": "url", + "url": "http://localhost:5008/mcp", + "headers": { + "Authorization": "Bearer YOUR_TOKEN" + } + } + } +} +``` + +### Claude Web (claude.ai) + +1. Open [claude.ai](https://claude.ai) +2. Click the **+** button (or your profile icon) +3. Select **Connectors** +4. Click **Manage Connectors** > **Add custom connector** +5. Enter a name and your MCP URL (e.g., `https://your-superset-host/mcp`) +6. Click **Add** + +:::info +Custom connectors on Claude Web require a Pro, Max, Team, or Enterprise plan. +::: + +### ChatGPT + +1. Click your profile icon > **Settings** > **Apps and Connectors** +2. Enable **Developer Mode** in Advanced Settings +3. In the chat composer, press **+** > **Add sources** > **App** > **Connect more** > **Create app** +4. Enter a name and your MCP server URL +5. Click **I understand and continue** + +:::info +ChatGPT MCP connectors require a Pro, Team, Enterprise, or Edu plan. +::: + +### Direct HTTP requests + +Call the MCP server directly with any HTTP client: + +```bash +curl -X POST http://localhost:5008/mcp \ + -H 'Content-Type: application/json' \ + -H 'Authorization: Bearer YOUR_JWT_TOKEN' \ + -d '{"jsonrpc": "2.0", "method": "tools/list", "id": 1}' +``` + +--- + +## Deployment + +### Single Process + +The simplest setup: run the MCP server alongside Superset on the same host. + +```mermaid +flowchart TD + subgraph host["Host / VM"] + direction TB + S["Superset
:8088"] --> DB[("Postgres")] + M["MCP Server
:5008"] --> DB + end + C["AI Client"] -- "HTTPS" --> P["Reverse Proxy
(Nginx / Caddy)"] + U["Browser"] -- "HTTPS" --> P + P -- ":8088" --> S + P -- ":5008/mcp" --> M +``` + +**superset_config.py:** + +```python +MCP_SERVICE_HOST = "0.0.0.0" +MCP_SERVICE_PORT = 5008 +MCP_DEV_USERNAME = "admin" # or enable JWT auth + +# If behind a reverse proxy, set the public-facing URL so +# MCP-generated links (chart previews, SQL Lab URLs) resolve correctly: +MCP_SERVICE_URL = "https://superset.example.com" +``` + +**Start both processes:** + +```bash +# Terminal 1 -- Superset web server +superset run -h 0.0.0.0 -p 8088 + +# Terminal 2 -- MCP server +superset mcp run --host 0.0.0.0 --port 5008 +``` + +**Nginx reverse proxy with TLS:** + +```nginx +server { + listen 443 ssl; + server_name superset.example.com; + + ssl_certificate /path/to/cert.pem; + ssl_certificate_key /path/to/key.pem; + + # Superset web UI + location / { + proxy_pass http://127.0.0.1:8088; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + } + + # MCP endpoint + location /mcp { + proxy_pass http://127.0.0.1:5008/mcp; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header Authorization $http_authorization; + } +} +``` + +### Docker Compose + +Run Superset and the MCP server as separate containers sharing the same config: + +```yaml +# docker-compose.yml +services: + superset: + image: apache/superset:latest + ports: + - "8088:8088" + volumes: + - ./superset_config.py:/app/superset_config.py + environment: + - SUPERSET_CONFIG_PATH=/app/superset_config.py + + mcp: + image: apache/superset:latest + command: ["superset", "mcp", "run", "--host", "0.0.0.0", "--port", "5008"] + ports: + - "5008:5008" + volumes: + - ./superset_config.py:/app/superset_config.py + environment: + - SUPERSET_CONFIG_PATH=/app/superset_config.py + depends_on: + - superset +``` + +Both containers share the same `superset_config.py`, so authentication settings, database connections, and feature flags stay in sync. + +### Multi-Pod (Kubernetes) + +For high-availability deployments, configure Redis so that replicas share session state: + +```mermaid +flowchart TD + LB["Load Balancer"] --> M1["MCP Pod 1"] + LB --> M2["MCP Pod 2"] + LB --> M3["MCP Pod 3"] + M1 --> R[("Redis
(session store)")] + M2 --> R + M3 --> R + M1 --> DB[("Postgres")] + M2 --> DB + M3 --> DB +``` + +**superset_config.py:** + +```python +MCP_STORE_CONFIG = { + "enabled": True, + "CACHE_REDIS_URL": "redis://redis-host:6379/0", + "event_store_max_events": 100, + "event_store_ttl": 3600, +} +``` + +When `CACHE_REDIS_URL` is set, the MCP server uses a Redis-backed EventStore for session management, allowing replicas to share state. Without Redis, each pod manages its own in-memory sessions and stateful MCP interactions may fail when requests hit different replicas. + +--- + +## Configuration Reference + +All MCP settings go in `superset_config.py`. Defaults are defined in `superset/mcp_service/mcp_config.py`. + +### Core + +| Setting | Default | Description | +|---------|---------|-------------| +| `MCP_SERVICE_HOST` | `"localhost"` | Host the MCP server binds to | +| `MCP_SERVICE_PORT` | `5008` | Port the MCP server binds to | +| `MCP_SERVICE_URL` | `None` | Public base URL for MCP-generated links (set this when behind a reverse proxy) | +| `MCP_DEBUG` | `False` | Enable debug logging | +| `MCP_DEV_USERNAME` | -- | Superset username for development mode (no auth) | + +### Authentication + +| Setting | Default | Description | +|---------|---------|-------------| +| `MCP_AUTH_ENABLED` | `False` | Enable JWT authentication | +| `MCP_JWT_ALGORITHM` | `"RS256"` | JWT signing algorithm (`RS256` or `HS256`) | +| `MCP_JWKS_URI` | `None` | JWKS endpoint URL (RS256) | +| `MCP_JWT_PUBLIC_KEY` | `None` | Static RSA public key string (RS256) | +| `MCP_JWT_SECRET` | `None` | Shared secret string (HS256) | +| `MCP_JWT_ISSUER` | `None` | Expected `iss` claim | +| `MCP_JWT_AUDIENCE` | `None` | Expected `aud` claim | +| `MCP_REQUIRED_SCOPES` | `[]` | Required JWT scopes | +| `MCP_JWT_DEBUG_ERRORS` | `False` | Log detailed JWT errors server-side (never exposed in HTTP responses per RFC 6750) | +| `MCP_AUTH_FACTORY` | `None` | Custom auth provider factory `(flask_app) -> auth_provider`. Takes precedence over built-in JWT | + +### Response Size Guard + +Limits response sizes to prevent exceeding LLM context windows: + +```python +MCP_RESPONSE_SIZE_CONFIG = { + "enabled": True, + "token_limit": 25000, + "warn_threshold_pct": 80, + "excluded_tools": [ + "health_check", + "get_chart_preview", + "generate_explore_link", + "open_sql_lab_with_context", + ], +} +``` + +| Key | Default | Description | +|-----|---------|-------------| +| `enabled` | `True` | Enable response size checking | +| `token_limit` | `25000` | Maximum estimated token count per response | +| `warn_threshold_pct` | `80` | Warn when response exceeds this percentage of the limit | +| `excluded_tools` | See above | Tools exempt from size checking (e.g., tools that return URLs, not data) | + +### Caching + +Optional response caching for read-heavy workloads. Requires Redis when used with multiple replicas. + +```python +MCP_CACHE_CONFIG = { + "enabled": False, + "CACHE_KEY_PREFIX": None, + "list_tools_ttl": 300, # 5 min + "list_resources_ttl": 300, + "list_prompts_ttl": 300, + "read_resource_ttl": 3600, # 1 hour + "get_prompt_ttl": 3600, + "call_tool_ttl": 3600, + "max_item_size": 1048576, # 1 MB + "excluded_tools": [ + "execute_sql", + "generate_dashboard", + "generate_chart", + "update_chart", + ], +} +``` + +| Key | Default | Description | +|-----|---------|-------------| +| `enabled` | `False` | Enable response caching | +| `CACHE_KEY_PREFIX` | `None` | Optional prefix for cache keys (useful for shared Redis) | +| `list_tools_ttl` | `300` | Cache TTL in seconds for `tools/list` | +| `list_resources_ttl` | `300` | Cache TTL for `resources/list` | +| `list_prompts_ttl` | `300` | Cache TTL for `prompts/list` | +| `read_resource_ttl` | `3600` | Cache TTL for `resources/read` | +| `get_prompt_ttl` | `3600` | Cache TTL for `prompts/get` | +| `call_tool_ttl` | `3600` | Cache TTL for `tools/call` | +| `max_item_size` | `1048576` | Maximum cached item size in bytes (1 MB) | +| `excluded_tools` | See above | Tools that are never cached (mutating or non-deterministic) | + +### Redis Store (Multi-Pod) + +Enables Redis-backed session and event storage for multi-replica deployments: + +```python +MCP_STORE_CONFIG = { + "enabled": False, + "CACHE_REDIS_URL": None, + "event_store_max_events": 100, + "event_store_ttl": 3600, +} +``` + +| Key | Default | Description | +|-----|---------|-------------| +| `enabled` | `False` | Enable Redis-backed store | +| `CACHE_REDIS_URL` | `None` | Redis connection URL (e.g., `redis://redis-host:6379/0`) | +| `event_store_max_events` | `100` | Maximum events retained per session | +| `event_store_ttl` | `3600` | Event TTL in seconds | + +### Session & CSRF + +These values are flat-merged into the Flask app config used by the MCP server process: + +```python +MCP_SESSION_CONFIG = { + "SESSION_COOKIE_HTTPONLY": True, + "SESSION_COOKIE_SECURE": False, + "SESSION_COOKIE_SAMESITE": "Lax", + "SESSION_COOKIE_NAME": "superset_session", + "PERMANENT_SESSION_LIFETIME": 86400, +} + +MCP_CSRF_CONFIG = { + "WTF_CSRF_ENABLED": True, + "WTF_CSRF_TIME_LIMIT": None, +} +``` + +--- + +## Troubleshooting + +### Server won't start + +- Verify `fastmcp` is installed: `pip install fastmcp` +- Check that `MCP_DEV_USERNAME` is set if auth is disabled -- the server requires a user identity +- Confirm the port is not already in use: `lsof -i :5008` + +### 401 Unauthorized + +- Verify your JWT token has not expired (`exp` claim) +- Check that `MCP_JWT_ISSUER` and `MCP_JWT_AUDIENCE` match the token's `iss` and `aud` claims exactly +- For RS256 with JWKS: confirm the JWKS URI is reachable from the MCP server +- For RS256 with static key: confirm the public key string includes the `BEGIN`/`END` markers +- For HS256: confirm the secret matches between the token issuer and `MCP_JWT_SECRET` +- Enable `MCP_JWT_DEBUG_ERRORS = True` for detailed server-side logging (errors are never leaked to the client) + +### Tool not found + +- Ensure the MCP server and Superset share the same `superset_config.py` +- Check server logs at startup -- tool registration errors are logged with the tool name and reason + +### Client can't connect + +- Verify the MCP server URL is reachable from the client machine +- For Claude Desktop: fully quit the app (not just close the window) and restart after config changes +- For remote access: ensure your firewall and reverse proxy allow traffic to the MCP port +- Confirm the URL path ends with `/mcp` (e.g., `http://localhost:5008/mcp`) + +### Permission errors on tool calls + +- The MCP server enforces Superset's RBAC permissions -- the authenticated user must have the required roles +- In development mode, ensure `MCP_DEV_USERNAME` maps to a user with appropriate roles (e.g., Admin) +- Check `superset/security/manager.py` for the specific permission tuples required by each tool domain (e.g., `("can_execute_sql_query", "SQLLab")`) + +### Response too large + +- If a tool call returns an error about exceeding token limits, the response size guard is blocking an oversized result +- Reduce `page_size` or `limit` parameters, use `select_columns` to exclude large fields, or add filters to narrow results +- To adjust the threshold, change `token_limit` in `MCP_RESPONSE_SIZE_CONFIG` +- To disable the guard entirely, set `MCP_RESPONSE_SIZE_CONFIG = {"enabled": False}` + +--- + +## Security Best Practices + +- **Use TLS** for all production MCP endpoints -- place the server behind a reverse proxy with HTTPS +- **Enable JWT authentication** for any internet-facing deployment +- **RBAC enforcement** -- The MCP server respects Superset's role-based access control. Users can only access data their roles permit +- **Secrets management** -- Store `MCP_JWT_SECRET`, database credentials, and API keys in environment variables or a secrets manager, never in config files committed to version control +- **Scoped tokens** -- Use `MCP_REQUIRED_SCOPES` to limit what operations a token can perform +- **Network isolation** -- In Kubernetes, restrict MCP pod network policies to only allow traffic from your AI client endpoints +- Review the **[Security documentation](./security)** for additional extension security guidance + +--- + +## Next Steps + +- **[MCP Integration](./mcp)** -- Build custom MCP tools and prompts via Superset extensions +- **[Security](./security)** -- Security best practices for extensions +- **[Deployment](./deployment)** -- Package and deploy Superset extensions diff --git a/docs/developer_docs/sidebars.js b/docs/developer_docs/sidebars.js index 7926d80cf61..babdcd41025 100644 --- a/docs/developer_docs/sidebars.js +++ b/docs/developer_docs/sidebars.js @@ -52,6 +52,7 @@ module.exports = { 'extensions/development', 'extensions/deployment', 'extensions/mcp', + 'extensions/mcp-server', 'extensions/security', 'extensions/tasks', 'extensions/registry',