From 66afdfd119c5f0b1d8adaa1ddd3e98fbd0ed235b Mon Sep 17 00:00:00 2001 From: Amin Ghadersohi Date: Thu, 20 Nov 2025 01:41:56 +1000 Subject: [PATCH] docs(mcp): add comprehensive architecture, security, and production deployment documentation (#36017) --- UPDATING.md | 101 ++ superset/mcp_service/ARCHITECTURE.md | 693 ++++++++++++++ superset/mcp_service/PRODUCTION.md | 1332 ++++++++++++++++++++++++++ superset/mcp_service/SECURITY.md | 803 ++++++++++++++++ 4 files changed, 2929 insertions(+) create mode 100644 superset/mcp_service/ARCHITECTURE.md create mode 100644 superset/mcp_service/PRODUCTION.md create mode 100644 superset/mcp_service/SECURITY.md diff --git a/UPDATING.md b/UPDATING.md index cc812016808..384b7d22233 100644 --- a/UPDATING.md +++ b/UPDATING.md @@ -23,6 +23,107 @@ This file documents any backwards-incompatible changes in Superset and assists people when migrating to a new version. ## Next + +### MCP Service + +The MCP (Model Context Protocol) service enables AI assistants and automation tools to interact programmatically with Superset. + +#### New Features +- MCP service infrastructure with FastMCP framework +- Tools for dashboards, charts, datasets, SQL Lab, and instance metadata +- Optional dependency: install with `pip install apache-superset[fastmcp]` +- Runs as separate process from Superset web server +- JWT-based authentication for production deployments + +#### New Configuration Options + +**Development** (single-user, local testing): +```python +# superset_config.py +MCP_DEV_USERNAME = "admin" # User for MCP authentication +MCP_SERVICE_HOST = "localhost" +MCP_SERVICE_PORT = 5008 +``` + +**Production** (JWT-based, multi-user): +```python +# superset_config.py +MCP_AUTH_ENABLED = True +MCP_JWT_ISSUER = "https://your-auth-provider.com" +MCP_JWT_AUDIENCE = "superset-mcp" +MCP_JWT_ALGORITHM = "RS256" # or "HS256" for shared secrets + +# Option 1: Use JWKS endpoint (recommended for RS256) +MCP_JWKS_URI = "https://auth.example.com/.well-known/jwks.json" + +# Option 2: Use static public key (RS256) +MCP_JWT_PUBLIC_KEY = "-----BEGIN PUBLIC KEY-----..." + +# Option 3: Use shared secret (HS256) +MCP_JWT_ALGORITHM = "HS256" +MCP_JWT_SECRET = "your-shared-secret-key" + +# Optional overrides +MCP_SERVICE_HOST = "0.0.0.0" +MCP_SERVICE_PORT = 5008 +MCP_SESSION_CONFIG = { + "SESSION_COOKIE_SECURE": True, + "SESSION_COOKIE_HTTPONLY": True, + "SESSION_COOKIE_SAMESITE": "Strict", +} +``` + +#### Running the MCP Service + +```bash +# Development +superset mcp run --port 5008 --debug + +# Production +superset mcp run --port 5008 + +# With factory config +superset mcp run --port 5008 --use-factory-config +``` + +#### Deployment Considerations + +The MCP service runs as a **separate process** from the Superset web server. + +**Important**: +- Requires same Python environment and configuration as Superset +- Shares database connections with main Superset app +- Can be scaled independently from web server +- Requires `fastmcp` package (optional dependency) + +**Installation**: +```bash +# Install with MCP support +pip install apache-superset[fastmcp] + +# Or add to requirements.txt +apache-superset[fastmcp]>=X.Y.Z +``` + +**Process Management**: +Use systemd, supervisord, or Kubernetes to manage the MCP service process. +See `superset/mcp_service/PRODUCTION.md` for deployment guides. + +**Security**: +- Development: Uses `MCP_DEV_USERNAME` for single-user access +- Production: **MUST** configure JWT authentication +- See `superset/mcp_service/SECURITY.md` for details + +#### Documentation + +- Architecture: `superset/mcp_service/ARCHITECTURE.md` +- Security: `superset/mcp_service/SECURITY.md` +- Production: `superset/mcp_service/PRODUCTION.md` +- Developer Guide: `superset/mcp_service/CLAUDE.md` +- Quick Start: `superset/mcp_service/README.md` + +--- + - [33055](https://github.com/apache/superset/pull/33055): Upgrades Flask-AppBuilder to 5.0.0. The AUTH_OID authentication type has been deprecated and is no longer available as an option in Flask-AppBuilder. OpenID (OID) is considered a deprecated authentication protocol - if you are using AUTH_OID, you will need to migrate to an alternative authentication method such as OAuth, LDAP, or database authentication before upgrading. - [35062](https://github.com/apache/superset/pull/35062): Changed the function signature of `setupExtensions` to `setupCodeOverrides` with options as arguments. - [34871](https://github.com/apache/superset/pull/34871): Fixed Jest test hanging issue from Ant Design v5 upgrade. MessageChannel is now mocked in test environment to prevent rc-overflow from causing Jest to hang. Test environment only - no production impact. diff --git a/superset/mcp_service/ARCHITECTURE.md b/superset/mcp_service/ARCHITECTURE.md new file mode 100644 index 00000000000..a54f73d8064 --- /dev/null +++ b/superset/mcp_service/ARCHITECTURE.md @@ -0,0 +1,693 @@ + + +# MCP Service Architecture + +## Overview + +The Apache Superset MCP (Model Context Protocol) service provides programmatic access to Superset functionality through a standardized protocol that enables AI assistants and automation tools to interact with dashboards, charts, datasets, and SQL Lab. + +The MCP service runs as a **separate process** from the Superset web server, using its own Flask application instance and HTTP server while sharing the same database and configuration with the main Superset application. + +## Flask Singleton Pattern + +### Why Module-Level Singleton? + +The MCP service uses a module-level singleton Flask application instance rather than creating a new app instance per request. This design decision is based on several important considerations: + +**Separate Process Architecture**: +- The MCP service runs as an independent process from the Superset web server +- It has its own HTTP server (via FastMCP/Starlette) handling MCP protocol requests +- Each MCP tool invocation occurs within the context of this single, long-lived Flask app + +**Benefits of Module-Level Singleton**: + +1. **Consistent Database Connection Pool** + - A single SQLAlchemy connection pool is maintained across all tool calls + - Connections are efficiently reused rather than recreated + - Connection pool configuration (size, timeout, etc.) behaves predictably + +2. **Shared Configuration Access** + - Flask app configuration is loaded once at startup + - All tools access the same configuration state + - Changes to runtime config affect all subsequent tool calls consistently + +3. **Thread-Safe Initialization** + - The Flask app is created exactly once using `threading.Lock()` + - Multiple concurrent requests safely share the same app instance + - No risk of duplicate initialization or race conditions + +4. **Lower Resource Overhead** + - No per-request app creation/teardown overhead + - Memory footprint remains constant regardless of request volume + - Extension initialization (Flask-AppBuilder, Flask-Migrate, etc.) happens once + +**When Module-Level Singleton Is Appropriate**: +- Service runs as dedicated daemon/process +- Application state is consistent across all requests +- No per-request application context needed +- Long-lived server process with many requests + +**When Module-Level Singleton Is NOT Appropriate**: +- Testing with different configurations (use app fixtures instead) +- Multi-tenant deployments requiring different app configs per tenant +- Dynamic plugin loading requiring app recreation +- Development scenarios requiring hot-reload of app configuration + +### Implementation Details + +The singleton is implemented in `flask_singleton.py`: + +```python +# Module-level instance - created once on import +from superset.app import create_app +from superset.mcp_service.mcp_config import get_mcp_config + +_temp_app = create_app() + +with _temp_app.app_context(): + mcp_config = get_mcp_config(_temp_app.config) + _temp_app.config.update(mcp_config) + +app = _temp_app + +def get_flask_app() -> Flask: + """Get the Flask app instance.""" + return app +``` + +**Key characteristics**: +- No complex patterns or metaclasses needed +- The module itself acts as the singleton container +- Clean, Pythonic approach following Stack Overflow recommendations +- Application context pushed during initialization to avoid "Working outside of application context" errors + +## Multitenant Architecture + +### Current Implementation + +The MCP service uses **Option B: Shared Process with Tenant Isolation**: + +```mermaid +graph LR + T1[Tenant 1] + T2[Tenant 2] + T3[Tenant 3] + MCP[Single MCP Process] + DB[(Superset Database)] + + T1 --> MCP + T2 --> MCP + T3 --> MCP + MCP --> DB + + MCP -.->|Isolation via| ISO[User authentication JWT or dev user
Flask-AppBuilder RBAC
Dataset access filters
Row-level security] + + style ISO fill:#f9f,stroke:#333,stroke-width:2px +``` + +### Tenant Isolation Mechanisms + +#### Database Level + +**Superset's Existing RLS (Row-Level Security)**: +- RLS rules are defined at the dataset level +- Rules filter queries based on user attributes (e.g., `department = '{{ current_user.department }}'`) +- The MCP service respects all RLS rules automatically through Superset's query execution layer + +**No Schema-Based Isolation**: +- The current implementation does NOT use separate database schemas per tenant +- All Superset metadata (dashboards, charts, datasets) exists in the same database schema +- Database-level isolation is achieved through Superset's permission system rather than physical schema separation + +#### Application Level + +**Flask-AppBuilder Security Manager**: +- Every MCP tool call uses `@mcp_auth_hook` decorator +- The auth hook sets `g.user` to the authenticated user (from JWT or `MCP_DEV_USERNAME`) +- Superset's security manager then enforces permissions based on this user's roles + +**User-Based Access Control**: +- Users can only access resources they have permissions for +- Dashboard ownership and role-based permissions are enforced +- The `can_access_datasource()` method validates dataset access + +**Dataset Access Filters**: +- All list operations (dashboards, charts, datasets) use Superset's access filters: + - `DashboardAccessFilter` - filters dashboards based on user permissions + - `ChartAccessFilter` - filters charts based on user permissions + - `DatasourceFilter` - filters datasets based on user permissions + +**Row-Level Security Enforcement**: +- RLS rules are applied transparently during query execution +- The MCP service makes no modifications to bypass RLS +- SQL queries executed through `execute_sql` tool respect RLS policies + +#### JWT Tenant Claims + +**Development Mode** (single user): +```python +# superset_config.py +MCP_DEV_USERNAME = "admin" +``` + +**Production Mode** (JWT-based): +```json +{ + "sub": "user@company.com", + "email": "user@company.com", + "scopes": ["superset:read", "superset:chart:create"], + "exp": 1672531200 +} +``` + +**Future Enhancement** (multi-tenant JWT): +```json +{ + "sub": "user@tenant-a.com", + "tenant_id": "tenant-a", + "scopes": ["superset:read"], + "exp": 1672531200 +} +``` + +The `tenant_id` claim could be used in future versions to: +- Further isolate data by tenant context +- Apply tenant-specific RLS rules +- Log and audit actions by tenant +- Implement tenant-specific rate limits + +## Process Model + +### Single Process Deployment + +**When to Use**: +- Development and testing environments +- Small deployments with low request volume (< 100 requests/minute) +- Single-tenant installations +- Resource-constrained environments + +**Resource Characteristics**: +- Memory: ~500MB-1GB (includes Flask app, SQLAlchemy, screenshot pool) +- CPU: Mostly I/O bound (database queries, screenshot generation) +- Database connections: Configurable via `SQLALCHEMY_POOL_SIZE` (default: 5) + +**Scaling Limitations**: +- Single Python process = GIL limitations for CPU-bound operations +- Screenshot generation can block other requests +- Limited horizontal scalability without load balancer + +**Example Command**: +```bash +superset mcp run --port 5008 +``` + +### Multi-Process Deployment + +**Using Gunicorn Workers**: +```bash +gunicorn \ + --workers 4 \ + --bind 0.0.0.0:5008 \ + --worker-class uvicorn.workers.UvicornWorker \ + superset.mcp_service.server:app +``` + +**Configuration Considerations**: +- Worker count: `2-4 x CPU cores` (typical recommendation) +- Each worker has its own Flask app instance via module-level singleton +- Workers share nothing - fully isolated processes +- Database connection pool per worker (watch total connections) + +**Process Pool Management**: +- Use process manager (systemd, supervisord) for auto-restart +- Health checks to detect and restart failed workers +- Graceful shutdown to complete in-flight requests + +**Load Balancing**: +- Use nginx/HAProxy to distribute requests across workers +- Round-robin or least-connections algorithms work well +- Sticky sessions NOT required (stateless API) + +### Containerized Deployment + +**Docker**: +```dockerfile +FROM apache/superset:latest +CMD ["superset", "mcp", "run", "--port", "5008"] +``` + +**Kubernetes Deployment**: +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: superset-mcp +spec: + replicas: 3 + selector: + matchLabels: + app: superset-mcp + template: + metadata: + labels: + app: superset-mcp + spec: + containers: + - name: mcp + image: apache/superset:latest + command: ["superset", "mcp", "run", "--port", "5008"] + ports: + - containerPort: 5008 + env: + - name: SUPERSET_CONFIG_PATH + value: /app/pythonpath/superset_config.py + resources: + requests: + memory: "512Mi" + cpu: "500m" + limits: + memory: "1Gi" + cpu: "1000m" +``` + +**Horizontal Pod Autoscaling**: +```yaml +apiVersion: autoscaling/v2 +kind: HorizontalPodAutoscaler +metadata: + name: superset-mcp-hpa +spec: + scaleTargetRef: + apiVersion: apps/v1 + kind: Deployment + name: superset-mcp + minReplicas: 2 + maxReplicas: 10 + metrics: + - type: Resource + resource: + name: cpu + target: + type: Utilization + averageUtilization: 70 +``` + +**Service Mesh Integration**: +- Istio/Linkerd can provide: + - Automatic retries and circuit breaking + - Distributed tracing + - Mutual TLS between pods + - Advanced traffic routing + +## Database Connection Management + +### Connection Pooling + +The MCP service uses SQLAlchemy's connection pooling with configuration inherited from Superset: + +```python +# superset_config.py +SQLALCHEMY_POOL_SIZE = 5 # Max connections per worker +SQLALCHEMY_POOL_TIMEOUT = 30 # Seconds to wait for connection +SQLALCHEMY_MAX_OVERFLOW = 10 # Extra connections beyond pool_size +SQLALCHEMY_POOL_RECYCLE = 3600 # Recycle connections after 1 hour +``` + +**Connection Lifecycle**: +1. Request arrives at MCP tool +2. Tool calls DAO method which accesses `db.session` +3. SQLAlchemy checks out connection from pool +4. Query executes on borrowed connection +5. Connection returns to pool (not closed) +6. Connection reused for next request + +**Pool Size Recommendations**: +- **Single process**: 5-10 connections +- **Multi-worker (4 workers)**: 3-5 connections per worker = 12-20 total +- **Monitor**: Database max_connections setting must exceed total pool size across all MCP workers + +**Example with 4 Gunicorn workers**: +```python +SQLALCHEMY_POOL_SIZE = 5 +SQLALCHEMY_MAX_OVERFLOW = 5 +# Total potential connections: 4 workers × (5 + 5) = 40 connections +# Ensure database supports 40+ connections +``` + +### Transaction Handling + +**MCP Tool Transaction Pattern**: +```python +@mcp.tool +@mcp_auth_hook +def my_tool(param: str) -> Result: + # Auth hook sets g.user and manages session + try: + # Tool executes within implicit transaction + result = DashboardDAO.find_by_id(123) + return Result(data=result) + except Exception: + # On error: rollback happens in auth hook's except block + raise + finally: + # On success: rollback happens in auth hook's finally block + # (read-only operations don't commit) + pass +``` + +**Session Cleanup in Auth Hook**: + +The `@mcp_auth_hook` decorator manages session lifecycle: + +```python +# On error path +except Exception: + try: + db.session.rollback() + db.session.remove() + except Exception as e: + logger.warning("Error cleaning up session: %s", e) + raise + +# On success path (finally block) +finally: + try: + if db.session.is_active: + db.session.rollback() # Cleanup, don't commit + except Exception as e: + logger.warning("Error in finally block: %s", e) +``` + +**Why Rollback on Success?** +- MCP tools are primarily **read-only operations** +- No explicit commits needed for queries +- Rollback ensures clean slate for next request +- Write operations (create chart, etc.) use Superset's command pattern which handles commits internally + +## Deployment Considerations + +### Resource Requirements + +**Memory Per Process**: +- Base Flask app: ~200MB +- SQLAlchemy + models: ~100MB +- WebDriver pool (if screenshots enabled): ~200MB +- Request processing overhead: ~50MB per concurrent request +- **Total**: 500MB-1GB per process + +**CPU Usage Patterns**: +- I/O bound: Most time spent waiting on database/screenshots +- Low CPU during normal operations (< 20% per core) +- CPU spikes during: + - Screenshot generation (WebDriver rendering) + - Large dataset query processing + - Complex chart configuration validation + +**Database Connections**: +- **Single process**: 5-10 connections (pool_size + max_overflow) +- **Multi-process**: `(pool_size + max_overflow) × worker_count` +- **Example**: 4 workers × 10 max connections = 40 total database connections + +### Scaling Strategy + +**When to Scale Horizontally**: +- Request latency increases beyond acceptable threshold (e.g., p95 > 2 seconds) +- CPU utilization consistently > 70% +- Request queue depth growing +- Database connection pool frequently exhausted + +**Load Balancing Between MCP Instances**: + +**Option 1: Nginx Round-Robin**: +```nginx +upstream mcp_backend { + server mcp-1:5008; + server mcp-2:5008; + server mcp-3:5008; +} + +server { + location / { + proxy_pass http://mcp_backend; + } +} +``` + +**Option 2: Kubernetes Service**: +```yaml +apiVersion: v1 +kind: Service +metadata: + name: superset-mcp +spec: + selector: + app: superset-mcp + ports: + - port: 5008 + targetPort: 5008 + type: ClusterIP +``` + +**Session Affinity**: +- NOT required - MCP service is stateless +- Each request is independent +- No session state maintained between requests +- Load balancer can freely distribute requests + +### High Availability + +**Multiple MCP Instances**: +- Deploy at least 2 instances for redundancy +- Use load balancer health checks to detect failures +- Failed instances automatically removed from rotation + +**Health Checks**: + +The MCP service provides a health check tool: + +```python +# Internal health check +@mcp.tool +def health_check() -> HealthCheckResponse: + return HealthCheckResponse( + status="healthy", + timestamp=datetime.now(timezone.utc), + database_connection="ok" + ) +``` + +**Load balancer health check**: +```nginx +# Nginx example +upstream mcp_backend { + server mcp-1:5008 max_fails=3 fail_timeout=30s; + server mcp-2:5008 max_fails=3 fail_timeout=30s; +} +``` + +**Kubernetes health check**: +```yaml +livenessProbe: + httpGet: + path: /health + port: 5008 + initialDelaySeconds: 30 + periodSeconds: 10 +readinessProbe: + httpGet: + path: /health + port: 5008 + initialDelaySeconds: 10 + periodSeconds: 5 +``` + +**Failover Handling**: +- Load balancer automatically routes around failed instances +- MCP clients should implement retry logic for transient failures +- Use circuit breaker pattern for repeated failures +- Monitor and alert on instance failures + +### Database Considerations + +**Shared Database with Superset**: +- MCP service and Superset web server share the same database +- Same SQLAlchemy models and schema +- Database migrations applied once, affect both services + +**Connection Pool Sizing**: +``` +Total DB Connections = + Superset Web (workers × pool_size) + + MCP Service (workers × pool_size) + + Other services + +Must be < Database max_connections +``` + +**Example Calculation**: +- Superset web: 8 workers × 10 connections = 80 +- MCP service: 4 workers × 10 connections = 40 +- Other: 20 reserved +- **Total**: 140 connections +- **Database**: Set max_connections >= 150 + +### Monitoring Recommendations + +**Key Metrics to Track**: +- Request rate per tool +- Request latency (p50, p95, p99) +- Error rate by tool and error type +- Database connection pool utilization +- Memory usage per process +- Active concurrent requests + +**Example Prometheus Metrics** (future implementation): +```python +mcp_requests_total{tool="list_charts", status="success"} +mcp_request_duration_seconds{tool="list_charts", quantile="0.95"} +mcp_database_connections_active +mcp_database_connections_idle +mcp_memory_usage_bytes +``` + +**Log Aggregation**: +- Centralize logs from all MCP instances +- Use structured logging (JSON format) +- Include trace IDs for request correlation +- Alert on error rate spikes + +## Architecture Diagrams + +### Request Flow + +```mermaid +sequenceDiagram + participant Client as MCP Client
(Claude/automation) + participant FastMCP as FastMCP Server
(Starlette/Uvicorn) + participant Auth as MCP Auth Hook + participant Tool as Tool Implementation
(e.g., list_charts) + participant DAO as Superset DAO Layer
(ChartDAO, DashboardDAO) + participant DB as Database
(PostgreSQL/MySQL) + + Client->>FastMCP: MCP Protocol (HTTP/SSE) + FastMCP->>Auth: @mcp.tool decorator + Auth->>Auth: Sets g.user, manages session + Auth->>Tool: Execute tool + Tool->>DAO: Uses DAO pattern + DAO->>DB: SQLAlchemy ORM + DB-->>DAO: Query results + DAO-->>Tool: Processed data + Tool-->>Auth: Tool response + Auth-->>FastMCP: Response with cleanup + FastMCP-->>Client: MCP response +``` + +### Multi-Instance Deployment + +```mermaid +graph TD + LB[Load Balancer
Nginx/K8s Service] + MCP1[MCP Instance 1
port 5008] + MCP2[MCP Instance 2
port 5008] + MCP3[MCP Instance 3
port 5008] + DB[(Superset Database
shared connection pool)] + + LB --> MCP1 + LB --> MCP2 + LB --> MCP3 + MCP1 --> DB + MCP2 --> DB + MCP3 --> DB +``` + +### Tenant Isolation + +```mermaid +graph TD + UserA[User A
JWT: tenant=acme] + UserB[User B
JWT: tenant=beta] + MCP[MCP Service
single process] + Auth[@mcp_auth_hook
Sets g.user from JWT] + RBAC[Flask-AppBuilder
RBAC] + Filters[Dataset Access
Filters] + DB[(Superset Database
single schema, filtered by permissions)] + + UserA --> MCP + UserB --> MCP + MCP --> Auth + Auth --> RBAC + Auth --> Filters + RBAC --> |User A sees only
acme dashboards| DB + Filters --> |User A queries filtered
by RLS rules for acme| DB +``` + +## Comparison with Alternative Architectures + +### Module-Level Singleton (Current) vs Per-Request App + +| Aspect | Module-Level Singleton | Per-Request App | +|--------|----------------------|-----------------| +| Connection Pool | Single shared pool | New pool per request | +| Memory Overhead | Constant (~500MB) | 500MB × concurrent requests | +| Thread Safety | Must ensure thread-safe access | Each request isolated | +| Configuration | Loaded once at startup | Can vary per request | +| Performance | Fast (no setup overhead) | Slow (initialization cost) | +| Use Case | Production daemon | Testing/multi-config scenarios | + +### Shared Process (Current) vs Separate Process Per Tenant + +| Aspect | Shared Process | Process Per Tenant | +|--------|---------------|-------------------| +| Isolation | Application-level (RBAC/RLS) | Process-level (OS isolation) | +| Resource Usage | Efficient (shared resources) | Higher (duplicate resources) | +| Scaling | Horizontal (add instances) | Vertical (more processes) | +| Complexity | Simpler deployment | Complex orchestration | +| Security | Depends on Superset RBAC | Stronger isolation | +| Use Case | Most deployments | High-security multi-tenant | + +## Future Architectural Considerations + +### Async/Await Support + +The current implementation uses synchronous request handling. Future versions could: +- Use `async`/`await` for I/O operations +- Implement connection pooling with `asyncpg` (PostgreSQL) or `aiomysql` +- Improve throughput for I/O-bound operations + +### Caching Layer + +Adding caching between MCP service and database: +- Redis cache for frequently accessed resources (dashboards, charts, datasets) +- Cache invalidation on updates +- Reduced database load for read-heavy workloads + +### Event-Driven Updates + +WebSocket support for real-time updates: +- Push notifications when dashboards/charts change +- Streaming query results for large datasets +- Live dashboard editing collaboration + +## References + +- **Flask Application Context**: https://flask.palletsprojects.com/en/stable/appcontext/ +- **SQLAlchemy Connection Pooling**: https://docs.sqlalchemy.org/en/stable/core/pooling.html +- **FastMCP Documentation**: https://github.com/jlowin/fastmcp +- **Superset Security Model**: https://superset.apache.org/docs/security diff --git a/superset/mcp_service/PRODUCTION.md b/superset/mcp_service/PRODUCTION.md new file mode 100644 index 00000000000..3ef3538b0e7 --- /dev/null +++ b/superset/mcp_service/PRODUCTION.md @@ -0,0 +1,1332 @@ + + +# MCP Service Production Deployment Guide + +## Current Status + +### What's Production-Ready + +The following components have been implemented and tested: + +- ✅ **Tool Infrastructure**: FastMCP-based tool registration and discovery +- ✅ **Flask App Context Management**: Module-level singleton pattern +- ✅ **Error Handling**: Comprehensive validation and error responses +- ✅ **Pydantic Validation**: Type-safe request/response schemas +- ✅ **Access Control**: RBAC and RLS enforcement through Superset's security manager +- ✅ **Database Connection Pooling**: SQLAlchemy pool management +- ✅ **Health Checks**: Basic health check tool for monitoring + +### What's Development-Only + +The following features are suitable for development but **require configuration for production**: + +- ❌ **Authentication**: `MCP_DEV_USERNAME` single-user authentication (replace with JWT) +- ❌ **Logging**: Basic debug logging (implement structured logging) +- ❌ **Rate Limiting**: No rate limiting implemented (add per-user/per-tool limits) +- ❌ **Monitoring**: No metrics export (add Prometheus/CloudWatch) +- ❌ **Caching**: No caching layer (consider Redis for performance) +- ❌ **HTTPS**: HTTP-only by default (must enable HTTPS for production) + +## Required for Production + +### 1. Authentication & Authorization + +#### JWT Authentication Setup + +**Required Configuration**: + +```python +# superset_config.py + +# Enable JWT authentication +MCP_AUTH_ENABLED = True + +# JWT validation settings +MCP_JWT_ISSUER = "https://auth.yourcompany.com" +MCP_JWT_AUDIENCE = "superset-mcp" +MCP_JWT_ALGORITHM = "RS256" # or "HS256" for shared secrets + +# Option A: Use JWKS endpoint (recommended for RS256) +MCP_JWKS_URI = "https://auth.yourcompany.com/.well-known/jwks.json" + +# Option B: Static public key (RS256) +MCP_JWT_PUBLIC_KEY = """-----BEGIN PUBLIC KEY----- +MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA... +-----END PUBLIC KEY-----""" + +# Option C: Shared secret (HS256 - less secure) +MCP_JWT_ALGORITHM = "HS256" +MCP_JWT_SECRET = "your-256-bit-secret-key" + +# Optional: Require specific scopes +MCP_REQUIRED_SCOPES = ["superset:read"] + +# Disable development authentication +MCP_DEV_USERNAME = None +``` + +**JWT Issuer Setup Examples**: + +**Auth0**: +```python +MCP_JWT_ISSUER = "https://your-tenant.auth0.com/" +MCP_JWT_AUDIENCE = "superset-mcp" +MCP_JWKS_URI = "https://your-tenant.auth0.com/.well-known/jwks.json" +``` + +**Okta**: +```python +MCP_JWT_ISSUER = "https://your-domain.okta.com/oauth2/default" +MCP_JWT_AUDIENCE = "api://superset-mcp" +MCP_JWKS_URI = "https://your-domain.okta.com/oauth2/default/v1/keys" +``` + +**AWS Cognito**: +```python +MCP_JWT_ISSUER = "https://cognito-idp.us-east-1.amazonaws.com/your-pool-id" +MCP_JWT_AUDIENCE = "your-app-client-id" +MCP_JWKS_URI = "https://cognito-idp.us-east-1.amazonaws.com/your-pool-id/.well-known/jwks.json" +``` + +**Self-Hosted (Using Keycloak)**: +```python +MCP_JWT_ISSUER = "https://keycloak.yourcompany.com/realms/superset" +MCP_JWT_AUDIENCE = "superset-mcp" +MCP_JWKS_URI = "https://keycloak.yourcompany.com/realms/superset/protocol/openid-connect/certs" +``` + +**Testing JWT Configuration**: + +```bash +# Validate JWT token +curl -H "Authorization: Bearer YOUR_JWT_TOKEN" \ + https://mcp.yourcompany.com/health + +# Expected response if successful: +# {"status": "healthy", "timestamp": "2025-01-01T10:30:45Z", ...} + +# Expected response if auth fails: +# {"error": "Invalid token", "error_type": "AuthenticationError", ...} +``` + +#### Permission Checking + +Superset's existing RBAC is automatically enforced. Ensure roles are configured: + +```python +# In Superset UI: Security → List Roles + +# Recommended role configuration: +# - Admin: Full access to all MCP tools +# - Alpha: Create/edit charts and dashboards +# - Gamma: Read-only access to shared resources +# - Custom: Fine-grained permissions per use case +``` + +**Verify Permissions**: + +```bash +# Test as non-admin user +curl -H "Authorization: Bearer NON_ADMIN_TOKEN" \ + https://mcp.yourcompany.com/list_charts + +# Should only return charts the user has access to +``` + +### 2. Performance & Reliability + +#### Rate Limiting + +**Implementation Options**: + +**Option A: Flask-Limiter** (application-level): + +```python +# superset_config.py +from flask_limiter import Limiter +from flask_limiter.util import get_remote_address + +# Add to MCP service initialization +MCP_RATE_LIMITING = { + "enabled": True, + "storage_uri": "redis://localhost:6379/0", + "default_limits": ["100 per minute", "1000 per hour"], + "per_tool_limits": { + "execute_sql": "10 per minute", + "generate_chart": "20 per minute", + "list_charts": "100 per minute", + } +} +``` + +**Option B: Nginx Rate Limiting** (infrastructure-level): + +```nginx +# /etc/nginx/conf.d/mcp-rate-limit.conf +limit_req_zone $binary_remote_addr zone=mcp_limit:10m rate=100r/m; + +server { + listen 443 ssl; + server_name mcp.yourcompany.com; + + location / { + limit_req zone=mcp_limit burst=20 nodelay; + proxy_pass http://mcp_backend; + } +} +``` + +**Option C: API Gateway** (cloud-native): + +AWS API Gateway, Azure API Management, or Google Cloud Endpoints provide built-in rate limiting. + +#### Error Handling and Monitoring + +**Structured Error Responses**: + +All MCP tools return consistent error schemas: + +```python +{ + "error": "Resource not found", + "error_type": "NotFoundError", + "timestamp": "2025-01-01T10:30:45.123Z", + "details": { + "resource_type": "dashboard", + "resource_id": 123 + } +} +``` + +**Error Tracking (Sentry)**: + +```python +# superset_config.py +import sentry_sdk +from sentry_sdk.integrations.flask import FlaskIntegration + +sentry_sdk.init( + dsn="https://your-dsn@sentry.io/project-id", + integrations=[FlaskIntegration()], + environment="production", + traces_sample_rate=0.1, # 10% of transactions +) +``` + +**Metrics Export (Prometheus)**: + +```python +# Future implementation - add to superset_config.py +from prometheus_flask_exporter import PrometheusMetrics + +MCP_PROMETHEUS_ENABLED = True +MCP_PROMETHEUS_PATH = "/metrics" +``` + +**Example Prometheus metrics**: +``` +# HELP mcp_requests_total Total MCP tool requests +# TYPE mcp_requests_total counter +mcp_requests_total{tool="list_charts",status="success"} 1234 + +# HELP mcp_request_duration_seconds MCP request duration +# TYPE mcp_request_duration_seconds histogram +mcp_request_duration_seconds_bucket{tool="list_charts",le="0.5"} 1000 +mcp_request_duration_seconds_bucket{tool="list_charts",le="1.0"} 1200 +``` + +#### Performance Optimization + +**Database Query Optimization**: + +```python +# superset_config.py + +# Connection pool sizing +SQLALCHEMY_POOL_SIZE = 10 # Connections per worker +SQLALCHEMY_MAX_OVERFLOW = 10 # Additional connections allowed +SQLALCHEMY_POOL_TIMEOUT = 30 # Seconds to wait for connection +SQLALCHEMY_POOL_RECYCLE = 3600 # Recycle connections after 1 hour + +# Query optimization +SQLALCHEMY_ECHO = False # Disable SQL logging in production +``` + +**Caching Strategy**: + +```python +# superset_config.py + +# Enable Superset's caching +CACHE_CONFIG = { + "CACHE_TYPE": "RedisCache", + "CACHE_REDIS_URL": "redis://localhost:6379/1", + "CACHE_DEFAULT_TIMEOUT": 300, # 5 minutes +} + +# Cache query results +DATA_CACHE_CONFIG = { + "CACHE_TYPE": "RedisCache", + "CACHE_REDIS_URL": "redis://localhost:6379/2", + "CACHE_DEFAULT_TIMEOUT": 3600, # 1 hour +} +``` + +**MCP-Specific Caching** (future enhancement): + +```python +# Cache tool responses in Redis +MCP_CACHE_CONFIG = { + "enabled": True, + "backend": "redis", + "url": "redis://localhost:6379/3", + "ttl": 300, # 5 minutes + "cache_tools": [ + "list_dashboards", + "list_charts", + "list_datasets", + "get_dataset_info", + ] +} +``` + +#### Load Testing + +**Run load tests before production deployment**: + +**Using Locust**: + +```python +# locustfile.py +from locust import HttpUser, task, between +import jwt +import time + +class MCPUser(HttpUser): + wait_time = between(1, 3) + + def on_start(self): + # Generate JWT token + self.token = generate_jwt_token() + + @task(3) + def list_charts(self): + self.client.post( + "/list_charts", + headers={"Authorization": f"Bearer {self.token}"}, + json={"request": {"page": 1, "page_size": 10}} + ) + + @task(2) + def get_chart_info(self): + self.client.post( + "/get_chart_info", + headers={"Authorization": f"Bearer {self.token}"}, + json={"request": {"identifier": 1}} + ) + + @task(1) + def generate_chart(self): + self.client.post( + "/generate_chart", + headers={"Authorization": f"Bearer {self.token}"}, + json={ + "request": { + "dataset_id": 1, + "config": { + "chart_type": "table", + "columns": [{"name": "col1"}] + } + } + } + ) +``` + +**Run load test**: +```bash +locust -f locustfile.py --host https://mcp.yourcompany.com + +# Test targets: +# - 100 concurrent users +# - < 2 second p95 response time +# - < 1% error rate +``` + +## Deployment Architecture + +### Production Deployment Overview + +```mermaid +graph TB + subgraph "External" + Clients[MCP Clients
Claude, Automation Tools] + AuthProvider[Auth Provider
Auth0, Okta, Cognito] + end + + subgraph "DMZ / Edge" + LB[Load Balancer
Nginx / ALB] + WAF[WAF
Optional] + end + + subgraph "Application Tier" + MCP1[MCP Instance 1] + MCP2[MCP Instance 2] + MCP3[MCP Instance 3] + Superset[Superset Web Server] + end + + subgraph "Data Tier" + DB[(PostgreSQL
Superset Metadata)] + Redis[(Redis
Cache)] + end + + subgraph "Monitoring" + Prometheus[Prometheus] + Grafana[Grafana] + Logs[Log Aggregator
ELK, Splunk] + end + + Clients --> |HTTPS| WAF + WAF --> LB + Clients --> |Get JWT| AuthProvider + LB --> MCP1 + LB --> MCP2 + LB --> MCP3 + MCP1 --> DB + MCP2 --> DB + MCP3 --> DB + MCP1 --> Redis + MCP2 --> Redis + MCP3 --> Redis + Superset --> DB + MCP1 -.->|Metrics| Prometheus + MCP2 -.->|Metrics| Prometheus + MCP3 -.->|Metrics| Prometheus + Prometheus --> Grafana + MCP1 -.->|Logs| Logs + MCP2 -.->|Logs| Logs + MCP3 -.->|Logs| Logs +``` + +## Deployment Guide + +### Installation Requirements + +**System Dependencies**: + +```bash +# Ubuntu/Debian +apt-get update +apt-get install -y \ + python3.11 \ + python3.11-dev \ + python3-pip \ + build-essential \ + libssl-dev \ + libffi-dev \ + libsasl2-dev \ + libldap2-dev + +# RHEL/CentOS +yum install -y \ + python311 \ + python311-devel \ + gcc \ + gcc-c++ \ + openssl-devel \ + libffi-devel +``` + +**Python Package Installation**: + +```bash +# Create virtual environment +python3.11 -m venv /opt/superset/venv +source /opt/superset/venv/bin/activate + +# Install Superset with MCP support +pip install apache-superset[fastmcp] + +# Or from requirements.txt +pip install -r requirements/production.txt +``` + +**Verify Installation**: + +```bash +superset version +superset mcp --help +``` + +### Configuration + +**Create Production Config**: + +```python +# /opt/superset/superset_config.py + +# Database connection +SQLALCHEMY_DATABASE_URI = "postgresql://user:pass@db-host:5432/superset" + +# Secret key (generate with: openssl rand -base64 42) +SECRET_KEY = "your-secret-key-here" + +# MCP Service Configuration +MCP_AUTH_ENABLED = True +MCP_JWT_ISSUER = "https://auth.yourcompany.com" +MCP_JWT_AUDIENCE = "superset-mcp" +MCP_JWKS_URI = "https://auth.yourcompany.com/.well-known/jwks.json" +MCP_DEV_USERNAME = None # Disable dev auth + +# Service binding +MCP_SERVICE_HOST = "0.0.0.0" # Listen on all interfaces +MCP_SERVICE_PORT = 5008 + +# Security settings +MCP_SESSION_CONFIG = { + "SESSION_COOKIE_HTTPONLY": True, + "SESSION_COOKIE_SECURE": True, # Requires HTTPS + "SESSION_COOKIE_SAMESITE": "Strict", + "PERMANENT_SESSION_LIFETIME": 3600, # 1 hour +} + +# Database connection pool +SQLALCHEMY_POOL_SIZE = 10 +SQLALCHEMY_MAX_OVERFLOW = 10 +SQLALCHEMY_POOL_TIMEOUT = 30 + +# Superset webserver address (for screenshot generation) +SUPERSET_WEBSERVER_ADDRESS = "https://superset.yourcompany.com" +WEBDRIVER_BASEURL = "https://superset.yourcompany.com/" + +# Enable HTTPS +ENABLE_PROXY_FIX = True +``` + +**Set Environment Variables**: + +```bash +# /opt/superset/.env +export SUPERSET_CONFIG_PATH=/opt/superset/superset_config.py +export FLASK_APP=superset +``` + +### Process Management + +#### Systemd (Recommended) + +**Service File**: + +```ini +# /etc/systemd/system/superset-mcp.service +[Unit] +Description=Superset MCP Service +After=network.target postgresql.service redis.service +Requires=postgresql.service + +[Service] +Type=simple +User=superset +Group=superset +WorkingDirectory=/opt/superset + +Environment="SUPERSET_CONFIG_PATH=/opt/superset/superset_config.py" +Environment="FLASK_APP=superset" + +ExecStart=/opt/superset/venv/bin/superset mcp run --port 5008 + +# Restart policy +Restart=always +RestartSec=10s + +# Resource limits +LimitNOFILE=65536 +MemoryLimit=2G + +# Logging +StandardOutput=journal +StandardError=journal +SyslogIdentifier=superset-mcp + +[Install] +WantedBy=multi-user.target +``` + +**Enable and Start Service**: + +```bash +# Reload systemd +systemctl daemon-reload + +# Enable service to start on boot +systemctl enable superset-mcp + +# Start service +systemctl start superset-mcp + +# Check status +systemctl status superset-mcp + +# View logs +journalctl -u superset-mcp -f +``` + +#### Supervisord + +**Configuration**: + +```ini +# /etc/supervisor/conf.d/superset-mcp.conf +[program:superset-mcp] +command=/opt/superset/venv/bin/superset mcp run --port 5008 +directory=/opt/superset +user=superset +autostart=true +autorestart=true +redirect_stderr=true +stdout_logfile=/var/log/superset/mcp.log +stdout_logfile_maxbytes=50MB +stdout_logfile_backups=10 +environment=SUPERSET_CONFIG_PATH="/opt/superset/superset_config.py",FLASK_APP="superset" +``` + +**Start Service**: + +```bash +supervisorctl reread +supervisorctl update +supervisorctl start superset-mcp +supervisorctl status superset-mcp +``` + +#### Docker + +**Dockerfile**: + +```dockerfile +FROM apache/superset:latest + +# Install MCP dependencies +RUN pip install apache-superset[fastmcp] + +# Copy production config +COPY superset_config.py /app/pythonpath/ + +# Expose MCP port +EXPOSE 5008 + +# Run MCP service +CMD ["superset", "mcp", "run", "--port", "5008"] +``` + +**Build and Run**: + +```bash +# Build image +docker build -t superset-mcp:latest . + +# Run container +docker run -d \ + --name superset-mcp \ + -p 5008:5008 \ + -v /opt/superset/superset_config.py:/app/pythonpath/superset_config.py:ro \ + -e SUPERSET_CONFIG_PATH=/app/pythonpath/superset_config.py \ + superset-mcp:latest + +# View logs +docker logs -f superset-mcp +``` + +#### Kubernetes + +**Deployment**: + +```yaml +# superset-mcp-deployment.yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: superset-mcp + labels: + app: superset-mcp +spec: + replicas: 3 + selector: + matchLabels: + app: superset-mcp + template: + metadata: + labels: + app: superset-mcp + spec: + containers: + - name: mcp + image: apache/superset:latest + command: ["superset", "mcp", "run", "--port", "5008"] + ports: + - containerPort: 5008 + name: mcp + env: + - name: SUPERSET_CONFIG_PATH + value: /app/pythonpath/superset_config.py + - name: FLASK_APP + value: superset + volumeMounts: + - name: config + mountPath: /app/pythonpath + readOnly: true + resources: + requests: + memory: "512Mi" + cpu: "500m" + limits: + memory: "2Gi" + cpu: "2000m" + livenessProbe: + httpGet: + path: /health + port: 5008 + initialDelaySeconds: 30 + periodSeconds: 10 + readinessProbe: + httpGet: + path: /health + port: 5008 + initialDelaySeconds: 10 + periodSeconds: 5 + volumes: + - name: config + configMap: + name: superset-config +--- +apiVersion: v1 +kind: Service +metadata: + name: superset-mcp +spec: + selector: + app: superset-mcp + ports: + - port: 5008 + targetPort: 5008 + name: mcp + type: ClusterIP +--- +apiVersion: autoscaling/v2 +kind: HorizontalPodAutoscaler +metadata: + name: superset-mcp-hpa +spec: + scaleTargetRef: + apiVersion: apps/v1 + kind: Deployment + name: superset-mcp + minReplicas: 2 + maxReplicas: 10 + metrics: + - type: Resource + resource: + name: cpu + target: + type: Utilization + averageUtilization: 70 + - type: Resource + resource: + name: memory + target: + type: Utilization + averageUtilization: 80 +``` + +**Deploy to Kubernetes**: + +```bash +# Create ConfigMap from superset_config.py +kubectl create configmap superset-config \ + --from-file=superset_config.py=/opt/superset/superset_config.py + +# Apply deployment +kubectl apply -f superset-mcp-deployment.yaml + +# Check status +kubectl get pods -l app=superset-mcp +kubectl logs -l app=superset-mcp -f +``` + +### Reverse Proxy Configuration + +#### Nginx + +```nginx +# /etc/nginx/sites-available/mcp.yourcompany.com +upstream mcp_backend { + # Health checks + server mcp-1:5008 max_fails=3 fail_timeout=30s; + server mcp-2:5008 max_fails=3 fail_timeout=30s; + server mcp-3:5008 max_fails=3 fail_timeout=30s; +} + +# Rate limiting +limit_req_zone $binary_remote_addr zone=mcp_limit:10m rate=100r/m; + +server { + listen 80; + server_name mcp.yourcompany.com; + + # Redirect HTTP to HTTPS + return 301 https://$server_name$request_uri; +} + +server { + listen 443 ssl http2; + server_name mcp.yourcompany.com; + + # SSL configuration + ssl_certificate /etc/ssl/certs/mcp.yourcompany.com.crt; + ssl_certificate_key /etc/ssl/private/mcp.yourcompany.com.key; + ssl_protocols TLSv1.2 TLSv1.3; + ssl_ciphers HIGH:!aNULL:!MD5; + ssl_prefer_server_ciphers on; + + # Security headers + add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always; + add_header X-Frame-Options "DENY" always; + add_header X-Content-Type-Options "nosniff" always; + add_header X-XSS-Protection "1; mode=block" always; + + # Logging + access_log /var/log/nginx/mcp-access.log combined; + error_log /var/log/nginx/mcp-error.log warn; + + location / { + # Rate limiting + limit_req zone=mcp_limit burst=20 nodelay; + + # Proxy configuration + proxy_pass http://mcp_backend; + proxy_http_version 1.1; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + + # Timeouts + proxy_connect_timeout 60s; + proxy_send_timeout 60s; + proxy_read_timeout 60s; + + # Buffering + proxy_buffering off; + } + + # Health check endpoint (bypass rate limiting) + location /health { + proxy_pass http://mcp_backend; + access_log off; + } +} +``` + +**Enable Site**: + +```bash +ln -s /etc/nginx/sites-available/mcp.yourcompany.com /etc/nginx/sites-enabled/ +nginx -t +systemctl reload nginx +``` + +#### Apache + +```apache +# /etc/apache2/sites-available/mcp.yourcompany.com.conf + + ServerName mcp.yourcompany.com + Redirect permanent / https://mcp.yourcompany.com/ + + + + ServerName mcp.yourcompany.com + + SSLEngine on + SSLCertificateFile /etc/ssl/certs/mcp.yourcompany.com.crt + SSLCertificateKeyFile /etc/ssl/private/mcp.yourcompany.com.key + SSLProtocol all -SSLv3 -TLSv1 -TLSv1.1 + SSLCipherSuite HIGH:!aNULL:!MD5 + + # Security headers + Header always set Strict-Transport-Security "max-age=31536000; includeSubDomains" + Header always set X-Frame-Options "DENY" + Header always set X-Content-Type-Options "nosniff" + + # Proxy configuration + ProxyPreserveHost On + ProxyPass / http://localhost:5008/ + ProxyPassReverse / http://localhost:5008/ + + # Timeouts + ProxyTimeout 60 + + ErrorLog ${APACHE_LOG_DIR}/mcp-error.log + CustomLog ${APACHE_LOG_DIR}/mcp-access.log combined + +``` + +**Enable Site**: + +```bash +a2enmod ssl proxy proxy_http headers +a2ensite mcp.yourcompany.com +apachectl configtest +systemctl reload apache2 +``` + +### Monitoring and Alerting + +#### Health Checks + +**Manual Health Check**: + +```bash +curl https://mcp.yourcompany.com/health +``` + +**Expected Response**: +```json +{ + "status": "healthy", + "timestamp": "2025-01-01T10:30:45.123Z", + "version": "1.0.0" +} +``` + +#### Prometheus Monitoring + +**Prometheus Config**: + +```yaml +# /etc/prometheus/prometheus.yml +scrape_configs: + - job_name: 'superset-mcp' + scrape_interval: 15s + static_configs: + - targets: ['mcp-1:5008', 'mcp-2:5008', 'mcp-3:5008'] + metrics_path: /metrics +``` + +**Grafana Dashboard**: + +Create dashboard with panels for: +- Request rate per tool +- Request latency (p50, p95, p99) +- Error rate +- Active connections +- Memory/CPU usage + +#### Alerting Rules + +**Prometheus Alerting**: + +```yaml +# /etc/prometheus/rules/mcp-alerts.yml +groups: + - name: mcp-service + interval: 30s + rules: + - alert: MCPHighErrorRate + expr: rate(mcp_requests_total{status="error"}[5m]) > 0.05 + for: 5m + labels: + severity: warning + annotations: + summary: "High error rate on MCP service" + description: "Error rate is {{ $value }} req/sec" + + - alert: MCPHighLatency + expr: histogram_quantile(0.95, mcp_request_duration_seconds) > 2 + for: 10m + labels: + severity: warning + annotations: + summary: "High latency on MCP service" + description: "P95 latency is {{ $value }} seconds" + + - alert: MCPServiceDown + expr: up{job="superset-mcp"} == 0 + for: 1m + labels: + severity: critical + annotations: + summary: "MCP service is down" + description: "Instance {{ $labels.instance }} is unreachable" +``` + +#### CloudWatch Monitoring (AWS) + +**CloudWatch Agent Config**: + +```json +{ + "logs": { + "logs_collected": { + "files": { + "collect_list": [ + { + "file_path": "/var/log/superset/mcp.log", + "log_group_name": "/aws/superset/mcp", + "log_stream_name": "{instance_id}", + "timezone": "UTC" + } + ] + } + } + }, + "metrics": { + "namespace": "SupersetMCP", + "metrics_collected": { + "cpu": { + "measurement": [ + {"name": "cpu_usage_idle", "rename": "CPU_IDLE", "unit": "Percent"} + ], + "totalcpu": false + }, + "mem": { + "measurement": [ + {"name": "mem_used_percent", "rename": "MEM_USED", "unit": "Percent"} + ] + } + } + } +} +``` + +**CloudWatch Alarms**: + +```bash +# Create alarm for high error rate +aws cloudwatch put-metric-alarm \ + --alarm-name mcp-high-error-rate \ + --alarm-description "MCP error rate > 5%" \ + --metric-name ErrorRate \ + --namespace SupersetMCP \ + --statistic Average \ + --period 300 \ + --threshold 5 \ + --comparison-operator GreaterThanThreshold \ + --evaluation-periods 2 +``` + +## Migration Path (Development → Production) + +### Pre-Deployment Checklist + +**Configuration**: +- [ ] `MCP_AUTH_ENABLED = True` +- [ ] JWT issuer, audience, and keys configured +- [ ] `MCP_DEV_USERNAME` set to `None` +- [ ] `SESSION_COOKIE_SECURE = True` +- [ ] HTTPS enabled on load balancer/reverse proxy +- [ ] Database connection pool sized appropriately +- [ ] Superset webserver address updated for production URL + +**Security**: +- [ ] TLS 1.2+ enforced +- [ ] Security headers configured (HSTS, X-Frame-Options, etc.) +- [ ] Firewall rules restrict access to MCP service +- [ ] Service account credentials rotated +- [ ] Secrets stored in secure vault (not in code) + +**Monitoring**: +- [ ] Health check endpoint accessible +- [ ] Metrics exported to monitoring system +- [ ] Alerts configured for critical conditions +- [ ] Log aggregation configured +- [ ] Dashboards created for key metrics + +**Performance**: +- [ ] Load testing completed successfully +- [ ] Database queries optimized +- [ ] Caching configured (if needed) +- [ ] Rate limiting enabled +- [ ] Connection pools tuned + +**Operations**: +- [ ] Process manager configured (systemd/supervisord/k8s) +- [ ] Auto-restart on failure enabled +- [ ] Log rotation configured +- [ ] Backup and disaster recovery plan documented +- [ ] Runbook for common issues created + +### Testing Production Setup + +**1. Verify Authentication**: + +```bash +# Test with valid JWT +curl -H "Authorization: Bearer VALID_TOKEN" \ + https://mcp.yourcompany.com/health + +# Test with invalid JWT +curl -H "Authorization: Bearer INVALID_TOKEN" \ + https://mcp.yourcompany.com/health +# Expected: 401 Unauthorized +``` + +**2. Verify Authorization**: + +```bash +# Test with limited permissions user +curl -H "Authorization: Bearer LIMITED_USER_TOKEN" \ + https://mcp.yourcompany.com/list_charts +# Expected: Only returns charts user can access + +# Test permission denial +curl -H "Authorization: Bearer LIMITED_USER_TOKEN" \ + https://mcp.yourcompany.com/generate_dashboard +# Expected: 403 Forbidden if user lacks permission +``` + +**3. Verify HTTPS**: + +```bash +# Should redirect to HTTPS +curl -I http://mcp.yourcompany.com + +# Should work with HTTPS +curl -I https://mcp.yourcompany.com +``` + +**4. Verify Rate Limiting**: + +```bash +# Send many requests rapidly +for i in {1..150}; do + curl -H "Authorization: Bearer TOKEN" \ + https://mcp.yourcompany.com/health & +done +wait +# Expected: Some requests return 429 Too Many Requests +``` + +**5. Monitor Logs**: + +```bash +# Systemd +journalctl -u superset-mcp -f + +# Docker +docker logs -f superset-mcp + +# Kubernetes +kubectl logs -l app=superset-mcp -f +``` + +### Rollback Plan + +**If issues occur after deployment**: + +1. **Immediate Rollback**: + ```bash + # Systemd + systemctl stop superset-mcp + # Restore previous configuration + cp /opt/superset/superset_config.py.backup /opt/superset/superset_config.py + systemctl start superset-mcp + + # Kubernetes + kubectl rollout undo deployment/superset-mcp + ``` + +2. **Partial Rollback** (rollback auth only): + ```python + # Temporarily re-enable dev auth + MCP_AUTH_ENABLED = False + MCP_DEV_USERNAME = "admin" + ``` + +3. **Investigate and Fix**: + - Review logs for errors + - Check JWT configuration + - Verify network connectivity + - Test database connection + - Validate Superset configuration + +## Troubleshooting + +### Common Issues + +**Issue**: "Invalid token" errors + +**Diagnosis**: +```bash +# Decode JWT to inspect claims +echo "YOUR_JWT_TOKEN" | cut -d'.' -f2 | base64 -d | jq . + +# Check issuer, audience, expiration +``` + +**Solution**: +- Verify `MCP_JWT_ISSUER` matches token's `iss` claim +- Verify `MCP_JWT_AUDIENCE` matches token's `aud` claim +- Check token hasn't expired (`exp` claim) +- Ensure JWKS URI is accessible from MCP server + +--- + +**Issue**: "User not found" errors + +**Diagnosis**: +```bash +# Check if user exists in Superset +superset fab list-users | grep username +``` + +**Solution**: +- Create user in Superset: `superset fab create-user` +- Ensure JWT `sub` claim matches Superset username +- Or configure user auto-provisioning (future feature) + +--- + +**Issue**: High latency + +**Diagnosis**: +```bash +# Check database connection pool +# Look for "QueuePool limit" errors in logs +journalctl -u superset-mcp | grep -i pool + +# Check database performance +# Monitor slow queries in database logs +``` + +**Solution**: +- Increase `SQLALCHEMY_POOL_SIZE` +- Add database indexes on frequently queried columns +- Enable query result caching +- Optimize dataset queries + +--- + +**Issue**: Service crashes on startup + +**Diagnosis**: +```bash +# Check logs +journalctl -u superset-mcp -n 100 + +# Common causes: +# - Missing configuration +# - Database connection failure +# - Port already in use +``` + +**Solution**: +- Verify all required config keys present +- Test database connection: `superset db upgrade` +- Check port availability: `netstat -tuln | grep 5008` + +--- + +**Issue**: Permission denied errors + +**Diagnosis**: +```bash +# Check user's roles +superset fab list-users | grep -A 5 username + +# Check role permissions in Superset UI +# Security → List Roles → [Role Name] → Permissions +``` + +**Solution**: +- Grant required permissions to user's role +- Verify RLS rules not too restrictive +- Check dataset permissions + +## Performance Tuning + +### Database Connection Pool + +**Optimal Settings** (4 workers): + +```python +SQLALCHEMY_POOL_SIZE = 5 # 5 connections per worker +SQLALCHEMY_MAX_OVERFLOW = 5 # 5 extra connections when busy +# Total: 4 workers × (5 + 5) = 40 max connections +``` + +**Monitoring**: +```sql +-- PostgreSQL: Check active connections +SELECT COUNT(*) FROM pg_stat_activity WHERE state = 'active'; + +-- PostgreSQL: Check connection limit +SHOW max_connections; +``` + +### Caching + +**Enable Redis Caching**: + +```python +# superset_config.py +CACHE_CONFIG = { + "CACHE_TYPE": "RedisCache", + "CACHE_REDIS_URL": "redis://localhost:6379/1", + "CACHE_DEFAULT_TIMEOUT": 300, +} + +DATA_CACHE_CONFIG = { + "CACHE_TYPE": "RedisCache", + "CACHE_REDIS_URL": "redis://localhost:6379/2", + "CACHE_DEFAULT_TIMEOUT": 3600, +} +``` + +**Cache Hit Rate Monitoring**: +```bash +# Redis: Monitor cache performance +redis-cli INFO stats | grep -E 'keyspace_hits|keyspace_misses' +``` + +### Request Timeouts + +```python +# superset_config.py + +# SQLLab query timeout +SQLLAB_TIMEOUT = 300 # 5 minutes + +# SQL query timeout +SQLLAB_ASYNC_TIME_LIMIT_SEC = 300 + +# Superset webserver request timeout +SUPERSET_WEBSERVER_TIMEOUT = 60 +``` + +## References + +- **Superset Configuration**: https://superset.apache.org/docs/configuration/configuring-superset +- **Superset Installation**: https://superset.apache.org/docs/installation/installing-superset-from-scratch +- **FastMCP Documentation**: https://github.com/jlowin/fastmcp +- **JWT Best Practices**: https://tools.ietf.org/html/rfc8725 +- **Prometheus Monitoring**: https://prometheus.io/docs/ +- **Nginx Configuration**: https://nginx.org/en/docs/ +- **Kubernetes Deployment**: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/ diff --git a/superset/mcp_service/SECURITY.md b/superset/mcp_service/SECURITY.md new file mode 100644 index 00000000000..e17431d3070 --- /dev/null +++ b/superset/mcp_service/SECURITY.md @@ -0,0 +1,803 @@ + + +# MCP Service Security + +## Overview + +The MCP service implements multiple layers of security to ensure safe programmatic access to Superset functionality. This document covers authentication, authorization, session management, audit logging, and compliance considerations. + +## Authentication + +### Current Implementation (Development) + +For development and testing, the MCP service uses a simple username-based authentication: + +```python +# superset_config.py +MCP_DEV_USERNAME = "admin" +``` + +**How it works**: +1. The `@mcp_auth_hook` decorator calls `get_user_from_request()` +2. `get_user_from_request()` reads `MCP_DEV_USERNAME` from config +3. User is queried from database and set as `g.user` +4. All subsequent Superset operations use this user's permissions + +**Development Use Only**: +- No token validation +- No multi-user support +- No authentication security +- Single user for all MCP requests +- NOT suitable for production + +### Production Implementation (JWT) + +For production deployments, the MCP service supports JWT (JSON Web Token) authentication: + +```python +# superset_config.py +MCP_AUTH_ENABLED = True +MCP_JWT_ISSUER = "https://your-auth-provider.com" +MCP_JWT_AUDIENCE = "superset-mcp" +MCP_JWT_ALGORITHM = "RS256" # or "HS256" for symmetric keys + +# Option 1: Use JWKS endpoint (recommended for RS256) +MCP_JWKS_URI = "https://your-auth-provider.com/.well-known/jwks.json" + +# Option 2: Use static public key (RS256) +MCP_JWT_PUBLIC_KEY = """-----BEGIN PUBLIC KEY----- +MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA... +-----END PUBLIC KEY-----""" + +# Option 3: Use shared secret (HS256 - less secure) +MCP_JWT_ALGORITHM = "HS256" +MCP_JWT_SECRET = "your-shared-secret-key" +``` + +**JWT Token Structure**: + +```json +{ + "iss": "https://your-auth-provider.com", + "sub": "user@company.com", + "aud": "superset-mcp", + "exp": 1735689600, + "iat": 1735686000, + "email": "user@company.com", + "scopes": ["superset:read", "superset:chart:create"] +} +``` + +**Required Claims**: +- `iss` (issuer): Must match `MCP_JWT_ISSUER` +- `sub` (subject): User identifier (username/email) +- `aud` (audience): Must match `MCP_JWT_AUDIENCE` +- `exp` (expiration): Token expiration timestamp +- `iat` (issued at): Token creation timestamp + +**Optional Claims**: +- `email`: User's email address +- `username`: Alternative to `sub` for user identification +- `scopes`: Array of permission scopes +- `tenant_id`: Multi-tenant identifier (future use) + +**Token Validation Process**: + +1. Extract Bearer token from `Authorization` header +2. Verify token signature using public key or JWKS +3. Validate `iss`, `aud`, and `exp` claims +4. Check required scopes (if configured) +5. Extract user identifier from `sub`, `email`, or `username` claim +6. Look up Superset user from database +7. Set `g.user` for request context + +**Example Client Usage**: + +```bash +# Using curl +curl -H "Authorization: Bearer YOUR_JWT_TOKEN" \ + http://localhost:5008/list_charts + +# Using MCP client (Claude Desktop) +{ + "mcpServers": { + "superset": { + "url": "http://localhost:5008", + "headers": { + "Authorization": "Bearer YOUR_JWT_TOKEN" + } + } + } +} +``` + +### Token Renewal and Refresh + +**Short-lived Access Tokens** (recommended): +- Issue tokens with short expiration (e.g., 15 minutes) +- Client must refresh token before expiration +- Reduces risk of token theft + +**Refresh Token Pattern**: + +```mermaid +sequenceDiagram + participant Client + participant AuthProvider as Auth Provider + participant MCP as MCP Service + + Client->>AuthProvider: Request access + AuthProvider-->>Client: access_token (15 min)
refresh_token (30 days) + + Client->>MCP: Request with access_token + MCP-->>Client: Response + + Note over Client,MCP: Access token expires + + Client->>AuthProvider: Request new token with refresh_token + AuthProvider-->>Client: New access_token (15 min) + + Client->>MCP: Request with new access_token + MCP-->>Client: Response + + Note over Client,AuthProvider: Refresh token expires + + Client->>AuthProvider: User must re-authenticate +``` + +**MCP Service Responsibility**: +- MCP service only validates access tokens +- Refresh token handling is the client's responsibility +- Auth provider (OAuth2/OIDC server) handles token refresh + +### Service Account Patterns + +For automation and batch jobs, use service accounts instead of user credentials: + +```json +{ + "iss": "https://your-auth-provider.com", + "sub": "service-account@automation.company.com", + "aud": "superset-mcp", + "exp": 1735689600, + "client_id": "superset-automation", + "scopes": ["superset:read", "superset:chart:create"] +} +``` + +**Service Account Best Practices**: +- Create dedicated Superset users for service accounts +- Grant minimal required permissions +- Use long-lived tokens only when necessary +- Rotate service account credentials regularly +- Log all service account activity +- Use separate service accounts per automation job + +**Example Superset Service Account Setup**: + +```bash +# Create service account user in Superset +superset fab create-user \ + --role Alpha \ + --username automation-service \ + --firstname Automation \ + --lastname Service \ + --email automation@company.com \ + --password + +# Grant specific permissions +# (Use Superset UI or FAB CLI to configure role permissions) +``` + +## Authorization + +### RBAC Integration + +The MCP service fully integrates with Superset's Flask-AppBuilder role-based access control: + +**Role Hierarchy**: +- **Admin**: Full access to all resources +- **Alpha**: Can create and edit dashboards, charts, datasets +- **Gamma**: Read-only access to permitted resources +- **Custom Roles**: Fine-grained permission sets + +**Permission Checking Flow**: + +```python +# In MCP tool +@mcp.tool +@mcp_auth_hook # Sets g.user +def list_dashboards(filters: List[Filter]) -> DashboardList: + # Flask-AppBuilder security manager automatically filters + # based on g.user's permissions + dashboards = DashboardDAO.find_by_ids(...) + # Only returns dashboards g.user can access +``` + +**Permission Types**: + +| Permission | Description | Example | +|------------|-------------|---------| +| `can_read` | View resource | View dashboard details | +| `can_write` | Edit resource | Update chart configuration | +| `can_delete` | Delete resource | Remove dashboard | +| `datasource_access` | Access dataset | Query dataset in chart | +| `database_access` | Access database | Execute SQL in SQL Lab | + +### Row-Level Security (RLS) + +RLS rules filter query results based on user attributes: + +**RLS Rule Example**: +```sql +-- Only show records for user's department +department = '{{ current_user().department }}' +``` + +**How RLS Works with MCP**: + +```mermaid +sequenceDiagram + participant Client + participant Auth as @mcp_auth_hook + participant Tool as MCP Tool + participant DAO as Superset DAO + participant DB as Database + + Client->>Auth: Request with JWT/dev username + Auth->>Auth: Set g.user + Auth->>Tool: Execute tool + Tool->>DAO: Call ChartDAO.get_chart_data() + DAO->>DAO: Apply RLS rules
Replace template variables
with g.user attributes + DAO->>DB: Query with RLS filters in WHERE clause + DB-->>DAO: Only permitted rows + DAO-->>Tool: Filtered data + Tool-->>Client: Response +``` + +**RLS Configuration**: + +RLS is configured per dataset in Superset UI: +1. Navigate to dataset → Edit → Row Level Security +2. Create RLS rule with SQL filter template +3. Assign rule to roles or users +4. MCP service automatically applies rules (no code changes needed) + +**MCP Service Guarantees**: +- Cannot bypass RLS rules +- No privileged access mode +- RLS applied consistently across all tools +- Same security model as Superset web UI + +### Dataset Access Control + +The MCP service validates dataset access before executing queries: + +```python +# In chart generation tool +@mcp.tool +@mcp_auth_hook +def generate_chart(dataset_id: int, ...) -> ChartResponse: + dataset = DatasetDAO.find_by_id(dataset_id) + + # Check if user has access + if not has_dataset_access(dataset): + raise ValueError( + f"User {g.user.username} does not have access to dataset {dataset_id}" + ) + + # Proceed with chart creation + ... +``` + +**Dataset Access Filters**: + +All listing operations automatically filter by user access: + +- `list_datasets`: Uses `DatasourceFilter` - only shows datasets user can query +- `list_charts`: Uses `ChartAccessFilter` - only shows charts with accessible datasets +- `list_dashboards`: Uses `DashboardAccessFilter` - only shows dashboards user can view + +**Access Check Implementation**: + +```python +from superset import security_manager + +def has_dataset_access(dataset: SqlaTable) -> bool: + """Check if g.user can access dataset.""" + if hasattr(g, "user") and g.user: + return security_manager.can_access_datasource(datasource=dataset) + return False +``` + +### Tool-Level Permissions + +Different MCP tools require different Superset permissions: + +| Tool | Required Permissions | Notes | +|------|---------------------|-------| +| `list_dashboards` | `can_read` on Dashboard | Returns only accessible dashboards | +| `get_dashboard_info` | `can_read` on Dashboard + dataset access | Validates dashboard and dataset permissions | +| `list_charts` | `can_read` on Slice | Returns only charts with accessible datasets | +| `get_chart_info` | `can_read` on Slice + dataset access | Validates chart and dataset permissions | +| `get_chart_data` | `can_read` on Slice + `datasource_access` | Executes query with RLS applied | +| `generate_chart` | `can_write` on Slice + `datasource_access` | Creates new chart | +| `update_chart` | `can_write` on Slice + ownership or Admin | Must own chart or be Admin | +| `list_datasets` | `datasource_access` | Returns only accessible datasets | +| `get_dataset_info` | `datasource_access` | Validates dataset access | +| `execute_sql` | `can_sql_json` or `can_sqllab` on Database | Executes SQL with RLS | +| `generate_dashboard` | `can_write` on Dashboard + dataset access | Creates new dashboard | + +**Permission Denied Handling**: + +```python +# If user lacks permission, Superset raises exception +try: + result = DashboardDAO.find_by_id(dashboard_id) +except SupersetSecurityException as e: + raise ValueError(f"Access denied: {e}") +``` + +### JWT Scope Validation + +Future implementation will support scope-based authorization: + +```python +# superset_config.py +MCP_REQUIRED_SCOPES = ["superset:read"] # Minimum scopes required +``` + +**Scope Hierarchy**: +- `superset:read`: List and view resources +- `superset:chart:create`: Create new charts +- `superset:chart:update`: Update existing charts +- `superset:chart:delete`: Delete charts +- `superset:dashboard:create`: Create dashboards +- `superset:sql:execute`: Execute SQL queries +- `superset:admin`: Full administrative access + +**Scope Enforcement** (future): + +```python +@mcp.tool +@mcp_auth_hook +@require_scopes(["superset:chart:create"]) +def generate_chart(...) -> ChartResponse: + # Only proceeds if JWT contains required scope + ... +``` + +**Scope Validation Logic**: +1. Extract `scopes` array from JWT payload +2. Check if all required scopes present +3. Deny access if any scope missing +4. Log denied attempts for audit + +## Session and CSRF Handling + +### Session Configuration + +The MCP service configures sessions for authentication context: + +```python +# superset_config.py +MCP_SESSION_CONFIG = { + "SESSION_COOKIE_HTTPONLY": True, # Prevent JavaScript access + "SESSION_COOKIE_SECURE": True, # HTTPS only (production) + "SESSION_COOKIE_SAMESITE": "Strict", # CSRF protection + "SESSION_COOKIE_NAME": "superset_session", + "PERMANENT_SESSION_LIFETIME": 86400, # 24 hours +} +``` + +**Why Session Config in MCP?** + +The MCP service uses Flask's session mechanism for: +- **Authentication context**: Storing `g.user` across request lifecycle +- **CSRF token generation**: Protecting state-changing operations +- **Request correlation**: Linking related tool calls + +**Important Notes**: +- MCP service is **stateless** - no server-side session storage +- Sessions used only for request-scoped auth context +- Cookies used for auth token transmission (alternative to Bearer header) +- Session data NOT persisted between MCP service restarts + +### CSRF Protection + +CSRF (Cross-Site Request Forgery) protection is configured but currently **not enforced** for MCP tools: + +```python +MCP_CSRF_CONFIG = { + "WTF_CSRF_ENABLED": True, + "WTF_CSRF_TIME_LIMIT": None, # No time limit +} +``` + +**Why CSRF Config Exists**: +- Flask-AppBuilder and Superset expect CSRF configuration +- Prevents errors during app initialization +- Future-proofing for potential web UI for MCP service + +**Why CSRF NOT Enforced**: +- MCP protocol uses Bearer tokens (not cookies for auth) +- CSRF attacks require browser cookie-based authentication +- Stateless API design prevents CSRF vulnerability +- MCP clients are programmatic (not browsers) + +**If Using Cookie-Based Auth** (future): +- Enable CSRF token requirement +- Include CSRF token in MCP tool requests +- Validate token on state-changing operations + +**CSRF Token Flow** (if enabled): + +```mermaid +sequenceDiagram + participant Client + participant MCP as MCP Service + participant Session as Session Store + + Client->>MCP: Request CSRF token + MCP->>Session: Generate and store token + MCP-->>Client: Return CSRF token + + Client->>MCP: Request with CSRF token + MCP->>Session: Validate token matches session + alt Token valid + MCP-->>Client: Process request + else Token invalid/missing + MCP-->>Client: Reject request (403) + end +``` + +### Production Security Recommendations + +**HTTPS Required**: +```python +MCP_SESSION_CONFIG = { + "SESSION_COOKIE_SECURE": True, # MUST be True in production +} +``` + +Without HTTPS: +- Cookies transmitted in plaintext +- Session hijacking risk +- JWT tokens exposed +- Man-in-the-middle attacks possible + +**SameSite Configuration**: +- `Strict`: Cookies never sent cross-site (most secure) +- `Lax`: Cookies sent on top-level navigation (less secure) +- `None`: Cookies sent everywhere (requires Secure flag, least secure) + +**Recommended Production Settings**: +```python +MCP_SESSION_CONFIG = { + "SESSION_COOKIE_HTTPONLY": True, # Always + "SESSION_COOKIE_SECURE": True, # Always (HTTPS required) + "SESSION_COOKIE_SAMESITE": "Strict", # Recommended + "PERMANENT_SESSION_LIFETIME": 3600, # 1 hour (adjust as needed) +} +``` + +## Audit Logging + +### Current Logging + +The MCP service logs basic authentication events: + +```python +# In @mcp_auth_hook +logger.debug( + "MCP tool call: user=%s, tool=%s", + user.username, + tool_func.__name__ +) +``` + +**What's Logged**: +- User who made the request +- Which tool was called +- Timestamp (from log formatter) +- Success/failure (via exception logging) + +**Log Format**: +``` +2025-01-01 10:30:45,123 DEBUG [mcp_auth_hook] MCP tool call: user=admin, tool=list_dashboards +2025-01-01 10:30:45,456 ERROR [mcp_auth_hook] Tool execution failed: user=admin, tool=generate_chart, error=Permission denied +``` + +### Enhanced Audit Logging (Recommended) + +For production deployments, implement structured logging: + +```python +# superset_config.py +import logging +import json + +class StructuredFormatter(logging.Formatter): + def format(self, record): + log_data = { + "timestamp": self.formatTime(record), + "level": record.levelname, + "logger": record.name, + "message": record.getMessage(), + "user": getattr(record, "user", None), + "tool": getattr(record, "tool", None), + "resource_type": getattr(record, "resource_type", None), + "resource_id": getattr(record, "resource_id", None), + "action": getattr(record, "action", None), + "result": getattr(record, "result", None), + "error": getattr(record, "error", None), + } + return json.dumps(log_data) + +# Apply formatter +handler = logging.StreamHandler() +handler.setFormatter(StructuredFormatter()) +logging.getLogger("superset.mcp_service").addHandler(handler) +``` + +**Structured Log Example**: +```json +{ + "timestamp": "2025-01-01T10:30:45.123Z", + "level": "INFO", + "logger": "superset.mcp_service.auth", + "message": "MCP tool execution", + "user": "admin", + "tool": "generate_chart", + "resource_type": "chart", + "resource_id": 42, + "action": "create", + "result": "success", + "duration_ms": 234 +} +``` + +### Audit Events + +**Key Events to Log**: + +| Event | Data to Capture | Severity | +|-------|----------------|----------| +| Authentication success | User, timestamp, IP | INFO | +| Authentication failure | Username attempted, reason | WARNING | +| Tool execution | User, tool, parameters, result | INFO | +| Permission denied | User, tool, resource, reason | WARNING | +| Chart created | User, chart_id, dataset_id | INFO | +| Dashboard created | User, dashboard_id, chart_ids | INFO | +| SQL executed | User, database, query (sanitized), rows | INFO | +| Error occurred | User, tool, error type, stack trace | ERROR | + +### Integration with SIEM Systems + +**Export to External Systems**: + +**Option 1: Syslog**: +```python +import logging.handlers + +syslog_handler = logging.handlers.SysLogHandler( + address=("syslog.company.com", 514) +) +logging.getLogger("superset.mcp_service").addHandler(syslog_handler) +``` + +**Option 2: Log Aggregation (ELK, Splunk)**: +```python +# Send JSON logs to stdout, collected by log shipper +import sys +import logging + +handler = logging.StreamHandler(sys.stdout) +handler.setFormatter(StructuredFormatter()) +``` + +**Option 3: Cloud Logging (CloudWatch, Stackdriver)**: +```python +# AWS CloudWatch example +import watchtower + +handler = watchtower.CloudWatchLogHandler( + log_group="/superset/mcp", + stream_name="mcp-service" +) +logging.getLogger("superset.mcp_service").addHandler(handler) +``` + +### Log Retention + +**Recommended Retention Policies**: +- **Authentication logs**: 90 days minimum +- **Tool execution logs**: 30 days minimum +- **Error logs**: 180 days minimum +- **Compliance logs**: Per regulatory requirements (e.g., 7 years for HIPAA) + +## Compliance Considerations + +### GDPR (General Data Protection Regulation) + +**User Data Access Tracking**: +- Log all data access by user +- Provide audit trail for data subject access requests (DSAR) +- Implement data retention policies +- Support right to be forgotten (delete user data from logs) + +**MCP Service Compliance**: +- All tool calls logged with user identification +- Can generate reports of user's data access +- Logs can be filtered/redacted for privacy +- No personal data stored in MCP service (only in Superset DB) + +### SOC 2 (Service Organization Control 2) + +**Audit Trail Requirements**: +- Log all administrative actions +- Maintain immutable audit logs +- Implement log integrity verification +- Provide audit log export functionality + +**MCP Service Compliance**: +- Structured logging provides audit trail +- Logs include who, what, when for all actions +- Export logs to secure, immutable storage (S3, etc.) +- Implement log signing for integrity verification + +### HIPAA (Health Insurance Portability and Accountability Act) + +**PHI Access Logging**: +- Log all access to protected health information +- Include user, timestamp, data accessed +- Maintain logs for 6 years minimum +- Implement access controls on audit logs + +**MCP Service Compliance**: +- All dataset queries logged +- Row-level security enforces data access controls +- Can identify which users accessed which PHI records +- Logs exportable for compliance reporting + +**Example HIPAA Audit Log Entry**: +```json +{ + "timestamp": "2025-01-01T10:30:45.123Z", + "user": "doctor@hospital.com", + "action": "query_dataset", + "dataset_id": 123, + "dataset_name": "patient_records", + "rows_returned": 5, + "phi_accessed": true, + "purpose": "Treatment", + "ip_address": "10.0.1.25" +} +``` + +### Access Control Matrix + +For compliance audits, maintain a matrix of who can access what: + +| Role | Dashboards | Charts | Datasets | SQL Lab | Admin | +|------|-----------|--------|----------|---------|-------| +| Admin | All | All | All | All | Yes | +| Alpha | Owned + Shared | Owned + Shared | Permitted | Permitted DBs | No | +| Gamma | Shared | Shared | Permitted | No | No | +| Viewer | Shared | Shared | None | No | No | + +## Security Checklist for Production + +Before deploying MCP service to production: + +**Authentication**: +- [ ] `MCP_AUTH_ENABLED = True` +- [ ] JWT issuer, audience, and keys configured +- [ ] `MCP_DEV_USERNAME` removed or set to `None` +- [ ] Token expiration enforced (short-lived tokens) +- [ ] Refresh token mechanism implemented (client-side) + +**Authorization**: +- [ ] RBAC roles configured in Superset +- [ ] RLS rules tested for all datasets +- [ ] Dataset access permissions verified +- [ ] Minimum required permissions granted per role +- [ ] Service accounts use dedicated roles + +**Network Security**: +- [ ] HTTPS enabled (`SESSION_COOKIE_SECURE = True`) +- [ ] TLS 1.2+ enforced +- [ ] Firewall rules restrict access to MCP service +- [ ] Network isolation between MCP and database +- [ ] Load balancer health checks configured + +**Session Security**: +- [ ] `SESSION_COOKIE_HTTPONLY = True` +- [ ] `SESSION_COOKIE_SECURE = True` +- [ ] `SESSION_COOKIE_SAMESITE = "Strict"` +- [ ] Session timeout configured appropriately +- [ ] No sensitive data stored in sessions + +**Audit Logging**: +- [ ] Structured logging enabled +- [ ] All tool executions logged +- [ ] Authentication events logged +- [ ] Logs exported to SIEM/aggregation system +- [ ] Log retention policy implemented + +**Monitoring**: +- [ ] Failed authentication attempts alerted +- [ ] Permission denied events monitored +- [ ] Error rate alerts configured +- [ ] Unusual access patterns detected +- [ ] Service availability monitored + +**Compliance**: +- [ ] Data access logs retained per regulations +- [ ] Audit trail exportable +- [ ] Privacy policy updated for MCP service +- [ ] User consent obtained (if required) +- [ ] Security incident response plan includes MCP + +## Security Incident Response + +### Suspected Token Compromise + +**Immediate Actions**: +1. Revoke compromised token at auth provider +2. Review audit logs for unauthorized access +3. Identify affected resources +4. Notify affected users/stakeholders +5. Force token refresh for all users (if provider supports) + +**Investigation**: +1. Check MCP service logs for unusual activity +2. Correlate access patterns with compromised token +3. Determine scope of data accessed +4. Document timeline of events + +### Unauthorized Access Detected + +**Response Procedure**: +1. Block user/IP immediately (firewall/load balancer) +2. Disable user account in Superset +3. Review all actions by user in audit logs +4. Assess data exposure +5. Notify security team and management +6. Preserve logs for forensic analysis + +### Data Breach + +**MCP-Specific Considerations**: +1. Identify which datasets were accessed via MCP +2. Determine if RLS was bypassed (should not be possible) +3. Check for SQL injection attempts (should be prevented by Superset) +4. Review all tool executions in timeframe +5. Export detailed audit logs for incident report + +## References + +- **JWT Best Practices**: https://tools.ietf.org/html/rfc8725 +- **OWASP API Security**: https://owasp.org/www-project-api-security/ +- **Superset Security Documentation**: https://superset.apache.org/docs/security +- **Flask-AppBuilder Security**: https://flask-appbuilder.readthedocs.io/en/latest/security.html +- **GDPR Compliance Guide**: https://gdpr.eu/ +- **SOC 2 Framework**: https://www.aicpa.org/soc2 +- **HIPAA Security Rule**: https://www.hhs.gov/hipaa/for-professionals/security/