mirror of https://github.com/apache/superset.git synced 2026-04-17 07:05:04 +00:00

Files

Amin Ghadersohi 66afdfd119 docs(mcp): add comprehensive architecture, security, and production deployment documentation (#36017 )

2025-11-19 16:41:56 +01:00

24 KiB

Raw Blame History

MCP Service Security

Overview

The MCP service implements multiple layers of security to ensure safe programmatic access to Superset functionality. This document covers authentication, authorization, session management, audit logging, and compliance considerations.

Authentication

Current Implementation (Development)

For development and testing, the MCP service uses a simple username-based authentication:

# superset_config.py
MCP_DEV_USERNAME = "admin"

How it works:

The @mcp_auth_hook decorator calls get_user_from_request()
get_user_from_request() reads MCP_DEV_USERNAME from config
User is queried from database and set as g.user
All subsequent Superset operations use this user's permissions

Development Use Only:

No token validation
No multi-user support
No authentication security
Single user for all MCP requests
NOT suitable for production

Production Implementation (JWT)

For production deployments, the MCP service supports JWT (JSON Web Token) authentication:

# superset_config.py
MCP_AUTH_ENABLED = True
MCP_JWT_ISSUER = "https://your-auth-provider.com"
MCP_JWT_AUDIENCE = "superset-mcp"
MCP_JWT_ALGORITHM = "RS256"  # or "HS256" for symmetric keys

# Option 1: Use JWKS endpoint (recommended for RS256)
MCP_JWKS_URI = "https://your-auth-provider.com/.well-known/jwks.json"

# Option 2: Use static public key (RS256)
MCP_JWT_PUBLIC_KEY = """-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA...
-----END PUBLIC KEY-----"""

# Option 3: Use shared secret (HS256 - less secure)
MCP_JWT_ALGORITHM = "HS256"
MCP_JWT_SECRET = "your-shared-secret-key"

JWT Token Structure:

{
  "iss": "https://your-auth-provider.com",
  "sub": "user@company.com",
  "aud": "superset-mcp",
  "exp": 1735689600,
  "iat": 1735686000,
  "email": "user@company.com",
  "scopes": ["superset:read", "superset:chart:create"]
}

Required Claims:

iss (issuer): Must match MCP_JWT_ISSUER
sub (subject): User identifier (username/email)
aud (audience): Must match MCP_JWT_AUDIENCE
exp (expiration): Token expiration timestamp
iat (issued at): Token creation timestamp

Optional Claims:

email: User's email address
username: Alternative to sub for user identification
scopes: Array of permission scopes
tenant_id: Multi-tenant identifier (future use)

Token Validation Process:

Extract Bearer token from Authorization header
Verify token signature using public key or JWKS
Validate iss, aud, and exp claims
Check required scopes (if configured)
Extract user identifier from sub, email, or username claim
Look up Superset user from database
Set g.user for request context

Example Client Usage:

# Using curl
curl -H "Authorization: Bearer YOUR_JWT_TOKEN" \
     http://localhost:5008/list_charts

# Using MCP client (Claude Desktop)
{
  "mcpServers": {
    "superset": {
      "url": "http://localhost:5008",
      "headers": {
        "Authorization": "Bearer YOUR_JWT_TOKEN"
      }
    }
  }
}

Token Renewal and Refresh

Short-lived Access Tokens (recommended):

Issue tokens with short expiration (e.g., 15 minutes)
Client must refresh token before expiration
Reduces risk of token theft

Refresh Token Pattern:

sequenceDiagram
    participant Client
    participant AuthProvider as Auth Provider
    participant MCP as MCP Service

    Client->>AuthProvider: Request access
    AuthProvider-->>Client: access_token (15 min)<br/>refresh_token (30 days)

    Client->>MCP: Request with access_token
    MCP-->>Client: Response

    Note over Client,MCP: Access token expires

    Client->>AuthProvider: Request new token with refresh_token
    AuthProvider-->>Client: New access_token (15 min)

    Client->>MCP: Request with new access_token
    MCP-->>Client: Response

    Note over Client,AuthProvider: Refresh token expires

    Client->>AuthProvider: User must re-authenticate

MCP Service Responsibility:

MCP service only validates access tokens
Refresh token handling is the client's responsibility
Auth provider (OAuth2/OIDC server) handles token refresh

Service Account Patterns

For automation and batch jobs, use service accounts instead of user credentials:

{
  "iss": "https://your-auth-provider.com",
  "sub": "service-account@automation.company.com",
  "aud": "superset-mcp",
  "exp": 1735689600,
  "client_id": "superset-automation",
  "scopes": ["superset:read", "superset:chart:create"]
}

Service Account Best Practices:

Create dedicated Superset users for service accounts
Grant minimal required permissions
Use long-lived tokens only when necessary
Rotate service account credentials regularly
Log all service account activity
Use separate service accounts per automation job

Example Superset Service Account Setup:

# Create service account user in Superset
superset fab create-user \
  --role Alpha \
  --username automation-service \
  --firstname Automation \
  --lastname Service \
  --email automation@company.com \
  --password <generated-password>

# Grant specific permissions
# (Use Superset UI or FAB CLI to configure role permissions)

Authorization

RBAC Integration

The MCP service fully integrates with Superset's Flask-AppBuilder role-based access control:

Role Hierarchy:

Admin: Full access to all resources
Alpha: Can create and edit dashboards, charts, datasets
Gamma: Read-only access to permitted resources
Custom Roles: Fine-grained permission sets

Permission Checking Flow:

# In MCP tool
@mcp.tool
@mcp_auth_hook  # Sets g.user
def list_dashboards(filters: List[Filter]) -> DashboardList:
    # Flask-AppBuilder security manager automatically filters
    # based on g.user's permissions
    dashboards = DashboardDAO.find_by_ids(...)
    # Only returns dashboards g.user can access

Permission Types:

Permission	Description	Example
`can_read`	View resource	View dashboard details
`can_write`	Edit resource	Update chart configuration
`can_delete`	Delete resource	Remove dashboard
`datasource_access`	Access dataset	Query dataset in chart
`database_access`	Access database	Execute SQL in SQL Lab

Row-Level Security (RLS)

RLS rules filter query results based on user attributes:

RLS Rule Example:

-- Only show records for user's department
department = '{{ current_user().department }}'

How RLS Works with MCP:

sequenceDiagram
    participant Client
    participant Auth as @mcp_auth_hook
    participant Tool as MCP Tool
    participant DAO as Superset DAO
    participant DB as Database

    Client->>Auth: Request with JWT/dev username
    Auth->>Auth: Set g.user
    Auth->>Tool: Execute tool
    Tool->>DAO: Call ChartDAO.get_chart_data()
    DAO->>DAO: Apply RLS rules<br/>Replace template variables<br/>with g.user attributes
    DAO->>DB: Query with RLS filters in WHERE clause
    DB-->>DAO: Only permitted rows
    DAO-->>Tool: Filtered data
    Tool-->>Client: Response

RLS Configuration:

RLS is configured per dataset in Superset UI:

Navigate to dataset → Edit → Row Level Security
Create RLS rule with SQL filter template
Assign rule to roles or users
MCP service automatically applies rules (no code changes needed)

MCP Service Guarantees:

Cannot bypass RLS rules
No privileged access mode
RLS applied consistently across all tools
Same security model as Superset web UI

Dataset Access Control

The MCP service validates dataset access before executing queries:

# In chart generation tool
@mcp.tool
@mcp_auth_hook
def generate_chart(dataset_id: int, ...) -> ChartResponse:
    dataset = DatasetDAO.find_by_id(dataset_id)

    # Check if user has access
    if not has_dataset_access(dataset):
        raise ValueError(
            f"User {g.user.username} does not have access to dataset {dataset_id}"
        )

    # Proceed with chart creation
    ...

Dataset Access Filters:

All listing operations automatically filter by user access:

list_datasets: Uses DatasourceFilter - only shows datasets user can query
list_charts: Uses ChartAccessFilter - only shows charts with accessible datasets
list_dashboards: Uses DashboardAccessFilter - only shows dashboards user can view

Access Check Implementation:

from superset import security_manager

def has_dataset_access(dataset: SqlaTable) -> bool:
    """Check if g.user can access dataset."""
    if hasattr(g, "user") and g.user:
        return security_manager.can_access_datasource(datasource=dataset)
    return False

Tool-Level Permissions

Different MCP tools require different Superset permissions:

Tool	Required Permissions	Notes
`list_dashboards`	`can_read` on Dashboard	Returns only accessible dashboards
`get_dashboard_info`	`can_read` on Dashboard + dataset access	Validates dashboard and dataset permissions
`list_charts`	`can_read` on Slice	Returns only charts with accessible datasets
`get_chart_info`	`can_read` on Slice + dataset access	Validates chart and dataset permissions
`get_chart_data`	`can_read` on Slice + `datasource_access`	Executes query with RLS applied
`generate_chart`	`can_write` on Slice + `datasource_access`	Creates new chart
`update_chart`	`can_write` on Slice + ownership or Admin	Must own chart or be Admin
`list_datasets`	`datasource_access`	Returns only accessible datasets
`get_dataset_info`	`datasource_access`	Validates dataset access
`execute_sql`	`can_sql_json` or `can_sqllab` on Database	Executes SQL with RLS
`generate_dashboard`	`can_write` on Dashboard + dataset access	Creates new dashboard

Permission Denied Handling:

# If user lacks permission, Superset raises exception
try:
    result = DashboardDAO.find_by_id(dashboard_id)
except SupersetSecurityException as e:
    raise ValueError(f"Access denied: {e}")

JWT Scope Validation

Future implementation will support scope-based authorization:

# superset_config.py
MCP_REQUIRED_SCOPES = ["superset:read"]  # Minimum scopes required

Scope Hierarchy:

superset:read: List and view resources
superset:chart:create: Create new charts
superset:chart:update: Update existing charts
superset:chart:delete: Delete charts
superset:dashboard:create: Create dashboards
superset:sql:execute: Execute SQL queries
superset:admin: Full administrative access

Scope Enforcement (future):

@mcp.tool
@mcp_auth_hook
@require_scopes(["superset:chart:create"])
def generate_chart(...) -> ChartResponse:
    # Only proceeds if JWT contains required scope
    ...

Scope Validation Logic:

Extract scopes array from JWT payload
Check if all required scopes present
Deny access if any scope missing
Log denied attempts for audit

Session and CSRF Handling

Session Configuration

The MCP service configures sessions for authentication context:

# superset_config.py
MCP_SESSION_CONFIG = {
    "SESSION_COOKIE_HTTPONLY": True,      # Prevent JavaScript access
    "SESSION_COOKIE_SECURE": True,        # HTTPS only (production)
    "SESSION_COOKIE_SAMESITE": "Strict",  # CSRF protection
    "SESSION_COOKIE_NAME": "superset_session",
    "PERMANENT_SESSION_LIFETIME": 86400,  # 24 hours
}

Why Session Config in MCP?

The MCP service uses Flask's session mechanism for:

Authentication context: Storing g.user across request lifecycle
CSRF token generation: Protecting state-changing operations
Request correlation: Linking related tool calls

Important Notes:

MCP service is stateless - no server-side session storage
Sessions used only for request-scoped auth context
Cookies used for auth token transmission (alternative to Bearer header)
Session data NOT persisted between MCP service restarts

CSRF Protection

CSRF (Cross-Site Request Forgery) protection is configured but currently not enforced for MCP tools:

MCP_CSRF_CONFIG = {
    "WTF_CSRF_ENABLED": True,
    "WTF_CSRF_TIME_LIMIT": None,  # No time limit
}

Why CSRF Config Exists:

Flask-AppBuilder and Superset expect CSRF configuration
Prevents errors during app initialization
Future-proofing for potential web UI for MCP service

Why CSRF NOT Enforced:

MCP protocol uses Bearer tokens (not cookies for auth)
CSRF attacks require browser cookie-based authentication
Stateless API design prevents CSRF vulnerability
MCP clients are programmatic (not browsers)

If Using Cookie-Based Auth (future):

Enable CSRF token requirement
Include CSRF token in MCP tool requests
Validate token on state-changing operations

CSRF Token Flow (if enabled):

sequenceDiagram
    participant Client
    participant MCP as MCP Service
    participant Session as Session Store

    Client->>MCP: Request CSRF token
    MCP->>Session: Generate and store token
    MCP-->>Client: Return CSRF token

    Client->>MCP: Request with CSRF token
    MCP->>Session: Validate token matches session
    alt Token valid
        MCP-->>Client: Process request
    else Token invalid/missing
        MCP-->>Client: Reject request (403)
    end

Production Security Recommendations

HTTPS Required:

MCP_SESSION_CONFIG = {
    "SESSION_COOKIE_SECURE": True,  # MUST be True in production
}

Without HTTPS:

Cookies transmitted in plaintext
Session hijacking risk
JWT tokens exposed
Man-in-the-middle attacks possible

SameSite Configuration:

Strict: Cookies never sent cross-site (most secure)
Lax: Cookies sent on top-level navigation (less secure)
None: Cookies sent everywhere (requires Secure flag, least secure)

Recommended Production Settings:

MCP_SESSION_CONFIG = {
    "SESSION_COOKIE_HTTPONLY": True,      # Always
    "SESSION_COOKIE_SECURE": True,        # Always (HTTPS required)
    "SESSION_COOKIE_SAMESITE": "Strict",  # Recommended
    "PERMANENT_SESSION_LIFETIME": 3600,   # 1 hour (adjust as needed)
}

Audit Logging

Current Logging

The MCP service logs basic authentication events:

# In @mcp_auth_hook
logger.debug(
    "MCP tool call: user=%s, tool=%s",
    user.username,
    tool_func.__name__
)

What's Logged:

User who made the request
Which tool was called
Timestamp (from log formatter)
Success/failure (via exception logging)

Log Format:

2025-01-01 10:30:45,123 DEBUG [mcp_auth_hook] MCP tool call: user=admin, tool=list_dashboards
2025-01-01 10:30:45,456 ERROR [mcp_auth_hook] Tool execution failed: user=admin, tool=generate_chart, error=Permission denied

Enhanced Audit Logging (Recommended)

For production deployments, implement structured logging:

# superset_config.py
import logging
import json

class StructuredFormatter(logging.Formatter):
    def format(self, record):
        log_data = {
            "timestamp": self.formatTime(record),
            "level": record.levelname,
            "logger": record.name,
            "message": record.getMessage(),
            "user": getattr(record, "user", None),
            "tool": getattr(record, "tool", None),
            "resource_type": getattr(record, "resource_type", None),
            "resource_id": getattr(record, "resource_id", None),
            "action": getattr(record, "action", None),
            "result": getattr(record, "result", None),
            "error": getattr(record, "error", None),
        }
        return json.dumps(log_data)

# Apply formatter
handler = logging.StreamHandler()
handler.setFormatter(StructuredFormatter())
logging.getLogger("superset.mcp_service").addHandler(handler)

Structured Log Example:

{
  "timestamp": "2025-01-01T10:30:45.123Z",
  "level": "INFO",
  "logger": "superset.mcp_service.auth",
  "message": "MCP tool execution",
  "user": "admin",
  "tool": "generate_chart",
  "resource_type": "chart",
  "resource_id": 42,
  "action": "create",
  "result": "success",
  "duration_ms": 234
}

Audit Events

Key Events to Log:

Event	Data to Capture	Severity
Authentication success	User, timestamp, IP	INFO
Authentication failure	Username attempted, reason	WARNING
Tool execution	User, tool, parameters, result	INFO
Permission denied	User, tool, resource, reason	WARNING
Chart created	User, chart_id, dataset_id	INFO
Dashboard created	User, dashboard_id, chart_ids	INFO
SQL executed	User, database, query (sanitized), rows	INFO
Error occurred	User, tool, error type, stack trace	ERROR

Integration with SIEM Systems

Export to External Systems:

Option 1: Syslog:

import logging.handlers

syslog_handler = logging.handlers.SysLogHandler(
    address=("syslog.company.com", 514)
)
logging.getLogger("superset.mcp_service").addHandler(syslog_handler)

Option 2: Log Aggregation (ELK, Splunk):

# Send JSON logs to stdout, collected by log shipper
import sys
import logging

handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(StructuredFormatter())

Option 3: Cloud Logging (CloudWatch, Stackdriver):

# AWS CloudWatch example
import watchtower

handler = watchtower.CloudWatchLogHandler(
    log_group="/superset/mcp",
    stream_name="mcp-service"
)
logging.getLogger("superset.mcp_service").addHandler(handler)

Log Retention

Recommended Retention Policies:

Authentication logs: 90 days minimum
Tool execution logs: 30 days minimum
Error logs: 180 days minimum
Compliance logs: Per regulatory requirements (e.g., 7 years for HIPAA)

Compliance Considerations

User Data Access Tracking:

Log all data access by user
Provide audit trail for data subject access requests (DSAR)
Implement data retention policies
Support right to be forgotten (delete user data from logs)

MCP Service Compliance:

All tool calls logged with user identification
Can generate reports of user's data access
Logs can be filtered/redacted for privacy
No personal data stored in MCP service (only in Superset DB)

SOC 2 (Service Organization Control 2)

Audit Trail Requirements:

Log all administrative actions
Maintain immutable audit logs
Implement log integrity verification
Provide audit log export functionality

MCP Service Compliance:

Structured logging provides audit trail
Logs include who, what, when for all actions
Export logs to secure, immutable storage (S3, etc.)
Implement log signing for integrity verification

HIPAA (Health Insurance Portability and Accountability Act)

PHI Access Logging:

Log all access to protected health information
Include user, timestamp, data accessed
Maintain logs for 6 years minimum
Implement access controls on audit logs

MCP Service Compliance:

All dataset queries logged
Row-level security enforces data access controls
Can identify which users accessed which PHI records
Logs exportable for compliance reporting

Example HIPAA Audit Log Entry:

{
  "timestamp": "2025-01-01T10:30:45.123Z",
  "user": "doctor@hospital.com",
  "action": "query_dataset",
  "dataset_id": 123,
  "dataset_name": "patient_records",
  "rows_returned": 5,
  "phi_accessed": true,
  "purpose": "Treatment",
  "ip_address": "10.0.1.25"
}

Access Control Matrix

For compliance audits, maintain a matrix of who can access what:

Role	Dashboards	Charts	Datasets	SQL Lab	Admin
Admin	All	All	All	All	Yes
Alpha	Owned + Shared	Owned + Shared	Permitted	Permitted DBs	No
Gamma	Shared	Shared	Permitted	No	No
Viewer	Shared	Shared	None	No	No

Security Checklist for Production

Before deploying MCP service to production:

Authentication:

MCP_AUTH_ENABLED = True
JWT issuer, audience, and keys configured
MCP_DEV_USERNAME removed or set to None
Token expiration enforced (short-lived tokens)
Refresh token mechanism implemented (client-side)

Authorization:

RBAC roles configured in Superset
RLS rules tested for all datasets
Dataset access permissions verified
Minimum required permissions granted per role
Service accounts use dedicated roles

Network Security:

HTTPS enabled (SESSION_COOKIE_SECURE = True)
TLS 1.2+ enforced
Firewall rules restrict access to MCP service
Network isolation between MCP and database
Load balancer health checks configured

Session Security:

SESSION_COOKIE_HTTPONLY = True
SESSION_COOKIE_SECURE = True
SESSION_COOKIE_SAMESITE = "Strict"
Session timeout configured appropriately
No sensitive data stored in sessions

Audit Logging:

Structured logging enabled
All tool executions logged
Authentication events logged
Logs exported to SIEM/aggregation system
Log retention policy implemented

Monitoring:

Failed authentication attempts alerted
Permission denied events monitored
Error rate alerts configured
Unusual access patterns detected
Service availability monitored

Compliance:

Data access logs retained per regulations
Audit trail exportable
Privacy policy updated for MCP service
User consent obtained (if required)
Security incident response plan includes MCP

Security Incident Response

Suspected Token Compromise

Immediate Actions:

Revoke compromised token at auth provider
Review audit logs for unauthorized access
Identify affected resources
Notify affected users/stakeholders
Force token refresh for all users (if provider supports)

Investigation:

Check MCP service logs for unusual activity
Correlate access patterns with compromised token
Determine scope of data accessed
Document timeline of events

Unauthorized Access Detected

Response Procedure:

Block user/IP immediately (firewall/load balancer)
Disable user account in Superset
Review all actions by user in audit logs
Assess data exposure
Notify security team and management
Preserve logs for forensic analysis

Data Breach

MCP-Specific Considerations:

Identify which datasets were accessed via MCP
Determine if RLS was bypassed (should not be possible)
Check for SQL injection attempts (should be prevented by Superset)
Review all tool executions in timeframe
Export detailed audit logs for incident report

References

JWT Best Practices: https://tools.ietf.org/html/rfc8725
OWASP API Security: https://owasp.org/www-project-api-security/
Superset Security Documentation: https://superset.apache.org/docs/security
Flask-AppBuilder Security: https://flask-appbuilder.readthedocs.io/en/latest/security.html
GDPR Compliance Guide: https://gdpr.eu/
SOC 2 Framework: https://www.aicpa.org/soc2
HIPAA Security Rule: https://www.hhs.gov/hipaa/for-professionals/security/

24 KiB Raw Blame History

MCP Service Security

Overview

Authentication

Current Implementation (Development)

Production Implementation (JWT)

Token Renewal and Refresh

Service Account Patterns

Authorization

RBAC Integration

Row-Level Security (RLS)

Dataset Access Control

Tool-Level Permissions

JWT Scope Validation

Session and CSRF Handling

Session Configuration

CSRF Protection

Production Security Recommendations

Audit Logging

Current Logging

Enhanced Audit Logging (Recommended)

Audit Events

Integration with SIEM Systems

Log Retention

Compliance Considerations

GDPR (General Data Protection Regulation)

SOC 2 (Service Organization Control 2)

HIPAA (Health Insurance Portability and Accountability Act)

Access Control Matrix

Security Checklist for Production

Security Incident Response

Suspected Token Compromise

Unauthorized Access Detected

Data Breach

References

24 KiB

Raw Blame History