mirror of https://github.com/apache/superset.git synced 2026-04-19 08:04:53 +00:00

Files

Amin Ghadersohi 84279acd2f feat(mcp): add unified get_schema tool for schema discovery (#36458 )

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-07 15:26:17 -08:00

22 KiB

Raw Blame History

MCP Service - LLM Agent Guide

This guide helps LLM agents understand the Superset MCP (Model Context Protocol) service architecture and development conventions.

⚠️ CRITICAL: Apache License Headers

EVERY Python file in the MCP service MUST have the Apache Software Foundation license header.

This includes:

All .py files (tool files, schemas, init.py files, etc.)
NEVER remove existing license headers during refactoring or edits
ALWAYS add license headers when creating new files
ALWAYS verify license headers are present after editing files

If you see a file without a license header, ADD IT IMMEDIATELY. If you accidentally remove one during editing, ADD IT BACK.

Use this exact template at the top of EVERY Python file:

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

Note: LLM instruction files like CLAUDE.md, AGENTS.md, etc. are excluded from this requirement (listed in .rat-excludes) to avoid token overhead, but ALL other Python files require it.

Architecture Overview

The MCP service provides programmatic access to Superset via the Model Context Protocol, allowing AI assistants to interact with dashboards, charts, datasets, SQL Lab, and instance metadata.

Key Components

superset/mcp_service/
├── app.py                      # FastMCP app factory and tool registration
├── auth.py                     # Authentication and authorization
├── mcp_config.py              # Default configuration
├── mcp_core.py                # Reusable core classes for tools
├── flask_singleton.py         # Flask app singleton for MCP context
├── chart/                     # Chart-related tools
│   ├── schemas.py            # Pydantic schemas for chart responses
│   └── tool/                 # Chart tool implementations
│       ├── __init__.py       # Tool exports
│       ├── list_charts.py
│       └── get_chart_info.py
├── dashboard/                 # Dashboard-related tools
│   ├── schemas.py
│   └── tool/
├── dataset/                   # Dataset-related tools
│   ├── schemas.py
│   └── tool/
└── system/                    # System/instance tools
    ├── schemas.py
    └── tool/

Critical Convention: Tool, Prompt, and Resource Registration

IMPORTANT: When creating new MCP tools, prompts, or resources, you MUST add their imports to app.py for auto-registration. Do NOT add them to server.py - that approach doesn't work properly.

How to Add a New Tool

Create the tool file in the appropriate directory (e.g., chart/tool/my_new_tool.py)
Decorate with @tool to register it with the decorator
Add import to app.py at the bottom of the file where other tools are imported (around line 210-242)

Example:

# superset/mcp_service/chart/tool/my_new_tool.py
from superset_core.mcp import tool

@tool
def my_new_tool(param: str) -> dict:
    """Tool description for LLMs."""
    return {"result": "success"}

Then add to app.py:

# superset/mcp_service/app.py (at the bottom, around line 207-224)
from superset.mcp_service.chart.tool import (  # noqa: F401, E402
    get_chart_info,
    list_charts,
    my_new_tool,  # ADD YOUR TOOL HERE
)

Why this matters: Tools use @tool decorators and register automatically on import. The import MUST be in app.py at the bottom of the file (after dependency injection is initialized). If you don't import the tool in app.py, it won't be available to MCP clients. DO NOT add imports to server.py - that file is for running the server only.

How to Add a New Prompt

Create the prompt file in the appropriate directory (e.g., chart/prompts/my_new_prompt.py)
Decorate with @prompt to register it with the unified decorator
Add import to module's __init__.py (e.g., chart/prompts/__init__.py)
Ensure module is imported in app.py (around line 244-253)

Example:

# superset/mcp_service/chart/prompts/my_new_prompt.py
from superset_core.mcp import prompt

@prompt("my_new_prompt")
async def my_new_prompt_handler(ctx: Context) -> str:
    """Interactive prompt for doing something."""
    return "Prompt instructions here..."

Then add to chart/prompts/__init__.py:

# superset/mcp_service/chart/prompts/__init__.py
from . import create_chart_guided  # existing
from . import my_new_prompt  # ADD YOUR PROMPT HERE

Verify module import exists in app.py (around line 248):

# superset/mcp_service/app.py
from superset.mcp_service.chart import prompts as chart_prompts  # This imports all prompts

How to Add a New Resource

Create the resource file in the appropriate directory (e.g., chart/resources/my_new_resource.py)
Decorate with @mcp.resource to register it with FastMCP (resources still use direct FastMCP)
Add import to module's __init__.py (e.g., chart/resources/__init__.py)
Ensure module is imported in app.py (around line 244-253)

Example:

# superset/mcp_service/chart/resources/my_new_resource.py
from superset.mcp_service.app import mcp
from superset.mcp_service.auth import mcp_auth_hook

@mcp.resource("superset://chart/my_resource")
@mcp_auth_hook
def get_my_resource() -> str:
    """Resource description for LLMs."""
    return "Resource data here..."

Note: Resources continue to use the direct FastMCP decorators (@mcp.resource) rather than the unified @tool() decorator.

Then add to chart/resources/__init__.py:

# superset/mcp_service/chart/resources/__init__.py
from . import chart_configs  # existing
from . import my_new_resource  # ADD YOUR RESOURCE HERE

Verify module import exists in app.py (around line 249):

# superset/mcp_service/app.py
from superset.mcp_service.chart import resources as chart_resources  # This imports all resources

Why this matters: Prompts and resources work similarly to tools - they use decorators and register on import. The module-level imports (chart/prompts/__init__.py, chart/resources/__init__.py) ensure individual files are imported when the module is imported. The app.py imports ensure the modules are loaded when the MCP service starts.

Tool Development Patterns

1. Use Core Classes for Reusability

The mcp_core.py module provides reusable patterns:

ModelListCore: For listing resources (dashboards, charts, datasets)
ModelGetInfoCore: For getting resource details by ID/UUID
ModelGetSchemaCore: For retrieving comprehensive schema metadata (columns, filters, sortable columns)

Example:

from superset_core.mcp import tool

from superset.mcp_service.mcp_core import ModelListCore
from superset.daos.dashboard import DashboardDAO
from superset.mcp_service.dashboard.schemas import DashboardList

list_core = ModelListCore(
    dao_class=DashboardDAO,
    output_schema=DashboardList,
    logger=logger,
)

@tool
def list_dashboards(filters: List[DashboardFilter], page: int = 1) -> DashboardList:
    return list_core.run_tool(filters=filters, page=page, page_size=10)

2. Always Use Authentication

Every tool must use @tool with authentication enabled (default) to ensure:

User authentication from JWT or configured admin user
Permission checking via JWT scopes
Audit logging of tool access

from superset_core.mcp import tool

@tool  # REQUIRED - secure=True by default
def my_tool() -> dict:
    # g.user is set by tool decorator
    return {"user": g.user.username}

@tool(protect=False)  # Only for truly public tools
def public_tool() -> dict:
    # No authentication required
    return {"status": "public"}

3. Use Pydantic Schemas

All tool inputs and outputs must be Pydantic models for:

Automatic validation
LLM-friendly schema generation
Type safety

Convention: Place schemas in {module}/schemas.py

from pydantic import BaseModel, Field

class MyToolRequest(BaseModel):
    param: str = Field(..., description="Parameter description for LLMs")

class MyToolResponse(BaseModel):
    result: str = Field(..., description="Result description")
    timestamp: datetime = Field(
        default_factory=lambda: datetime.now(timezone.utc),
        description="Response timestamp"
    )

4. Follow the DAO Pattern

Use Superset's DAO (Data Access Object) layer instead of direct database queries:

from superset.daos.dashboard import DashboardDAO

# GOOD: Use DAO
dashboard = DashboardDAO.find_by_id(dashboard_id)

# BAD: Don't query directly
dashboard = db.session.query(Dashboard).filter_by(id=dashboard_id).first()

5. Python Type Hints (Python 3.10+ Style)

CRITICAL: Always use modern Python 3.10+ union syntax for type hints.

# GOOD - Modern Python 3.10+ syntax
from typing import List, Dict, Any
from pydantic import BaseModel, Field

class MySchema(BaseModel):
    name: str | None = Field(None, description="Optional name")
    tags: List[str] = Field(default_factory=list)
    metadata: Dict[str, Any] = Field(default_factory=dict)

def my_function(
    id: int,
    filters: List[str] | None = None,
    options: Dict[str, Any] | None = None
) -> MySchema | None:
    pass

# BAD - Old-style Optional (DO NOT USE)
from typing import Optional, List, Dict, Any

class MySchema(BaseModel):
    name: Optional[str] = Field(None, description="Optional name")  # Wrong!

def my_function(
    id: int,
    filters: Optional[List[str]] = None,  # Wrong!
    options: Optional[Dict[str, Any]] = None  # Wrong!
) -> Optional[MySchema]:  # Wrong!
    pass

Key rules:

Use T | None instead of Optional[T]
Do NOT import Optional, List, Dict from typing, prefer | None, list[] etc
All new code must follow this pattern

6. Flexible Input Parsing (JSON String or Object)

MCP tools accept both JSON string and native object formats for parameters using utilities from superset.mcp_service.utils.schema_utils. This makes tools flexible for different client types (LLM clients send objects, CLI tools send JSON strings).

PREFERRED: Use the @parse_request decorator for tool functions to automatically handle request parsing:

from superset.mcp_service.utils.schema_utils import parse_request

@mcp.tool
@mcp_auth_hook
@parse_request(ListChartsRequest)  # Automatically parses string requests!
async def list_charts(request: ListChartsRequest | str, ctx: Context) -> ChartList:
    """List charts with filtering and search."""
    # request is guaranteed to be ListChartsRequest here - no manual parsing needed!
    await ctx.info(f"Listing charts: page={request.page}")
    ...

Benefits:

Eliminates 5 lines of boilerplate code per tool
Handles both async and sync functions automatically
Works with Claude Code bug (GitHub issue #5504)
Cleaner, more maintainable code

Available utilities for other use cases:

parse_json_or_passthrough

Parse JSON string or return object as-is:

from superset.mcp_service.utils.schema_utils import parse_json_or_passthrough

# Accepts both formats
config = parse_json_or_passthrough(value, param_name="config")
# value can be: '{"key": "value"}' (JSON string) OR {"key": "value"} (dict)

parse_json_or_list

Parse to list from JSON, list, or comma-separated string:

from superset.mcp_service.utils.schema_utils import parse_json_or_list

# Accepts multiple formats
items = parse_json_or_list(value, param_name="items")
# value can be:
#   '["a", "b"]' (JSON array)
#   ["a", "b"] (Python list)
#   "a, b, c" (comma-separated string)

parse_json_or_model

Parse to Pydantic model from JSON or dict:

from superset.mcp_service.utils.schema_utils import parse_json_or_model

# Accepts JSON string or dict
config = parse_json_or_model(value, ConfigModel, param_name="config")
# value can be: '{"name": "test"}' OR {"name": "test"}

parse_json_or_model_list

Parse to list of Pydantic models:

from superset.mcp_service.utils.schema_utils import parse_json_or_model_list

# Accepts JSON array or list of dicts
filters = parse_json_or_model_list(value, FilterModel, param_name="filters")
# value can be: '[{"col": "name"}]' OR [{"col": "name"}]

Using with Pydantic validators:

from pydantic import BaseModel, field_validator
from superset.mcp_service.utils.schema_utils import parse_json_or_list

class MyToolRequest(BaseModel):
    filters: List[FilterModel] = Field(default_factory=list)
    select_columns: List[str] = Field(default_factory=list)

    @field_validator("filters", mode="before")
    @classmethod
    def parse_filters(cls, v):
        """Accept both JSON string and list of objects."""
        return parse_json_or_model_list(v, FilterModel, "filters")

    @field_validator("select_columns", mode="before")
    @classmethod
    def parse_columns(cls, v):
        """Accept JSON array, list, or comma-separated string."""
        return parse_json_or_list(v, "select_columns")

Core classes already use these utilities:

ModelListCore uses them for filters and select_columns
No need to add parsing logic in individual tools that use core classes

When to use:

Tool parameters that accept complex objects (dicts, lists)
Parameters that may come from CLI tools (JSON strings) or LLM clients (objects)
Any field where you want maximum flexibility

7. Error Handling

Use consistent error schemas:

class MyError(BaseModel):
    error: str = Field(..., description="Error message")
    error_type: str = Field(..., description="Type of error")
    timestamp: datetime = Field(
        default_factory=lambda: datetime.now(timezone.utc),
        description="Error timestamp"
    )

@tool
def my_tool(id: int) -> MyResponse:
    try:
        result = process_data(id)
        return MyResponse(data=result)
    except NotFound:
        raise ValueError(f"Resource {id} not found")

Testing Conventions

Unit Tests

Place unit tests in tests/unit_tests/mcp_service/{module}/tool/test_{tool_name}.py

Test structure:

from unittest.mock import MagicMock, patch
import pytest

class TestMyTool:
    @pytest.fixture
    def mock_dao(self):
        """Create mock DAO for testing."""
        dao = MagicMock()
        dao.find_by_id.return_value = create_mock_object()
        return dao

    @patch("superset.mcp_service.chart.tool.my_tool.ChartDAO")
    def test_my_tool_success(self, mock_dao_class, mock_dao):
        """Test successful tool execution."""
        mock_dao_class.return_value = mock_dao

        result = my_tool(id=1)

        assert result.data is not None
        mock_dao.find_by_id.assert_called_once_with(1)

Integration Tests

Use Flask test client for integration tests:

def test_tool_with_flask_context(app):
    """Test tool with full Flask app context."""
    with app.app_context():
        result = my_tool(id=1)
        assert result is not None

Common Pitfalls to Avoid

1. ❌ Forgetting Tool Import in app.py

Problem: Tool exists but isn't available to MCP clients. Solution: Always add tool import to app.py (at the bottom) after creating it. Never add to server.py.

2. ❌ Adding Tool Imports to server.py

Problem: Tools won't register properly, causing runtime errors. Solution: Tool imports must be in app.py at the bottom of the file, not in server.py. The server.py file is only for running the server.

3. ❌ Missing Authentication

Problem: Tool bypasses authentication and authorization. Solution: Always use @tool with default protect=True, or explicitly set protect=False only for public tools.

4. ❌ Using `Optional` Instead of Union Syntax

Problem: Old-style Optional[T] is not Python 3.10+ style. Solution: Use T | None instead of Optional[T] for all type hints.

# GOOD - Modern Python 3.10+ syntax
def my_function(param: str | None = None) -> int | None:
    pass

# BAD - Old-style Optional
from typing import Optional
def my_function(param: Optional[str] = None) -> Optional[int]:
    pass

5. ❌ Using `any` Types in Schemas

Problem: Violates TypeScript modernization goals, no validation. Solution: Use proper Pydantic types with Field descriptions.

6. ❌ Direct Database Queries

Problem: Bypasses Superset's security and caching layers. Solution: Use DAO classes (ChartDAO, DashboardDAO, etc.).

7. ❌ Not Using Core Classes

Problem: Duplicating list/get_info/schema logic across tools. Solution: Use ModelListCore, ModelGetInfoCore, ModelGetSchemaCore.

8. ❌ Missing Apache License Headers

Problem: CI fails on license check. Solution: Add Apache license header to all new .py files. Use this exact template at the top of every new Python file:

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

Note: LLM instruction files like CLAUDE.md, AGENTS.md, etc. are excluded from this requirement (listed in .rat-excludes) to avoid token overhead.

9. ❌ Using `@tool()` with Empty Parentheses

Problem: Inconsistent decorator style. Solution: Use @tool without parentheses unless passing arguments.

# GOOD
from superset_core.mcp import tool

@tool
def my_tool():
    pass

# BAD
from superset_core.mcp import tool

@tool
def my_tool():
    pass

10. ❌ Circular Imports

Problem: Importing too many things from app.py can create circular dependencies. Solution: Use the unified @tool decorator from superset_core.mcp:

# GOOD - New pattern
from superset_core.mcp import tool

@tool
def my_tool():
    pass

# ACCEPTABLE - For prompts/resources (when needed)
from superset.mcp_service.app import mcp

@prompt
def my_prompt():
    pass

# BAD - causes circular import
from superset.mcp_service.app import mcp, some_other_function

Configuration

Default configuration is in mcp_config.py. Users can override in superset_config.py:

# superset_config.py
MCP_ADMIN_USERNAME = "your_admin"
MCP_AUTH_ENABLED = True
MCP_JWT_PUBLIC_KEY = "your_public_key"

Tool Discovery

MCP clients discover tools via:

Tool listing: All tools with @tool are automatically listed
Schema introspection: Pydantic schemas generate JSON Schema for LLMs
Instructions: DEFAULT_INSTRUCTIONS in app.py documents available tools

Resources for Learning

MCP Specification: https://modelcontextprotocol.io/
FastMCP Documentation: https://github.com/jlowin/fastmcp
Superset DAO Patterns: See superset/daos/ for examples
Pydantic Documentation: https://docs.pydantic.dev/

Quick Checklist for New Tools

Created tool file in {module}/tool/{tool_name}.py
Added @tool decorator
Created Pydantic request/response schemas in {module}/schemas.py
Used DAO classes instead of direct queries when querying Superset entities
Added tool import to app.py (around line 210-242)
Added Apache license header to new files
Created unit tests in tests/unit_tests/mcp_service/{module}/tool/test_{tool_name}.py
Updated DEFAULT_INSTRUCTIONS in app.py if adding new capability
Tested locally with MCP client (e.g., Claude Desktop)

Quick Checklist for New Prompts

Created prompt file in {module}/prompts/{prompt_name}.py
Added @prompt("prompt_name") decorator
Made function async: async def prompt_handler(ctx: Context) -> str
Added import to {module}/prompts/__init__.py
Verified module import exists in app.py (around line 244-253)
Added Apache license header to new file
Updated DEFAULT_INSTRUCTIONS in app.py to list the new prompt
Tested locally with MCP client (e.g., Claude Desktop)

Quick Checklist for New Resources

Created resource file in {module}/resources/{resource_name}.py
Added @mcp.resource("superset://{path}") decorator with unique URI
Added @mcp_auth_hook decorator
Implemented resource data retrieval logic
Added import to {module}/resources/__init__.py
Verified module import exists in app.py (around line 244-253)
Added Apache license header to new file
Updated DEFAULT_INSTRUCTIONS in app.py to list the new resource
Tested locally with MCP client (e.g., Claude Desktop)

Getting Help

Check existing tool implementations for patterns (chart/tool/, dashboard/tool/)
Review core classes in mcp_core.py for reusable functionality
See CLAUDE.md in project root for general Superset development guidelines
Consult Superset documentation: https://superset.apache.org/docs/

22 KiB Raw Blame History

MCP Service - LLM Agent Guide

⚠️ CRITICAL: Apache License Headers

Architecture Overview

Key Components

Critical Convention: Tool, Prompt, and Resource Registration

How to Add a New Tool

How to Add a New Prompt

How to Add a New Resource

Tool Development Patterns

1. Use Core Classes for Reusability

2. Always Use Authentication

3. Use Pydantic Schemas

4. Follow the DAO Pattern

5. Python Type Hints (Python 3.10+ Style)

6. Flexible Input Parsing (JSON String or Object)

parse_json_or_passthrough

parse_json_or_list

parse_json_or_model

parse_json_or_model_list

7. Error Handling

Testing Conventions

Unit Tests

Integration Tests

Common Pitfalls to Avoid

1. ❌ Forgetting Tool Import in app.py

2. ❌ Adding Tool Imports to server.py

3. ❌ Missing Authentication

4. ❌ Using Optional Instead of Union Syntax

5. ❌ Using any Types in Schemas

6. ❌ Direct Database Queries

7. ❌ Not Using Core Classes

8. ❌ Missing Apache License Headers

9. ❌ Using @tool() with Empty Parentheses

10. ❌ Circular Imports

Configuration

Tool Discovery

Resources for Learning

Quick Checklist for New Tools

Quick Checklist for New Prompts

Quick Checklist for New Resources

Getting Help

22 KiB

Raw Blame History

4. ❌ Using `Optional` Instead of Union Syntax

5. ❌ Using `any` Types in Schemas

9. ❌ Using `@tool()` with Empty Parentheses