Files
superset2/superset/mcp_service/CLAUDE.md

677 lines
24 KiB
Markdown

# MCP Service - LLM Agent Guide
This guide helps LLM agents understand the Superset MCP (Model Context Protocol) service architecture and development conventions.
## ⚠️ CRITICAL: Apache License Headers
**EVERY Python file in the MCP service MUST have the Apache Software Foundation license header.**
This includes:
- All `.py` files (tool files, schemas, __init__.py files, etc.)
- **NEVER remove existing license headers during refactoring or edits**
- **ALWAYS add license headers when creating new files**
- **ALWAYS verify license headers are present after editing files**
If you see a file without a license header, ADD IT IMMEDIATELY. If you accidentally remove one during editing, ADD IT BACK.
Use this exact template at the top of EVERY Python file:
```python
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
```
**Note**: LLM instruction files like `CLAUDE.md`, `AGENTS.md`, etc. are excluded from this requirement (listed in `.rat-excludes`) to avoid token overhead, but ALL other Python files require it.
## Architecture Overview
The MCP service provides programmatic access to Superset via the Model Context Protocol, allowing AI assistants to interact with dashboards, charts, datasets, SQL Lab, and instance metadata.
### Key Components
```
superset/mcp_service/
├── app.py # FastMCP app factory and tool registration
├── auth.py # Authentication and authorization
├── mcp_config.py # Default configuration
├── mcp_core.py # Reusable core classes for tools
├── flask_singleton.py # Flask app singleton for MCP context
├── chart/ # Chart-related tools
│ ├── schemas.py # Pydantic schemas for chart responses
│ └── tool/ # Chart tool implementations
│ ├── __init__.py # Tool exports
│ ├── list_charts.py
│ └── get_chart_info.py
├── dashboard/ # Dashboard-related tools
│ ├── schemas.py
│ └── tool/
├── dataset/ # Dataset-related tools
│ ├── schemas.py
│ └── tool/
└── system/ # System/instance tools
├── schemas.py
└── tool/
```
## Critical Convention: Tool, Prompt, and Resource Registration
**IMPORTANT**: When creating new MCP tools, prompts, or resources, you MUST add their imports to `app.py` for auto-registration. Do NOT add them to `server.py` - that approach doesn't work properly.
### How to Add a New Tool
1. **Create the tool file** in the appropriate directory (e.g., `chart/tool/my_new_tool.py`)
2. **Decorate with `@tool`** to register it with the decorator
3. **Add import to `app.py`** at the bottom of the file where other tools are imported (around line 210-242)
**Example**:
```python
# superset/mcp_service/chart/tool/my_new_tool.py
from superset_core.api.mcp import tool
@tool
def my_new_tool(param: str) -> dict:
"""Tool description for LLMs."""
return {"result": "success"}
```
**Then add to app.py**:
```python
# superset/mcp_service/app.py (at the bottom, around line 207-224)
from superset.mcp_service.chart.tool import ( # noqa: F401, E402
get_chart_info,
list_charts,
my_new_tool, # ADD YOUR TOOL HERE
)
```
**Why this matters**: Tools use `@tool` decorators and register automatically on import. The import MUST be in `app.py` at the bottom of the file (after dependency injection is initialized). If you don't import the tool in `app.py`, it won't be available to MCP clients. DO NOT add imports to `server.py` - that file is for running the server only.
### How to Add a New Prompt
1. **Create the prompt file** in the appropriate directory (e.g., `chart/prompts/my_new_prompt.py`)
2. **Decorate with `@prompt`** to register it with the unified decorator
3. **Add import to module's `__init__.py`** (e.g., `chart/prompts/__init__.py`)
4. **Ensure module is imported in `app.py`** (around line 244-253)
**Example**:
```python
# superset/mcp_service/chart/prompts/my_new_prompt.py
from superset_core.api.mcp import prompt
@prompt("my_new_prompt")
async def my_new_prompt_handler(ctx: Context) -> str:
"""Interactive prompt for doing something."""
return "Prompt instructions here..."
```
**Then add to `chart/prompts/__init__.py`**:
```python
# superset/mcp_service/chart/prompts/__init__.py
from . import create_chart_guided # existing
from . import my_new_prompt # ADD YOUR PROMPT HERE
```
**Verify module import exists in `app.py`** (around line 248):
```python
# superset/mcp_service/app.py
from superset.mcp_service.chart import prompts as chart_prompts # This imports all prompts
```
### How to Add a New Resource
1. **Create the resource file** in the appropriate directory (e.g., `chart/resources/my_new_resource.py`)
2. **Decorate with `@mcp.resource`** to register it with FastMCP (resources still use direct FastMCP)
3. **Add import to module's `__init__.py`** (e.g., `chart/resources/__init__.py`)
4. **Ensure module is imported in `app.py`** (around line 244-253)
**Example**:
```python
# superset/mcp_service/chart/resources/my_new_resource.py
from superset.mcp_service.app import mcp
from superset.mcp_service.auth import mcp_auth_hook
@mcp.resource("superset://chart/my_resource")
@mcp_auth_hook
def get_my_resource() -> str:
"""Resource description for LLMs."""
return "Resource data here..."
```
**Note**: Resources continue to use the direct FastMCP decorators (`@mcp.resource`) rather than the unified `@tool()` decorator.
**Then add to `chart/resources/__init__.py`**:
```python
# superset/mcp_service/chart/resources/__init__.py
from . import chart_configs # existing
from . import my_new_resource # ADD YOUR RESOURCE HERE
```
**Verify module import exists in `app.py`** (around line 249):
```python
# superset/mcp_service/app.py
from superset.mcp_service.chart import resources as chart_resources # This imports all resources
```
**Why this matters**: Prompts and resources work similarly to tools - they use decorators and register on import. The module-level imports (`chart/prompts/__init__.py`, `chart/resources/__init__.py`) ensure individual files are imported when the module is imported. The `app.py` imports ensure the modules are loaded when the MCP service starts.
## Tool Development Patterns
### 1. Use Core Classes for Reusability
The `mcp_core.py` module provides reusable patterns:
- **`ModelListCore`**: For listing resources (dashboards, charts, datasets)
- **`ModelGetInfoCore`**: For getting resource details by ID/UUID
- **`ModelGetSchemaCore`**: For retrieving comprehensive schema metadata (columns, filters, sortable columns)
**Example**:
```python
from superset_core.api.mcp import tool
from superset.mcp_service.mcp_core import ModelListCore
from superset.daos.dashboard import DashboardDAO
from superset.mcp_service.dashboard.schemas import DashboardList
list_core = ModelListCore(
dao_class=DashboardDAO,
output_schema=DashboardList,
logger=logger,
)
@tool
def list_dashboards(filters: List[DashboardFilter], page: int = 1) -> DashboardList:
return list_core.run_tool(filters=filters, page=page, page_size=10)
```
### 2. Always Use Authentication
**Every tool must use `@tool`** with authentication enabled (default) to ensure:
- User authentication from JWT or configured admin user
- Permission checking via JWT scopes
- Audit logging of tool access
```python
from superset_core.api.mcp import tool
@tool # REQUIRED - secure=True by default
def my_tool() -> dict:
# g.user is set by tool decorator
return {"user": g.user.username}
@tool(protect=False) # Only for truly public tools
def public_tool() -> dict:
# No authentication required
return {"status": "public"}
```
### 3. Use Pydantic Schemas
**All tool inputs and outputs must be Pydantic models** for:
- Automatic validation
- LLM-friendly schema generation
- Type safety
**Convention**: Place schemas in `{module}/schemas.py`
```python
from pydantic import BaseModel, Field
class MyToolRequest(BaseModel):
param: str = Field(..., description="Parameter description for LLMs")
class MyToolResponse(BaseModel):
result: str = Field(..., description="Result description")
timestamp: datetime = Field(
default_factory=lambda: datetime.now(timezone.utc),
description="Response timestamp"
)
```
### 4. Follow the DAO Pattern
**Use Superset's DAO (Data Access Object) layer** instead of direct database queries:
```python
from superset.daos.dashboard import DashboardDAO
# GOOD: Use DAO
dashboard = DashboardDAO.find_by_id(dashboard_id)
# BAD: Don't query directly
dashboard = db.session.query(Dashboard).filter_by(id=dashboard_id).first()
```
### 5. Python Type Hints (Python 3.10+ Style)
**CRITICAL**: Always use modern Python 3.10+ union syntax for type hints.
```python
# GOOD - Modern Python 3.10+ syntax
from typing import List, Dict, Any
from pydantic import BaseModel, Field
class MySchema(BaseModel):
name: str | None = Field(None, description="Optional name")
tags: List[str] = Field(default_factory=list)
metadata: Dict[str, Any] = Field(default_factory=dict)
def my_function(
id: int,
filters: List[str] | None = None,
options: Dict[str, Any] | None = None
) -> MySchema | None:
pass
# BAD - Old-style Optional (DO NOT USE)
from typing import Optional, List, Dict, Any
class MySchema(BaseModel):
name: Optional[str] = Field(None, description="Optional name") # Wrong!
def my_function(
id: int,
filters: Optional[List[str]] = None, # Wrong!
options: Optional[Dict[str, Any]] = None # Wrong!
) -> Optional[MySchema]: # Wrong!
pass
```
**Key rules:**
- Use `T | None` instead of `Optional[T]`
- Do NOT import `Optional`, `List`, `Dict` from typing, prefer `| None`, `list[]` etc
- All new code must follow this pattern
### 6. Flexible Input Parsing (JSON String or Object)
**MCP tools accept both JSON string and native object formats for parameters** using utilities from `superset.mcp_service.utils.schema_utils`. This makes tools flexible for different client types (LLM clients send objects, CLI tools send JSON strings).
**PREFERRED: Use the `@parse_request` decorator** for tool functions to automatically handle request parsing:
```python
from superset.mcp_service.utils.schema_utils import parse_request
@mcp.tool
@mcp_auth_hook
@parse_request(ListChartsRequest) # Automatically parses string requests!
async def list_charts(request: ListChartsRequest | str, ctx: Context) -> ChartList:
"""List charts with filtering and search."""
# request is guaranteed to be ListChartsRequest here - no manual parsing needed!
await ctx.info(f"Listing charts: page={request.page}")
...
```
**Benefits:**
- Eliminates 5 lines of boilerplate code per tool
- Handles both async and sync functions automatically
- Works with Claude Code bug (GitHub issue #5504)
- Cleaner, more maintainable code
**Available utilities for other use cases:**
#### parse_json_or_passthrough
Parse JSON string or return object as-is:
```python
from superset.mcp_service.utils.schema_utils import parse_json_or_passthrough
# Accepts both formats
config = parse_json_or_passthrough(value, param_name="config")
# value can be: '{"key": "value"}' (JSON string) OR {"key": "value"} (dict)
```
#### parse_json_or_list
Parse to list from JSON, list, or comma-separated string:
```python
from superset.mcp_service.utils.schema_utils import parse_json_or_list
# Accepts multiple formats
items = parse_json_or_list(value, param_name="items")
# value can be:
# '["a", "b"]' (JSON array)
# ["a", "b"] (Python list)
# "a, b, c" (comma-separated string)
```
#### parse_json_or_model
Parse to Pydantic model from JSON or dict:
```python
from superset.mcp_service.utils.schema_utils import parse_json_or_model
# Accepts JSON string or dict
config = parse_json_or_model(value, ConfigModel, param_name="config")
# value can be: '{"name": "test"}' OR {"name": "test"}
```
#### parse_json_or_model_list
Parse to list of Pydantic models:
```python
from superset.mcp_service.utils.schema_utils import parse_json_or_model_list
# Accepts JSON array or list of dicts
filters = parse_json_or_model_list(value, FilterModel, param_name="filters")
# value can be: '[{"col": "name"}]' OR [{"col": "name"}]
```
**Using with Pydantic validators:**
```python
from pydantic import BaseModel, field_validator
from superset.mcp_service.utils.schema_utils import parse_json_or_list
class MyToolRequest(BaseModel):
filters: List[FilterModel] = Field(default_factory=list)
select_columns: List[str] = Field(default_factory=list)
@field_validator("filters", mode="before")
@classmethod
def parse_filters(cls, v):
"""Accept both JSON string and list of objects."""
return parse_json_or_model_list(v, FilterModel, "filters")
@field_validator("select_columns", mode="before")
@classmethod
def parse_columns(cls, v):
"""Accept JSON array, list, or comma-separated string."""
return parse_json_or_list(v, "select_columns")
```
**Core classes already use these utilities:**
- `ModelListCore` uses them for `filters` and `select_columns`
- No need to add parsing logic in individual tools that use core classes
**When to use:**
- Tool parameters that accept complex objects (dicts, lists)
- Parameters that may come from CLI tools (JSON strings) or LLM clients (objects)
- Any field where you want maximum flexibility
### 7. Error Handling
**Use consistent error schemas**:
```python
class MyError(BaseModel):
error: str = Field(..., description="Error message")
error_type: str = Field(..., description="Type of error")
timestamp: datetime = Field(
default_factory=lambda: datetime.now(timezone.utc),
description="Error timestamp"
)
@tool
def my_tool(id: int) -> MyResponse:
try:
result = process_data(id)
return MyResponse(data=result)
except NotFound:
raise ValueError(f"Resource {id} not found")
```
### 8. Dataset Validation for Chart Tools
**IMPORTANT**: All chart-related tools must validate that the chart's dataset is accessible before performing operations. Use the shared `validate_chart_dataset` utility from `chart_utils.py`.
**Why this matters**: Charts can reference datasets that have been deleted or become inaccessible. Without validation, users may see confusing errors when trying to preview or get data from charts with missing datasets.
**Usage pattern**:
```python
from superset.mcp_service.chart.chart_utils import validate_chart_dataset
# After retrieving a chart, validate its dataset
validation_result = validate_chart_dataset(chart, check_access=True)
if not validation_result.is_valid:
await ctx.warning("Dataset not accessible: %s" % (validation_result.error,))
return ChartError(
error=validation_result.error or "Chart's dataset is not accessible",
error_type="DatasetNotAccessible",
)
# Log any warnings (e.g., virtual dataset warnings)
for warning in validation_result.warnings:
await ctx.warning("Dataset warning: %s" % (warning,))
```
**Tools that use this pattern**:
- `get_chart_info` - Validates after retrieving chart metadata
- `get_chart_preview` - Validates before generating preview
- `get_chart_data` - Validates before querying data
- `generate_chart` - Validates after chart creation (post-creation validation)
**The `DatasetValidationResult` contains**:
- `is_valid`: Whether the dataset exists and is accessible
- `dataset_id`: The dataset ID being validated
- `dataset_name`: The dataset name (if found)
- `warnings`: List of warnings (e.g., "virtual dataset may be deleted")
- `error`: Error message if validation failed
## Testing Conventions
### Unit Tests
Place unit tests in `tests/unit_tests/mcp_service/{module}/tool/test_{tool_name}.py`
**Test structure**:
```python
from unittest.mock import MagicMock, patch
import pytest
class TestMyTool:
@pytest.fixture
def mock_dao(self):
"""Create mock DAO for testing."""
dao = MagicMock()
dao.find_by_id.return_value = create_mock_object()
return dao
@patch("superset.mcp_service.chart.tool.my_tool.ChartDAO")
def test_my_tool_success(self, mock_dao_class, mock_dao):
"""Test successful tool execution."""
mock_dao_class.return_value = mock_dao
result = my_tool(id=1)
assert result.data is not None
mock_dao.find_by_id.assert_called_once_with(1)
```
### Integration Tests
Use Flask test client for integration tests:
```python
def test_tool_with_flask_context(app):
"""Test tool with full Flask app context."""
with app.app_context():
result = my_tool(id=1)
assert result is not None
```
## Common Pitfalls to Avoid
### 1. ❌ Forgetting Tool Import in app.py
**Problem**: Tool exists but isn't available to MCP clients.
**Solution**: Always add tool import to `app.py` (at the bottom) after creating it. Never add to `server.py`.
### 2. ❌ Adding Tool Imports to server.py
**Problem**: Tools won't register properly, causing runtime errors.
**Solution**: Tool imports must be in `app.py` at the bottom of the file, not in `server.py`. The `server.py` file is only for running the server.
### 3. ❌ Missing Authentication
**Problem**: Tool bypasses authentication and authorization.
**Solution**: Always use `@tool` with default protect=True, or explicitly set protect=False only for public tools.
### 4. ❌ Using `Optional` Instead of Union Syntax
**Problem**: Old-style Optional[T] is not Python 3.10+ style.
**Solution**: Use `T | None` instead of `Optional[T]` for all type hints.
```python
# GOOD - Modern Python 3.10+ syntax
def my_function(param: str | None = None) -> int | None:
pass
# BAD - Old-style Optional
from typing import Optional
def my_function(param: Optional[str] = None) -> Optional[int]:
pass
```
### 5. ❌ Using `any` Types in Schemas
**Problem**: Violates TypeScript modernization goals, no validation.
**Solution**: Use proper Pydantic types with Field descriptions.
### 6. ❌ Direct Database Queries
**Problem**: Bypasses Superset's security and caching layers.
**Solution**: Use DAO classes (ChartDAO, DashboardDAO, etc.).
### 7. ❌ Not Using Core Classes
**Problem**: Duplicating list/get_info/schema logic across tools.
**Solution**: Use ModelListCore, ModelGetInfoCore, ModelGetSchemaCore.
### 8. ❌ Missing Apache License Headers
**Problem**: CI fails on license check.
**Solution**: Add Apache license header to all new .py files. Use this exact template at the top of every new Python file:
```python
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
```
**Note**: LLM instruction files like `CLAUDE.md`, `AGENTS.md`, etc. are excluded from this requirement (listed in `.rat-excludes`) to avoid token overhead.
### 9. ❌ Using `@tool()` with Empty Parentheses
**Problem**: Inconsistent decorator style.
**Solution**: Use `@tool` without parentheses unless passing arguments.
```python
# GOOD
from superset_core.api.mcp import tool
@tool
def my_tool():
pass
# BAD
from superset_core.api.mcp import tool
@tool
def my_tool():
pass
```
### 10. ❌ Circular Imports
**Problem**: Importing too many things from `app.py` can create circular dependencies.
**Solution**: Use the unified `@tool` decorator from `superset_core.api.mcp`:
```python
# GOOD - New pattern
from superset_core.api.mcp import tool
@tool
def my_tool():
pass
# ACCEPTABLE - For prompts/resources (when needed)
from superset.mcp_service.app import mcp
@prompt
def my_prompt():
pass
# BAD - causes circular import
from superset.mcp_service.app import mcp, some_other_function
```
## Configuration
Default configuration is in `mcp_config.py`. Users can override in `superset_config.py`:
```python
# superset_config.py
MCP_ADMIN_USERNAME = "your_admin"
MCP_AUTH_ENABLED = True
MCP_JWT_PUBLIC_KEY = "your_public_key"
```
## Tool Discovery
MCP clients discover tools via:
1. **Tool listing**: All tools with `@tool` are automatically listed
2. **Schema introspection**: Pydantic schemas generate JSON Schema for LLMs
3. **Instructions**: `DEFAULT_INSTRUCTIONS` in `app.py` documents available tools
## Resources for Learning
- **MCP Specification**: https://modelcontextprotocol.io/
- **FastMCP Documentation**: https://github.com/jlowin/fastmcp
- **Superset DAO Patterns**: See `superset/daos/` for examples
- **Pydantic Documentation**: https://docs.pydantic.dev/
## Quick Checklist for New Tools
- [ ] Created tool file in `{module}/tool/{tool_name}.py`
- [ ] Added `@tool` decorator
- [ ] Created Pydantic request/response schemas in `{module}/schemas.py`
- [ ] Used DAO classes instead of direct queries when querying Superset entities
- [ ] Added tool import to `app.py` (around line 210-242)
- [ ] Added Apache license header to new files
- [ ] Created unit tests in `tests/unit_tests/mcp_service/{module}/tool/test_{tool_name}.py`
- [ ] Updated `DEFAULT_INSTRUCTIONS` in `app.py` if adding new capability
- [ ] Tested locally with MCP client (e.g., Claude Desktop)
## Quick Checklist for New Prompts
- [ ] Created prompt file in `{module}/prompts/{prompt_name}.py`
- [ ] Added `@prompt("prompt_name")` decorator
- [ ] Made function async: `async def prompt_handler(ctx: Context) -> str`
- [ ] Added import to `{module}/prompts/__init__.py`
- [ ] Verified module import exists in `app.py` (around line 244-253)
- [ ] Added Apache license header to new file
- [ ] Updated `DEFAULT_INSTRUCTIONS` in `app.py` to list the new prompt
- [ ] Tested locally with MCP client (e.g., Claude Desktop)
## Quick Checklist for New Resources
- [ ] Created resource file in `{module}/resources/{resource_name}.py`
- [ ] Added `@mcp.resource("superset://{path}")` decorator with unique URI
- [ ] Added `@mcp_auth_hook` decorator
- [ ] Implemented resource data retrieval logic
- [ ] Added import to `{module}/resources/__init__.py`
- [ ] Verified module import exists in `app.py` (around line 244-253)
- [ ] Added Apache license header to new file
- [ ] Updated `DEFAULT_INSTRUCTIONS` in `app.py` to list the new resource
- [ ] Tested locally with MCP client (e.g., Claude Desktop)
## Getting Help
- Check existing tool implementations for patterns (chart/tool/, dashboard/tool/)
- Review core classes in `mcp_core.py` for reusable functionality
- See `CLAUDE.md` in project root for general Superset development guidelines
- Consult Superset documentation: https://superset.apache.org/docs/