fix(mcp): restore missing tool registrations and fix create_dataset tests

- Restore create_virtual_dataset, query_dataset, get_database_info, list_databases, get_chart_sql, get_chart_type_schema, save_sql_query imports in app.py (accidentally dropped in the feat commit) - Restore create_virtual_dataset and query_dataset exports in dataset tool __init__.py - Make CreateDatasetRequest.schema optional (str | None, default None) - Refactor create_dataset.py to use @tool decorator pattern - Fix test_create_dataset.py: convert @patch class decorators to with patch() context managers (avoids pytest-asyncio arg injection issues), add get_superset_base_url mock for success paths, and set certified_by/certification_details=None on mock dataset
test(mcp): fix patch paths in test_create_dataset — CreateDatasetCommand is a lazy import
2026-06-10 18:19:28 +00:00 · 2026-05-28 04:32:13 +00:00 · 2026-05-28 02:03:34 +00:00 · 2026-05-28 02:03:34 +00:00 · 2026-05-28 02:03:34 +00:00 · 2026-05-28 02:03:34 +00:00
5 changed files with 533 additions and 101 deletions
--- a/superset/mcp_service/app.py
+++ b/superset/mcp_service/app.py
@@ -130,20 +130,17 @@ Dashboard Management:
 - generate_dashboard: Create a dashboard from chart IDs (requires write access)
 - add_chart_to_existing_dashboard: Add a chart to an existing dashboard (requires write access)

-Database Connections:
- list_databases: List database connections with advanced filters (1-based pagination)
- get_database_info: Get detailed database connection info by ID (backend, capabilities)
-
 Dataset Management:
 - list_datasets: List datasets with advanced filters (1-based pagination)
 - get_dataset_info: Get detailed dataset information by ID (includes columns/metrics)
+- create_dataset: Register a physical table as a dataset against an existing DB connection (requires write access)
 - create_virtual_dataset: Save a SQL query as a virtual dataset for charting (requires write access)
 - query_dataset: Query a dataset using its semantic layer (saved metrics, dimensions, filters) without needing a saved chart

 Chart Management:
 - list_charts: List charts with advanced filters (1-based pagination)
 - get_chart_info: Get detailed chart information by ID
- get_chart_preview: Get a visual preview of a chart as formatted content or URL
+- get_chart_preview: Get a visual preview of a chart with image URL
 - get_chart_data: Get underlying chart data in text-friendly format
 - get_chart_sql: Get the rendered SQL query for a chart (without executing it)
 - generate_chart: Create and save a new chart permanently (requires write access)
@@ -163,30 +160,25 @@ System Information:
 - get_instance_info: Get instance-wide statistics, metadata, and current user identity
 - find_users: Resolve a person's name to user IDs for use as a filter value
 - health_check: Simple health check tool (takes NO parameters, call without arguments)
- generate_bug_report: Build a PII-sanitized bug report to send to Preset support
-  (use when the user says the MCP is broken or asks how to report an issue)

 Available Resources:
- instance://metadata: Instance configuration, stats, and available dataset IDs
- chart://configs: Valid chart configuration examples and best practices
+- instance/metadata: Access instance configuration and metadata
+- chart/templates: Access chart configuration templates

 Available Prompts:
 - quickstart: Interactive guide for getting started with the MCP service
 - create_chart_guided: Step-by-step chart creation wizard

-IMPORTANT - Using Saved Metrics vs Columns:
-When get_dataset_info returns a dataset, it includes both 'columns' and 'metrics'.
- 'columns' are raw database columns (e.g., order_date, product_name, revenue)
- 'metrics' are pre-defined saved metrics with SQL expressions
-  (e.g., count, total_revenue)
+Common Chart Types (viz_type) and Behaviors:

-When building chart configurations
-(generate_chart, generate_explore_link, update_chart):
- For raw columns: use {{"name": "col_name", "aggregate": "SUM"}}
- For saved metrics: use {{"name": "metric", "saved_metric": true}}
-  Do NOT add an aggregate when using saved_metric=true
-  (it's already defined in the metric).
-  Do NOT use a saved metric name as if it were a column — it will fail.
+Interactive Charts (support sorting, filtering, drill-down):
+- table: Standard table view with sorting and filtering
+- pivot_table_v2: Pivot table with grouping and aggregations
+- echarts_timeseries_line: Time series line chart
+- echarts_timeseries_bar: Time series bar chart
+- echarts_timeseries_area: Time series area chart
+- echarts_timeseries_scatter: Time series scatter plot
+- mixed_timeseries: Combined line/bar time series

 Example: If get_dataset_info returns metrics=[{{"metric_name": "count", ...}}], use:
  {{"name": "count", "saved_metric": true}}  ← CORRECT
@@ -315,52 +307,11 @@ Chart Types in Existing Charts (viewable via list_charts/get_chart_info):
 - word_cloud, world_map, box_plot, bubble, mixed_timeseries

 Query Examples:
- List all tables:
-  list_charts(request={{"filters": [{{"col": "viz_type",
-    "opr": "in",
-    "value": ["table", "pivot_table_v2"]}}]}})
+- List all interactive tables:
+  filters=[{{"col": "viz_type", "opr": "in", "value": ["table", "pivot_table_v2"]}}]
 - List time series charts:
-  list_charts(request={{"filters": [{{"col": "viz_type",
-    "opr": "sw", "value": "echarts_timeseries"}}]}})
- Search by name: list_charts(request={{"search": "sales"}})
- My charts: list_charts(request={{"created_by_me": true}})
- My dashboards: list_dashboards(request={{"created_by_me": true}})
- My databases: list_databases(request={{"created_by_me": true}})
-To modify an existing chart (add filters, change metrics, etc.):
-1. get_chart_info(request={{"identifier": <chart_id>}})
-   -> examine current configuration
-2. update_chart(request={{
-     "identifier": <chart_id>, "config": {{...}}
-   }}) -> apply changes
-Do NOT use execute_sql for chart modifications.
-Use update_chart instead.
-
-CRITICAL RULES - NEVER VIOLATE:
- NEVER fabricate or invent URLs. ALL URLs must come from tool call results.
-  If you need a link, call the appropriate tool (generate_explore_link, generate_chart,
-  open_sql_lab_with_context, etc.) and use the URL it returns.
- NEVER call generate_dashboard when the user wants to add a chart to an EXISTING
-  dashboard. Always use add_chart_to_existing_dashboard. Only call generate_dashboard
-  to create a brand-new dashboard, or after the user explicitly confirms they want
-  a new one (e.g., after a permission_denied=True response from
-  add_chart_to_existing_dashboard).
- To modify an existing chart's filters, metrics, or dimensions, use update_chart.
-  Do NOT use execute_sql for chart modifications.
- Parameter name reminders: ALWAYS use the EXACT parameter names from the tool schema.
-  Do NOT use Superset's internal form_data names.
-
-IMPORTANT - Tool-Only Interaction:
- Do NOT generate code artifacts, HTML pages, JavaScript snippets, or any code intended
-  for the user to run. All visualization, data retrieval, and authentication are handled
-  by the provided MCP tools.
- Always call the appropriate tool directly instead of writing code. For example, use
-  generate_chart to create visualizations rather than generating plotting code.
- When a tool returns a URL (chart URL, dashboard URL, explore link, SQL Lab link),
-  return that URL to the user. Do NOT attempt to recreate the visualization in code.
- Do NOT generate HTML dashboards, embed scripts, or custom frontend code. Use
-  generate_dashboard and add_chart_to_existing_dashboard for dashboard operations.
- If a user asks for something the tools cannot do, explain the limitation and suggest
-  the closest available tool rather than generating code as a workaround.
+  filters=[{{"col": "viz_type", "opr": "sw", "value": "echarts_timeseries"}}]
+- Search by name: search="sales"

 General usage tips:
 - All listing tools use 1-based pagination (first page is 1)
@@ -368,7 +319,7 @@ General usage tips:
 - Use 'filters' parameter for advanced queries with filter columns from get_schema
 - IDs can be integer or UUID format where supported
 - All tools return structured, Pydantic-typed responses
- Chart previews can return ASCII text, Explore URLs, table data, or Vega-Lite specs
+- Chart previews are served as PNG images via custom screenshot endpoints

 Input format:
 - Tool request parameters accept structured objects (dicts/JSON)
@@ -377,10 +328,11 @@ Input format:
 {_feature_availability}Permission Awareness:
 {_instance_info_role_bullet}- ALWAYS check the user's roles BEFORE suggesting write operations (creating datasets,
  charts, or dashboards). SQL execution is a separate permission — see execute_sql below.
- Write tools (generate_chart, generate_dashboard, update_chart, create_virtual_dataset,
-  save_sql_query, add_chart_to_existing_dashboard, update_chart_preview) require write
-  permissions. These tools are only listed for users who have the necessary access.
-  If a write tool does not appear in the tool list, the current user lacks write access.
+- Write tools (generate_chart, generate_dashboard, update_chart, create_dataset,
+  create_virtual_dataset, save_sql_query, add_chart_to_existing_dashboard,
+  update_chart_preview) require write permissions. These tools are only listed for
+  users who have the necessary access. If a write tool does not appear in the tool
+  list, the current user lacks write access.
 - execute_sql requires SQL Lab access (execute_sql_query permission), which is separate
  from write access. A user may have SQL Lab access without having write access to charts
  or dashboards, and vice versa.
@@ -584,39 +536,13 @@ def create_mcp_app(


 # Create default MCP instance for backward compatibility
+# Tool modules can import this and use @mcp.tool decorators
 mcp = create_mcp_app()

-# Initialize MCP dependency injection BEFORE importing tools/prompts
-# This replaces the abstract @tool and @prompt decorators in superset_core.api.mcp
-# with concrete implementations that can register with the mcp instance
-from superset.core.mcp.core_mcp_injection import (  # noqa: E402
-    initialize_core_mcp_dependencies,
-)
-
-initialize_core_mcp_dependencies()
-
-# Suppress known third-party deprecation warnings that leak to MCP clients.
-# The MCP SDK captures Python warnings and forwards them to clients via
-# server log entries, wasting LLM tokens and causing clients to act on
-# irrelevant internal warnings. These warnings come from transitive imports
-# triggered by tool/schema registration below.
-import warnings  # noqa: E402
-
-warnings.filterwarnings(
-    "ignore",
-    category=DeprecationWarning,
-    module=r"marshmallow\..*",
-)
-warnings.filterwarnings(
-    "ignore",
-    category=FutureWarning,
-    module=r"google\..*",
-)
-
 # Import all MCP tools to register them with the mcp instance
 # NOTE: Always add new tool imports here when creating new MCP tools.
-# Tools use the @tool decorator from `superset-core` and register automatically
-# on import. Import prompts and resources to register them with the mcp instance
+# Tools use @mcp.tool decorators and register automatically on import.
+# Import prompts and resources to register them with the mcp instance
 # NOTE: Always add new prompt/resource imports here when creating new prompts/resources.
 # Prompts use @mcp.prompt decorators and resources use @mcp.resource decorators.
 # They register automatically on import, similar to tools.
@@ -646,6 +572,7 @@ from superset.mcp_service.database.tool import (  # noqa: F401, E402
    list_databases,
 )
 from superset.mcp_service.dataset.tool import (  # noqa: F401, E402
+    create_dataset,
    create_virtual_dataset,
    get_dataset_info,
    list_datasets,
--- a/superset/mcp_service/dataset/schemas.py
+++ b/superset/mcp_service/dataset/schemas.py
@@ -324,6 +324,37 @@ class GetDatasetInfoRequest(MetadataCacheControl):
    ]


+class CreateDatasetRequest(BaseModel):
+    """Request schema for create_dataset to register a physical table as a dataset."""
+
+    database_id: Annotated[
+        int,
+        Field(
+            description="ID of the database connection to register the table against"
+        ),
+    ]
+    schema: Annotated[
+        str | None,
+        Field(
+            default=None,
+            description="Schema (namespace) where the table lives, e.g. 'public'. "
+            "Optional: omit to use the database default schema.",
+        ),
+    ]
+    table_name: Annotated[
+        str,
+        Field(description="Name of the physical table to register as a dataset"),
+    ]
+    owners: Annotated[
+        List[int] | None,
+        Field(
+            default=None,
+            description="Optional list of owner user IDs. "
+            "Defaults to the calling user.",
+        ),
+    ]
+
+
 class CreateVirtualDatasetRequest(BaseModel):
    """Request schema for create_virtual_dataset."""

--- a/superset/mcp_service/dataset/tool/init.py
+++ b/superset/mcp_service/dataset/tool/init.py
@@ -15,14 +15,16 @@
 # specific language governing permissions and limitations
 # under the License.

+from .create_dataset import create_dataset
 from .create_virtual_dataset import create_virtual_dataset
 from .get_dataset_info import get_dataset_info
 from .list_datasets import list_datasets
 from .query_dataset import query_dataset

 __all__ = [
+    "create_dataset",
    "create_virtual_dataset",
-    "list_datasets",
    "get_dataset_info",
+    "list_datasets",
    "query_dataset",
 ]
--- a/superset/mcp_service/dataset/tool/create_dataset.py
+++ b/superset/mcp_service/dataset/tool/create_dataset.py
@@ -0,0 +1,142 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+Create dataset FastMCP tool
+
+Registers a physical table as a Superset dataset against an existing
+database connection — the programmatic equivalent of Data → Datasets → +Dataset.
+Returns the same DatasetInfo shape as get_dataset_info so the caller can feed
+the resulting dataset_id directly into generate_chart.
+"""
+
+import logging
+from typing import Any
+
+from fastmcp import Context
+from superset_core.mcp.decorators import tool, ToolAnnotations
+
+from superset.extensions import event_logger
+from superset.mcp_service.dataset.schemas import (
+    CreateDatasetRequest,
+    DatasetError,
+    DatasetInfo,
+    serialize_dataset_object,
+)
+
+logger = logging.getLogger(__name__)
+
+
+@tool(
+    tags=["mutate"],
+    class_permission_name="Dataset",
+    method_permission_name="write",
+    annotations=ToolAnnotations(
+        title="Create dataset",
+        readOnlyHint=False,
+        destructiveHint=False,
+    ),
+)
+async def create_dataset(
+    request: CreateDatasetRequest, ctx: Context
+) -> DatasetInfo | DatasetError:
+    """Register a physical table as a Superset dataset.
+
+    Wraps POST /api/v1/dataset/ — the same endpoint the UI uses when you click
+    Data → Datasets → +Dataset.  Returns full dataset metadata (same shape as
+    get_dataset_info) so you can pass the resulting dataset_id straight into
+    generate_chart.
+
+    Required fields:
+    - database_id: ID of the existing database connection
+    - table_name: Exact name of the physical table to register
+
+    Optional fields:
+    - schema: Schema/namespace where the table lives (e.g. "public")
+    - owners: List of user IDs to set as owners (defaults to calling user)
+
+    Example:
+    ```json
+    {
+        "database_id": 1,
+        "schema": "public",
+        "table_name": "orders"
+    }
+    ```
+
+    Returns DatasetInfo on success or DatasetError on failure.
+    Use list_databases to find the correct database_id.
+    """
+    await ctx.info(
+        "Creating dataset: database_id=%s, schema=%r, table_name=%r"
+        % (request.database_id, request.schema, request.table_name)
+    )
+    try:
+        from superset.commands.dataset.create import CreateDatasetCommand
+        from superset.commands.dataset.exceptions import (
+            DatasetCreateFailedError,
+            DatasetExistsValidationError,
+            DatasetInvalidError,
+            TableNotFoundValidationError,
+        )
+
+        dataset_properties: dict[str, Any] = {
+            "database": request.database_id,
+            "schema": request.schema,
+            "table_name": request.table_name,
+        }
+        if request.owners is not None:
+            dataset_properties["owners"] = request.owners
+
+        with event_logger.log_context(action="mcp.create_dataset"):
+            command = CreateDatasetCommand(dataset_properties)
+            dataset = command.run()
+
+        result = serialize_dataset_object(dataset)
+        if result is None:
+            return DatasetError.create(
+                error="Dataset was created but could not be serialized",
+                error_type="SerializationError",
+            )
+
+        logger.info(
+            "Created dataset id=%s table=%s.%s",
+            dataset.id,
+            request.schema,
+            request.table_name,
+        )
+        return result
+
+    except DatasetExistsValidationError as e:
+        await ctx.error("Dataset already exists: %s" % (str(e),))
+        return DatasetError.create(error=str(e), error_type="DatasetExistsError")
+    except TableNotFoundValidationError as e:
+        await ctx.error("Table not found: %s" % (str(e),))
+        return DatasetError.create(error=str(e), error_type="TableNotFoundError")
+    except DatasetInvalidError as e:
+        await ctx.error("Dataset validation failed: %s" % (str(e),))
+        return DatasetError.create(error=str(e), error_type="ValidationError")
+    except DatasetCreateFailedError as e:
+        await ctx.error("Dataset creation failed: %s" % (str(e),))
+        return DatasetError.create(error=str(e), error_type="CreateFailedError")
+    except Exception as e:
+        logger.error("Failed to create dataset: %s", e, exc_info=True)
+        await ctx.error("Unexpected error: %s: %s" % (type(e).__name__, str(e)))
+        return DatasetError.create(
+            error=f"Failed to create dataset: {str(e)}",
+            error_type="InternalError",
+        )
--- a/tests/unit_tests/mcp_service/dataset/tool/test_create_dataset.py
+++ b/tests/unit_tests/mcp_service/dataset/tool/test_create_dataset.py
@@ -0,0 +1,330 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Unit tests for create_dataset MCP tool."""
+
+import logging
+from unittest.mock import MagicMock, Mock, patch
+
+import pytest
+from fastmcp import Client
+from fastmcp.exceptions import ToolError
+
+from superset.mcp_service.app import mcp
+from superset.utils import json
+
+logging.basicConfig(level=logging.DEBUG)
+logger = logging.getLogger(__name__)
+
+
+def _make_mock_dataset(
+    dataset_id: int = 42,
+    table_name: str = "orders",
+    schema: str = "public",
+    database_name: str = "main_db",
+) -> MagicMock:
+    dataset = MagicMock()
+    dataset.id = dataset_id
+    dataset.table_name = table_name
+    dataset.schema = schema
+    dataset.description = None
+    dataset.changed_by_name = "admin"
+    dataset.changed_on = None
+    dataset.changed_on_humanized = None
+    dataset.created_by_name = "admin"
+    dataset.created_on = None
+    dataset.created_on_humanized = None
+    dataset.tags = []
+    dataset.owners = []
+    dataset.is_virtual = False
+    dataset.database_id = 1
+    dataset.certified_by = None
+    dataset.certification_details = None
+    dataset.schema_perm = f"[{database_name}].[{schema}]"
+    dataset.url = f"/tablemodelview/edit/{dataset_id}"
+    dataset.database = MagicMock()
+    dataset.database.database_name = database_name
+    dataset.sql = None
+    dataset.main_dttm_col = None
+    dataset.offset = 0
+    dataset.cache_timeout = 0
+    dataset.params = {}
+    dataset.template_params = {}
+    dataset.extra = {}
+    dataset.uuid = f"dataset-uuid-{dataset_id}"
+    dataset.columns = []
+    dataset.metrics = []
+    return dataset
+
+
+@pytest.fixture
+def mcp_server():
+    return mcp
+
+
+@pytest.fixture(autouse=True)
+def mock_auth():
+    with patch("superset.mcp_service.auth.get_user_from_request") as mock_get_user:
+        mock_user = Mock()
+        mock_user.id = 1
+        mock_user.username = "admin"
+        mock_get_user.return_value = mock_user
+        yield mock_get_user
+
+
+class TestCreateDataset:
+    """Tests for the create_dataset MCP tool."""
+
+    @pytest.mark.asyncio
+    async def test_create_dataset_success(self, mcp_server):
+        """Happy path: tool creates dataset and returns DatasetInfo."""
+        mock_dataset = _make_mock_dataset()
+        mock_command = MagicMock()
+        mock_command.run.return_value = mock_dataset
+
+        with (
+            patch(
+                "superset.commands.dataset.create.CreateDatasetCommand",
+                return_value=mock_command,
+            ) as mock_command_class,
+            patch(
+                "superset.mcp_service.utils.url_utils.get_superset_base_url",
+                return_value="http://localhost:8088",
+            ),
+        ):
+            async with Client(mcp_server) as client:
+                result = await client.call_tool(
+                    "create_dataset",
+                    {
+                        "request": {
+                            "database_id": 1,
+                            "schema": "public",
+                            "table_name": "orders",
+                        }
+                    },
+                )
+
+        assert result.content is not None
+        data = json.loads(result.content[0].text)
+        assert data["id"] == 42
+        assert data["table_name"] == "orders"
+        assert data["schema"] == "public"
+
+        # Verify the command was called with the right properties
+        call_kwargs = mock_command_class.call_args[0][0]
+        assert call_kwargs["database"] == 1
+        assert call_kwargs["schema"] == "public"
+        assert call_kwargs["table_name"] == "orders"
+        assert "owners" not in call_kwargs
+
+    @pytest.mark.asyncio
+    async def test_create_dataset_with_owners(self, mcp_server):
+        """Owners list is forwarded to the command when supplied."""
+        mock_dataset = _make_mock_dataset()
+        mock_command = MagicMock()
+        mock_command.run.return_value = mock_dataset
+
+        with (
+            patch(
+                "superset.commands.dataset.create.CreateDatasetCommand",
+                return_value=mock_command,
+            ) as mock_command_class,
+            patch(
+                "superset.mcp_service.utils.url_utils.get_superset_base_url",
+                return_value="http://localhost:8088",
+            ),
+        ):
+            async with Client(mcp_server) as client:
+                result = await client.call_tool(
+                    "create_dataset",
+                    {
+                        "request": {
+                            "database_id": 2,
+                            "schema": "sales",
+                            "table_name": "transactions",
+                            "owners": [5, 10],
+                        }
+                    },
+                )
+
+        data = json.loads(result.content[0].text)
+        assert data["id"] == 42
+
+        call_kwargs = mock_command_class.call_args[0][0]
+        assert call_kwargs["owners"] == [5, 10]
+
+    @pytest.mark.asyncio
+    async def test_create_dataset_already_exists(self, mcp_server):
+        """Returns DatasetError when a dataset for the table already exists."""
+        from superset.commands.dataset.exceptions import DatasetExistsValidationError
+        from superset.sql.parse import Table
+
+        mock_command = MagicMock()
+        mock_command.run.side_effect = DatasetExistsValidationError(
+            Table("orders", "public", None)
+        )
+
+        with patch(
+            "superset.commands.dataset.create.CreateDatasetCommand",
+            return_value=mock_command,
+        ):
+            async with Client(mcp_server) as client:
+                result = await client.call_tool(
+                    "create_dataset",
+                    {
+                        "request": {
+                            "database_id": 1,
+                            "schema": "public",
+                            "table_name": "orders",
+                        }
+                    },
+                )
+
+        data = json.loads(result.content[0].text)
+        assert data["error_type"] == "DatasetExistsError"
+        assert "error" in data
+
+    @pytest.mark.asyncio
+    async def test_create_dataset_table_not_found(self, mcp_server):
+        """Returns DatasetError when the physical table does not exist in the DB."""
+        from superset.commands.dataset.exceptions import TableNotFoundValidationError
+        from superset.sql.parse import Table
+
+        mock_command = MagicMock()
+        mock_command.run.side_effect = TableNotFoundValidationError(
+            Table("missing_table", "public", None)
+        )
+
+        with patch(
+            "superset.commands.dataset.create.CreateDatasetCommand",
+            return_value=mock_command,
+        ):
+            async with Client(mcp_server) as client:
+                result = await client.call_tool(
+                    "create_dataset",
+                    {
+                        "request": {
+                            "database_id": 1,
+                            "schema": "public",
+                            "table_name": "missing_table",
+                        }
+                    },
+                )
+
+        data = json.loads(result.content[0].text)
+        assert data["error_type"] == "TableNotFoundError"
+
+    @pytest.mark.asyncio
+    async def test_create_dataset_unexpected_error(self, mcp_server):
+        """Unexpected exceptions are caught and returned as InternalError."""
+        mock_command = MagicMock()
+        mock_command.run.side_effect = RuntimeError("DB connection lost")
+
+        with patch(
+            "superset.commands.dataset.create.CreateDatasetCommand",
+            return_value=mock_command,
+        ):
+            async with Client(mcp_server) as client:
+                result = await client.call_tool(
+                    "create_dataset",
+                    {
+                        "request": {
+                            "database_id": 1,
+                            "schema": "public",
+                            "table_name": "orders",
+                        }
+                    },
+                )
+
+        data = json.loads(result.content[0].text)
+        assert data["error_type"] == "InternalError"
+        assert "DB connection lost" in data["error"]
+
+    @pytest.mark.asyncio
+    async def test_create_dataset_missing_required_fields(self, mcp_server):
+        """Missing required fields raise a validation error before the tool runs."""
+        async with Client(mcp_server) as client:
+            with pytest.raises(ToolError):
+                await client.call_tool(
+                    "create_dataset",
+                    {
+                        "request": {
+                            # database_id and table_name are omitted intentionally
+                            "schema": "public",
+                        }
+                    },
+                )
+
+    @pytest.mark.asyncio
+    async def test_create_dataset_returns_full_dataset_info(self, mcp_server):
+        """The returned DatasetInfo includes columns, metrics, and all core fields."""
+        mock_dataset = _make_mock_dataset(
+            dataset_id=99, table_name="sales", schema="dw"
+        )
+
+        col = MagicMock()
+        col.column_name = "amount"
+        col.verbose_name = "Amount"
+        col.type = "NUMERIC"
+        col.is_dttm = False
+        col.groupby = True
+        col.filterable = True
+        col.description = "Sale amount"
+        mock_dataset.columns = [col]
+
+        metric = MagicMock()
+        metric.metric_name = "total_sales"
+        metric.verbose_name = "Total Sales"
+        metric.expression = "SUM(amount)"
+        metric.description = "Sum of amounts"
+        metric.d3format = None
+        mock_dataset.metrics = [metric]
+
+        mock_command = MagicMock()
+        mock_command.run.return_value = mock_dataset
+
+        with (
+            patch(
+                "superset.commands.dataset.create.CreateDatasetCommand",
+                return_value=mock_command,
+            ),
+            patch(
+                "superset.mcp_service.utils.url_utils.get_superset_base_url",
+                return_value="http://localhost:8088",
+            ),
+        ):
+            async with Client(mcp_server) as client:
+                result = await client.call_tool(
+                    "create_dataset",
+                    {
+                        "request": {
+                            "database_id": 1,
+                            "schema": "dw",
+                            "table_name": "sales",
+                        }
+                    },
+                )
+
+        data = json.loads(result.content[0].text)
+        assert data["id"] == 99
+        assert data["table_name"] == "sales"
+        assert data["schema"] == "dw"
+        assert data["is_virtual"] is False
+        assert len(data["columns"]) == 1
+        assert data["columns"][0]["column_name"] == "amount"
+        assert len(data["metrics"]) == 1
+        assert data["metrics"][0]["metric_name"] == "total_sales"
Author	SHA1	Message	Date
Amin Ghadersohi	9b6ff262fd	fix(mcp): restore missing tool registrations and fix create_dataset tests - Restore create_virtual_dataset, query_dataset, get_database_info, list_databases, get_chart_sql, get_chart_type_schema, save_sql_query imports in app.py (accidentally dropped in the feat commit) - Restore create_virtual_dataset and query_dataset exports in dataset tool __init__.py - Make CreateDatasetRequest.schema optional (str \| None, default None) - Refactor create_dataset.py to use @tool decorator pattern - Fix test_create_dataset.py: convert @patch class decorators to with patch() context managers (avoids pytest-asyncio arg injection issues), add get_superset_base_url mock for success paths, and set certified_by/certification_details=None on mock dataset	2026-05-28 04:32:13 +00:00
Amin Ghadersohi	576d40111b	test(mcp): fix patch paths in test_create_dataset — CreateDatasetCommand is a lazy import CreateDatasetCommand is imported inside the function body, so patching at superset.mcp_service.dataset.tool.create_dataset.CreateDatasetCommand fails with AttributeError. Patch at the source module instead. Also fix data["schema_name"] assertions: DatasetInfo.model_serializer renames the field to "schema" in the serialized output.	2026-05-28 02:03:34 +00:00
Amin Ghadersohi	178fe56c9c	fix(mcp): fix create_dataset CI failures - schemas.py: restore full apache/master version and add CreateDatasetRequest (previous cherry-pick used an older shorter version missing helper functions _sanitize_dataset_info_for_llm_context, _humanize_timestamp, etc.) - create_dataset.py: remove parse_request decorator (not in apache/master yet)	2026-05-28 02:03:34 +00:00
Amin Ghadersohi	de4da995b2	style: ruff format create_dataset tool files	2026-05-28 02:03:34 +00:00
Amin Ghadersohi	0f7f92011c	feat(mcp): add create_dataset tool to register physical tables as datasets Adds create_dataset MCP tool that wraps POST /api/v1/dataset/ so skills and agents can register an existing physical table as a Superset dataset without manual UI interaction. Returns DatasetInfo (same shape as get_dataset_info) so the resulting dataset_id feeds directly into generate_chart. - CreateDatasetRequest schema (database_id, schema, table_name, owners?) - Tool file with typed error handling (exists/not-found/validation/internal) - Registered in dataset/tool/__init__.py and app.py - DEFAULT_INSTRUCTIONS updated to list create_dataset - Unit tests covering success, owners, error cases, and full DatasetInfo shape	2026-05-28 02:03:34 +00:00