Compare commits

..

28 Commits

Author SHA1 Message Date
Maxime Beauchemin
375fe42a68 pointing link to master 2025-07-29 12:01:05 -07:00
Maxime Beauchemin
e6e0c3c47e docs 2025-07-29 11:19:58 -07:00
Maxime Beauchemin
1d6617d809 improve startup script 2025-07-29 11:19:58 -07:00
Maxime Beauchemin
4ff2a85b11 gh 2025-07-29 11:19:58 -07:00
Maxime Beauchemin
f1a3bdd878 tweak utilities 2025-07-29 11:19:58 -07:00
Maxime Beauchemin
4b5dbf3dcf public port 2025-07-29 11:19:58 -07:00
Maxime Beauchemin
458db68929 tmux 2025-07-29 11:19:58 -07:00
Maxime Beauchemin
d4463078ad only 9001 2025-07-29 11:19:57 -07:00
Maxime Beauchemin
7ad10ac1a9 ssh 2025-07-29 11:19:57 -07:00
Maxime Beauchemin
f580f6159e ok 2025-07-29 11:19:57 -07:00
Maxime Beauchemin
a26e0ea0fe fix: Use Python 3.11 Bookworm image to match current standard
- Switch to pre-built Python 3.11 image (no compilation)
- Bookworm base matches Superset Docker images
- Python 3.11 is the current tested standard
- Faster startup, no building from source
2025-07-29 11:19:57 -07:00
Maxime Beauchemin
4eef7a65c1 fix: Remove Python feature to avoid building from source
- Ubuntu 24.04 already includes Python 3.12
- No need to build Python from source (saves ~10min)
- System Python is sufficient for host environment
- Actual Superset Python runs in Docker containers
2025-07-29 11:19:57 -07:00
Maxime Beauchemin
ba3388bf94 feat: Add Claude Code CLI to devcontainer setup
- Install Claude Code for AI-assisted development
- Perfect for using 'claude --yes' safely in Codespaces
- No risk to local machine when running automated commands
2025-07-29 11:19:57 -07:00
Maxime Beauchemin
ca57bbc1e2 feat: Add uv package installer to devcontainer setup
- Install uv via official installer script
- Provides 10-100x faster Python package operations
- Matches what CI uses for package installation
2025-07-29 11:19:56 -07:00
Maxime Beauchemin
19f414b217 fix: Update Node version to 20 to match package.json requirements
- package.json specifies Node ^20.18.1
- Update devcontainer to use Node 20 instead of 18
2025-07-29 11:19:56 -07:00
Maxime Beauchemin
bc604d54e4 fix: Use Ubuntu 24.04 base to match CI with Python 3.11
- Switch to ubuntu-24.04 to match CI environment
- Add Python 3.11 explicitly
- Keep lean setup with only needed features
2025-07-29 11:19:56 -07:00
Maxime Beauchemin
e922e51e6b fix: Use lean Python base image instead of bloated universal
- Switch from 10GB universal to ~2GB Python base
- Add only needed features: Docker, Node, Git
- Much faster Codespace startup
- Same functionality, less bloat
2025-07-29 11:19:56 -07:00
Maxime Beauchemin
8bf2e4ea3a fix: Simplify devcontainer to avoid docker-compose conflicts
- Remove all features (universal image has everything)
- Simplified config to just image + scripts
- No dockerComposeFile reference
- Plain container that runs docker-compose internally
2025-07-29 11:19:56 -07:00
Maxime Beauchemin
cf8183b67e fix: Force rebuild with clean devcontainer config 2025-07-29 11:19:56 -07:00
Maxime Beauchemin
02f90f4321 feat: Use devcontainers/universal image for better tooling
- Switch to universal:2 image which includes vim, curl, jq, tmux, etc.
- Remove redundant features (already in universal image)
- Simplify setup script - only install Superset-specific libs
- Keeps SSH feature for remote access
2025-07-29 11:19:55 -07:00
Maxime Beauchemin
a007b3020d fix: Refactor devcontainer to use base Ubuntu with Docker-in-Docker
- Switch from docker-compose service to base Ubuntu container
- Add Docker-in-Docker to run docker-compose inside Codespace
- This provides git access and full dev environment
- Superset services run via docker-compose from within the container
2025-07-29 11:19:55 -07:00
Maxime Beauchemin
26e5e637f9 feat: Add SSH support to Codespaces configuration 2025-07-29 11:19:55 -07:00
Maxime Beauchemin
8de420ec8e fix: Correct workspace paths for Codespaces
- Use /workspaces instead of /app for Codespaces compatibility
- Fix postCreateCommand and postStartCommand paths
- Make startup script more flexible with directory detection
2025-07-29 11:19:55 -07:00
Maxime Beauchemin
fd51cc65a2 feat: Add GitHub Codespaces support with docker-compose-light
## Summary

Adds full GitHub Codespaces development environment configuration leveraging the new `docker-compose-light.yml` for efficient cloud development.

## Key Features

- **Lightweight Setup**: Uses `docker-compose-light.yml` which removes Redis/nginx for faster startup and lower resource usage
- **Multi-Instance Support**: Each Codespace gets isolated database volumes, perfect for testing multiple branches
- **Auto-Configuration**: Includes VS Code extensions, Python/TypeScript settings, and auto-start script
- **Developer Friendly**: Comprehensive README with SSH, VS Code, and browser connection instructions

## Implementation Details

### Files Added
- `.devcontainer/devcontainer.json` - Main configuration with:
  - Docker-in-Docker support for compose
  - Optimized VS Code extensions for Superset development
  - Smart port forwarding (9001 for frontend, 8088 for API)
  - 4-core/8GB recommended resources

- `.devcontainer/start-superset.sh` - Auto-start script that:
  - Uses unique project names per Codespace
  - Handles Docker daemon startup
  - Shows clear status and credentials

- `.devcontainer/README.md` - Developer guide covering:
  - Multiple connection methods (SSH, VS Code, browser)
  - Port forwarding instructions
  - Cost optimization tips
  - Integration with `claude --yes` workflows

## Benefits

1. **Isolated Development**: No risk to local machine when using `claude --yes`
2. **Resource Efficiency**: Laptop stays cool, Codespaces handles the load
3. **Parallel Testing**: Spin up multiple instances for different features
4. **Quick Pause/Resume**: Auto-stops when idle, resumes in ~30 seconds

## Testing

Push to fork and create a Codespace to test. The environment auto-starts Superset and forwards port 9001 with HTTPS.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-29 11:19:55 -07:00
Maxime Beauchemin
16db999067 fix: rate limiting issues with example data hosted on github.com (#34381) 2025-07-29 11:19:29 -07:00
Beto Dealmeida
972be15dda feat: focus on text input when modal opens (#34379) 2025-07-29 14:01:10 -04:00
Maxime Beauchemin
c9e06714f8 fix: prevent theme initialization errors during fresh installs (#34339)
Co-authored-by: Claude <noreply@anthropic.com>
2025-07-29 09:32:53 -07:00
Beto Dealmeida
32626ab707 fix: use catalog name on generated queries (#34360) 2025-07-29 12:30:46 -04:00
45 changed files with 444 additions and 822 deletions

5
.devcontainer/README.md Normal file
View File

@@ -0,0 +1,5 @@
# Superset Development with GitHub Codespaces
For complete documentation on using GitHub Codespaces with Apache Superset, please see:
**[Setting up a Development Environment - GitHub Codespaces](https://superset.apache.org/docs/contributing/development#github-codespaces-cloud-development)**

View File

@@ -0,0 +1,52 @@
{
"name": "Apache Superset Development",
// Keep this in sync with the base image in Dockerfile (ARG PY_VER)
// Using the same base as Dockerfile, but non-slim for dev tools
"image": "python:3.11.13-bookworm",
"features": {
"ghcr.io/devcontainers/features/docker-in-docker:2": {
"moby": true,
"dockerDashComposeVersion": "v2"
},
"ghcr.io/devcontainers/features/node:1": {
"version": "20"
},
"ghcr.io/devcontainers/features/git:1": {},
"ghcr.io/devcontainers/features/common-utils:2": {
"configureZshAsDefaultShell": true
},
"ghcr.io/devcontainers/features/sshd:1": {
"version": "latest"
}
},
// Forward ports for development
"forwardPorts": [9001],
"portsAttributes": {
"9001": {
"label": "Superset (via Webpack Dev Server)",
"onAutoForward": "notify",
"visibility": "public"
}
},
// Run commands after container is created
"postCreateCommand": "chmod +x .devcontainer/setup-dev.sh && .devcontainer/setup-dev.sh",
// Auto-start Superset on Codespace resume
"postStartCommand": ".devcontainer/start-superset.sh",
// VS Code customizations
"customizations": {
"vscode": {
"extensions": [
"ms-python.python",
"ms-python.vscode-pylance",
"charliermarsh.ruff",
"dbaeumer.vscode-eslint",
"esbenp.prettier-vscode"
]
}
}
}

32
.devcontainer/setup-dev.sh Executable file
View File

@@ -0,0 +1,32 @@
#!/bin/bash
# Setup script for Superset Codespaces development environment
echo "🔧 Setting up Superset development environment..."
# The universal image has most tools, just need Superset-specific libs
echo "📦 Installing Superset-specific dependencies..."
sudo apt-get update
sudo apt-get install -y \
libsasl2-dev \
libldap2-dev \
libpq-dev \
tmux \
gh
# Install uv for fast Python package management
echo "📦 Installing uv..."
curl -LsSf https://astral.sh/uv/install.sh | sh
# Add cargo/bin to PATH for uv
echo 'export PATH="$HOME/.cargo/bin:$PATH"' >> ~/.bashrc
echo 'export PATH="$HOME/.cargo/bin:$PATH"' >> ~/.zshrc
# Install Claude Code CLI via npm
echo "🤖 Installing Claude Code..."
npm install -g @anthropic-ai/claude-code
# Make the start script executable
chmod +x .devcontainer/start-superset.sh
echo "✅ Development environment setup complete!"
echo "🚀 Run '.devcontainer/start-superset.sh' to start Superset"

59
.devcontainer/start-superset.sh Executable file
View File

@@ -0,0 +1,59 @@
#!/bin/bash
# Startup script for Superset in Codespaces
echo "🚀 Starting Superset in Codespaces..."
echo "🌐 Frontend will be available at port 9001"
# Find the workspace directory (Codespaces clones as 'superset', not 'superset-2')
WORKSPACE_DIR=$(find /workspaces -maxdepth 1 -name "superset*" -type d | head -1)
if [ -n "$WORKSPACE_DIR" ]; then
cd "$WORKSPACE_DIR"
echo "📁 Working in: $WORKSPACE_DIR"
else
echo "📁 Using current directory: $(pwd)"
fi
# Check if docker is running
if ! docker info > /dev/null 2>&1; then
echo "⏳ Waiting for Docker to start..."
sleep 5
fi
# Clean up any existing containers
echo "🧹 Cleaning up existing containers..."
docker-compose -f docker-compose-light.yml down
# Start services
echo "🏗️ Building and starting services..."
echo ""
echo "📝 Once started, login with:"
echo " Username: admin"
echo " Password: admin"
echo ""
echo "📋 Running in foreground with live logs (Ctrl+C to stop)..."
# Run docker-compose and capture exit code
docker-compose -f docker-compose-light.yml up
EXIT_CODE=$?
# If it failed, provide helpful instructions
if [ $EXIT_CODE -ne 0 ] && [ $EXIT_CODE -ne 130 ]; then # 130 is Ctrl+C
echo ""
echo "❌ Superset startup failed (exit code: $EXIT_CODE)"
echo ""
echo "🔄 To restart Superset, run:"
echo " .devcontainer/start-superset.sh"
echo ""
echo "🔧 For troubleshooting:"
echo " # View logs:"
echo " docker-compose -f docker-compose-light.yml logs"
echo ""
echo " # Clean restart (removes volumes):"
echo " docker-compose -f docker-compose-light.yml down -v"
echo " .devcontainer/start-superset.sh"
echo ""
echo " # Common issues:"
echo " - Network timeouts: Just retry, often transient"
echo " - Port conflicts: Check 'docker ps'"
echo " - Database issues: Try clean restart with -v"
fi

View File

@@ -120,6 +120,78 @@ docker volume rm superset_db_home
docker-compose up
```
## GitHub Codespaces (Cloud Development)
GitHub Codespaces provides a complete, pre-configured development environment in the cloud. This is ideal for:
- Quick contributions without local setup
- Consistent development environments across team members
- Working from devices that can't run Docker locally
- Safe experimentation in isolated environments
:::info
We're grateful to GitHub for providing this excellent cloud development service that makes
contributing to Apache Superset more accessible to developers worldwide.
:::
### Getting Started with Codespaces
1. **Create a Codespace**: Use this pre-configured link that sets up everything you need:
[**Launch Superset Codespace →**](https://github.com/codespaces/new?skip_quickstart=true&machine=standardLinux32gb&repo=39464018&ref=master&geo=UsWest&devcontainer_path=.devcontainer%2Fdevcontainer.json)
:::caution
**Important**: You must select at least the **4 CPU / 16GB RAM** machine type (pre-selected in the link above).
Smaller instances will not have sufficient resources to run Superset effectively.
:::
2. **Wait for Setup**: The initial setup takes several minutes. The Codespace will:
- Build the development container
- Install all dependencies
- Start all required services (PostgreSQL, Redis, etc.)
- Initialize the database with example data
3. **Access Superset**: Once ready, check the **PORTS** tab in VS Code for port `9001`.
Click the globe icon to open Superset in your browser.
- Default credentials: `admin` / `admin`
### Key Features
- **Auto-reload**: Both Python and TypeScript files auto-refresh on save
- **Pre-installed Extensions**: VS Code extensions for Python, TypeScript, and database tools
- **Multiple Instances**: Run multiple Codespaces for different branches/features
- **SSH Access**: Connect via terminal using `gh cs ssh` or through the GitHub web UI
- **VS Code Integration**: Works seamlessly with VS Code desktop app
### Managing Codespaces
- **List active Codespaces**: `gh cs list`
- **SSH into a Codespace**: `gh cs ssh`
- **Stop a Codespace**: Via GitHub UI or `gh cs stop`
- **Delete a Codespace**: Via GitHub UI or `gh cs delete`
### Debugging and Logs
Since Codespaces uses `docker-compose-light.yml`, you can monitor all services:
```bash
# Stream logs from all services
docker compose -f docker-compose-light.yml logs -f
# Stream logs from a specific service
docker compose -f docker-compose-light.yml logs -f superset
# View last 100 lines and follow
docker compose -f docker-compose-light.yml logs --tail=100 -f
# List all running services
docker compose -f docker-compose-light.yml ps
```
:::tip
Codespaces automatically stop after 30 minutes of inactivity to save resources.
Your work is preserved and you can restart anytime.
:::
## Installing Development Tools
:::note

View File

@@ -97,10 +97,6 @@ export const COST_ESTIMATE_STARTED = 'COST_ESTIMATE_STARTED';
export const COST_ESTIMATE_RETURNED = 'COST_ESTIMATE_RETURNED';
export const COST_ESTIMATE_FAILED = 'COST_ESTIMATE_FAILED';
export const COST_THRESHOLD_CHECK_STARTED = 'COST_THRESHOLD_CHECK_STARTED';
export const COST_THRESHOLD_CHECK_RETURNED = 'COST_THRESHOLD_CHECK_RETURNED';
export const COST_THRESHOLD_CHECK_FAILED = 'COST_THRESHOLD_CHECK_FAILED';
export const CREATE_DATASOURCE_STARTED = 'CREATE_DATASOURCE_STARTED';
export const CREATE_DATASOURCE_SUCCESS = 'CREATE_DATASOURCE_SUCCESS';
export const CREATE_DATASOURCE_FAILED = 'CREATE_DATASOURCE_FAILED';
@@ -237,45 +233,6 @@ export function estimateQueryCost(queryEditor) {
};
}
export function checkCostThreshold(queryEditor) {
return (dispatch, getState) => {
const { dbId, catalog, schema, sql, selectedText, templateParams } =
getUpToDateQuery(getState(), queryEditor);
const requestSql = selectedText || sql;
const postPayload = {
database_id: dbId,
catalog,
schema,
sql: requestSql,
template_params: JSON.parse(templateParams || '{}'),
};
return Promise.all([
dispatch({ type: COST_THRESHOLD_CHECK_STARTED, query: queryEditor }),
SupersetClient.post({
endpoint: '/api/v1/sqllab/check_cost_threshold/',
body: JSON.stringify(postPayload),
headers: { 'Content-Type': 'application/json' },
})
.then(({ json }) =>
dispatch({ type: COST_THRESHOLD_CHECK_RETURNED, query: queryEditor, json }),
)
.catch(response =>
getClientErrorObject(response).then(error => {
const message =
error.error ||
error.statusText ||
t('Failed at checking cost threshold');
return dispatch({
type: COST_THRESHOLD_CHECK_FAILED,
query: queryEditor,
error: message,
});
}),
),
]);
};
}
export function clearInactiveQueries(interval) {
return { type: CLEAR_INACTIVE_QUERIES, interval };
}

View File

@@ -1,131 +0,0 @@
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
import { render, screen, fireEvent } from '@testing-library/react';
import { ThemeProvider } from '@superset-ui/core';
import { theme } from 'src/preamble';
import CostWarningModal from './index';
const mockProps = {
visible: true,
onHide: jest.fn(),
onProceed: jest.fn(),
warningMessage: 'This query will scan 10 GB of data, which exceeds the threshold of 5 GB.',
thresholdInfo: {
bytes_threshold: 5 * 1024 ** 3, // 5 GB
estimated_bytes: 10 * 1024 ** 3, // 10 GB
},
};
const renderWithTheme = (ui: React.ReactElement) =>
render(<ThemeProvider theme={theme}>{ui}</ThemeProvider>);
describe('CostWarningModal', () => {
beforeEach(() => {
jest.clearAllMocks();
});
it('renders with warning message', () => {
renderWithTheme(<CostWarningModal {...mockProps} />);
expect(screen.getByText('Query Cost Warning')).toBeInTheDocument();
expect(screen.getByText(mockProps.warningMessage)).toBeInTheDocument();
});
it('shows threshold details when provided', () => {
renderWithTheme(<CostWarningModal {...mockProps} />);
expect(screen.getByText('Threshold Details:')).toBeInTheDocument();
expect(screen.getByText('Data to scan:')).toBeInTheDocument();
expect(screen.getByText('10.0 GB')).toBeInTheDocument();
expect(screen.getByText('5.0 GB')).toBeInTheDocument();
});
it('disables proceed button until checkbox is checked', () => {
renderWithTheme(<CostWarningModal {...mockProps} />);
const proceedButton = screen.getByText('Run Query Anyway');
const checkbox = screen.getByRole('checkbox');
expect(proceedButton).toBeDisabled();
fireEvent.click(checkbox);
expect(proceedButton).not.toBeDisabled();
});
it('calls onProceed when proceed button is clicked with checkbox checked', () => {
renderWithTheme(<CostWarningModal {...mockProps} />);
const checkbox = screen.getByRole('checkbox');
const proceedButton = screen.getByText('Run Query Anyway');
fireEvent.click(checkbox);
fireEvent.click(proceedButton);
expect(mockProps.onProceed).toHaveBeenCalledTimes(1);
});
it('calls onHide when cancel button is clicked', () => {
renderWithTheme(<CostWarningModal {...mockProps} />);
const cancelButton = screen.getByText('Cancel');
fireEvent.click(cancelButton);
expect(mockProps.onHide).toHaveBeenCalledTimes(1);
});
it('renders without threshold details when not provided', () => {
const propsWithoutThreshold = {
...mockProps,
thresholdInfo: undefined,
};
renderWithTheme(<CostWarningModal {...propsWithoutThreshold} />);
expect(screen.queryByText('Threshold Details:')).not.toBeInTheDocument();
});
it('shows default message when warningMessage is null', () => {
const propsWithNoMessage = {
...mockProps,
warningMessage: null,
};
renderWithTheme(<CostWarningModal {...propsWithNoMessage} />);
expect(screen.getByText('This query may be expensive to run.')).toBeInTheDocument();
});
it('handles cost threshold details', () => {
const propsWithCostThreshold = {
...mockProps,
thresholdInfo: {
cost_threshold: 100,
estimated_cost: 250,
},
};
renderWithTheme(<CostWarningModal {...propsWithCostThreshold} />);
expect(screen.getByText('Estimated cost:')).toBeInTheDocument();
expect(screen.getByText('250')).toBeInTheDocument();
expect(screen.getByText('Cost threshold:')).toBeInTheDocument();
expect(screen.getByText('100')).toBeInTheDocument();
});
});

View File

@@ -1,166 +0,0 @@
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
import { useState } from 'react';
import { styled, t } from '@superset-ui/core';
import { Button, Modal, Checkbox } from '@superset-ui/core/components';
import { ModalTitleWithIcon } from 'src/components/ModalTitleWithIcon';
const StyledModal = styled(Modal)`
.ant-modal-body {
padding: 24px;
}
`;
const WarningContent = styled.div`
margin: 16px 0;
font-size: 14px;
line-height: 1.5;
`;
const DetailsSection = styled.div`
margin: 16px 0;
padding: 12px;
background-color: ${({ theme }) => theme.colors.grayscale.light4};
border-radius: 4px;
font-size: 12px;
`;
const CheckboxWrapper = styled.div`
margin: 16px 0;
`;
interface CostWarningModalProps {
visible: boolean;
onHide: () => void;
onProceed: () => void;
warningMessage: string | null;
thresholdInfo?: {
bytes_threshold?: number;
estimated_bytes?: number;
cost_threshold?: number;
estimated_cost?: number;
};
}
export default function CostWarningModal({
visible,
onHide,
onProceed,
warningMessage,
thresholdInfo,
}: CostWarningModalProps) {
const [proceedAnyway, setProceedAnyway] = useState(false);
const handleProceed = () => {
if (proceedAnyway) {
onProceed();
}
};
const formatBytes = (bytes: number) => {
if (bytes < 1024) return `${bytes} B`;
if (bytes < 1024 ** 2) return `${(bytes / 1024).toFixed(1)} KB`;
if (bytes < 1024 ** 3) return `${(bytes / 1024 ** 2).toFixed(1)} MB`;
if (bytes < 1024 ** 4) return `${(bytes / 1024 ** 3).toFixed(1)} GB`;
if (bytes < 1024 ** 5) return `${(bytes / 1024 ** 4).toFixed(1)} TB`;
return `${(bytes / 1024 ** 5).toFixed(1)} PB`;
};
const renderThresholdDetails = () => {
if (!thresholdInfo) return null;
const details = [];
if (thresholdInfo.bytes_threshold && thresholdInfo.estimated_bytes) {
details.push(
<div key="bytes">
<strong>{t('Data to scan:')}</strong> {formatBytes(thresholdInfo.estimated_bytes)}
<br />
<strong>{t('Threshold:')}</strong> {formatBytes(thresholdInfo.bytes_threshold)}
</div>
);
}
if (thresholdInfo.cost_threshold && thresholdInfo.estimated_cost) {
details.push(
<div key="cost">
<strong>{t('Estimated cost:')}</strong> {thresholdInfo.estimated_cost}
<br />
<strong>{t('Cost threshold:')}</strong> {thresholdInfo.cost_threshold}
</div>
);
}
return details.length > 0 ? (
<DetailsSection>
<div style={{ marginBottom: '8px' }}>
<strong>{t('Threshold Details:')}</strong>
</div>
{details.map((detail, index) => (
<div key={index} style={{ marginBottom: index < details.length - 1 ? '8px' : '0' }}>
{detail}
</div>
))}
</DetailsSection>
) : null;
};
return (
<StyledModal
show={visible}
onHide={onHide}
title={
<ModalTitleWithIcon
icon="exclamation-triangle"
title={t('Query Cost Warning')}
/>
}
footer={
<>
<Button onClick={onHide}>
{t('Cancel')}
</Button>
<Button
buttonStyle="primary"
onClick={handleProceed}
disabled={!proceedAnyway}
>
{t('Run Query Anyway')}
</Button>
</>
}
>
<WarningContent>
{warningMessage || t('This query may be expensive to run.')}
</WarningContent>
{renderThresholdDetails()}
<CheckboxWrapper>
<Checkbox
checked={proceedAnyway}
onChange={(e) => setProceedAnyway(e.target.checked)}
>
{t('I understand the cost implications and want to proceed anyway')}
</Checkbox>
</CheckboxWrapper>
</StyledModal>
);
}

View File

@@ -71,7 +71,6 @@ import {
addNewQueryEditor,
CtasEnum,
estimateQueryCost,
checkCostThreshold,
persistEditorHeight,
postStopQuery,
queryEditorSetAutorun,
@@ -124,7 +123,6 @@ import SouthPane from '../SouthPane';
import SaveQuery, { QueryPayload } from '../SaveQuery';
import ScheduleQueryButton from '../ScheduleQueryButton';
import EstimateQueryCostButton from '../EstimateQueryCostButton';
import CostWarningModal from '../CostWarningModal';
import ShareSqlLabQuery from '../ShareSqlLabQuery';
import SqlEditorLeftBar from '../SqlEditorLeftBar';
import AceEditorWrapper from '../AceEditorWrapper';
@@ -272,7 +270,6 @@ const SqlEditor: FC<Props> = ({
hideLeftBar,
currentQueryEditorId,
hasSqlStatement,
costThresholdData,
} = useSelector<
SqlLabRootState,
{
@@ -281,9 +278,8 @@ const SqlEditor: FC<Props> = ({
hideLeftBar?: boolean;
currentQueryEditorId: QueryEditor['id'];
hasSqlStatement: boolean;
costThresholdData?: any;
}
>(({ sqlLab: { unsavedQueryEditor, databases, queries, tabHistory, queryCostThresholds } }) => {
>(({ sqlLab: { unsavedQueryEditor, databases, queries, tabHistory } }) => {
let { dbId, latestQueryId, hideLeftBar } = queryEditor;
if (unsavedQueryEditor?.id === queryEditor.id) {
dbId = unsavedQueryEditor.dbId || dbId;
@@ -299,7 +295,6 @@ const SqlEditor: FC<Props> = ({
latestQuery: queries[latestQueryId || ''],
hideLeftBar,
currentQueryEditorId: tabHistory.slice(-1)[0],
costThresholdData: queryCostThresholds[queryEditor.id],
};
}, shallowEqual);
@@ -322,11 +317,6 @@ const SqlEditor: FC<Props> = ({
);
const [showCreateAsModal, setShowCreateAsModal] = useState(false);
const [createAs, setCreateAs] = useState('');
const [showCostWarningModal, setShowCostWarningModal] = useState(false);
const [costWarningData, setCostWarningData] = useState<{
warningMessage: string | null;
thresholdInfo?: any;
} | null>(null);
const currentSQL = useRef<string>(queryEditor.sql);
const showEmptyState = useMemo(
() => !database || isEmpty(database),
@@ -340,69 +330,7 @@ const SqlEditor: FC<Props> = ({
const isTempId = (value: unknown): boolean => Number.isNaN(Number(value));
const checkCostThresholdAndRun = useCallback(
(ctasArg = false, ctas_method = CtasEnum.Table) => {
if (!database) {
return;
}
// Check if cost threshold checking is enabled via feature flag or configuration
// For now, we'll implement the logic directly
dispatch(checkCostThreshold(queryEditor)).then(([_, response]) => {
if (response && response.json) {
const { exceeds_threshold, formatted_warning, threshold_info } = response.json;
if (exceeds_threshold && formatted_warning) {
// Show warning modal
setCostWarningData({
warningMessage: formatted_warning,
thresholdInfo: threshold_info,
});
setShowCostWarningModal(true);
return;
}
}
// If no threshold exceeded or checking failed, proceed with query
dispatch(
runQueryFromSqlEditor(
database,
queryEditor,
defaultQueryLimit,
ctasArg ? ctas : '',
ctasArg,
ctas_method,
),
);
dispatch(setActiveSouthPaneTab('Results'));
}).catch(() => {
// If cost checking fails, proceed with query anyway
dispatch(
runQueryFromSqlEditor(
database,
queryEditor,
defaultQueryLimit,
ctasArg ? ctas : '',
ctasArg,
ctas_method,
),
);
dispatch(setActiveSouthPaneTab('Results'));
});
},
[ctas, database, defaultQueryLimit, dispatch, queryEditor],
);
const startQuery = useCallback(
(ctasArg = false, ctas_method = CtasEnum.Table) => {
// Use cost threshold checking for regular queries
checkCostThresholdAndRun(ctasArg, ctas_method);
},
[checkCostThresholdAndRun],
);
// Direct query execution without cost checking (for modal "proceed anyway")
const executeQueryDirectly = useCallback(
(ctasArg = false, ctas_method = CtasEnum.Table) => {
if (!database) {
return;
@@ -1193,20 +1121,6 @@ const SqlEditor: FC<Props> = ({
<span>{t('Name')}</span>
<Input placeholder={createModalPlaceHolder} onChange={ctasChanged} />
</Modal>
<CostWarningModal
visible={showCostWarningModal}
onHide={() => {
setShowCostWarningModal(false);
setCostWarningData(null);
}}
onProceed={() => {
setShowCostWarningModal(false);
setCostWarningData(null);
executeQueryDirectly();
}}
warningMessage={costWarningData?.warningMessage || null}
thresholdInfo={costWarningData?.thresholdInfo}
/>
</StyledSqlEditor>
);
};

View File

@@ -264,7 +264,6 @@ export default function getInitialState({
queriesLastUpdate: Date.now(),
editorTabLastUpdatedAt,
queryCostEstimates: {},
queryCostThresholds: {},
unsavedQueryEditor,
lastUpdatedActiveTab,
destroyedQueryEditors,

View File

@@ -315,51 +315,6 @@ export default function sqlLabReducer(state = {}, action) {
},
};
},
[actions.COST_THRESHOLD_CHECK_STARTED]() {
return {
...state,
queryCostThresholds: {
...state.queryCostThresholds,
[action.query.id]: {
completed: false,
exceedsThreshold: false,
thresholdInfo: null,
formattedWarning: null,
error: null,
},
},
};
},
[actions.COST_THRESHOLD_CHECK_RETURNED]() {
return {
...state,
queryCostThresholds: {
...state.queryCostThresholds,
[action.query.id]: {
completed: true,
exceedsThreshold: action.json.exceeds_threshold,
thresholdInfo: action.json.threshold_info,
formattedWarning: action.json.formatted_warning,
error: null,
},
},
};
},
[actions.COST_THRESHOLD_CHECK_FAILED]() {
return {
...state,
queryCostThresholds: {
...state.queryCostThresholds,
[action.query.id]: {
completed: false,
exceedsThreshold: false,
thresholdInfo: null,
formattedWarning: null,
error: action.error,
},
},
};
},
[actions.START_QUERY]() {
let newState = { ...state };
if (action.query.sqlEditorId) {

View File

@@ -41,6 +41,9 @@ import {
import TableChartPlugin from '../../../../../plugins/plugin-chart-table/src';
import VizTypeControl, { VIZ_TYPE_CONTROL_TEST_ID } from './index';
// Mock scrollIntoView to avoid errors in test environment
jest.mock('scroll-into-view-if-needed', () => jest.fn());
jest.useFakeTimers();
class MainPreset extends Preset {
@@ -256,4 +259,22 @@ describe('VizTypeControl', () => {
expect(defaultProps.onChange).toHaveBeenCalledWith(VizType.Line);
});
it('Search input is focused when modal opens', async () => {
// Mock the focus method to track if it was called
const focusSpy = jest.fn();
const originalFocus = HTMLInputElement.prototype.focus;
HTMLInputElement.prototype.focus = focusSpy;
await waitForRenderWrapper();
const searchInput = screen.getByTestId(getTestId('search-input'));
// Verify that focus() was called on the search input
expect(focusSpy).toHaveBeenCalled();
expect(searchInput).toBeInTheDocument();
// Restore the original focus method
HTMLInputElement.prototype.focus = originalFocus;
});
});

View File

@@ -575,6 +575,13 @@ export default function VizTypeGallery(props: VizTypeGalleryProps) {
setIsSearchFocused(true);
}, []);
// Auto-focus the search input when the modal opens
useEffect(() => {
if (searchInputRef.current) {
searchInputRef.current.focus();
}
}, []);
const changeSearch: ChangeEventHandler<HTMLInputElement> = useCallback(
event => setSearchInputValue(event.target.value),
[],

View File

@@ -199,6 +199,11 @@ def load_data(data_uri: str, dataset: SqlaTable, database: Database) -> None:
:raises DatasetUnAllowedDataURI: If a dataset is trying
to load data from a URI that is not allowed.
"""
from superset.examples.helpers import normalize_example_data_url
# Convert example URLs to align with configuration
data_uri = normalize_example_data_url(data_uri)
validate_data_uri(data_uri)
logger.info("Downloading data from %s", data_uri)
data = request.urlopen(data_uri) # pylint: disable=consider-using-with # noqa: S310

View File

@@ -190,6 +190,12 @@ def load_configs(
db_ssh_tunnel_priv_key_passws[config["uuid"]]
)
# Normalize example data URLs before schema validation
if prefix == "datasets" and "data" in config:
from superset.examples.helpers import normalize_example_data_url
config["data"] = normalize_example_data_url(config["data"])
schema.load(config)
configs[file_name] = config
except ValidationError as exc:

View File

@@ -1,230 +0,0 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
from __future__ import annotations
import logging
from typing import Any, TypedDict
from superset import app
from superset.commands.base import BaseCommand
from superset.commands.sql_lab.estimate import QueryEstimationCommand, EstimateQueryCostType
config = app.config
logger = logging.getLogger(__name__)
class CostThresholdResult(TypedDict):
exceeds_threshold: bool
estimated_cost: list[dict[str, Any]]
threshold_info: dict[str, Any]
formatted_warning: str | None
class QueryCostThresholdCheckCommand(BaseCommand):
"""
Command to check if a query's estimated cost exceeds configured thresholds.
"""
_estimation_command: QueryEstimationCommand
def __init__(self, estimation_params: EstimateQueryCostType) -> None:
self._estimation_command = QueryEstimationCommand(estimation_params)
def validate(self) -> None:
# Use the estimation command's validation
self._estimation_command.validate()
def run(self) -> CostThresholdResult:
"""
Check if query cost exceeds thresholds.
Returns a result indicating whether the query exceeds cost thresholds
and provides information for user warnings.
"""
self.validate()
# Check if cost checking is enabled
if not config.get("SQLLAB_QUERY_COST_CHECKING_ENABLED", False):
return self._create_empty_result()
estimated_cost = self._get_estimated_cost()
if not estimated_cost:
return self._create_empty_result()
thresholds = self._get_engine_thresholds()
if not thresholds:
return CostThresholdResult(
exceeds_threshold=False,
estimated_cost=estimated_cost,
threshold_info={},
formatted_warning=None,
)
return self._check_thresholds(estimated_cost, thresholds)
def _create_empty_result(self) -> CostThresholdResult:
"""Create an empty result when cost checking is disabled or fails."""
return CostThresholdResult(
exceeds_threshold=False,
estimated_cost=[],
threshold_info={},
formatted_warning=None,
)
def _get_estimated_cost(self) -> list[dict[str, Any]] | None:
"""Get cost estimation, returning None if it fails."""
try:
return self._estimation_command.run()
except Exception as ex:
logger.warning("Cost estimation failed: %s", str(ex))
return None
def _get_engine_thresholds(self) -> dict[str, Any]:
"""Get thresholds for the current database engine."""
database = self._estimation_command._database
engine_name = database.db_engine_spec.engine_name
if engine_name is None:
return {}
engine_name = engine_name.lower()
return config.get("SQLLAB_QUERY_COST_THRESHOLDS", {}).get(engine_name, {})
def _check_thresholds(
self, estimated_cost: list[dict[str, Any]], thresholds: dict[str, Any]
) -> CostThresholdResult:
"""Check if estimated cost exceeds configured thresholds."""
exceeds_threshold = False
warning_messages = []
threshold_info = {}
for cost_item in estimated_cost:
if self._check_bytes_threshold(cost_item, thresholds, threshold_info, warning_messages):
exceeds_threshold = True
if self._check_cost_threshold(cost_item, thresholds, threshold_info, warning_messages):
exceeds_threshold = True
formatted_warning = None
if warning_messages:
formatted_warning = (
" ".join(warning_messages) + " Are you sure you want to continue?"
)
return CostThresholdResult(
exceeds_threshold=exceeds_threshold,
estimated_cost=estimated_cost,
threshold_info=threshold_info,
formatted_warning=formatted_warning,
)
def _check_bytes_threshold(
self,
cost_item: dict[str, Any],
thresholds: dict[str, Any],
threshold_info: dict[str, Any],
warning_messages: list[str]
) -> bool:
"""Check bytes scanned threshold. Returns True if threshold exceeded."""
if "bytes_scanned" not in thresholds or "Bytes Scanned" not in cost_item:
return False
try:
bytes_scanned = self._parse_bytes_from_cost_item(cost_item["Bytes Scanned"])
threshold_bytes = thresholds["bytes_scanned"]
threshold_info["bytes_threshold"] = threshold_bytes
threshold_info["estimated_bytes"] = bytes_scanned
if bytes_scanned > threshold_bytes:
warning_messages.append(
f"This query will scan approximately {self._format_bytes(bytes_scanned)} "
f"of data, which exceeds the threshold of {self._format_bytes(threshold_bytes)}."
)
return True
except (ValueError, KeyError) as ex:
logger.warning("Failed to parse bytes from cost estimation: %s", str(ex))
return False
def _check_cost_threshold(
self,
cost_item: dict[str, Any],
thresholds: dict[str, Any],
threshold_info: dict[str, Any],
warning_messages: list[str]
) -> bool:
"""Check cost threshold. Returns True if threshold exceeded."""
if "cost_threshold" not in thresholds or "Cost" not in cost_item:
return False
try:
cost_value = float(cost_item["Cost"])
threshold_cost = thresholds["cost_threshold"]
threshold_info["cost_threshold"] = threshold_cost
threshold_info["estimated_cost"] = cost_value
if cost_value > threshold_cost:
warning_messages.append(
f"This query has an estimated cost of {cost_value}, "
f"which exceeds the threshold of {threshold_cost}."
)
return True
except (ValueError, KeyError) as ex:
logger.warning("Failed to parse cost from cost estimation: %s", str(ex))
return False
def _parse_bytes_from_cost_item(self, bytes_str: str) -> int:
"""Parse bytes from formatted string like '5.2 GB' or '1024 MB'."""
if not isinstance(bytes_str, str):
return int(bytes_str)
# Remove commas and split
parts = bytes_str.replace(",", "").strip().split()
if len(parts) != 2:
raise ValueError(f"Cannot parse bytes from: {bytes_str}")
value_str, unit = parts
value = float(value_str)
unit = unit.upper()
multipliers = {
"B": 1,
"KB": 1024,
"MB": 1024**2,
"GB": 1024**3,
"TB": 1024**4,
"PB": 1024**5,
}
if unit not in multipliers:
raise ValueError(f"Unknown unit: {unit}")
return int(value * multipliers[unit])
def _format_bytes(self, bytes_count: int) -> str:
"""Format bytes into human-readable string."""
if bytes_count < 1024:
return f"{bytes_count} B"
elif bytes_count < 1024**2:
return f"{bytes_count / 1024:.1f} KB"
elif bytes_count < 1024**3:
return f"{bytes_count / (1024**2):.1f} MB"
elif bytes_count < 1024**4:
return f"{bytes_count / (1024**3):.1f} GB"
elif bytes_count < 1024**5:
return f"{bytes_count / (1024**4):.1f} TB"
else:
return f"{bytes_count / (1024**5):.1f} PB"

View File

@@ -1191,18 +1191,6 @@ SQLLAB_ASYNC_TIME_LIMIT_SEC = int(timedelta(hours=6).total_seconds())
# timeout.
SQLLAB_QUERY_COST_ESTIMATE_TIMEOUT = int(timedelta(seconds=10).total_seconds())
# Query cost governance configuration
# Enable automatic cost checking before query execution
SQLLAB_QUERY_COST_CHECKING_ENABLED = False
# Cost thresholds that trigger warnings before query execution
# This is a dictionary where keys are database engine names and values are threshold configs
# Each threshold config can contain:
# - 'bytes_scanned': maximum bytes that can be scanned without warning
# - 'cost_threshold': monetary cost threshold (engine-specific units)
# Example: {'bigquery': {'bytes_scanned': 5 * 1024**4}, 'presto': {'cost_threshold': 1000}}
SQLLAB_QUERY_COST_THRESHOLDS = {}
# Timeout duration for SQL Lab fetching query results by the resultsKey.
# 0 means no timeout.
SQLLAB_QUERY_RESULT_TIMEOUT = 0

View File

@@ -1368,10 +1368,23 @@ class SqlaTable(
return get_template_processor(table=self, database=self.database, **kwargs)
def get_sqla_table(self) -> TableClause:
tbl = table(self.table_name)
# For databases that support cross-catalog queries (like BigQuery),
# include the catalog in the table identifier to generate
# project.dataset.table format
if self.catalog and self.database.db_engine_spec.supports_cross_catalog_queries:
# SQLAlchemy doesn't have built-in catalog support for TableClause,
# so we need to construct the full identifier manually
if self.schema:
full_name = f"{self.catalog}.{self.schema}.{self.table_name}"
else:
full_name = f"{self.catalog}.{self.table_name}"
return table(full_name)
if self.schema:
tbl.schema = self.schema
return tbl
return table(self.table_name, schema=self.schema)
return table(self.table_name)
def get_from_clause(
self,

View File

@@ -38,7 +38,7 @@ def load_bart_lines(only_metadata: bool = False, force: bool = False) -> None:
if not only_metadata and (not table_exists or force):
df = read_example_data(
"bart-lines.json.gz", encoding="latin-1", compression="gzip"
"examples://bart-lines.json.gz", encoding="latin-1", compression="gzip"
)
df["path_json"] = df.path.map(json.dumps)
df["polyline"] = df.path.map(polyline.encode)

View File

@@ -57,7 +57,7 @@ def gen_filter(
def load_data(tbl_name: str, database: Database, sample: bool = False) -> None:
pdf = read_example_data("birth_names2.json.gz", compression="gzip")
pdf = read_example_data("examples://birth_names2.json.gz", compression="gzip")
# TODO(bkyryliuk): move load examples data into the pytest fixture
if database.backend == "presto":
@@ -584,8 +584,8 @@ def create_dashboard(slices: list[Slice]) -> Dashboard:
}
}"""
)
# pylint: disable=echarts_timeseries_line-too-long
pos = json.loads(
# pylint: disable=line-too-long
pos = json.loads( # noqa: TID251
textwrap.dedent(
"""\
{
@@ -859,11 +859,11 @@ def create_dashboard(slices: list[Slice]) -> Dashboard:
""" # noqa: E501
)
)
# pylint: enable=echarts_timeseries_line-too-long
# pylint: enable=line-too-long
# dashboard v2 doesn't allow add markup slice
dash.slices = [slc for slc in slices if slc.viz_type != "markup"]
update_slice_ids(pos)
dash.dashboard_title = "USA Births Names"
dash.position_json = json.dumps(pos, indent=4)
dash.position_json = json.dumps(pos, indent=4) # noqa: TID251
dash.slug = "births"
return dash

View File

@@ -1490,4 +1490,4 @@ columns:
python_date_format: null
version: 1.0.0
database_uuid: a2dc77af-e654-49bb-b321-40f6b559a1ee
data: https://github.com/apache-superset/examples-data/raw/master/datasets/examples/fcc_survey_2018.csv.gz
data: examples://datasets/examples/fcc_survey_2018.csv.gz

View File

@@ -60,4 +60,4 @@ columns:
python_date_format: null
version: 1.0.0
database_uuid: a2dc77af-e654-49bb-b321-40f6b559a1ee
data: https://raw.githubusercontent.com/apache-superset/examples-data/master/datasets/examples/slack/channel_members.csv
data: examples://datasets/examples/slack/channel_members.csv

View File

@@ -360,4 +360,4 @@ columns:
python_date_format: null
version: 1.0.0
database_uuid: a2dc77af-e654-49bb-b321-40f6b559a1ee
data: https://raw.githubusercontent.com/apache-superset/examples-data/master/datasets/examples/slack/channels.csv
data: examples://datasets/examples/slack/channels.csv

View File

@@ -344,4 +344,4 @@ columns:
extra: null
version: 1.0.0
database_uuid: a2dc77af-e654-49bb-b321-40f6b559a1ee
data: https://raw.githubusercontent.com/apache-superset/examples-data/lowercase_columns_examples/datasets/examples/sales.csv
data: examples://datasets/examples/sales.csv

View File

@@ -204,4 +204,4 @@ columns:
python_date_format: null
version: 1.0.0
database_uuid: a2dc77af-e654-49bb-b321-40f6b559a1ee
data: https://raw.githubusercontent.com/apache-superset/examples-data/lowercase_columns_examples/datasets/examples/covid_vaccines.csv
data: examples://datasets/examples/covid_vaccines.csv

View File

@@ -260,4 +260,4 @@ columns:
python_date_format: null
version: 1.0.0
database_uuid: a2dc77af-e654-49bb-b321-40f6b559a1ee
data: https://raw.githubusercontent.com/apache-superset/examples-data/master/datasets/examples/slack/exported_stats.csv
data: examples://datasets/examples/slack/exported_stats.csv

View File

@@ -480,4 +480,4 @@ columns:
python_date_format: null
version: 1.0.0
database_uuid: a2dc77af-e654-49bb-b321-40f6b559a1ee
data: https://raw.githubusercontent.com/apache-superset/examples-data/master/datasets/examples/slack/messages.csv
data: examples://datasets/examples/slack/messages.csv

View File

@@ -180,4 +180,4 @@ columns:
python_date_format: null
version: 1.0.0
database_uuid: a2dc77af-e654-49bb-b321-40f6b559a1ee
data: https://raw.githubusercontent.com/apache-superset/examples-data/master/datasets/examples/slack/threads.csv
data: examples://datasets/examples/slack/threads.csv

View File

@@ -90,4 +90,4 @@ columns:
python_date_format: null
version: 1.0.0
database_uuid: a2dc77af-e654-49bb-b321-40f6b559a1ee
data: https://raw.githubusercontent.com/apache-superset/examples-data/master/datasets/examples/unicode_test.csv
data: examples://datasets/examples/unicode_test.csv

View File

@@ -220,4 +220,4 @@ columns:
python_date_format: null
version: 1.0.0
database_uuid: a2dc77af-e654-49bb-b321-40f6b559a1ee
data: https://raw.githubusercontent.com/apache-superset/examples-data/master/datasets/examples/slack/users.csv
data: examples://datasets/examples/slack/users.csv

View File

@@ -60,4 +60,4 @@ columns:
python_date_format: null
version: 1.0.0
database_uuid: a2dc77af-e654-49bb-b321-40f6b559a1ee
data: https://raw.githubusercontent.com/apache-superset/examples-data/master/datasets/examples/slack/users_channels.csv
data: examples://datasets/examples/slack/users_channels.csv

View File

@@ -153,4 +153,4 @@ columns:
python_date_format: null
version: 1.0.0
database_uuid: a2dc77af-e654-49bb-b321-40f6b559a1ee
data: https://github.com/apache-superset/examples-data/raw/lowercase_columns_examples/datasets/examples/video_game_sales.csv
data: examples://datasets/examples/video_game_sales.csv

View File

@@ -49,7 +49,7 @@ def load_country_map_data(only_metadata: bool = False, force: bool = False) -> N
if not only_metadata and (not table_exists or force):
data = read_example_data(
"birth_france_data_for_country_map.csv", encoding="utf-8"
"examples://birth_france_data_for_country_map.csv", encoding="utf-8"
)
data["dttm"] = datetime.datetime.now().date()
data.to_sql(

View File

@@ -50,7 +50,7 @@ def load_energy(
table_exists = database.has_table(Table(tbl_name, schema))
if not only_metadata and (not table_exists or force):
pdf = read_example_data("energy.json.gz", compression="gzip")
pdf = read_example_data("examples://energy.json.gz", compression="gzip")
pdf = pdf.head(100) if sample else pdf
pdf.to_sql(
tbl_name,

View File

@@ -38,12 +38,12 @@ def load_flights(only_metadata: bool = False, force: bool = False) -> None:
if not only_metadata and (not table_exists or force):
pdf = read_example_data(
"flight_data.csv.gz", encoding="latin-1", compression="gzip"
"examples://flight_data.csv.gz", encoding="latin-1", compression="gzip"
)
# Loading airports info to join and get lat/long
airports = read_example_data(
"airports.csv.gz", encoding="latin-1", compression="gzip"
"examples://airports.csv.gz", encoding="latin-1", compression="gzip"
)
airports = airports.set_index("IATA_CODE")

View File

@@ -54,6 +54,8 @@ from superset.connectors.sqla.models import SqlaTable
from superset.models.slice import Slice
from superset.utils import json
EXAMPLES_PROTOCOL = "examples://"
# ---------------------------------------------------------------------------
# Public sampledata mirror configuration
# ---------------------------------------------------------------------------
@@ -125,6 +127,20 @@ def get_example_url(filepath: str) -> str:
return f"{BASE_URL}{filepath}"
def normalize_example_data_url(url: str) -> str:
"""Convert example data URLs to use the configured CDN.
Transforms examples:// URLs to the configured CDN URL.
Non-example URLs are returned unchanged.
"""
if url.startswith(EXAMPLES_PROTOCOL):
relative_path = url[len(EXAMPLES_PROTOCOL) :]
return get_example_url(relative_path)
# Not an examples URL, return unchanged
return url
def read_example_data(
filepath: str,
max_attempts: int = 5,
@@ -132,9 +148,7 @@ def read_example_data(
**kwargs: Any,
) -> pd.DataFrame:
"""Load CSV or JSON from example data mirror with retry/backoff."""
from superset.examples.helpers import get_example_url
url = get_example_url(filepath)
url = normalize_example_data_url(filepath)
is_json = filepath.endswith(".json") or filepath.endswith(".json.gz")
for attempt in range(1, max_attempts + 1):

View File

@@ -48,7 +48,7 @@ def load_long_lat_data(only_metadata: bool = False, force: bool = False) -> None
if not only_metadata and (not table_exists or force):
pdf = read_example_data(
"san_francisco.csv.gz", encoding="utf-8", compression="gzip"
"examples://san_francisco.csv.gz", encoding="utf-8", compression="gzip"
)
start = datetime.datetime.now().replace(
hour=0, minute=0, second=0, microsecond=0

View File

@@ -49,7 +49,7 @@ def load_multiformat_time_series( # pylint: disable=too-many-locals
if not only_metadata and (not table_exists or force):
pdf = read_example_data(
"multiformat_time_series.json.gz", compression="gzip"
"examples://multiformat_time_series.json.gz", compression="gzip"
)
# TODO(bkyryliuk): move load examples data into the pytest fixture

View File

@@ -37,7 +37,7 @@ def load_paris_iris_geojson(only_metadata: bool = False, force: bool = False) ->
table_exists = database.has_table(Table(tbl_name, schema))
if not only_metadata and (not table_exists or force):
df = read_example_data("paris_iris.json.gz", compression="gzip")
df = read_example_data("examples://paris_iris.json.gz", compression="gzip")
df["features"] = df.features.map(json.dumps)
df.to_sql(

View File

@@ -46,7 +46,9 @@ def load_random_time_series_data(
table_exists = database.has_table(Table(tbl_name, schema))
if not only_metadata and (not table_exists or force):
pdf = read_example_data("random_time_series.json.gz", compression="gzip")
pdf = read_example_data(
"examples://random_time_series.json.gz", compression="gzip"
)
if database.backend == "presto":
pdf.ds = pd.to_datetime(pdf.ds, unit="s")
pdf.ds = pdf.ds.dt.strftime("%Y-%m-%d %H:%M%:%S")

View File

@@ -39,7 +39,9 @@ def load_sf_population_polygons(
table_exists = database.has_table(Table(tbl_name, schema))
if not only_metadata and (not table_exists or force):
df = read_example_data("sf_population.json.gz", compression="gzip")
df = read_example_data(
"examples://sf_population.json.gz", compression="gzip"
)
df["contour"] = df.contour.map(json.dumps)
df.to_sql(

View File

@@ -55,7 +55,7 @@ def load_world_bank_health_n_pop( # pylint: disable=too-many-locals
table_exists = database.has_table(Table(tbl_name, schema))
if not only_metadata and (not table_exists or force):
pdf = read_example_data("countries.json.gz", compression="gzip")
pdf = read_example_data("examples://countries.json.gz", compression="gzip")
pdf.columns = [col.replace(".", "_") for col in pdf.columns]
if database.backend == "presto":
pdf.year = pd.to_datetime(pdf.year)

View File

@@ -34,6 +34,7 @@ from flask_appbuilder.utils.base import get_safe_redirect
from flask_babel import lazy_gettext as _, refresh
from flask_compress import Compress
from flask_session import Session
from sqlalchemy import inspect
from werkzeug.middleware.proxy_fix import ProxyFix
from superset.constants import CHANGE_ME_SECRET_KEY
@@ -470,6 +471,31 @@ class SupersetAppInitializer: # pylint: disable=too-many-public-methods
icon="fa-lock",
)
def _init_database_dependent_features(self) -> None:
"""
Initialize features that require database tables to exist.
This is called during app initialization but checks table existence
to handle cases where the app starts before database migration.
"""
inspector = inspect(db.engine)
# Check if core tables exist (use 'dashboards' as proxy for Superset tables)
if not inspector.has_table("dashboards"):
logger.debug(
"Superset tables not yet created. Skipping database-dependent "
"initialization. These features will be initialized after migration."
)
return
# Register SQLA event listeners for tagging system
if feature_flag_manager.is_feature_enabled("TAGGING_SYSTEM"):
register_sqla_event_listeners()
# Seed system themes from configuration
from superset.commands.theme.seed import SeedSystemThemesCommand
SeedSystemThemesCommand().run()
def init_app_in_ctx(self) -> None:
"""
Runs init logic in the context of the app
@@ -487,16 +513,8 @@ class SupersetAppInitializer: # pylint: disable=too-many-public-methods
if flask_app_mutator := self.config["FLASK_APP_MUTATOR"]:
flask_app_mutator(self.superset_app)
if feature_flag_manager.is_feature_enabled("TAGGING_SYSTEM"):
register_sqla_event_listeners()
# Seed system themes from configuration
try:
from superset.commands.theme.seed import SeedSystemThemesCommand
SeedSystemThemesCommand().run()
except Exception:
logger.exception("Failed to seed system themes")
# Initialize database-dependent features only if database is ready
self._init_database_dependent_features()
self.init_views()

View File

@@ -25,9 +25,6 @@ from flask_appbuilder.models.sqla.interface import SQLAInterface
from marshmallow import ValidationError
from superset import app, is_feature_enabled
from superset.commands.sql_lab.check_cost_threshold import (
QueryCostThresholdCheckCommand,
)
from superset.commands.sql_lab.estimate import QueryEstimationCommand
from superset.commands.sql_lab.execute import CommandResult, ExecuteSqlCommand
from superset.commands.sql_lab.export import SqlResultExportCommand
@@ -191,66 +188,6 @@ class SqlLabRestApi(BaseSupersetApi):
result = command.run()
return self.response(200, result=result)
@expose("/check_cost_threshold/", methods=("POST",))
@protect()
@statsd_metrics
@requires_json
@event_logger.log_this_with_context(
action=lambda self, *args, **kwargs: f"{self.__class__.__name__}"
f".check_cost_threshold",
log_to_statsd=False,
)
def check_cost_threshold(self) -> Response:
"""Check if query cost exceeds configured thresholds.
---
post:
summary: Check if query cost exceeds thresholds
requestBody:
description: SQL query and params
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/EstimateQueryCostSchema'
responses:
200:
description: Cost threshold check result
content:
application/json:
schema:
type: object
properties:
exceeds_threshold:
type: boolean
description: Whether query exceeds cost thresholds
estimated_cost:
type: array
description: Detailed cost estimation
threshold_info:
type: object
description: Information about thresholds and estimates
formatted_warning:
type: string
nullable: true
description: Human-readable warning message
400:
$ref: '#/components/responses/400'
401:
$ref: '#/components/responses/401'
403:
$ref: '#/components/responses/403'
500:
$ref: '#/components/responses/500'
"""
try:
model = self.estimate_model_schema.load(request.json)
except ValidationError as error:
return self.response_400(message=error.messages)
command = QueryCostThresholdCheckCommand(model)
result = command.run()
return self.response(200, **result)
@expose("/format_sql/", methods=("POST",))
@statsd_metrics
@protect()

View File

@@ -605,3 +605,94 @@ def test_fetch_metadata_empty_comment_field_handling(mocker: MockerFixture) -> N
# Valid comment should be set
assert columns_by_name["col_with_valid_comment"].description == "Valid comment"
@pytest.mark.parametrize(
"supports_cross_catalog,table_name,catalog,schema,expected_name,expected_schema",
[
# Database supports cross-catalog queries (like BigQuery)
(
True,
"test_table",
"test_project",
"test_dataset",
"test_project.test_dataset.test_table",
None,
),
# Database supports cross-catalog queries, catalog only (no schema)
(
True,
"test_table",
"test_project",
None,
"test_project.test_table",
None,
),
# Database supports cross-catalog queries, schema only (no catalog)
(
True,
"test_table",
None,
"test_schema",
"test_table",
"test_schema",
),
# Database supports cross-catalog queries, no catalog or schema
(
True,
"test_table",
None,
None,
"test_table",
None,
),
# Database doesn't support cross-catalog queries, catalog ignored
(
False,
"test_table",
"test_catalog",
"test_schema",
"test_table",
"test_schema",
),
# Database doesn't support cross-catalog queries, no schema
(
False,
"test_table",
"test_catalog",
None,
"test_table",
None,
),
],
)
def test_get_sqla_table_with_catalog(
mocker: MockerFixture,
supports_cross_catalog: bool,
table_name: str,
catalog: str | None,
schema: str | None,
expected_name: str,
expected_schema: str | None,
) -> None:
"""Test that get_sqla_table handles catalog inclusion correctly based on
database cross-catalog support
"""
# Mock database with specified cross-catalog support
database = mocker.MagicMock()
database.db_engine_spec.supports_cross_catalog_queries = supports_cross_catalog
# Create table with specified parameters
table = SqlaTable(
table_name=table_name,
database=database,
schema=schema,
catalog=catalog,
)
# Get the SQLAlchemy table representation
sqla_table = table.get_sqla_table()
# Verify expected table name and schema
assert sqla_table.name == expected_name
assert sqla_table.schema == expected_schema