address review: only skip first docstring in static_return_bool

A later bare string literal in a function body is not a docstring and should count as non-trivial logic. Track whether the docstring has been skipped so subsequent string-constant expression statements fall through to the conservative 'other_logic' branch.
address review: check has_implicit_cancel return value
2026-05-01 05:54:26 +00:00 · 2026-04-27 08:43:18 -07:00 · 2026-04-27 08:43:18 -07:00 · 2026-04-27 08:43:18 -07:00 · 2026-04-27 08:43:18 -07:00 · 2026-04-27 08:43:18 -07:00
3 changed files with 7302 additions and 3562 deletions
--- a/docs/scripts/generate-database-docs.mjs
+++ b/docs/scripts/generate-database-docs.mjs
@@ -141,6 +141,47 @@ def eval_node(node):
        return "<f-string>"
    return None

+def static_return_bool(func_node):
+    """
+    Statically resolve a method's return value to a bool when possible.
+
+    Returns True/False for functions whose body is (effectively) a single
+    \`return True\` / \`return False\` — allowing a leading docstring and
+    ignoring pure-comment/pass statements. Returns None for anything more
+    complex (conditional returns, computed values, no return, etc.).
+
+    Used by \`has_implicit_cancel\` handling: \`diagnose()\` in lib.py calls
+    the method and checks the return value, so an override that explicitly
+    returns False must NOT be treated as enabling query cancelation.
+    """
+    returns = []
+    other_logic = False
+    docstring_skipped = False
+    for stmt in func_node.body:
+        # Skip docstring (only the FIRST expression statement that is a
+        # string constant — later bare string literals are not docstrings
+        # and should count as non-trivial logic).
+        if (not docstring_skipped
+                and isinstance(stmt, ast.Expr)
+                and isinstance(stmt.value, ast.Constant)
+                and isinstance(stmt.value.value, str)):
+            docstring_skipped = True
+            continue
+        if isinstance(stmt, ast.Pass):
+            continue
+        if isinstance(stmt, ast.Return):
+            returns.append(stmt)
+            continue
+        # Any other statement (if/for/assign/etc.) means control flow is
+        # non-trivial; bail out to be conservative.
+        other_logic = True
+        break
+    if other_logic or len(returns) != 1:
+        return None
+    val = eval_node(returns[0].value)
+    return val if isinstance(val, bool) else None
+
+
 def deep_merge(base, override):
    """Deep merge two dictionaries. Override values take precedence."""
    if base is None:
@@ -186,8 +227,55 @@ if not os.path.isdir(specs_dir):
    print(json.dumps({"error": f"Directory not found: {specs_dir}", "cwd": os.getcwd()}))
    sys.exit(1)

-# First pass: collect all class info (name, bases, metadata)
-class_info = {}  # class_name -> {bases: [], metadata: {}, engine_name: str, filename: str}
+# Capability flag attributes with their defaults from BaseEngineSpec
+CAP_ATTR_DEFAULTS = {
+    'supports_dynamic_schema': False,
+    'supports_catalog': False,
+    'supports_dynamic_catalog': False,
+    'disable_ssh_tunneling': False,
+    'supports_file_upload': True,
+    'allows_joins': True,
+    'allows_subqueries': True,
+}
+
+# Maps source capability attribute -> output field name used in databases.json.
+# When a cap attr is assigned an unevaluable expression (e.g.
+# allows_joins = is_feature_enabled("DRUID_JOINS")), the JS layer uses this
+# mapping to preserve the corresponding field from the previously-generated
+# JSON rather than silently inheriting an incorrect parent default.
+CAP_ATTR_TO_OUTPUT_FIELD = {
+    'allows_joins': 'joins',
+    'allows_subqueries': 'subqueries',
+    'supports_dynamic_schema': 'supports_dynamic_schema',
+    'supports_catalog': 'supports_catalog',
+    'supports_dynamic_catalog': 'supports_dynamic_catalog',
+    'disable_ssh_tunneling': 'ssh_tunneling',
+    'supports_file_upload': 'supports_file_upload',
+}
+
+# Methods that indicate a capability when overridden by a non-BaseEngineSpec class.
+# Mirrors the has_custom_method checks in superset/db_engine_specs/lib.py.
+# cancel_query / has_implicit_cancel -> query_cancelation
+#   (diagnose() checks cancel_query override OR has_implicit_cancel() == True;
+#    base has_implicit_cancel returns False, so overriding it is the static
+#    equivalent of that method returning True. get_cancel_query_id is NOT
+#    part of the diagnose() heuristic and is intentionally excluded.)
+# estimate_statement_cost / estimate_query_cost -> query_cost_estimation
+# impersonate_user / update_impersonation_config / get_url_for_impersonation -> user_impersonation
+# validate_sql -> sql_validation (not used yet; validation is engine-based)
+CAP_METHODS = {
+    'cancel_query', 'has_implicit_cancel',
+    'estimate_statement_cost', 'estimate_query_cost',
+    'impersonate_user', 'update_impersonation_config', 'get_url_for_impersonation',
+    'validate_sql',
+}
+
+# Only the literal BaseEngineSpec is excluded from method-override tracking.
+# Intermediate base classes (e.g. PrestoBaseEngineSpec) do count as overrides.
+TRUE_BASE_CLASS = 'BaseEngineSpec'
+
+# First pass: collect all class info (name, bases, metadata, cap_attrs, direct_methods)
+class_info = {}  # class_name -> {bases: [], metadata: {}, engine_name: str, filename: str, ...}

 for filename in sorted(os.listdir(specs_dir)):
    if not filename.endswith('.py') or filename in ('__init__.py', 'lib.py', 'lint_metadata.py'):
@@ -218,30 +306,54 @@ for filename in sorted(os.listdir(specs_dir)):

            # Extract class attributes
            engine_name = None
+            engine_attr = None
            metadata = None
+            cap_attrs = {}   # capability flag attributes defined directly in this class
+            # Cap attrs assigned via expressions we can't statically resolve
+            # (e.g. is_feature_enabled("FLAG")). Tracked so the JS layer can
+            # fall back to the previously-generated databases.json value
+            # rather than inherit a parent default that would be wrong.
+            unresolved_cap_attrs = set()
+            direct_methods = set()  # capability methods defined directly in this class

            for item in node.body:
                if isinstance(item, ast.Assign):
                    for target in item.targets:
-                        if isinstance(target, ast.Name):
-                            if target.id == 'engine_name':
-                                val = eval_node(item.value)
-                                if isinstance(val, str):
-                                    engine_name = val
-                            elif target.id == 'metadata':
-                                metadata = eval_node(item.value)
+                        if not isinstance(target, ast.Name):
+                            continue
+                        if target.id == 'engine_name':
+                            val = eval_node(item.value)
+                            if isinstance(val, str):
+                                engine_name = val
+                        elif target.id == 'engine':
+                            val = eval_node(item.value)
+                            if isinstance(val, str):
+                                engine_attr = val
+                        elif target.id == 'metadata':
+                            metadata = eval_node(item.value)
+                        elif target.id in CAP_ATTR_DEFAULTS:
+                            val = eval_node(item.value)
+                            if isinstance(val, bool):
+                                cap_attrs[target.id] = val
+                            else:
+                                # Unevaluable expression — defer to JS fallback.
+                                unresolved_cap_attrs.add(target.id)
+                elif isinstance(item, (ast.FunctionDef, ast.AsyncFunctionDef)):
+                    if item.name in CAP_METHODS:
+                        # has_implicit_cancel is special: diagnose() uses the
+                        # method's RETURN VALUE, not just its presence. If the
+                        # override statically returns False, treat it as if
+                        # the method weren't overridden so query_cancelation
+                        # matches diagnose(). Unresolvable / True / anything
+                        # else falls through as an override (conservative).
+                        if item.name == 'has_implicit_cancel':
+                            if static_return_bool(item) is False:
+                                continue
+                        direct_methods.add(item.name)

            # Check for engine attribute with non-empty value to distinguish
            # true base classes from product classes like OceanBaseEngineSpec
-            has_non_empty_engine = False
-            for item in node.body:
-                if isinstance(item, ast.Assign):
-                    for target in item.targets:
-                        if isinstance(target, ast.Name) and target.id == 'engine':
-                            # Check if engine value is non-empty string
-                            if isinstance(item.value, ast.Constant):
-                                has_non_empty_engine = bool(item.value.value)
-                            break
+            has_non_empty_engine = engine_attr is not None and bool(engine_attr)

            # True base classes: end with BaseEngineSpec AND don't define engine
            # or have empty engine (like PostgresBaseEngineSpec with engine = "")
@@ -254,13 +366,18 @@ for filename in sorted(os.listdir(specs_dir)):
                'bases': base_names,
                'metadata': metadata,
                'engine_name': engine_name,
+                'engine': engine_attr,
                'filename': filename,
                'is_base_or_mixin': is_true_base,
+                'cap_attrs': cap_attrs,
+                'unresolved_cap_attrs': unresolved_cap_attrs,
+                'direct_methods': direct_methods,
            }
    except Exception as e:
        errors.append(f"{filename}: {str(e)}")

-# Second pass: resolve inheritance and build final metadata
+# Second pass: resolve inheritance and build final metadata + capability flags
+
 def get_inherited_metadata(class_name, visited=None):
    """Recursively get metadata from parent classes."""
    if visited is None:
@@ -286,6 +403,64 @@ def get_inherited_metadata(class_name, visited=None):

    return inherited

+def get_resolved_caps(class_name, visited=None):
+    """
+    Resolve capability flags and method overrides with inheritance.
+
+    Returns (attr_values, unresolved, methods):
+      - attr_values: {attr: bool} for attrs where the nearest MRO assignment
+        was a literal bool. Defaults are applied at the call site.
+      - unresolved: attrs where the nearest MRO assignment was an unevaluable
+        expression (e.g. is_feature_enabled("FLAG")). The JS layer falls
+        back to the previously-generated JSON value for these.
+      - methods: capability methods defined directly in some non-base ancestor,
+        matching the has_custom_method() logic in db_engine_specs/lib.py.
+
+    attr_values and unresolved are disjoint — an attr is in at most one.
+    """
+    if visited is None:
+        visited = set()
+    if class_name in visited:
+        return {}, set(), set()
+    visited.add(class_name)
+
+    info = class_info.get(class_name)
+    if not info:
+        return {}, set(), set()
+
+    attr_values = {}
+    unresolved = set()
+    resolved_methods = set()
+
+    # Collect from parents, iterating right-to-left so leftmost bases win
+    # (matches Python MRO: for class C(A, B), A's attributes take precedence).
+    for base_name in reversed(info['bases']):
+        p_vals, p_unres, p_meth = get_resolved_caps(base_name, visited.copy())
+        # A parent's literal assignments overwrite whatever we inherited so far.
+        for attr, val in p_vals.items():
+            attr_values[attr] = val
+            unresolved.discard(attr)
+        # A parent's unresolved assignments likewise take precedence.
+        for attr in p_unres:
+            unresolved.add(attr)
+            attr_values.pop(attr, None)
+        resolved_methods.update(p_meth)
+
+    # Apply this class's own assignments (override parents).
+    for attr, val in info['cap_attrs'].items():
+        attr_values[attr] = val
+        unresolved.discard(attr)
+    for attr in info['unresolved_cap_attrs']:
+        unresolved.add(attr)
+        attr_values.pop(attr, None)
+
+    # Accumulate method overrides, but skip the literal BaseEngineSpec
+    # (its implementations are stubs; only non-base overrides count).
+    if class_name != TRUE_BASE_CLASS:
+        resolved_methods.update(info['direct_methods'])
+
+    return attr_values, unresolved, resolved_methods
+
 for class_name, info in class_info.items():
    # Skip base classes and mixins
    if info['is_base_or_mixin']:
@@ -310,7 +485,14 @@ for class_name, info in class_info.items():

    if final_metadata and isinstance(final_metadata, dict) and display_name:
        debug_info["classes_with_metadata"] += 1
-        databases[display_name] = {
+
+        # Resolve capability flags from Python source
+        attr_values, unresolved_caps, cap_methods = get_resolved_caps(class_name)
+        cap_attrs = dict(CAP_ATTR_DEFAULTS)
+        cap_attrs.update(attr_values)
+        engine_attr = info.get('engine') or ''
+
+        entry = {
            'engine': display_name.lower().replace(' ', '_'),
            'engine_name': display_name,
            'module': info['filename'][:-3],  # Remove .py extension
@@ -318,19 +500,40 @@ for class_name, info in class_info.items():
            'time_grains': {},
            'score': 0,
            'max_score': 0,
-            'joins': True,
-            'subqueries': True,
-            'supports_dynamic_schema': False,
-            'supports_catalog': False,
-            'supports_dynamic_catalog': False,
-            'ssh_tunneling': False,
-            'query_cancelation': False,
-            'supports_file_upload': False,
-            'user_impersonation': False,
-            'query_cost_estimation': False,
-            'sql_validation': False,
+            # Capability flags read from engine spec class attributes/methods
+            'joins': cap_attrs['allows_joins'],
+            'subqueries': cap_attrs['allows_subqueries'],
+            'supports_dynamic_schema': cap_attrs['supports_dynamic_schema'],
+            'supports_catalog': cap_attrs['supports_catalog'],
+            'supports_dynamic_catalog': cap_attrs['supports_dynamic_catalog'],
+            'ssh_tunneling': not cap_attrs['disable_ssh_tunneling'],
+            'supports_file_upload': cap_attrs['supports_file_upload'],
+            # Method-based flags: True only when a non-base class overrides them.
+            # Matches diagnose() in lib.py: cancel_query override OR
+            # has_implicit_cancel() returning True (which, given the base
+            # returns False, is equivalent to overriding has_implicit_cancel).
+            'query_cancelation': bool({'cancel_query', 'has_implicit_cancel'} & cap_methods),
+            'query_cost_estimation': bool({'estimate_statement_cost', 'estimate_query_cost'} & cap_methods),
+            # SQL validation is implemented in external validator classes keyed by engine name
+            'sql_validation': engine_attr in {'presto', 'postgresql'},
+            'user_impersonation': bool(
+                {'impersonate_user', 'update_impersonation_config', 'get_url_for_impersonation'} & cap_methods
+            ),
        }

+        # Tell the JS layer which output fields were populated from the
+        # BaseEngineSpec default because the source assignment was an
+        # unevaluable expression; those get overridden from existing JSON.
+        unresolved_fields = sorted(
+            CAP_ATTR_TO_OUTPUT_FIELD[attr]
+            for attr in unresolved_caps
+            if attr in CAP_ATTR_TO_OUTPUT_FIELD
+        )
+        if unresolved_fields:
+            entry['_unresolved_cap_fields'] = unresolved_fields
+
+        databases[display_name] = entry
+
 if errors and not databases:
    print(json.dumps({"error": "Parse errors", "details": errors, "debug": debug_info}), file=sys.stderr)

@@ -851,24 +1054,52 @@ function loadExistingData() {
  }
 }

+/**
+ * Fall back to the previously-generated databases.json for capability flags
+ * whose source assignment couldn't be statically resolved (e.g.
+ * `allows_joins = is_feature_enabled("DRUID_JOINS")`). The Python extractor
+ * flags these via the internal `_unresolved_cap_fields` marker; without this
+ * fallback those fields would silently inherit the BaseEngineSpec default
+ * and disagree with runtime behavior. The marker is stripped before output.
+ */
+function fallbackUnresolvedCaps(newDatabases, existingData) {
+  for (const [name, db] of Object.entries(newDatabases)) {
+    const unresolved = db._unresolved_cap_fields;
+    if (!unresolved || unresolved.length === 0) {
+      delete db._unresolved_cap_fields;
+      continue;
+    }
+    const existingDb = existingData?.databases?.[name];
+    if (existingDb) {
+      for (const field of unresolved) {
+        if (existingDb[field] !== undefined) {
+          db[field] = existingDb[field];
+        }
+      }
+    }
+    delete db._unresolved_cap_fields;
+  }
+  return newDatabases;
+}
+
 /**
 * Merge new documentation with existing diagnostics
- * Preserves score, time_grains, and feature flags from existing data
+ * Preserves score, max_score, and time_grains from existing data (these require
+ * Flask context to generate and cannot be derived from static source analysis).
+ * Capability flags (joins, supports_catalog, etc.) are NOT preserved here — they
+ * are read fresh from the Python engine spec source by extractEngineSpecMetadata(),
+ * with a separate fallback for expression-based assignments (see fallbackUnresolvedCaps).
 */
 function mergeWithExistingDiagnostics(newDatabases, existingData) {
  if (!existingData?.databases) return newDatabases;

-  const diagnosticFields = [
-    'score', 'max_score', 'time_grains', 'joins', 'subqueries',
-    'supports_dynamic_schema', 'supports_catalog', 'supports_dynamic_catalog',
-    'ssh_tunneling', 'query_cancelation', 'supports_file_upload',
-    'user_impersonation', 'query_cost_estimation', 'sql_validation'
-  ];
+  // Only preserve fields that require Flask/runtime context to generate
+  const diagnosticFields = ['score', 'max_score', 'time_grains'];

  for (const [name, db] of Object.entries(newDatabases)) {
    const existingDb = existingData.databases[name];
    if (existingDb && existingDb.score > 0) {
-      // Preserve diagnostics from existing data
+      // Preserve score/time_grain diagnostics from existing data
      for (const field of diagnosticFields) {
        if (existingDb[field] !== undefined) {
          db[field] = existingDb[field];
@@ -879,7 +1110,7 @@ function mergeWithExistingDiagnostics(newDatabases, existingData) {

  const preserved = Object.values(newDatabases).filter(d => d.score > 0).length;
  if (preserved > 0) {
-    console.log(`Preserved diagnostics for ${preserved} databases from existing data`);
+    console.log(`Preserved score/time_grains for ${preserved} databases from existing data`);
  }

  return newDatabases;
@@ -927,6 +1158,12 @@ async function main() {
    databases = mergeWithExistingDiagnostics(databases, existingData);
  }

+  // For cap flags assigned via unevaluable expressions (e.g.
+  // `is_feature_enabled(...)`), prefer the value from a previously-generated
+  // JSON. Runs regardless of scores since it addresses static-analysis gaps,
+  // not missing Flask diagnostics. Always strips the internal marker.
+  databases = fallbackUnresolvedCaps(databases, existingData);
+
  // Extract and merge custom_errors for troubleshooting documentation
  const customErrors = extractCustomErrors();
  mergeCustomErrors(databases, customErrors);
--- a/docs/src/data/databases.json
+++ b/docs/src/data/databases.json
--- a/superset/db_engine_specs/lib.py
+++ b/superset/db_engine_specs/lib.py
@@ -753,6 +753,15 @@ def generate_yaml_docs(output_dir: str | None = None) -> dict[str, dict[str, Any
            continue

        name = get_name(spec)
+
+        # Skip "base" specs (e.g. PostgresBaseEngineSpec) that share an engine_name
+        # with a real product spec but have no concrete engine value.  When multiple
+        # specs share the same engine_name the one with a non-empty engine string is
+        # the authoritative product spec; letting a base class overwrite it would
+        # produce incorrect capability flags.
+        if not spec.engine and name in all_docs:
+            continue
+
        doc_data = diagnose(spec)

        # Get documentation metadata (prefers spec.metadata over DATABASE_DOCS)
@@ -766,6 +775,7 @@ def generate_yaml_docs(output_dir: str | None = None) -> dict[str, dict[str, Any
        doc_data["supports_file_upload"] = spec.supports_file_upload
        doc_data["supports_dynamic_schema"] = spec.supports_dynamic_schema
        doc_data["supports_catalog"] = spec.supports_catalog
+        doc_data["supports_dynamic_catalog"] = spec.supports_dynamic_catalog

        all_docs[name] = doc_data
Author	SHA1	Message	Date
Evan Rusackas	b8123f87ed	address review: only skip first docstring in static_return_bool A later bare string literal in a function body is not a docstring and should count as non-trivial logic. Track whether the docstring has been skipped so subsequent string-constant expression statements fall through to the conservative 'other_logic' branch.	2026-04-27 08:43:18 -07:00
Evan Rusackas	5dc9ca153e	address review: check has_implicit_cancel return value diagnose() in lib.py calls spec.has_implicit_cancel() and uses the return value; the extractor was treating any override as cancel support, regardless of what it returns. An override that explicitly returns False (e.g. ImpalaEngineSpec) should NOT enable query_cancelation — only a cancel_query override should, per has_custom_method(spec, "cancel_query") or spec.has_implicit_cancel(). Added static_return_bool() to statically resolve simple "return <bool>" methods. has_implicit_cancel overrides that explicitly return False are now excluded from direct_methods; True / unresolvable / complex bodies still count (conservative). For current specs the outcome is unchanged (Impala/Hive have cancel_query overrides; Presto/Hive return True), but the heuristic now matches diagnose() exactly and won't misclassify future specs that override has_implicit_cancel to False without a cancel_query override. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-27 08:43:18 -07:00
Evan Rusackas	88a28b180a	address review: align query_cancelation with diagnose() diagnose() in lib.py only counts cancel_query being overridden OR has_implicit_cancel() returning True (equivalent, statically, to overriding has_implicit_cancel since the base returns False). The extractor additionally counted get_cancel_query_id, which could mark cancellation as supported even when cancel_query wasn't overridden. Removed get_cancel_query_id from CAP_METHODS and the query_cancelation check so the generated docs match diagnose(). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-27 08:43:18 -07:00
Evan Rusackas	5f0fe4a2b7	address review: preserve existing JSON for unresolvable cap flags Cap attrs assigned via expressions (e.g. DruidEngineSpec.allows_joins = is_feature_enabled("DRUID_JOINS")) can't be statically evaluated, so they previously silently fell back to the BaseEngineSpec default — causing generated docs to disagree with runtime behavior. Now the Python extractor tracks unresolvable assignments per class, propagates them through the MRO walk, and emits an _unresolved_cap_fields marker on the database entry. The JS layer uses that marker to prefer the value from the previously-generated databases.json (which was produced with Flask context and reflects real runtime values) and strips the marker before writing output. Verified against druid.py: extraction correctly emits "_unresolved_cap_fields": ["joins"], and the existing JSON's "joins": false is preserved rather than overwritten with the base default of true. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-27 08:43:18 -07:00
Evan Rusackas	97af6bd9f7	address review: iterate bases right-to-left so leftmost wins (MRO) Python attribute lookup picks the first base that defines the attr; the previous left-to-right update() order caused later bases to override earlier ones, contradicting MRO. Reversing the iteration makes the leftmost base the final writer, which matches Python semantics for the common single-chain hierarchies in db_engine_specs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-27 08:43:18 -07:00
Superset Dev	3c19df8706	fix(docs): read capability flags from engine specs in database docs generator Update the database docs generator to extract capability flag values directly from Python engine spec class attributes (supports_catalog, supports_dynamic_schema, etc.) and method overrides (cancel_query, estimate_statement_cost, impersonate_user, has_implicit_cancel, etc.) using AST parsing in the fallback path. Previously the generator hardcoded False for all capability flags and relied on mergeWithExistingDiagnostics to preserve manual edits from the existing databases.json. This made it impossible to keep flags in sync with the Python source. Changes: - Fallback AST path now extracts cap flags via AST with proper inheritance resolution (mirrors lib.py has_custom_method logic) - mergeWithExistingDiagnostics now only preserves score/max_score/ time_grains (Flask-context-only fields), not capability flags - lib.py generate_yaml_docs now includes supports_dynamic_catalog in its output, and skips base specs that would overwrite a real product spec's flags for the same engine_name - Regenerated databases.json with corrected flag values Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-27 08:43:18 -07:00