The @backoff.on_exception decorator on SlackV2Notification.send() was
configured to retry on SlackApiError, but the function's own try/except
catches every SlackApiError and re-raises as NotificationUnprocessableException
before the decorator can see it. As a result, no retries were happening —
a single transient failure (rate limit, connection blip) would fail the
report immediately, defeating the intent of the 5-attempt retry budget.
Switch the decorator to retry on NotificationUnprocessableException, which
is the exception type that send() actually raises for transient Slack
failures (SlackApiError, SlackClientNotConnectedError, and the SlackClientError
catch-all). Mirrors the working pattern already in webhook.py.
Non-transient errors (NotificationParamException, NotificationMalformedException,
NotificationAuthorizationException) still surface immediately — they aren't
retryable and shouldn't be retried.
Test changes:
- Replaces the prior "locks in broken behavior" regression test with
test_v2_send_retries_on_transient_slack_api_error asserting call_count == 5
- Adds test_v2_send_does_not_retry_param_errors verifying that BotUserAccessError
→ NotificationParamException is NOT retried (call_count == 1)
- Adds an autouse fixture that patches backoff._sync.time.sleep so unit-test
retries complete in milliseconds rather than the ~150s of real exponential
backoff. Without this, the parametrized exception-mapping cases that map
to NotificationUnprocessableException balloon the test runtime by ~75s
The v1 SlackNotification has the same bug but is being deprecated in this
release; not worth fixing there since v1's file_uploads endpoint is already
dead at Slack's side and only the text-only chat_postMessage path still works.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Flips the ALERT_REPORT_SLACK_V2 feature flag default to True so the v2
auto-upgrade path runs out of the box, and adds one-shot DeprecationWarning
+ logger.warning emissions when v1 still runs (flag explicitly off, or bot
missing the channels:read scope). Slack retired the legacy files.upload
endpoint in 2025, so v1 file uploads are already broken at the API level —
only text-only chat_postMessage sends still succeed via the legacy path.
The bulk of the change is bulletproof unit-test coverage for SlackV2Notification
ahead of v1 removal in the next major:
- files_upload_v2 invocation with PNG (single + multiple), CSV, and PDF,
asserting channel, file, title, filename, and initial_comment kwargs
- multi-channel fan-out (3 channels x 2 files = 6 uploads) and text-only
multi-channel chat_postMessage
- inline-file precedence (CSV beats screenshots beats PDF)
- parametrized exception mapping across 7 slack_sdk error types -> the
4 NotificationException subclasses
- statsd .ok and .warning gauge emission via the @statsd_gauge decorator
- execution_id propagation from g.logs_context to the success log, plus
the falsy g.logs_context fallback path
- end-to-end auto-upgrade round-trip: v1 SLACK recipient with channel
names raises SlackV1NotificationError -> update_report_schedule_slack_v2
rewrites the row to channel IDs -> SlackV2Notification fast-paths the
next send with no further channel resolution
- should_use_v2_api() warning behavior: deprecation warning emitted exactly
once across multiple calls in both the flag-off and scope-missing paths,
with the scope-missing logger.warning continuing to fire each call so
operators see the actionable scope hint in their report-execution logs
Also locks in current behavior of the @backoff.on_exception(SlackApiError, ...)
decorator on send(): because send() catches every SlackApiError internally
and re-raises as NotificationUnprocessableException, backoff never sees the
target exception type and no retries actually fire. Test asserts call_count
== 1 with a docstring marking this as a known design issue to address
separately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>