Compare commits

...

1 Commits

Author SHA1 Message Date
Claude Code
7a1ded7c29 ci(docker): retry image build to absorb transient Docker Hub registry errors
A large share of "Build & publish docker images" failures on master are
transient Docker Hub registry errors during the supersetbot build/push:
base-image pull timeouts (registry-1.docker.io ... Client.Timeout), 504/401
on push (auth.docker.io), and ECONNRESET. These are infra flakiness, not
build breaks, yet they fail the whole job.

Wrap the supersetbot build invocation in a 3-attempt retry loop (30s backoff),
mirroring the existing retry on the subsequent "Docker pull" step. buildx
reuses the buildkit layer cache from the failed attempt, so retries are cheap.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-15 11:29:10 -07:00

View File

@@ -101,13 +101,27 @@ jobs:
PUSH_OR_LOAD="--load"
fi
supersetbot docker \
$PUSH_OR_LOAD \
--preset "$BUILD_PRESET" \
--context "$EVENT" \
--context-ref "$RELEASE" $FORCE_LATEST \
--extra-flags "--build-arg INCLUDE_CHROMIUM=false --tag $IMAGE_TAG" \
$PLATFORM_ARG
# Retry to absorb transient Docker Hub registry errors (base-image
# pull timeouts, 504/401 on push, ECONNRESET) that otherwise fail
# the whole job. buildx reuses the buildkit layer cache from the
# failed attempt, so a retry mostly re-does just the failed push.
for attempt in 1 2 3; do
if supersetbot docker \
$PUSH_OR_LOAD \
--preset "$BUILD_PRESET" \
--context "$EVENT" \
--context-ref "$RELEASE" $FORCE_LATEST \
--extra-flags "--build-arg INCLUDE_CHROMIUM=false --tag $IMAGE_TAG" \
$PLATFORM_ARG; then
break
fi
if [ "$attempt" -eq 3 ]; then
echo "::error::supersetbot docker build failed after 3 attempts"
exit 1
fi
echo "::warning::Build attempt ${attempt} failed; retrying in 30s..."
sleep 30
done
# in the context of push (using multi-platform build), we need to pull the image locally
- name: Docker pull