Skip to content

Conversation

@jwadolowski
Copy link
Contributor

@jwadolowski jwadolowski commented Sep 5, 2025

Proposed changes

Intermittent network errors may happen when a module's source code is downloaded. That happened to me recently (an unexpected 502 from GitHub) which resulted in a missing module in my Docker image (that was partially my fault, as the build process should stop in case of any error).

This PR implements automatic retries for common transient network hiccups.

Here's how that's going to work in practice:

$ wget --retry-on-http-error=429,502,503,504 --retry-on-host-error --retry-connrefused --tries 3 -O /tmp/out https://httpbingo.org/status/429
--2025-09-05 11:14:04--  https://httpbingo.org/status/429
Resolving httpbingo.org (httpbingo.org)... 66.241.125.232, 2a09:8280:1:7fcb:9efa:a365:aa2a:d036
Connecting to httpbingo.org (httpbingo.org)|66.241.125.232|:443... connected.
HTTP request sent, awaiting response... 429 Too Many Requests
Retrying.

--2025-09-05 11:14:05--  (try: 2)  https://httpbingo.org/status/429
Reusing existing connection to httpbingo.org:443.
HTTP request sent, awaiting response... 429 Too Many Requests
Retrying.

--2025-09-05 11:14:07--  (try: 3)  https://httpbingo.org/status/429
Reusing existing connection to httpbingo.org:443.
HTTP request sent, awaiting response... 429 Too Many Requests
Giving up.

As per the documentation, wget uses linear backoff between retry attempts by default:

  --waitretry=seconds
      If you don't want Wget to wait between every retrieval, but only between retries of failed downloads, you can use >this option.  Wget will use linear backoff, waiting 1 second
      after the first failure on a given file, then waiting 2 seconds after the second failure on that file, up to the maximum number of seconds you specify.

Checklist

Before creating a PR, run through this checklist and mark each as complete:

@jwadolowski jwadolowski marked this pull request as ready for review September 5, 2025 09:30
@mihaiplesa
Copy link

Any helping in getting this in would be appreciated @oxpa @thresheek.

@thresheek
Copy link
Member

thresheek commented Sep 23, 2025

Thanks for the patch. I've looked up when those options appeared in wget:

  • --retry-on-http-error: ~Feb 2017, released in v1.19.1
  • --retry-on-host-error: ~Jun 2018, released in v1.20
  • --retry-connrefused: ~Sep 2003

Now, looking at the oldest currently supported distros by pkg-oss (https://nginx.org/en/linux_packages.html):

  • ❌ RHEL 8 has wget 1.19.5, the manual page does not list --retry-on-host-error, and passing it as an option yields "unrecognized option"
  • ✅ RHEL 9 has wget 1.21.1
  • ✅ Debian 11 has wget 1.21
  • ✅ Ubuntu 22.04 has wget 1.21.2
  • ✅ SLES 15 has wget 1.20.3
  • ✅ Alpine Linux 3.19 has wget 1.21.4
  • ❌ Amazon Linux 2 has wget 1.14
  • ✅ Amazon Linux 2023 has wget 1.21.3

That means if we're to merge this change, we're going to abandon at least two operating systems, one of which is arguably very widely used. I'm inclined to NACK for this reason.

@oxpa your thoughts?

@oxpa
Copy link
Contributor

oxpa commented Sep 24, 2025

Such network issues never happened to me. So I probably tend to downplay the importance of the patch.
If a restart is not an option - I would wrap wget in a function that uses simple for or while loop to try download several times.
And it probably makes sense to make it more general like try_n_time. Very little effort and can be reused later. Like this:

try_n_times() {
    N=$1
    CMD=$2
    CLEAN=$3
    i=0
    while ! $CMD; do
        i=`expr $i + 1`
        if [ $i -le $N ]; then
            echo "attempt $i failed! retry!"
            test -n "$CLEAN" && $CLEAN
        else
            echo "$N attempts failed!"
            return 1
        fi;
    done
    return 0
}

And then

try_n_times 3 "wget ..."

This way it works anywhere and can be reused for other places where retries make sense.

@jwadolowski
Copy link
Contributor Author

Thanks for the feedback guys, these are very valid points.

I didn't check it myself, but kind of hoped for better support across supported operating systems. Adding these new params as-is is definitely not a viable solution.

The retry loop approach seems much more universal. I'm going to rewrite the PR and include this suggestion.

@jwadolowski
Copy link
Contributor Author

@oxpa @thresheek I've just updated the PR as per your suggestion.

The updated logic has been tested against one of my NGINX container images and it worked as expected:

#9 19.47 build_module.sh: INFO: Creating /tmp/build_module.sh.2138 build area
#9 19.47 build_module.sh: INFO Downloading module source
#9 19.48 --2025-10-08 13:35:36--  https://httpbingo.org/status/502
#9 19.49 Resolving httpbingo.org (httpbingo.org)... 66.241.125.232
#9 19.53 Connecting to httpbingo.org (httpbingo.org)|66.241.125.232|:443... connected.
#9 19.60 HTTP request sent, awaiting response... 502 Bad Gateway
#9 19.73 2025-10-08 13:35:36 ERROR 502: Bad Gateway.
#9 19.73
#9 19.73 Attempt 1 failed! Waiting 1 seconds before retry...
#9 20.75 --2025-10-08 13:35:37--  https://httpbingo.org/status/502
#9 20.76 Resolving httpbingo.org (httpbingo.org)... 66.241.125.232
#9 20.77 Connecting to httpbingo.org (httpbingo.org)|66.241.125.232|:443... connected.
#9 20.84 HTTP request sent, awaiting response... 502 Bad Gateway
#9 21.07 2025-10-08 13:35:37 ERROR 502: Bad Gateway.
#9 21.07
#9 21.08 Attempt 2 failed! Waiting 2 seconds before retry...
#9 23.09 --2025-10-08 13:35:39--  https://httpbingo.org/status/502
#9 23.10 Resolving httpbingo.org (httpbingo.org)... 66.241.125.232
#9 23.10 Connecting to httpbingo.org (httpbingo.org)|66.241.125.232|:443... connected.
#9 23.18 HTTP request sent, awaiting response... 502 Bad Gateway
#9 23.31 2025-10-08 13:35:40 ERROR 502: Bad Gateway.
#9 23.31
#9 23.31 Attempt 3 failed! Waiting 4 seconds before retry...
#9 27.32 --2025-10-08 13:35:44--  https://httpbingo.org/status/502
#9 27.33 Resolving httpbingo.org (httpbingo.org)... 66.241.125.232
#9 27.33 Connecting to httpbingo.org (httpbingo.org)|66.241.125.232|:443... connected.
#9 27.41 HTTP request sent, awaiting response... 502 Bad Gateway
#9 27.54 2025-10-08 13:35:44 ERROR 502: Bad Gateway.
#9 27.54
#9 27.54 3 attempts failed!
#9 27.54
#9 27.54 gzip: stdin: unexpected end of file
#9 27.54 tar: Child returned status 1
#9 27.54 tar: Error is not recoverable: exiting now
#9 27.54
#9 27.54 gzip: stdin: unexpected end of file
#9 27.54 tar: Child returned status 1
#9 27.54 tar: Error is not recoverable: exiting now
#9 27.55 mv: missing destination file operand after '/tmp/build_module.sh.2138/vts'
#9 27.55 Try 'mv --help' for more information.
#9 27.55 /pkg-oss
#9 27.55 build_module.sh: ERROR: Cannot locate module config file - quitting
#9 27.55 + exit 1

@mihaiplesa
Copy link

How does the above sound to you @oxpa @thresheek?

@oxpa
Copy link
Contributor

oxpa commented Oct 21, 2025

Looks good to me. Let's wait @thresheek and let him merge.

@thresheek
Copy link
Member

Yeah it looks good to me too. Can you squash the changes to a single commit? It does not make much sense to have at least the first one, and second and third are probably best pushed as one change.

Intermittent network errors may happen when a module's source code is
downloaded over HTTP. That happened to me recently (an unexpected 502
from GitHub), which resulted in a missing module in my Docker image
(that was partially my fault, as the build process should stop in case
of any error).

This PR implements automatic retries (with backoff delay) using an
OS-agnostic `try_n_times` helper.
@jwadolowski jwadolowski force-pushed the fix/handle-intermittent-network-errors branch from b7f5159 to e2998b0 Compare October 22, 2025 21:34
@jwadolowski
Copy link
Contributor Author

Done - all the commits have been squashed into one.

@thresheek thresheek merged commit f508d8a into nginx:master Oct 22, 2025
@thresheek
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants