bug: 开启健康检测的upstream，有流量的情况下调整权重会导致错误日志产生，并有概率导致500错误 #11897

Lewisyixin · 2025-01-07T10:12:32Z

Current Behavior

When modify nodes weight, error log will be recorded " all upstream nodes is unhealthy"

Expected Behavior

no error log

Error Logs

2025/01/07 18:04:23 [error] 6714#6714: *276732 [lua] balancer.lua:83: fetch_health_nodes(): failed to get health check target status, addr: 10.58.94.168:80, host: nil, err: target not found, client: 127.0.0.1, server: _, request: "GET /l HTTP/1.1", host: "l.com" 2025/01/07 18:04:23 [error] 6714#6714: *276732 [lua] balancer.lua:83: fetch_health_nodes(): failed to get health check target status, addr: 10.58.32.145:80, host: nil, err: target not found, client: 127.0.0.1, server: _, request: "GET /l HTTP/1.1", host: "l.com" 2025/01/07 18:04:23 [warn] 6714#6714: *276732 [lua] balancer.lua:89: fetch_health_nodes(): all upstream nodes is unhealthy, use default, client: 127.0.0.1, server: _, request: "GET /l HTTP/1.1", host: "l.com" 2025/01/07 18:04:24 [error] 6716#6716: *276683 [lua] balancer.lua:83: fetch_health_nodes(): failed to get health check target status, addr: 10.58.32.145:80, host: nil, err: target not found, client: 127.0.0.1, server: _, request: "GET /l HTTP/1.1", host: "l.com" 2025/01/07 18:04:24 [error] 6716#6716: *276683 [lua] balancer.lua:83: fetch_health_nodes(): failed to get health check target status, addr: 10.58.94.168:80, host: nil, err: target not found, client: 127.0.0.1, server: _, request: "GET /l HTTP/1.1", host: "l.com" 2025/01/07 18:04:24 [warn] 6716#6716: *276683 [lua] balancer.lua:89: fetch_health_nodes(): all upstream nodes is unhealthy, use default, client: 127.0.0.1, server: _, request: "GET /l HTTP/1.1", host: "l.com"

Steps to Reproduce

create a route and upstream with 2 or more nodes. Upstream need turn on healthcheck
run a wrk command to make requests to the server continuously.
change node weight continuously(such as keep executing the following command)
curl -XPATCH -s -H "x-api-key: $token" http://192.168.20.226:9180/apisix/admin/upstreams/547982406942985081 -d '{"nodes":{"10.58.32.145:80": 20}}'
and we will get error log

5.In extreme cases, client will get a 500 internal error(but this cannot be reproduced stably)

Environment

APISIX version (run apisix version): 3.5
Operating system (run uname -a): 3.10.0-514.26.2.el7.x86_64
OpenResty / Nginx version (run openresty -V or nginx -V): openresty/1.21.4.2
etcd version, if relevant (run curl http://127.0.0.1:9090/v1/server_info): 3.5
APISIX Dashboard version, if relevant:
Plugin runner version, for issues related to plugin runners:
LuaRocks version, for installation issues (run luarocks --version): 2.3.0

The text was updated successfully, but these errors were encountered:

github-project-automation bot added this to Apache APISIX backlog Jan 7, 2025

github-project-automation bot moved this to 📋 Backlog in Apache APISIX backlog Jan 7, 2025

dosubot bot added the bug Something isn't working label Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: 开启健康检测的upstream，有流量的情况下调整权重会导致错误日志产生，并有概率导致500错误 #11897

bug: 开启健康检测的upstream，有流量的情况下调整权重会导致错误日志产生，并有概率导致500错误 #11897

Lewisyixin commented Jan 7, 2025

bug: 开启健康检测的upstream，有流量的情况下调整权重会导致错误日志产生，并有概率导致500错误 #11897

bug: 开启健康检测的upstream，有流量的情况下调整权重会导致错误日志产生，并有概率导致500错误 #11897

Comments

Lewisyixin commented Jan 7, 2025

Current Behavior

Expected Behavior

Error Logs

Steps to Reproduce

Environment