Why is my AWS ALB routing traffic to unhealthy targets even though health checks look normal?

I’m running an application behind an AWS Application Load Balancer with two EC2 instances in an Auto Scaling group.

Both instances show as healthy in the Target Group health checks, but I’m seeing inconsistent behavior where the ALB still routes requests to an instance that is clearly failing at the application level (timeouts, 500s, etc.).

I’ve already checked:

  • Health check path is correct (/health)
  • Security groups allow ALB → EC2 traffic
  • Application logs show intermittent failures but the health endpoint still returns 200

Is there a scenario where the ALB keeps routing traffic to a target even though the application behind it isn’t responding properly?

Should I tighten the health check settings, or is there another configuration I might be missing?

Would appreciate guidance from anyone who has dealt with similar behavior.

1 Like

welcome to the club - I’m kinda not experienced in LB, last touched AWS about 5 years back actually.

But yeah, welcome to the club, when you work it out, do share.

Thanks for getting back to me!

I will keep looking into how the ALB works. It looks like the problem is that the application can fail in ways that the health check doesn’t find, so the ALB still thinks the instance is healthy.

I’m thinking about making the health checks stricter and maybe adding more in-depth checks at the application level.

If I find a good root cause or fix, I will definitely post it here so that others who have the same problem can see it.

1 Like

sounds like your health check isn’t giving an accurate state of the application and so it will pass even though the application is in a failed state and giving 500s when receiving real requests.

is it possible to make a simple get request to the application and assert on a known good response to that request?

1 Like

ALB does not route based on real app activity, simply the health check. The target remains healthy even while the app is malfunctioning if `/health` consistently returns 200.

Making the health check represent actual application readiness rather than simply uptime is the solution. ALB routing functions properly once that is accurate.