Random HTTP Connection refused via nginx proxy

Hi!

I’m having a small issue with the HTTP service, and I wonder if anyone has any suggestions.

I’ve written a number of simple REST endpoints for interacting with the Nameko services, fronted by Nginx.

Occasionally, i get connect() failed (111: Connection refused) while connecting to upstream in the logs.

Now the servers are not under any kind of extreme load, I can reproduce the issue by refreshing the HTTP response a few times. But what is odd, is that the page loads!

I’ve increased the number of Nameko workers, but it hasn’t helped.

So I’m not quite sure what is happening - maybe nginx is retrying automatically, and getting a response (?).

I’m going to upgrade all the Nameko instances to the current version and see if that helps, but I wanted to get this out there in the meantime.

A little more information…

I have been able to confirm that nginx is automatically retrying the request after the connection refused is thrown, hence I get the response.

I suspect the issue is something like werkzeug receives but terminates the incoming request. I have noticed that the web services slow down after some time, so it is highly probable that I have a memory leak or something somewhere.

Can you check if the called Nameko RPC is not generating an exception occasionally ?
In that case I think the that the RPC will casue the eventlet-worker to terminate without reply, causing Nginx to timeout while processing the HTTP request.

Are there any logs from the Nameko service? Is it possible the service is dying and being restarted again?

A “connection refused” error will be thrown if the server has stopped listening on the port you’ve bound it to. This happens here when the webserver starts, and should continue to be the case as long as the service stays up.

Thank you for the responses.

I updated Nameko from 2.8.5 to 2.11.0 - which was far less painful than I had anticipated - and the issue seems to have resolved.

I guess I should have done that first :slight_smile:

There aren’t any obvious changes between those releases that could fix this, but glad it seems to be working now! Please report back here again if it happens again.