Hi,
We're experiencing a lot of closed connection errors when a service is not
used frequently and aren't sure what to do about them.
This seems to be what's happening:
* Service A tries to call a method on Service B through RpcProxy
* Services B has been idle for more than 10 minutes
* An error is raised in Service A and it seems like the entrypoint for
Service B never fires.
Here's the traceback:
Sep 06 10:33:22: Connection to broker lost, trying to re-establish
connection...
Sep 06 10:33:22: Traceback (most recent call last):
Sep 06 10:33:22: File
"/venv/local/lib/python2.7/site-packages/kombu/mixins.py", line 177, in run
Sep 06 10:33:22: for _ in self.consume(limit=None): # pragma: no
cover
Sep 06 10:33:22: File
"/venv/local/lib/python2.7/site-packages/nameko/messaging.py", line 404, in
consume
Sep 06 10:33:22: conn.drain_events(timeout=safety_interval)
Sep 06 10:33:22: File
"/venv/local/lib/python2.7/site-packages/kombu/connection.py", line 275, in
drain_events
Sep 06 10:33:22: return self.transport.drain_events(self.connection,
**kwargs)
Sep 06 10:33:22: File
"/venv/local/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 95
, in drain_events
Sep 06 10:33:22: return connection.drain_events(**kwargs)
Sep 06 10:33:22: File
"/venv/local/lib/python2.7/site-packages/amqp/connection.py", line 303, in
drain_events
Sep 06 10:33:22: chanmap, None, timeout=timeout,
Sep 06 10:33:22: File
"/venv/local/lib/python2.7/site-packages/amqp/connection.py", line 366, in
_wait_multiple
Sep 06 10:33:22: channel, method_sig, args, content = read_timeout(
timeout)
Sep 06 10:33:22: File
"/venv/local/lib/python2.7/site-packages/amqp/connection.py", line 337, in
read_timeout
Sep 06 10:33:22: return self.method_reader.read_method()
Sep 06 10:33:22: File
"/venv/local/lib/python2.7/site-packages/amqp/method_framing.py", line 189,
in read_method
Sep 06 10:33:22: raise m
Sep 06 10:33:22: IOError: Socket closed
It looks like Service B needs to reestablish its connection and so an error
is raised in Service A. I think that's what's happening because I deployed
our services in a separate environment where Service B is called on a
15-minute interval. In that environment, when Service A calls Service B,
the connection error does not get raised.
I don't know very much about amqp internals but my main questions are:
1. Is the automatic closing of idle connections the expected behavior?
2. If this is the expected behavior, what is the best way to handle this
IOError? I'd rather not do a try/catch for an IOError every time I make a
RPC call!
3. If this is an issue in amqp rather than nameko, what is the best way to
handle it?
Any help would be greatly appreciated!
Thanks,
Grace