Nameko's performance

From @cad on Tue Aug 11 2015 15:53:36 GMT+0000 (UTC)

did anybody take a swing at profiling nameko? Like how much time is consumed outside RabbitMQ operations? etc…

Copied from original issue:

From @mattbennett on Wed Aug 26 2015 17:15:44 GMT+0000 (UTC)

Not yet. I have done some high-level comparisons of nameko’s HTTP entrypoint to other web frameworks (names withheld to prevent flamewars) and it is fast, presumably because it’s so lightweight. I’ll publish them when I’m confident I’m doing a fair test.

But profiling, no. I would love to see some though. I’m sure there’s plenty of low-hanging fruit to be found and picked.

From @noizex on Wed Jun 22 2016 10:18:14 GMT+0000 (UTC)

I tried profiling non-async RPC call with Nameko over vanilla rabbitmq installation, and performance is pretty bad or I’m doing something totally wrong. Using ClusterRpcProxy and calling just a method that returns some number (and does nothing else) I’m getting to 20 calls per second. I’m not printing to screen or doing anything that could slow it down, so no idea why an empty RPC call takes 50ms on average (AMQP is local, as is nameko service). What number of calls are you getting with just RPC over AMPQ (not HTTP-enabled).

I tried running 10, 20, 30 workers and it doesn’t make a difference (it made a difference with 1 worker, and it went down to a crawl). I tried running 2 nameko services on 2 different machines, still no difference. Any idea what I’m doing wrong?

From @astawiarski on Wed Jun 22 2016 13:53:56 GMT+0000 (UTC)

We had similar issues and decided to turn off the persistancy of messages - they need to be done immediately or not at all was our approach. I started working on a nicely configurable approach in nameko, but didn’t have time to finish. For the moment, we are overriding settings for the whole project.

You can see some comments and work started in

In summary, the following on its own gave 10x increase in message rates.

from kombu import Exchange
Exchange.delivery_mode = 1

From @mattbennett on Wed Jun 22 2016 14:45:29 GMT+0000 (UTC)

@noizex I suspect your benchmark involved creating a new ClusterRpcProxy for each call. Setting up the rabbit connections and reply queue is expensive.

This silly example of one service talking to another gives me ~110 calls per second locally:

import itertools

import eventlet

from nameko.rpc import RpcProxy, rpc
from nameko.timer import timer

class Server(object):
    name = "server"

    def method(self):
        return 42

class Client(object):
    name = "client"

    server_rpc = RpcProxy("server")

    def start(self):
        with eventlet.Timeout(1, exception=False):
            for count in itertools.count():
        print("{} calls/s".format(count))

screen shot 2016-06-22 at 15 38 34

The RpcProxy keeps its reply queue alive between uses, so it’s much faster. You will also get a performance boost if you turn off persistent messages as @astawiarski suggests, but that gain comes from processes inside the broker rather than nameko.

From @noizex on Wed Jun 22 2016 18:42:01 GMT+0000 (UTC)

Thanks Artur & Matt, I did as suggested and managed to get some decent amount of processed calls (~600 on a better machine than my laptop). Indeed, ClusterRpcProxy was the problem, as it made a new connection to RabbitMQ every time I called RPC method, and slowed whole thing down significantly.

I haven’t tried playing with persistent messages but I will try this too once the amount of processed calls becomes bottleneck. For now it looks promising and we will give it a go in our upcoming microservice-oriented project.

From @noizex on Fri Jun 24 2016 14:10:23 GMT+0000 (UTC)

Hmm, one thing Matt I don’t understand in your example. Does your method requires these 2 services to be defined in the same Python module? What I used was some external script that wasn’t service itself, so I used the nameko.standalone namespace. I guess then I need to provide AMQP config and use ClusterRpcProxy?

From @mattbennett on Fri Jun 24 2016 15:27:57 GMT+0000 (UTC)

nameko run will start all the services defined in the module you pass it. You could run the “server” service and the “client” service in two different processes and the performance would be the same (potentially better if it’s CPU-bound).

All I’m demonstrating is that the RpcProxy dependency provider reuses its connections, whereas the proxies in the standalone module (ClusterRpcProxy and ServiceRpcProxy) do not. There’s no reason why they shouldn’t reuse connections, but the current implementation does not, which makes them unsuitable for a performance test.

From @yunfan on Fri Apr 13 2018 07:08:55 GMT+0000 (UTC)

i just happened to this issues too, after a quickstart of nameko, i found its convenient for writing services
but the benchmakrs made my heart breaks, i only got about 300 pqs on my company’s server(controled by me, yes, i am a devops), i remember the old time we use our own handcrafted rpc solution, which is about 3k to 6k qps, this nameko performance really made me cant try microservice, or at least cant try microservice on it

From @mattbennett on Fri Apr 13 2018 13:41:58 GMT+0000 (UTC)

Nameko optimises for the first thing you mentioned, convenience of writing services. If you need to do thousands of requests/sec there are more appropriate tools.

That said, there is probably a ton of low-hanging fruit that can be picked to improve performance in Nameko, but nobody has really tried to do it yet. One area I’ve regularly seen performance bottlenecks is serialization – Nameko does a lot of it and JSON serialization in Python is generally pretty slow. You could switch out the default serializer for something faster pretty easily.

Also note that since Nameko 2.7.0 it’s been possible specify delivery options in your message publishers, for example to turn off persistence and publish confirmations. You will probably get decent mileage out of adjusting these options to your usecase.

From @yunfan on Sun Apr 15 2018 12:08:38 GMT+0000 (UTC)

@mattbennett thanks for the tips . for other tools , do you have any recomends?