Serialization cache

It's possibly an edge case, but serialization seems like it can be a
significant bottleneck with large messages and simple services. So I
thought an (optional?) serialization and deserialization cache might
provide some benefit without being too hard to implement (?).

There's also probably some way of going a step further and memoizing -
allowing the user to specify that this is a deterministic function and if
that's the case then simply retrieving the result from the cache if it's
been run before with the same args. (The benefit here is it completely
skips the serialize -> send message over network -> send result over
network -> deserialize stuff, which the user can't (easily) avoid, even if
they memoize their functions directly in their services.)

I'm largely thinking RPC here too.

Thoughts?

Hi,

Caching of rpc calls is probably best kept outside of the rpc framework
itself. You could possibly write a wrapper, but this is probably too tied
to your particular business logic to warrant a blanket solution.

If you are sending large (or otherwise expensive to serialise/deserialise)
messages you might want to consider using a different serialiser. Iirc,
kombu has built int (optional) support for msgpack that you can enable
using the `serializer` config key

Best,
David

···

On Sunday, 14 August 2016 19:51:38 UTC+1, kodonnell wrote:

It's possibly an edge case, but serialization seems like it can be a
significant bottleneck with large messages and simple services. So I
thought an (optional?) serialization and deserialization cache might
provide some benefit without being too hard to implement (?).

There's also probably some way of going a step further and memoizing -
allowing the user to specify that this is a deterministic function and if
that's the case then simply retrieving the result from the cache if it's
been run before with the same args. (The benefit here is it completely
skips the serialize -> send message over network -> send result over
network -> deserialize stuff, which the user can't (easily) avoid, even if
they memoize their functions directly in their services.)

I'm largely thinking RPC here too.

Thoughts?

I agree with David that caching should be separate from the built-in RPC
implementation.

Memoizing service methods is a nice pattern that is easily implemented and
very clean. But as you say, it doesn't avoid having to
serialize/deserialize on its own.

You could build a cache into the RpcProxy, as you describe, and save even
going over the network. The RpcProxy is not easy to extend at the moment
though. It's of limited value if you have lots of different callers too.

At the service end, the best way to implement a cache would be to write a
custom serializer for kombu. Nameko supports this, and you can see a toy
implementation in the tests
<https://github.com/onefinestay/nameko/blob/master/test/test_serialization.py#L183>.
If you built something like it this it wouldn't be a candidate for
inclusion in the core library, but I'm sure people would find it useful. It
would make a great blog post :wink:

Matt.

···

On Monday, August 15, 2016 at 11:21:03 PM UTC+8, David Szotten wrote:

Hi,

Caching of rpc calls is probably best kept outside of the rpc framework
itself. You could possibly write a wrapper, but this is probably too tied
to your particular business logic to warrant a blanket solution.

If you are sending large (or otherwise expensive to serialise/deserialise)
messages you might want to consider using a different serialiser. Iirc,
kombu has built int (optional) support for msgpack that you can enable
using the `serializer` config key

Best,
David

On Sunday, 14 August 2016 19:51:38 UTC+1, kodonnell wrote:

It's possibly an edge case, but serialization seems like it can be a
significant bottleneck with large messages and simple services. So I
thought an (optional?) serialization and deserialization cache might
provide some benefit without being too hard to implement (?).

There's also probably some way of going a step further and memoizing -
allowing the user to specify that this is a deterministic function and if
that's the case then simply retrieving the result from the cache if it's
been run before with the same args. (The benefit here is it completely
skips the serialize -> send message over network -> send result over
network -> deserialize stuff, which the user can't (easily) avoid, even if
they memoize their functions directly in their services.)

I'm largely thinking RPC here too.

Thoughts?

Hi both,

As Matt mentions, it's not really possible outside of the framework. I had
an idea for doing it in the framework, before realising last night that it
was very daft. In addition, both of Matt's suggestions are better, so I'll
have a crack with those. I'm still a tad dinosaur-ish, so don't have a
blog, but at the least I'll post something on the end of this thread if I
find a nice implementation = )

Regarding msgpack - yes, I have been using that (along with ujson et. al.).
It's certainly faster, but the serializing is still the bottleneck by far.
However, moving to msgpack (or e.g. a custom serialization format) means
playing with kombu, which fits well with the above suggestion.

Thanks

···

On Tuesday, August 16, 2016 at 6:41:50 PM UTC+12, Matt Yule-Bennett wrote:

I agree with David that caching should be separate from the built-in RPC
implementation.

Memoizing service methods is a nice pattern that is easily implemented and
very clean. But as you say, it doesn't avoid having to
serialize/deserialize on its own.

You could build a cache into the RpcProxy, as you describe, and save even
going over the network. The RpcProxy is not easy to extend at the moment
though. It's of limited value if you have lots of different callers too.

At the service end, the best way to implement a cache would be to write a
custom serializer for kombu. Nameko supports this, and you can see a toy
implementation in the tests
<https://github.com/onefinestay/nameko/blob/master/test/test_serialization.py#L183>.
If you built something like it this it wouldn't be a candidate for
inclusion in the core library, but I'm sure people would find it useful. It
would make a great blog post :wink:

Matt.

On Monday, August 15, 2016 at 11:21:03 PM UTC+8, David Szotten wrote:

Hi,

Caching of rpc calls is probably best kept outside of the rpc framework
itself. You could possibly write a wrapper, but this is probably too tied
to your particular business logic to warrant a blanket solution.

If you are sending large (or otherwise expensive to
serialise/deserialise) messages you might want to consider using a
different serialiser. Iirc, kombu has built int (optional) support for
msgpack that you can enable using the `serializer` config key

Best,
David

On Sunday, 14 August 2016 19:51:38 UTC+1, kodonnell wrote:

It's possibly an edge case, but serialization seems like it can be a
significant bottleneck with large messages and simple services. So I
thought an (optional?) serialization and deserialization cache might
provide some benefit without being too hard to implement (?).

There's also probably some way of going a step further and memoizing -
allowing the user to specify that this is a deterministic function and if
that's the case then simply retrieving the result from the cache if it's
been run before with the same args. (The benefit here is it completely
skips the serialize -> send message over network -> send result over
network -> deserialize stuff, which the user can't (easily) avoid, even if
they memoize their functions directly in their services.)

I'm largely thinking RPC here too.

Thoughts?

Hah, it turns out trivial implementations of a generic serialization cache
don't seem to offer any benefit (they actually slowed my tests down!). The
main reason being that to cache generic objects, including non-hashable
ones, you need a way of getting a hash (see how I did it below), and that's
pretty slow. Ways that could circumvent it:

- if you only use hashable types (probably not useful)
- the user provides a key which can be used (which gets into the specific
business case logic mentioned by you both)
- ??

I guess I should have expected this - creating a hash of an object requires
similar processes to just serializing. (Indeed, the serialized version
could be considered a hash.)

Similar considerations will apply to memoizing, I guess.

For completeness, see below for a naive (not production ready!) example of
a 'caching_json' kombu serializer (which I registered with setup.py as per
the docs
<http://docs.celeryproject.org/projects/kombu/en/latest/userguide/serialization.html#creating-extensions-using-setuptools-entry-points>).
I tested with a simple echo service.

import json
import sys

loads_cache = {}
dumps_cache = {}

def make_hashable(value):

    if value == None or isinstance(value, (str, int, float)):
        return value
    elif isinstance(value, (tuple, list)):
        return tuple(make_hashable(v) for v in value)
    elif isinstance(value, (dict)):
        return tuple((k, make_hashable(v)) for k, v in value.items())
    else:
        raise ValueError("cannot make %s hashable" % str(value)[:100])

def loads(value):

    hsh = hash(value)
    cache = globals()['loads_cache']
    if hsh not in cache:
        print('loading')
        cache[hsh] = json.loads(value)
    return cache[hsh]

def dumps(value):
    hsh = hash(make_hashable(value))
    cache = globals()['dumps_cache']
    if hsh not in cache:
        print('dumping')
        cache[hsh] = json.dumps(value)
    return cache[hsh]

register_args = (dumps, loads, 'application/json', 'utf-8')

···

On Wednesday, August 17, 2016 at 6:04:39 AM UTC+12, kodonnell wrote:

Hi both,

As Matt mentions, it's not really possible outside of the framework. I had
an idea for doing it in the framework, before realising last night that it
was very daft. In addition, both of Matt's suggestions are better, so I'll
have a crack with those. I'm still a tad dinosaur-ish, so don't have a
blog, but at the least I'll post something on the end of this thread if I
find a nice implementation = )

Regarding msgpack - yes, I have been using that (along with ujson et.
al.). It's certainly faster, but the serializing is still the bottleneck by
far. However, moving to msgpack (or e.g. a custom serialization format)
means playing with kombu, which fits well with the above suggestion.

Thanks

On Tuesday, August 16, 2016 at 6:41:50 PM UTC+12, Matt Yule-Bennett wrote:

I agree with David that caching should be separate from the built-in RPC
implementation.

Memoizing service methods is a nice pattern that is easily implemented
and very clean. But as you say, it doesn't avoid having to
serialize/deserialize on its own.

You could build a cache into the RpcProxy, as you describe, and save
even going over the network. The RpcProxy is not easy to extend at the
moment though. It's of limited value if you have lots of different callers
too.

At the service end, the best way to implement a cache would be to write a
custom serializer for kombu. Nameko supports this, and you can see a toy
implementation in the tests
<https://github.com/onefinestay/nameko/blob/master/test/test_serialization.py#L183>.
If you built something like it this it wouldn't be a candidate for
inclusion in the core library, but I'm sure people would find it useful. It
would make a great blog post :wink:

Matt.

On Monday, August 15, 2016 at 11:21:03 PM UTC+8, David Szotten wrote:

Hi,

Caching of rpc calls is probably best kept outside of the rpc framework
itself. You could possibly write a wrapper, but this is probably too tied
to your particular business logic to warrant a blanket solution.

If you are sending large (or otherwise expensive to
serialise/deserialise) messages you might want to consider using a
different serialiser. Iirc, kombu has built int (optional) support for
msgpack that you can enable using the `serializer` config key

Best,
David

On Sunday, 14 August 2016 19:51:38 UTC+1, kodonnell wrote:

It's possibly an edge case, but serialization seems like it can be a
significant bottleneck with large messages and simple services. So I
thought an (optional?) serialization and deserialization cache might
provide some benefit without being too hard to implement (?).

There's also probably some way of going a step further and memoizing -
allowing the user to specify that this is a deterministic function and if
that's the case then simply retrieving the result from the cache if it's
been run before with the same args. (The benefit here is it completely
skips the serialize -> send message over network -> send result over
network -> deserialize stuff, which the user can't (easily) avoid, even if
they memoize their functions directly in their services.)

I'm largely thinking RPC here too.

Thoughts?