Message validation

See initial discussion of issue #345
<https://github.com/onefinestay/nameko/issues/345> - continued here.

In quick reply to some of @mattbennet's comments:

- marshmallow looks good (it's inspired by Djangjo REST Framework, which is
actually the one I was referring to with regards to REST)
- I think we're similar in that we have an RPC service cluster which users
can only access indirectly through a separate API/GUI (which we control).
That seems nice on one hand, but on the other, where does the validation
sit? If it's not with the services, it has to be in the API - but then
we're repeating ourselves. (E.g. if I change the schema for a service, I
then have to also go and update any API that validates messages to be sent
to that service.) So it might actually be better to shift the validation to
the service itself (allowing introspection etc.). I guess tying the service
and it's validation together just minimises the work required, and
gurantees that anything that describes/uses the service
(documentation/APIs/GUIs) can never get "out-of-sync" with the actual
service.
- we also control our RPC service cluster - and that's fine with few
services and few developers. However, once we've got thousands of services
and tens of developers, a lack of enforced contracts is probably asking for
trouble.
- to your question about type hints: I haven't used them much either, but I
believe the type hints are saved as an attribute (__annotations__) of a
function. So, assuming you decorate the function to enforce checking of the
type hints, then any time that function is called (indirectly or not), the
validation should occur.

Based on this, I think we can maybe narrow down the options:

*Short term*

We can probably just write a decorator that takes a marshmallow schema and
validates, e.g.

@rpc
@schema(<marshmallow schema for the args>)
def add(x, y):
    return x + y

This is nice because it's easy, completely optional, and won't have to
affect the underlying code (I don't think). A "better" implementation would
be tie it into the serializing/deserialzing (since marshmallow does that
too), but then we're messing with existing stuff.

If you think this is reasonable, I can have a play trying to implement
something.

*Longer term*I'm not familiar with Thrift, but from what I've read this (or
Protocol/Avro/etc) could be the "best" approach - it's solves the message
validation (to some degree), it's standard, and it's faster (serialization
and smaller message sizes). However, I'm not sure if this is in scope for
nameko, or how much work it'd be to implement. Thoughts?

Further replies inline...

See initial discussion of issue #345
<https://github.com/onefinestay/nameko/issues/345> - continued here.

In quick reply to some of @mattbennet's comments:

- marshmallow looks good (it's inspired by Djangjo REST Framework, which
is actually the one I was referring to with regards to REST)
- I think we're similar in that we have an RPC service cluster which users
can only access indirectly through a separate API/GUI (which we control).
That seems nice on one hand, but on the other, where does the validation
sit? If it's not with the services, it has to be in the API - but then
we're repeating ourselves. (E.g. if I change the schema for a service, I
then have to also go and update any API that validates messages to be sent
to that service.) So it might actually be better to shift the validation to
the service itself (allowing introspection etc.). I guess tying the service
and it's validation together just minimises the work required, and
gurantees that anything that describes/uses the service
(documentation/APIs/GUIs) can never get "out-of-sync" with the actual
service.

- we also control our RPC service cluster - and that's fine with few

services and few developers. However, once we've got thousands of services
and tens of developers, a lack of enforced contracts is probably asking for
trouble.

For most of our usecases, the service with the "external" API does the
validation, and we trust it to call any underlying services correctly (i.e.
without validation). Or rather, since the same team develops both the
"gateway" and internal services, we don't need the constraints of strict
validation. If something has made it through the "external" validation it's
probably OK to be passed to the internal services. If it doesn't work for
some reason, we can figure it out because they're "internal" systems.

You're completely right that this only works because we have a limited
number of services and a small team.

- to your question about type hints: I haven't used them much either, but
I believe the type hints are saved as an attribute (__annotations__) of a
function. So, assuming you decorate the function to enforce checking of the
type hints, then any time that function is called (indirectly or not), the
validation should occur.

Based on this, I think we can maybe narrow down the options:

*Short term*

We can probably just write a decorator that takes a marshmallow schema and
validates, e.g.

@rpc
@schema(<marshmallow schema for the args>)
def add(x, y):
    return x + y

This is nice because it's easy, completely optional, and won't have to
affect the underlying code (I don't think). A "better" implementation would
be tie it into the serializing/deserialzing (since marshmallow does that
too), but then we're messing with existing stuff.

If you think this is reasonable, I can have a play trying to implement
something.

I like this idea a lot, for a few reasons:

* It's declarative
* It keeps all the logic out of the service method
* It could be introspectable

You probably just need to extend the @rpc entrypoint's check_signature method
to run the validator if one has been defined.

As for using Marshmallow to do the serialization too -- that's an
interesting point. Rather than using a separate @schema decorator, you
could create an entirely new @rpc entrypoint and rewrite the internals.
That would be rather a lot of work (for limited return, I'd argue) but it's
entirely possible, even while remaining compatible with the existing
entrypoint.

*Longer term*I'm not familiar with Thrift, but from what I've read this
(or Protocol/Avro/etc) could be the "best" approach - it's solves the
message validation (to some degree), it's standard, and it's faster
(serialization and smaller message sizes). However, I'm not sure if this is
in scope for nameko, or how much work it'd be to implement. Thoughts?

Again, this means writing a new entrypoint. It's not *that* difficult
though, and I want to get some docs describing the process online soon. If
there was a Thrift entrypoint I'd list it along with the other community
extensions
<http://nameko.readthedocs.io/en/stable/community_extensions.html> in the
docs (which is woefully out of date at present).

···

On Wednesday, August 10, 2016 at 8:22:10 AM UTC+8, kodonnell wrote:

Hi Matt,

Agree with how it currently works for you team - the same is true of us.
It's more of a consideration for if/when we open source, or allow some of
our pseudo-technical colleagues to write services etc.

*Schema decorator:*

As you say, the declarative and introspectable nature is good, in part
because it's easy to enforce the presence of a schema (e.g. for community
services), which results in that rarely found utopia of up-to-date
documentation.

I've already implemented and been using one (I didn't need to modify @rpc,
I just used a standard decorator to act as a middleman between the
arguments from @rpc, and the function call). That said, it's lead to a few
more questions that I think are worth thinking about before we charge ahead:

   - how to separate "structural validation" from "business validation".
   For example, I'd consider "I expect X to be an integer" as structural
   validation, which should be taken care of by the schema. But "if X is 3 and
   Y is a datetime after 2016 and this database table is empty, then it's
   invalid" is business logic validation, which probably shouldn't go in the
   schema (otherwise they get quite polluted and tied together). My current
   approach is that the schema should do structural validation, optionally
   including data-type structure (e.g. "X is meant to be an age, so it should
   be an integer between 0 and 120"). However, I think any contextual
   validation (i.e. validating combinations of things) or business validation
   should be left in the service (and throw some sort of validation error as
   required). The user will probably be free to do whatever they want, but
   it's good to clarify what we're aiming for.
   - what schema and validator to use? I see two concerns: cross-platform
   introspectability, and are we doing more than validating? E.g.
      - jsonschema (which I went with): main benefit is I can easily send
      the direct schema to e.g. a web client, which can then also use it for
      client-side validation. In addition, it's sole purpose is validation - and
      there's not great support for contextual/business validation (which is
      good). It's also not extensible (e.g. adding new structure types). So far
      it's worked OK for me.
      - marshmallow: I just realized you already suggested this, but yes,
      should we use it for serializing/deserializing too? That'd be nice as we
      could easily have it e.g. deserialize a serialiazed thing into it's actual
      class (e.g. "DataPacket") before passing it to the service. (I actually
      wrote a separate decorator for doing this, though it's probably not a great
      general solution.) That aside, I'd say marshmallow is great in terms of
      flexibility - you can write whatever serializing/deserializing/validation
      you want. That may be a bad thing (e.g. it allows business validation),
      though we can leave that to users. Finally, it's not cross-platform (e.g.
      converting to frontend) and possibly harder to completely introspect (as it
      follows a more flexible format).
      - should we leave the actual schema/validator flexible? E.g. make it
      somehow accept a marshmallow or jsonschema schema? It's probably not too
      hard to do, though would need some thought.
   - where do we put the schema? The decorator won't care, but it'd be nice
   to come up with a standard pattern. Currently I'm storing it in a variable
   inside the same file as the service to be validated, as it might be nice to
   import it from other files (e.g. to reuse a validator for a different
   service). However, it'd be equally simply to store it in the service class
   itself. An alternative I considered was parsing the schema from a
   docstring, though I feel that's a bad idea (especially as docstrings can
   disappear). However, it'd be pretty cool if your nice sphinx documentation
   doubled as the actual validation!
   - similarly, should be use @schema(myschema), or just @schema, and have
   some convention for auto-discovering a schema?

My current thoughts are to stick just with "structural" validation for now
(i.e. no marshmallow serializing and no need for complex validators), and
to pick a specific schema/validator (not be flexible). Also, manually
specify schema as opposed to auto-discovery (unless we can do it really
nicely somehow).

What do you think?

*Thrift et. al.*

Great. If you let me know when the docs are up, I might have a look -
currently I can spend some time on it, and I like a challenge. (Though I'm
still pretty new to the microservice business, so some words of wisdom
wouldn't go astray.) I could document the process as I went, as opposed to
you writing the docs, if that helps. Let me know how you want to progress.

···

On Tuesday, August 16, 2016 at 7:17:43 PM UTC+12, Matt Yule-Bennett wrote:

Further replies inline...

On Wednesday, August 10, 2016 at 8:22:10 AM UTC+8, kodonnell wrote:

See initial discussion of issue #345
<https://github.com/onefinestay/nameko/issues/345> - continued here.

In quick reply to some of @mattbennet's comments:

- marshmallow looks good (it's inspired by Djangjo REST Framework, which
is actually the one I was referring to with regards to REST)
- I think we're similar in that we have an RPC service cluster which
users can only access indirectly through a separate API/GUI (which we
control). That seems nice on one hand, but on the other, where does the
validation sit? If it's not with the services, it has to be in the API -
but then we're repeating ourselves. (E.g. if I change the schema for a
service, I then have to also go and update any API that validates messages
to be sent to that service.) So it might actually be better to shift the
validation to the service itself (allowing introspection etc.). I guess
tying the service and it's validation together just minimises the work
required, and gurantees that anything that describes/uses the service
(documentation/APIs/GUIs) can never get "out-of-sync" with the actual
service.

- we also control our RPC service cluster - and that's fine with few

services and few developers. However, once we've got thousands of services
and tens of developers, a lack of enforced contracts is probably asking for
trouble.

For most of our usecases, the service with the "external" API does the
validation, and we trust it to call any underlying services correctly (i.e.
without validation). Or rather, since the same team develops both the
"gateway" and internal services, we don't need the constraints of strict
validation. If something has made it through the "external" validation it's
probably OK to be passed to the internal services. If it doesn't work for
some reason, we can figure it out because they're "internal" systems.

You're completely right that this only works because we have a limited
number of services and a small team.

- to your question about type hints: I haven't used them much either, but
I believe the type hints are saved as an attribute (__annotations__) of a
function. So, assuming you decorate the function to enforce checking of the
type hints, then any time that function is called (indirectly or not), the
validation should occur.

Based on this, I think we can maybe narrow down the options:

*Short term*

We can probably just write a decorator that takes a marshmallow schema
and validates, e.g.

@rpc
@schema(<marshmallow schema for the args>)
def add(x, y):
    return x + y

This is nice because it's easy, completely optional, and won't have to
affect the underlying code (I don't think). A "better" implementation would
be tie it into the serializing/deserialzing (since marshmallow does that
too), but then we're messing with existing stuff.

If you think this is reasonable, I can have a play trying to implement
something.

I like this idea a lot, for a few reasons:

* It's declarative
* It keeps all the logic out of the service method
* It could be introspectable

You probably just need to extend the @rpc entrypoint's check_signature method
to run the validator if one has been defined.

As for using Marshmallow to do the serialization too -- that's an
interesting point. Rather than using a separate @schema decorator, you
could create an entirely new @rpc entrypoint and rewrite the internals.
That would be rather a lot of work (for limited return, I'd argue) but it's
entirely possible, even while remaining compatible with the existing
entrypoint.

*Longer term*I'm not familiar with Thrift, but from what I've read this
(or Protocol/Avro/etc) could be the "best" approach - it's solves the
message validation (to some degree), it's standard, and it's faster
(serialization and smaller message sizes). However, I'm not sure if this is
in scope for nameko, or how much work it'd be to implement. Thoughts?

Again, this means writing a new entrypoint. It's not *that* difficult
though, and I want to get some docs describing the process online soon. If
there was a Thrift entrypoint I'd list it along with the other community
extensions
<http://nameko.readthedocs.io/en/stable/community_extensions.html> in the
docs (which is woefully out of date at present).