Process not releasing memory to container

*First of a spec of the setup I'm running:*

   - AWS EC2 instances in the bottom layer.
   - Kubernetes hosting Docker containers
   - Clustered RabbitMQ (two servers) with an ELB in front.
   - Redis
   - Influx
   - Nameko 2.6.0
   
*Problem:*

In short I'm experiencing memory allocation problems. Where the python
process running Nameko wont release the memory back to the container,
causing the container to reboot when it hits its memory limit (in my case
512MB). An easy way to reproduce this in my environment is to log some
random messages. The more I log the bigger the leak. However this is not in
any way a proof that the logging is the core problem, but maybe an
indication?

Below is a sample code I use to reproduce the problem. As you might notice
I have a circular event dispatch/handling, this is just for testing
purposes. My "real" code does not have this implementation but still
experience the same problem. Even with the most simple implementation, a
service with a simple logging.info("message") will experience a memory leak.

*service.py*
import logging

from nameko.timer import timer
from nameko.events import event_handler
from nameko.events import EventDispatcher

from service.redis_cli import RedisDependency

class Service:
    name = "service"
    dispatcher = EventDispatcher()
    r_db = RedisDependency()

    @timer(interval=5)
    def log(self):
        logging.info("This is a rather long log message?")
        self.dispatcher("publish", {"message": "hello world"})

    @event_handler("uc_sensor_pipeline", "new_reading")
    def do_log(self, payload):
        logging.info(payload)

    @event_handler("service", "publish")
    def handle_publish(self, payload):
        self.r_db.lpush('logging', '123', payload)
        logging.info(payload['message'])

*redis_cli.py*
from nameko.extensions import DependencyProvider

from redis import StrictRedis

class RedisWrapper:
    def __init__(self, client):
        self.client = client

    def lpush(self, prefix, key, value):
        full_key = self._format_key(prefix, key)
        self.client.lpush(full_key, value)
        self.client.ltrim(full_key, 0, 9)

    def get(self, prefix, key):
        full_key = self._format_key(prefix, key)
        return self.client.lrange(full_key, 0, -1)

    def _decode(self, item):
        return { k.decode(): v.decode() for k, v in item.items() }

    def _format_key(self, prefix, key):
        return '{0}:{1}'.format(prefix, key)

class RedisDependency(DependencyProvider):
    def setup(self):
        self.client = StrictRedis.from_url(
            self.container.config.get('REDIS_URI'))

    def get_dependency(self, worker_ctx):
        return RedisWrapper(self.client)

*config.yml*
AMQP_URI:
amqp://${RMQ_USER:guest}:${RMQ_PASSWORD:guest}@${RMQ_HOST:localhost}
REDIS_URI: redis://myredishost:6379

LOGGING:
    version: 1
    handlers:
        console:
            class: logging.StreamHandler
        file:
            class: logging.handlers.RotatingFileHandler
            filename: service.log
            maxBytes: 1048576
            backupCount: 3
    root:
        level: INFO
        handlers: [console]

*Overview of the mem allocation:*

<https://lh3.googleusercontent.com/-UO65BwWFeBE/WaB5lzVen3I/AAAAAAAAHc0/HO_TvJcuF8oPNuvGZv2L_nXmtJS8XT5ugCLcBGAs/s1600/Screen%2BShot%2B2017-08-25%2Bat%2B21.23.48.png>

I know this is not a huge increase but it follows a pattern and is steady
(will eventually cause the container to reboot due to hitting mem limit).
In containers with more load it would leak memory faster.

*What have I tried (without luck) so far:*

   - Logging to file instead of stdout.
   - Manually gc.collect() (as this doesn't release any memory I figure
   that it's not a reference problem but rather that the actual process wont
   release memory to the container once freed by the gc).
   - Removed the dependency injection and calling redis from the service
   itself.
   - Decrease and increase number of workers.
   - Simplify the service to just contain a timer, which logs a message
   every fifth second (still memory increases)

I understand that this might be perceived as to little information to give
any valuable input. But I sincerely don't know how to proceed. Anyone got
any ideas or see anything really strange with the example? Anyone
experienced something similar?

If you came this far, thank you for reading and taking your time! Any help
would be very appreciated!

Disclaimer: I don't necessarily believe this has to do with Nameko in
particular but rather some combination of errors/dependencies/bad
implementation.

Hi Johan,

I'm sorry you're having issues. However, there's still quite a lot going on
in your example, and a few unknowns. Could you try reducing the example
further? (e.g. removing redis, and even the event dispatching? Do you still
see the problem? What if you remove nameko entirely and just log stuff in a
loop?) With a 5 second timer, how long before you see anything happening?

Also, can you reproduce the memory increase without docker? (do you see the
process memory usage grow over time) If not, what's your docker setup?
Could you post a complete setup that we can `docker run` (or
`docker-compose run`)? That may be easier as a github repo

Best,
David

···

On Saturday, 26 August 2017 15:11:36 UTC+1, Johan Frisell wrote:

*First of a spec of the setup I'm running:*

   - AWS EC2 instances in the bottom layer.
   - Kubernetes hosting Docker containers
   - Clustered RabbitMQ (two servers) with an ELB in front.
   - Redis
   - Influx
   - Nameko 2.6.0
   
*Problem:*

In short I'm experiencing memory allocation problems. Where the python
process running Nameko wont release the memory back to the container,
causing the container to reboot when it hits its memory limit (in my case
512MB). An easy way to reproduce this in my environment is to log some
random messages. The more I log the bigger the leak. However this is not in
any way a proof that the logging is the core problem, but maybe an
indication?

Below is a sample code I use to reproduce the problem. As you might notice
I have a circular event dispatch/handling, this is just for testing
purposes. My "real" code does not have this implementation but still
experience the same problem. Even with the most simple implementation, a
service with a simple logging.info("message") will experience a memory
leak.

*service.py*
import logging

from nameko.timer import timer
from nameko.events import event_handler
from nameko.events import EventDispatcher

from service.redis_cli import RedisDependency

class Service:
    name = "service"
    dispatcher = EventDispatcher()
    r_db = RedisDependency()

    @timer(interval=5)
    def log(self):
        logging.info("This is a rather long log message?")
        self.dispatcher("publish", {"message": "hello world"})

    @event_handler("uc_sensor_pipeline", "new_reading")
    def do_log(self, payload):
        logging.info(payload)

    @event_handler("service", "publish")
    def handle_publish(self, payload):
        self.r_db.lpush('logging', '123', payload)
        logging.info(payload['message'])

*redis_cli.py*
from nameko.extensions import DependencyProvider

from redis import StrictRedis

class RedisWrapper:
    def __init__(self, client):
        self.client = client

    def lpush(self, prefix, key, value):
        full_key = self._format_key(prefix, key)
        self.client.lpush(full_key, value)
        self.client.ltrim(full_key, 0, 9)

    def get(self, prefix, key):
        full_key = self._format_key(prefix, key)
        return self.client.lrange(full_key, 0, -1)

    def _decode(self, item):
        return { k.decode(): v.decode() for k, v in item.items() }

    def _format_key(self, prefix, key):
        return '{0}:{1}'.format(prefix, key)

class RedisDependency(DependencyProvider):
    def setup(self):
        self.client = StrictRedis.from_url(
            self.container.config.get('REDIS_URI'))

    def get_dependency(self, worker_ctx):
        return RedisWrapper(self.client)

*config.yml*
AMQP_URI:
amqp://${RMQ_USER:guest}:${RMQ_PASSWORD:guest}@${RMQ_HOST:localhost}
REDIS_URI: redis://myredishost:6379

LOGGING:
    version: 1
    handlers:
        console:
            class: logging.StreamHandler
        file:
            class: logging.handlers.RotatingFileHandler
            filename: service.log
            maxBytes: 1048576
            backupCount: 3
    root:
        level: INFO
        handlers: [console]

*Overview of the mem allocation:*

<https://lh3.googleusercontent.com/-UO65BwWFeBE/WaB5lzVen3I/AAAAAAAAHc0/HO_TvJcuF8oPNuvGZv2L_nXmtJS8XT5ugCLcBGAs/s1600/Screen%2BShot%2B2017-08-25%2Bat%2B21.23.48.png&gt;

I know this is not a huge increase but it follows a pattern and is steady
(will eventually cause the container to reboot due to hitting mem limit).
In containers with more load it would leak memory faster.

*What have I tried (without luck) so far:*

   - Logging to file instead of stdout.
   - Manually gc.collect() (as this doesn't release any memory I figure
   that it's not a reference problem but rather that the actual process wont
   release memory to the container once freed by the gc).
   - Removed the dependency injection and calling redis from the service
   itself.
   - Decrease and increase number of workers.
   - Simplify the service to just contain a timer, which logs a message
   every fifth second (still memory increases)

I understand that this might be perceived as to little information to give
any valuable input. But I sincerely don't know how to proceed. Anyone got
any ideas or see anything really strange with the example? Anyone
experienced something similar?

If you came this far, thank you for reading and taking your time! Any help
would be very appreciated!

Disclaimer: I don't necessarily believe this has to do with Nameko in
particular but rather some combination of errors/dependencies/bad
implementation.

Hi,

Thanks for your quick reply, much appreciated!

I've tried almost all settings concerning stripping down my nameko service:

   - Removing the redis dep
   - Removing event dispatcher
   - Removing even handler
   - Combinations of the above

However I'm still experiencing the problem. The only conclusion made so far
is: the more load the bigger leak, which in my opinion indicates some kind
of integration problem towards underlying platform, logging not releasing
memory to the container or something similar.

I've created a small test with just a super simple python script logging
messages to stdout.

import logging
from time import sleep

def setup():
    logger = logging.getLogger('spam_application')
    logger.setLevel(logging.DEBUG)

    ch = logging.StreamHandler()
    ch.setLevel(logging.INFO)

    fh = logging.FileHandler('spam.log')
    fh.setLevel(logging.DEBUG)

    formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s -
%(message)s')
    ch.setFormatter(formatter)
    fh.setFormatter(formatter)

    logger.addHandler(ch)
    # logger.addHandler(fh)

def work():
    logger = logging.getLogger('spam_application')

    while True:
        logger.info("Hello World")
        sleep(1)

if __name__ == "__main__":
    setup()
    work()

As of now, it doesn't leak any memory. But since the mem consumption in
general is so small for such a script I need to run it a for a bit longer
before posting a graph.

Do you have any thoughts so far?

···

Den söndag 27 augusti 2017 kl. 11:44:30 UTC+2 skrev David Szotten:

Hi Johan,

I'm sorry you're having issues. However, there's still quite a lot going
on in your example, and a few unknowns. Could you try reducing the example
further? (e.g. removing redis, and even the event dispatching? Do you still
see the problem? What if you remove nameko entirely and just log stuff in a
loop?) With a 5 second timer, how long before you see anything happening?

Also, can you reproduce the memory increase without docker? (do you see
the process memory usage grow over time) If not, what's your docker setup?
Could you post a complete setup that we can `docker run` (or
`docker-compose run`)? That may be easier as a github repo

Best,
David

On Saturday, 26 August 2017 15:11:36 UTC+1, Johan Frisell wrote:

*First of a spec of the setup I'm running:*

   - AWS EC2 instances in the bottom layer.
   - Kubernetes hosting Docker containers
   - Clustered RabbitMQ (two servers) with an ELB in front.
   - Redis
   - Influx
   - Nameko 2.6.0
   
*Problem:*

In short I'm experiencing memory allocation problems. Where the python
process running Nameko wont release the memory back to the container,
causing the container to reboot when it hits its memory limit (in my case
512MB). An easy way to reproduce this in my environment is to log some
random messages. The more I log the bigger the leak. However this is not in
any way a proof that the logging is the core problem, but maybe an
indication?

Below is a sample code I use to reproduce the problem. As you might
notice I have a circular event dispatch/handling, this is just for testing
purposes. My "real" code does not have this implementation but still
experience the same problem. Even with the most simple implementation, a
service with a simple logging.info("message") will experience a memory
leak.

*service.py*
import logging

from nameko.timer import timer
from nameko.events import event_handler
from nameko.events import EventDispatcher

from service.redis_cli import RedisDependency

class Service:
    name = "service"
    dispatcher = EventDispatcher()
    r_db = RedisDependency()

    @timer(interval=5)
    def log(self):
        logging.info("This is a rather long log message?")
        self.dispatcher("publish", {"message": "hello world"})

    @event_handler("uc_sensor_pipeline", "new_reading")
    def do_log(self, payload):
        logging.info(payload)

    @event_handler("service", "publish")
    def handle_publish(self, payload):
        self.r_db.lpush('logging', '123', payload)
        logging.info(payload['message'])

*redis_cli.py*
from nameko.extensions import DependencyProvider

from redis import StrictRedis

class RedisWrapper:
    def __init__(self, client):
        self.client = client

    def lpush(self, prefix, key, value):
        full_key = self._format_key(prefix, key)
        self.client.lpush(full_key, value)
        self.client.ltrim(full_key, 0, 9)

    def get(self, prefix, key):
        full_key = self._format_key(prefix, key)
        return self.client.lrange(full_key, 0, -1)

    def _decode(self, item):
        return { k.decode(): v.decode() for k, v in item.items() }

    def _format_key(self, prefix, key):
        return '{0}:{1}'.format(prefix, key)

class RedisDependency(DependencyProvider):
    def setup(self):
        self.client = StrictRedis.from_url(
            self.container.config.get('REDIS_URI'))

    def get_dependency(self, worker_ctx):
        return RedisWrapper(self.client)

*config.yml*
AMQP_URI:
amqp://${RMQ_USER:guest}:${RMQ_PASSWORD:guest}@${RMQ_HOST:localhost}
REDIS_URI: redis://myredishost:6379

LOGGING:
    version: 1
    handlers:
        console:
            class: logging.StreamHandler
        file:
            class: logging.handlers.RotatingFileHandler
            filename: service.log
            maxBytes: 1048576
            backupCount: 3
    root:
        level: INFO
        handlers: [console]

*Overview of the mem allocation:*

<https://lh3.googleusercontent.com/-UO65BwWFeBE/WaB5lzVen3I/AAAAAAAAHc0/HO_TvJcuF8oPNuvGZv2L_nXmtJS8XT5ugCLcBGAs/s1600/Screen%2BShot%2B2017-08-25%2Bat%2B21.23.48.png&gt;

I know this is not a huge increase but it follows a pattern and is steady
(will eventually cause the container to reboot due to hitting mem limit).
In containers with more load it would leak memory faster.

*What have I tried (without luck) so far:*

   - Logging to file instead of stdout.
   - Manually gc.collect() (as this doesn't release any memory I figure
   that it's not a reference problem but rather that the actual process wont
   release memory to the container once freed by the gc).
   - Removed the dependency injection and calling redis from the service
   itself.
   - Decrease and increase number of workers.
   - Simplify the service to just contain a timer, which logs a message
   every fifth second (still memory increases)

I understand that this might be perceived as to little information to
give any valuable input. But I sincerely don't know how to proceed. Anyone
got any ideas or see anything really strange with the example? Anyone
experienced something similar?

If you came this far, thank you for reading and taking your time! Any
help would be very appreciated!

Disclaimer: I don't necessarily believe this has to do with Nameko in
particular but rather some combination of errors/dependencies/bad
implementation.

I've made another test using minikube (local kubertes ish). The memory
problem persists, although through cAdvisor I can see that the mem increase
happens continuously at a steady pace:

<https://lh3.googleusercontent.com/-fQDcGhUK4l8/WaKniJUKj6I/AAAAAAAAHdI/iyMRrS7K9Vo8Lp3cjhiSVNd2YxXTx3lQACLcBGAs/s1600/Screen%2BShot%2B2017-08-27%2Bat%2B13.01.35.png&gt;

···

Den söndag 27 augusti 2017 kl. 11:44:30 UTC+2 skrev David Szotten:

Hi Johan,

I'm sorry you're having issues. However, there's still quite a lot going
on in your example, and a few unknowns. Could you try reducing the example
further? (e.g. removing redis, and even the event dispatching? Do you still
see the problem? What if you remove nameko entirely and just log stuff in a
loop?) With a 5 second timer, how long before you see anything happening?

Also, can you reproduce the memory increase without docker? (do you see
the process memory usage grow over time) If not, what's your docker setup?
Could you post a complete setup that we can `docker run` (or
`docker-compose run`)? That may be easier as a github repo

Best,
David

On Saturday, 26 August 2017 15:11:36 UTC+1, Johan Frisell wrote:

*First of a spec of the setup I'm running:*

   - AWS EC2 instances in the bottom layer.
   - Kubernetes hosting Docker containers
   - Clustered RabbitMQ (two servers) with an ELB in front.
   - Redis
   - Influx
   - Nameko 2.6.0
   
*Problem:*

In short I'm experiencing memory allocation problems. Where the python
process running Nameko wont release the memory back to the container,
causing the container to reboot when it hits its memory limit (in my case
512MB). An easy way to reproduce this in my environment is to log some
random messages. The more I log the bigger the leak. However this is not in
any way a proof that the logging is the core problem, but maybe an
indication?

Below is a sample code I use to reproduce the problem. As you might
notice I have a circular event dispatch/handling, this is just for testing
purposes. My "real" code does not have this implementation but still
experience the same problem. Even with the most simple implementation, a
service with a simple logging.info("message") will experience a memory
leak.

*service.py*
import logging

from nameko.timer import timer
from nameko.events import event_handler
from nameko.events import EventDispatcher

from service.redis_cli import RedisDependency

class Service:
    name = "service"
    dispatcher = EventDispatcher()
    r_db = RedisDependency()

    @timer(interval=5)
    def log(self):
        logging.info("This is a rather long log message?")
        self.dispatcher("publish", {"message": "hello world"})

    @event_handler("uc_sensor_pipeline", "new_reading")
    def do_log(self, payload):
        logging.info(payload)

    @event_handler("service", "publish")
    def handle_publish(self, payload):
        self.r_db.lpush('logging', '123', payload)
        logging.info(payload['message'])

*redis_cli.py*
from nameko.extensions import DependencyProvider

from redis import StrictRedis

class RedisWrapper:
    def __init__(self, client):
        self.client = client

    def lpush(self, prefix, key, value):
        full_key = self._format_key(prefix, key)
        self.client.lpush(full_key, value)
        self.client.ltrim(full_key, 0, 9)

    def get(self, prefix, key):
        full_key = self._format_key(prefix, key)
        return self.client.lrange(full_key, 0, -1)

    def _decode(self, item):
        return { k.decode(): v.decode() for k, v in item.items() }

    def _format_key(self, prefix, key):
        return '{0}:{1}'.format(prefix, key)

class RedisDependency(DependencyProvider):
    def setup(self):
        self.client = StrictRedis.from_url(
            self.container.config.get('REDIS_URI'))

    def get_dependency(self, worker_ctx):
        return RedisWrapper(self.client)

*config.yml*
AMQP_URI:
amqp://${RMQ_USER:guest}:${RMQ_PASSWORD:guest}@${RMQ_HOST:localhost}
REDIS_URI: redis://myredishost:6379

LOGGING:
    version: 1
    handlers:
        console:
            class: logging.StreamHandler
        file:
            class: logging.handlers.RotatingFileHandler
            filename: service.log
            maxBytes: 1048576
            backupCount: 3
    root:
        level: INFO
        handlers: [console]

*Overview of the mem allocation:*

<https://lh3.googleusercontent.com/-UO65BwWFeBE/WaB5lzVen3I/AAAAAAAAHc0/HO_TvJcuF8oPNuvGZv2L_nXmtJS8XT5ugCLcBGAs/s1600/Screen%2BShot%2B2017-08-25%2Bat%2B21.23.48.png&gt;

I know this is not a huge increase but it follows a pattern and is steady
(will eventually cause the container to reboot due to hitting mem limit).
In containers with more load it would leak memory faster.

*What have I tried (without luck) so far:*

   - Logging to file instead of stdout.
   - Manually gc.collect() (as this doesn't release any memory I figure
   that it's not a reference problem but rather that the actual process wont
   release memory to the container once freed by the gc).
   - Removed the dependency injection and calling redis from the service
   itself.
   - Decrease and increase number of workers.
   - Simplify the service to just contain a timer, which logs a message
   every fifth second (still memory increases)

I understand that this might be perceived as to little information to
give any valuable input. But I sincerely don't know how to proceed. Anyone
got any ideas or see anything really strange with the example? Anyone
experienced something similar?

If you came this far, thank you for reading and taking your time! Any
help would be very appreciated!

Disclaimer: I don't necessarily believe this has to do with Nameko in
particular but rather some combination of errors/dependencies/bad
implementation.

Sorry for spamming this topic, I will take the time to compose a summary
once I have something valuable to share or maybe even a solution to the
problem. But I have made some testing without docker at all. Hosting my
nameko service on a pure EC2 instance with the same configuration as my
containers (dist, installed deps etc). The process doesn't seem to leak
memory in this case.

This narrows it down to Nameko in combination with Docker and/or
Kubernetes. I guess I'll have to host a pure Docker container as well. To
possibly rule out Kubernetes. Just wanted to give you an update on the
progress. If you have any ideas this would be most welcome!

Kindest,

···

Den söndag 27 augusti 2017 kl. 11:44:30 UTC+2 skrev David Szotten:

Hi Johan,

I'm sorry you're having issues. However, there's still quite a lot going
on in your example, and a few unknowns. Could you try reducing the example
further? (e.g. removing redis, and even the event dispatching? Do you still
see the problem? What if you remove nameko entirely and just log stuff in a
loop?) With a 5 second timer, how long before you see anything happening?

Also, can you reproduce the memory increase without docker? (do you see
the process memory usage grow over time) If not, what's your docker setup?
Could you post a complete setup that we can `docker run` (or
`docker-compose run`)? That may be easier as a github repo

Best,
David

On Saturday, 26 August 2017 15:11:36 UTC+1, Johan Frisell wrote:

*First of a spec of the setup I'm running:*

   - AWS EC2 instances in the bottom layer.
   - Kubernetes hosting Docker containers
   - Clustered RabbitMQ (two servers) with an ELB in front.
   - Redis
   - Influx
   - Nameko 2.6.0
   
*Problem:*

In short I'm experiencing memory allocation problems. Where the python
process running Nameko wont release the memory back to the container,
causing the container to reboot when it hits its memory limit (in my case
512MB). An easy way to reproduce this in my environment is to log some
random messages. The more I log the bigger the leak. However this is not in
any way a proof that the logging is the core problem, but maybe an
indication?

Below is a sample code I use to reproduce the problem. As you might
notice I have a circular event dispatch/handling, this is just for testing
purposes. My "real" code does not have this implementation but still
experience the same problem. Even with the most simple implementation, a
service with a simple logging.info("message") will experience a memory
leak.

*service.py*
import logging

from nameko.timer import timer
from nameko.events import event_handler
from nameko.events import EventDispatcher

from service.redis_cli import RedisDependency

class Service:
    name = "service"
    dispatcher = EventDispatcher()
    r_db = RedisDependency()

    @timer(interval=5)
    def log(self):
        logging.info("This is a rather long log message?")
        self.dispatcher("publish", {"message": "hello world"})

    @event_handler("uc_sensor_pipeline", "new_reading")
    def do_log(self, payload):
        logging.info(payload)

    @event_handler("service", "publish")
    def handle_publish(self, payload):
        self.r_db.lpush('logging', '123', payload)
        logging.info(payload['message'])

*redis_cli.py*
from nameko.extensions import DependencyProvider

from redis import StrictRedis

class RedisWrapper:
    def __init__(self, client):
        self.client = client

    def lpush(self, prefix, key, value):
        full_key = self._format_key(prefix, key)
        self.client.lpush(full_key, value)
        self.client.ltrim(full_key, 0, 9)

    def get(self, prefix, key):
        full_key = self._format_key(prefix, key)
        return self.client.lrange(full_key, 0, -1)

    def _decode(self, item):
        return { k.decode(): v.decode() for k, v in item.items() }

    def _format_key(self, prefix, key):
        return '{0}:{1}'.format(prefix, key)

class RedisDependency(DependencyProvider):
    def setup(self):
        self.client = StrictRedis.from_url(
            self.container.config.get('REDIS_URI'))

    def get_dependency(self, worker_ctx):
        return RedisWrapper(self.client)

*config.yml*
AMQP_URI:
amqp://${RMQ_USER:guest}:${RMQ_PASSWORD:guest}@${RMQ_HOST:localhost}
REDIS_URI: redis://myredishost:6379

LOGGING:
    version: 1
    handlers:
        console:
            class: logging.StreamHandler
        file:
            class: logging.handlers.RotatingFileHandler
            filename: service.log
            maxBytes: 1048576
            backupCount: 3
    root:
        level: INFO
        handlers: [console]

*Overview of the mem allocation:*

<https://lh3.googleusercontent.com/-UO65BwWFeBE/WaB5lzVen3I/AAAAAAAAHc0/HO_TvJcuF8oPNuvGZv2L_nXmtJS8XT5ugCLcBGAs/s1600/Screen%2BShot%2B2017-08-25%2Bat%2B21.23.48.png&gt;

I know this is not a huge increase but it follows a pattern and is steady
(will eventually cause the container to reboot due to hitting mem limit).
In containers with more load it would leak memory faster.

*What have I tried (without luck) so far:*

   - Logging to file instead of stdout.
   - Manually gc.collect() (as this doesn't release any memory I figure
   that it's not a reference problem but rather that the actual process wont
   release memory to the container once freed by the gc).
   - Removed the dependency injection and calling redis from the service
   itself.
   - Decrease and increase number of workers.
   - Simplify the service to just contain a timer, which logs a message
   every fifth second (still memory increases)

I understand that this might be perceived as to little information to
give any valuable input. But I sincerely don't know how to proceed. Anyone
got any ideas or see anything really strange with the example? Anyone
experienced something similar?

If you came this far, thank you for reading and taking your time! Any
help would be very appreciated!

Disclaimer: I don't necessarily believe this has to do with Nameko in
particular but rather some combination of errors/dependencies/bad
implementation.

I can now confirm that no leak exists with the simple python script. Only
when i incorporate a nameko service.

···

Den söndag 27 augusti 2017 kl. 12:38:52 UTC+2 skrev Johan Frisell:

Hi,

Thanks for your quick reply, much appreciated!

I've tried almost all settings concerning stripping down my nameko service:

   - Removing the redis dep
   - Removing event dispatcher
   - Removing even handler
   - Combinations of the above

However I'm still experiencing the problem. The only conclusion made so
far is: the more load the bigger leak, which in my opinion indicates some
kind of integration problem towards underlying platform, logging not
releasing memory to the container or something similar.

I've created a small test with just a super simple python script logging
messages to stdout.

import logging
from time import sleep

def setup():
    logger = logging.getLogger('spam_application')
    logger.setLevel(logging.DEBUG)

    ch = logging.StreamHandler()
    ch.setLevel(logging.INFO)

    fh = logging.FileHandler('spam.log')
    fh.setLevel(logging.DEBUG)

    formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s
- %(message)s')
    ch.setFormatter(formatter)
    fh.setFormatter(formatter)

    logger.addHandler(ch)
    # logger.addHandler(fh)

def work():
    logger = logging.getLogger('spam_application')

    while True:
        logger.info("Hello World")
        sleep(1)

if __name__ == "__main__":
    setup()
    work()

As of now, it doesn't leak any memory. But since the mem consumption in
general is so small for such a script I need to run it a for a bit longer
before posting a graph.

Do you have any thoughts so far?

Den söndag 27 augusti 2017 kl. 11:44:30 UTC+2 skrev David Szotten:

Hi Johan,

I'm sorry you're having issues. However, there's still quite a lot going
on in your example, and a few unknowns. Could you try reducing the example
further? (e.g. removing redis, and even the event dispatching? Do you still
see the problem? What if you remove nameko entirely and just log stuff in a
loop?) With a 5 second timer, how long before you see anything happening?

Also, can you reproduce the memory increase without docker? (do you see
the process memory usage grow over time) If not, what's your docker setup?
Could you post a complete setup that we can `docker run` (or
`docker-compose run`)? That may be easier as a github repo

Best,
David

On Saturday, 26 August 2017 15:11:36 UTC+1, Johan Frisell wrote:

*First of a spec of the setup I'm running:*

   - AWS EC2 instances in the bottom layer.
   - Kubernetes hosting Docker containers
   - Clustered RabbitMQ (two servers) with an ELB in front.
   - Redis
   - Influx
   - Nameko 2.6.0
   
*Problem:*

In short I'm experiencing memory allocation problems. Where the python
process running Nameko wont release the memory back to the container,
causing the container to reboot when it hits its memory limit (in my case
512MB). An easy way to reproduce this in my environment is to log some
random messages. The more I log the bigger the leak. However this is not in
any way a proof that the logging is the core problem, but maybe an
indication?

Below is a sample code I use to reproduce the problem. As you might
notice I have a circular event dispatch/handling, this is just for testing
purposes. My "real" code does not have this implementation but still
experience the same problem. Even with the most simple implementation, a
service with a simple logging.info("message") will experience a memory
leak.

*service.py*
import logging

from nameko.timer import timer
from nameko.events import event_handler
from nameko.events import EventDispatcher

from service.redis_cli import RedisDependency

class Service:
    name = "service"
    dispatcher = EventDispatcher()
    r_db = RedisDependency()

    @timer(interval=5)
    def log(self):
        logging.info("This is a rather long log message?")
        self.dispatcher("publish", {"message": "hello world"})

    @event_handler("uc_sensor_pipeline", "new_reading")
    def do_log(self, payload):
        logging.info(payload)

    @event_handler("service", "publish")
    def handle_publish(self, payload):
        self.r_db.lpush('logging', '123', payload)
        logging.info(payload['message'])

*redis_cli.py*
from nameko.extensions import DependencyProvider

from redis import StrictRedis

class RedisWrapper:
    def __init__(self, client):
        self.client = client

    def lpush(self, prefix, key, value):
        full_key = self._format_key(prefix, key)
        self.client.lpush(full_key, value)
        self.client.ltrim(full_key, 0, 9)

    def get(self, prefix, key):
        full_key = self._format_key(prefix, key)
        return self.client.lrange(full_key, 0, -1)

    def _decode(self, item):
        return { k.decode(): v.decode() for k, v in item.items() }

    def _format_key(self, prefix, key):
        return '{0}:{1}'.format(prefix, key)

class RedisDependency(DependencyProvider):
    def setup(self):
        self.client = StrictRedis.from_url(
            self.container.config.get('REDIS_URI'))

    def get_dependency(self, worker_ctx):
        return RedisWrapper(self.client)

*config.yml*
AMQP_URI:
amqp://${RMQ_USER:guest}:${RMQ_PASSWORD:guest}@${RMQ_HOST:localhost}
REDIS_URI: redis://myredishost:6379

LOGGING:
    version: 1
    handlers:
        console:
            class: logging.StreamHandler
        file:
            class: logging.handlers.RotatingFileHandler
            filename: service.log
            maxBytes: 1048576
            backupCount: 3
    root:
        level: INFO
        handlers: [console]

*Overview of the mem allocation:*

<https://lh3.googleusercontent.com/-UO65BwWFeBE/WaB5lzVen3I/AAAAAAAAHc0/HO_TvJcuF8oPNuvGZv2L_nXmtJS8XT5ugCLcBGAs/s1600/Screen%2BShot%2B2017-08-25%2Bat%2B21.23.48.png&gt;

I know this is not a huge increase but it follows a pattern and is
steady (will eventually cause the container to reboot due to hitting mem
limit). In containers with more load it would leak memory faster.

*What have I tried (without luck) so far:*

   - Logging to file instead of stdout.
   - Manually gc.collect() (as this doesn't release any memory I figure
   that it's not a reference problem but rather that the actual process wont
   release memory to the container once freed by the gc).
   - Removed the dependency injection and calling redis from the
   service itself.
   - Decrease and increase number of workers.
   - Simplify the service to just contain a timer, which logs a message
   every fifth second (still memory increases)

I understand that this might be perceived as to little information to
give any valuable input. But I sincerely don't know how to proceed. Anyone
got any ideas or see anything really strange with the example? Anyone
experienced something similar?

If you came this far, thank you for reading and taking your time! Any
help would be very appreciated!

Disclaimer: I don't necessarily believe this has to do with Nameko in
particular but rather some combination of errors/dependencies/bad
implementation.

Hi Johan,

Thanks for keeping us up to date with your investigations. Two useful looks
I've found really helpful for finding memory leaks in the past are Pympler
<Identifying memory leaks — Pympler 0.5 documentation; and Objgraph
<https://mg.pov.lt/objgraph/&gt;\.

Use the Nameko backdoor (which deserves its own page in the docs, but
doesn't have one) to get a REPL inside the running process:

$ nameko run service --backdoor 3000
...

$ nameko backdoor 3000

from pympler import muppy
all_objects = muppy.get_objects() # will take a while
from pympler import summary
sum = summary.summarize(all_objects)
summary.print_(sum)

...

Matt.

···

On Sunday, August 27, 2017 at 7:13:12 PM UTC+1, Johan Frisell wrote:

Sorry for spamming this topic, I will take the time to compose a summary
once I have something valuable to share or maybe even a solution to the
problem. But I have made some testing without docker at all. Hosting my
nameko service on a pure EC2 instance with the same configuration as my
containers (dist, installed deps etc). The process doesn't seem to leak
memory in this case.

This narrows it down to Nameko in combination with Docker and/or
Kubernetes. I guess I'll have to host a pure Docker container as well. To
possibly rule out Kubernetes. Just wanted to give you an update on the
progress. If you have any ideas this would be most welcome!

Kindest,

Den söndag 27 augusti 2017 kl. 11:44:30 UTC+2 skrev David Szotten:

Hi Johan,

I'm sorry you're having issues. However, there's still quite a lot going
on in your example, and a few unknowns. Could you try reducing the example
further? (e.g. removing redis, and even the event dispatching? Do you still
see the problem? What if you remove nameko entirely and just log stuff in a
loop?) With a 5 second timer, how long before you see anything happening?

Also, can you reproduce the memory increase without docker? (do you see
the process memory usage grow over time) If not, what's your docker setup?
Could you post a complete setup that we can `docker run` (or
`docker-compose run`)? That may be easier as a github repo

Best,
David

On Saturday, 26 August 2017 15:11:36 UTC+1, Johan Frisell wrote:

*First of a spec of the setup I'm running:*

   - AWS EC2 instances in the bottom layer.
   - Kubernetes hosting Docker containers
   - Clustered RabbitMQ (two servers) with an ELB in front.
   - Redis
   - Influx
   - Nameko 2.6.0
   
*Problem:*

In short I'm experiencing memory allocation problems. Where the python
process running Nameko wont release the memory back to the container,
causing the container to reboot when it hits its memory limit (in my case
512MB). An easy way to reproduce this in my environment is to log some
random messages. The more I log the bigger the leak. However this is not in
any way a proof that the logging is the core problem, but maybe an
indication?

Below is a sample code I use to reproduce the problem. As you might
notice I have a circular event dispatch/handling, this is just for testing
purposes. My "real" code does not have this implementation but still
experience the same problem. Even with the most simple implementation, a
service with a simple logging.info("message") will experience a memory
leak.

*service.py*
import logging

from nameko.timer import timer
from nameko.events import event_handler
from nameko.events import EventDispatcher

from service.redis_cli import RedisDependency

class Service:
    name = "service"
    dispatcher = EventDispatcher()
    r_db = RedisDependency()

    @timer(interval=5)
    def log(self):
        logging.info("This is a rather long log message?")
        self.dispatcher("publish", {"message": "hello world"})

    @event_handler("uc_sensor_pipeline", "new_reading")
    def do_log(self, payload):
        logging.info(payload)

    @event_handler("service", "publish")
    def handle_publish(self, payload):
        self.r_db.lpush('logging', '123', payload)
        logging.info(payload['message'])

*redis_cli.py*
from nameko.extensions import DependencyProvider

from redis import StrictRedis

class RedisWrapper:
    def __init__(self, client):
        self.client = client

    def lpush(self, prefix, key, value):
        full_key = self._format_key(prefix, key)
        self.client.lpush(full_key, value)
        self.client.ltrim(full_key, 0, 9)

    def get(self, prefix, key):
        full_key = self._format_key(prefix, key)
        return self.client.lrange(full_key, 0, -1)

    def _decode(self, item):
        return { k.decode(): v.decode() for k, v in item.items() }

    def _format_key(self, prefix, key):
        return '{0}:{1}'.format(prefix, key)

class RedisDependency(DependencyProvider):
    def setup(self):
        self.client = StrictRedis.from_url(
            self.container.config.get('REDIS_URI'))

    def get_dependency(self, worker_ctx):
        return RedisWrapper(self.client)

*config.yml*
AMQP_URI:
amqp://${RMQ_USER:guest}:${RMQ_PASSWORD:guest}@${RMQ_HOST:localhost}
REDIS_URI: redis://myredishost:6379

LOGGING:
    version: 1
    handlers:
        console:
            class: logging.StreamHandler
        file:
            class: logging.handlers.RotatingFileHandler
            filename: service.log
            maxBytes: 1048576
            backupCount: 3
    root:
        level: INFO
        handlers: [console]

*Overview of the mem allocation:*

<https://lh3.googleusercontent.com/-UO65BwWFeBE/WaB5lzVen3I/AAAAAAAAHc0/HO_TvJcuF8oPNuvGZv2L_nXmtJS8XT5ugCLcBGAs/s1600/Screen%2BShot%2B2017-08-25%2Bat%2B21.23.48.png&gt;

I know this is not a huge increase but it follows a pattern and is
steady (will eventually cause the container to reboot due to hitting mem
limit). In containers with more load it would leak memory faster.

*What have I tried (without luck) so far:*

   - Logging to file instead of stdout.
   - Manually gc.collect() (as this doesn't release any memory I figure
   that it's not a reference problem but rather that the actual process wont
   release memory to the container once freed by the gc).
   - Removed the dependency injection and calling redis from the
   service itself.
   - Decrease and increase number of workers.
   - Simplify the service to just contain a timer, which logs a message
   every fifth second (still memory increases)

I understand that this might be perceived as to little information to
give any valuable input. But I sincerely don't know how to proceed. Anyone
got any ideas or see anything really strange with the example? Anyone
experienced something similar?

If you came this far, thank you for reading and taking your time! Any
help would be very appreciated!

Disclaimer: I don't necessarily believe this has to do with Nameko in
particular but rather some combination of errors/dependencies/bad
implementation.

Just wanted to get back in touch. I've not solved the problem yet, a sprint
just came in the way :slight_smile:

Tried the backdoor, however I can't get it to work on in my docker
container. I get

netcat: invalid option -- '-'

whenever I try to connect to the backdoor. Probably some underlying
dependency missing in my container (ubuntu:16:04 based with rlwrap and
netcat installed), I'll get back as soon as I have anything to report.

Kindest,
Johan

···

Den måndag 28 augusti 2017 kl. 08:44:28 UTC+2 skrev Matt Yule-Bennett:

Hi Johan,

Thanks for keeping us up to date with your investigations. Two useful
looks I've found really helpful for finding memory leaks in the past are
Pympler <Identifying memory leaks — Pympler 0.5 documentation; and Objgraph
<https://mg.pov.lt/objgraph/&gt;\.

Use the Nameko backdoor (which deserves its own page in the docs, but
doesn't have one) to get a REPL inside the running process:

$ nameko run service --backdoor 3000
...

$ nameko backdoor 3000
>>> from pympler import muppy
>>> all_objects = muppy.get_objects() # will take a while
>>> from pympler import summary
>>> sum = summary.summarize(all_objects)
>>> summary.print_(sum)
...

Matt.

On Sunday, August 27, 2017 at 7:13:12 PM UTC+1, Johan Frisell wrote:

Sorry for spamming this topic, I will take the time to compose a summary
once I have something valuable to share or maybe even a solution to the
problem. But I have made some testing without docker at all. Hosting my
nameko service on a pure EC2 instance with the same configuration as my
containers (dist, installed deps etc). The process doesn't seem to leak
memory in this case.

This narrows it down to Nameko in combination with Docker and/or
Kubernetes. I guess I'll have to host a pure Docker container as well. To
possibly rule out Kubernetes. Just wanted to give you an update on the
progress. If you have any ideas this would be most welcome!

Kindest,

Den söndag 27 augusti 2017 kl. 11:44:30 UTC+2 skrev David Szotten:

Hi Johan,

I'm sorry you're having issues. However, there's still quite a lot going
on in your example, and a few unknowns. Could you try reducing the example
further? (e.g. removing redis, and even the event dispatching? Do you still
see the problem? What if you remove nameko entirely and just log stuff in a
loop?) With a 5 second timer, how long before you see anything happening?

Also, can you reproduce the memory increase without docker? (do you see
the process memory usage grow over time) If not, what's your docker setup?
Could you post a complete setup that we can `docker run` (or
`docker-compose run`)? That may be easier as a github repo

Best,
David

On Saturday, 26 August 2017 15:11:36 UTC+1, Johan Frisell wrote:

*First of a spec of the setup I'm running:*

   - AWS EC2 instances in the bottom layer.
   - Kubernetes hosting Docker containers
   - Clustered RabbitMQ (two servers) with an ELB in front.
   - Redis
   - Influx
   - Nameko 2.6.0
   
*Problem:*

In short I'm experiencing memory allocation problems. Where the python
process running Nameko wont release the memory back to the container,
causing the container to reboot when it hits its memory limit (in my case
512MB). An easy way to reproduce this in my environment is to log some
random messages. The more I log the bigger the leak. However this is not in
any way a proof that the logging is the core problem, but maybe an
indication?

Below is a sample code I use to reproduce the problem. As you might
notice I have a circular event dispatch/handling, this is just for testing
purposes. My "real" code does not have this implementation but still
experience the same problem. Even with the most simple implementation, a
service with a simple logging.info("message") will experience a memory
leak.

*service.py*
import logging

from nameko.timer import timer
from nameko.events import event_handler
from nameko.events import EventDispatcher

from service.redis_cli import RedisDependency

class Service:
    name = "service"
    dispatcher = EventDispatcher()
    r_db = RedisDependency()

    @timer(interval=5)
    def log(self):
        logging.info("This is a rather long log message?")
        self.dispatcher("publish", {"message": "hello world"})

    @event_handler("uc_sensor_pipeline", "new_reading")
    def do_log(self, payload):
        logging.info(payload)

    @event_handler("service", "publish")
    def handle_publish(self, payload):
        self.r_db.lpush('logging', '123', payload)
        logging.info(payload['message'])

*redis_cli.py*
from nameko.extensions import DependencyProvider

from redis import StrictRedis

class RedisWrapper:
    def __init__(self, client):
        self.client = client

    def lpush(self, prefix, key, value):
        full_key = self._format_key(prefix, key)
        self.client.lpush(full_key, value)
        self.client.ltrim(full_key, 0, 9)

    def get(self, prefix, key):
        full_key = self._format_key(prefix, key)
        return self.client.lrange(full_key, 0, -1)

    def _decode(self, item):
        return { k.decode(): v.decode() for k, v in item.items() }

    def _format_key(self, prefix, key):
        return '{0}:{1}'.format(prefix, key)

class RedisDependency(DependencyProvider):
    def setup(self):
        self.client = StrictRedis.from_url(
            self.container.config.get('REDIS_URI'))

    def get_dependency(self, worker_ctx):
        return RedisWrapper(self.client)

*config.yml*
AMQP_URI:
amqp://${RMQ_USER:guest}:${RMQ_PASSWORD:guest}@${RMQ_HOST:localhost}
REDIS_URI: redis://myredishost:6379

LOGGING:
    version: 1
    handlers:
        console:
            class: logging.StreamHandler
        file:
            class: logging.handlers.RotatingFileHandler
            filename: service.log
            maxBytes: 1048576
            backupCount: 3
    root:
        level: INFO
        handlers: [console]

*Overview of the mem allocation:*

<https://lh3.googleusercontent.com/-UO65BwWFeBE/WaB5lzVen3I/AAAAAAAAHc0/HO_TvJcuF8oPNuvGZv2L_nXmtJS8XT5ugCLcBGAs/s1600/Screen%2BShot%2B2017-08-25%2Bat%2B21.23.48.png&gt;

I know this is not a huge increase but it follows a pattern and is
steady (will eventually cause the container to reboot due to hitting mem
limit). In containers with more load it would leak memory faster.

*What have I tried (without luck) so far:*

   - Logging to file instead of stdout.
   - Manually gc.collect() (as this doesn't release any memory I
   figure that it's not a reference problem but rather that the actual process
   wont release memory to the container once freed by the gc).
   - Removed the dependency injection and calling redis from the
   service itself.
   - Decrease and increase number of workers.
   - Simplify the service to just contain a timer, which logs a
   message every fifth second (still memory increases)

I understand that this might be perceived as to little information to
give any valuable input. But I sincerely don't know how to proceed. Anyone
got any ideas or see anything really strange with the example? Anyone
experienced something similar?

If you came this far, thank you for reading and taking your time! Any
help would be very appreciated!

Disclaimer: I don't necessarily believe this has to do with Nameko in
particular but rather some combination of errors/dependencies/bad
implementation.

Oh interesting. The backdoor command is just a convenience wrapper around
rlwrap, netcat and/or telnet. I tend to just use them directly, e.g.
"rlwrap telnet localhost 3000"

If you have time, please raise an issue on the Github page to report the
problems with the backdoor command.

···

On Tuesday, September 5, 2017 at 8:15:14 AM UTC+1, Johan Frisell wrote:

Just wanted to get back in touch. I've not solved the problem yet, a
sprint just came in the way :slight_smile:

Tried the backdoor, however I can't get it to work on in my docker
container. I get

netcat: invalid option -- '-'

whenever I try to connect to the backdoor. Probably some underlying
dependency missing in my container (ubuntu:16:04 based with rlwrap and
netcat installed), I'll get back as soon as I have anything to report.

Kindest,
Johan

Den måndag 28 augusti 2017 kl. 08:44:28 UTC+2 skrev Matt Yule-Bennett:

Hi Johan,

Thanks for keeping us up to date with your investigations. Two useful
looks I've found really helpful for finding memory leaks in the past are
Pympler <Identifying memory leaks — Pympler 0.5 documentation; and Objgraph
<https://mg.pov.lt/objgraph/&gt;\.

Use the Nameko backdoor (which deserves its own page in the docs, but
doesn't have one) to get a REPL inside the running process:

$ nameko run service --backdoor 3000
...

$ nameko backdoor 3000
>>> from pympler import muppy
>>> all_objects = muppy.get_objects() # will take a while
>>> from pympler import summary
>>> sum = summary.summarize(all_objects)
>>> summary.print_(sum)
...

Matt.

On Sunday, August 27, 2017 at 7:13:12 PM UTC+1, Johan Frisell wrote:

Sorry for spamming this topic, I will take the time to compose a summary
once I have something valuable to share or maybe even a solution to the
problem. But I have made some testing without docker at all. Hosting my
nameko service on a pure EC2 instance with the same configuration as my
containers (dist, installed deps etc). The process doesn't seem to leak
memory in this case.

This narrows it down to Nameko in combination with Docker and/or
Kubernetes. I guess I'll have to host a pure Docker container as well. To
possibly rule out Kubernetes. Just wanted to give you an update on the
progress. If you have any ideas this would be most welcome!

Kindest,

Den söndag 27 augusti 2017 kl. 11:44:30 UTC+2 skrev David Szotten:

Hi Johan,

I'm sorry you're having issues. However, there's still quite a lot
going on in your example, and a few unknowns. Could you try reducing the
example further? (e.g. removing redis, and even the event dispatching? Do
you still see the problem? What if you remove nameko entirely and just log
stuff in a loop?) With a 5 second timer, how long before you see anything
happening?

Also, can you reproduce the memory increase without docker? (do you see
the process memory usage grow over time) If not, what's your docker setup?
Could you post a complete setup that we can `docker run` (or
`docker-compose run`)? That may be easier as a github repo

Best,
David

On Saturday, 26 August 2017 15:11:36 UTC+1, Johan Frisell wrote:

*First of a spec of the setup I'm running:*

   - AWS EC2 instances in the bottom layer.
   - Kubernetes hosting Docker containers
   - Clustered RabbitMQ (two servers) with an ELB in front.
   - Redis
   - Influx
   - Nameko 2.6.0
   
*Problem:*

In short I'm experiencing memory allocation problems. Where the python
process running Nameko wont release the memory back to the container,
causing the container to reboot when it hits its memory limit (in my case
512MB). An easy way to reproduce this in my environment is to log some
random messages. The more I log the bigger the leak. However this is not in
any way a proof that the logging is the core problem, but maybe an
indication?

Below is a sample code I use to reproduce the problem. As you might
notice I have a circular event dispatch/handling, this is just for testing
purposes. My "real" code does not have this implementation but still
experience the same problem. Even with the most simple implementation, a
service with a simple logging.info("message") will experience a
memory leak.

*service.py*
import logging

from nameko.timer import timer
from nameko.events import event_handler
from nameko.events import EventDispatcher

from service.redis_cli import RedisDependency

class Service:
    name = "service"
    dispatcher = EventDispatcher()
    r_db = RedisDependency()

    @timer(interval=5)
    def log(self):
        logging.info("This is a rather long log message?")
        self.dispatcher("publish", {"message": "hello world"})

    @event_handler("uc_sensor_pipeline", "new_reading")
    def do_log(self, payload):
        logging.info(payload)

    @event_handler("service", "publish")
    def handle_publish(self, payload):
        self.r_db.lpush('logging', '123', payload)
        logging.info(payload['message'])

*redis_cli.py*
from nameko.extensions import DependencyProvider

from redis import StrictRedis

class RedisWrapper:
    def __init__(self, client):
        self.client = client

    def lpush(self, prefix, key, value):
        full_key = self._format_key(prefix, key)
        self.client.lpush(full_key, value)
        self.client.ltrim(full_key, 0, 9)

    def get(self, prefix, key):
        full_key = self._format_key(prefix, key)
        return self.client.lrange(full_key, 0, -1)

    def _decode(self, item):
        return { k.decode(): v.decode() for k, v in item.items() }

    def _format_key(self, prefix, key):
        return '{0}:{1}'.format(prefix, key)

class RedisDependency(DependencyProvider):
    def setup(self):
        self.client = StrictRedis.from_url(
            self.container.config.get('REDIS_URI'))

    def get_dependency(self, worker_ctx):
        return RedisWrapper(self.client)

*config.yml*
AMQP_URI:
amqp://${RMQ_USER:guest}:${RMQ_PASSWORD:guest}@${RMQ_HOST:localhost}
REDIS_URI: redis://myredishost:6379

LOGGING:
    version: 1
    handlers:
        console:
            class: logging.StreamHandler
        file:
            class: logging.handlers.RotatingFileHandler
            filename: service.log
            maxBytes: 1048576
            backupCount: 3
    root:
        level: INFO
        handlers: [console]

*Overview of the mem allocation:*

<https://lh3.googleusercontent.com/-UO65BwWFeBE/WaB5lzVen3I/AAAAAAAAHc0/HO_TvJcuF8oPNuvGZv2L_nXmtJS8XT5ugCLcBGAs/s1600/Screen%2BShot%2B2017-08-25%2Bat%2B21.23.48.png&gt;

I know this is not a huge increase but it follows a pattern and is
steady (will eventually cause the container to reboot due to hitting mem
limit). In containers with more load it would leak memory faster.

*What have I tried (without luck) so far:*

   - Logging to file instead of stdout.
   - Manually gc.collect() (as this doesn't release any memory I
   figure that it's not a reference problem but rather that the actual process
   wont release memory to the container once freed by the gc).
   - Removed the dependency injection and calling redis from the
   service itself.
   - Decrease and increase number of workers.
   - Simplify the service to just contain a timer, which logs a
   message every fifth second (still memory increases)

I understand that this might be perceived as to little information to
give any valuable input. But I sincerely don't know how to proceed. Anyone
got any ideas or see anything really strange with the example? Anyone
experienced something similar?

If you came this far, thank you for reading and taking your time! Any
help would be very appreciated!

Disclaimer: I don't necessarily believe this has to do with Nameko in
particular but rather some combination of errors/dependencies/bad
implementation.

I might, just want to make sure where the problem lies. Not sure it has to
do with the actual backdoor cli command. Checked the sc and it was pretty
straight forward, not much could go wrong there :slight_smile: I'll raise an issue if I
can pin it down to something more specific.

Thanks for the tip!

···

Den tisdag 5 september 2017 kl. 09:26:43 UTC+2 skrev Matt Yule-Bennett:

Oh interesting. The backdoor command is just a convenience wrapper around
rlwrap, netcat and/or telnet. I tend to just use them directly, e.g.
"rlwrap telnet localhost 3000"

If you have time, please raise an issue on the Github page to report the
problems with the backdoor command.

On Tuesday, September 5, 2017 at 8:15:14 AM UTC+1, Johan Frisell wrote:

Just wanted to get back in touch. I've not solved the problem yet, a
sprint just came in the way :slight_smile:

Tried the backdoor, however I can't get it to work on in my docker
container. I get

netcat: invalid option -- '-'

whenever I try to connect to the backdoor. Probably some underlying
dependency missing in my container (ubuntu:16:04 based with rlwrap and
netcat installed), I'll get back as soon as I have anything to report.

Kindest,
Johan

Den måndag 28 augusti 2017 kl. 08:44:28 UTC+2 skrev Matt Yule-Bennett:

Hi Johan,

Thanks for keeping us up to date with your investigations. Two useful
looks I've found really helpful for finding memory leaks in the past are
Pympler <Identifying memory leaks — Pympler 0.5 documentation; and Objgraph
<https://mg.pov.lt/objgraph/&gt;\.

Use the Nameko backdoor (which deserves its own page in the docs, but
doesn't have one) to get a REPL inside the running process:

$ nameko run service --backdoor 3000
...

$ nameko backdoor 3000
>>> from pympler import muppy
>>> all_objects = muppy.get_objects() # will take a while
>>> from pympler import summary
>>> sum = summary.summarize(all_objects)
>>> summary.print_(sum)
...

Matt.

On Sunday, August 27, 2017 at 7:13:12 PM UTC+1, Johan Frisell wrote:

Sorry for spamming this topic, I will take the time to compose a
summary once I have something valuable to share or maybe even a solution to
the problem. But I have made some testing without docker at all. Hosting my
nameko service on a pure EC2 instance with the same configuration as my
containers (dist, installed deps etc). The process doesn't seem to leak
memory in this case.

This narrows it down to Nameko in combination with Docker and/or
Kubernetes. I guess I'll have to host a pure Docker container as well. To
possibly rule out Kubernetes. Just wanted to give you an update on the
progress. If you have any ideas this would be most welcome!

Kindest,

Den söndag 27 augusti 2017 kl. 11:44:30 UTC+2 skrev David Szotten:

Hi Johan,

I'm sorry you're having issues. However, there's still quite a lot
going on in your example, and a few unknowns. Could you try reducing the
example further? (e.g. removing redis, and even the event dispatching? Do
you still see the problem? What if you remove nameko entirely and just log
stuff in a loop?) With a 5 second timer, how long before you see anything
happening?

Also, can you reproduce the memory increase without docker? (do you
see the process memory usage grow over time) If not, what's your docker
setup? Could you post a complete setup that we can `docker run` (or
`docker-compose run`)? That may be easier as a github repo

Best,
David

On Saturday, 26 August 2017 15:11:36 UTC+1, Johan Frisell wrote:

*First of a spec of the setup I'm running:*

   - AWS EC2 instances in the bottom layer.
   - Kubernetes hosting Docker containers
   - Clustered RabbitMQ (two servers) with an ELB in front.
   - Redis
   - Influx
   - Nameko 2.6.0
   
*Problem:*

In short I'm experiencing memory allocation problems. Where the
python process running Nameko wont release the memory back to the
container, causing the container to reboot when it hits its memory limit
(in my case 512MB). An easy way to reproduce this in my environment is to
log some random messages. The more I log the bigger the leak. However this
is not in any way a proof that the logging is the core problem, but maybe
an indication?

Below is a sample code I use to reproduce the problem. As you might
notice I have a circular event dispatch/handling, this is just for testing
purposes. My "real" code does not have this implementation but still
experience the same problem. Even with the most simple implementation, a
service with a simple logging.info("message") will experience a
memory leak.

*service.py*
import logging

from nameko.timer import timer
from nameko.events import event_handler
from nameko.events import EventDispatcher

from service.redis_cli import RedisDependency

class Service:
    name = "service"
    dispatcher = EventDispatcher()
    r_db = RedisDependency()

    @timer(interval=5)
    def log(self):
        logging.info("This is a rather long log message?")
        self.dispatcher("publish", {"message": "hello world"})

    @event_handler("uc_sensor_pipeline", "new_reading")
    def do_log(self, payload):
        logging.info(payload)

    @event_handler("service", "publish")
    def handle_publish(self, payload):
        self.r_db.lpush('logging', '123', payload)
        logging.info(payload['message'])

*redis_cli.py*
from nameko.extensions import DependencyProvider

from redis import StrictRedis

class RedisWrapper:
    def __init__(self, client):
        self.client = client

    def lpush(self, prefix, key, value):
        full_key = self._format_key(prefix, key)
        self.client.lpush(full_key, value)
        self.client.ltrim(full_key, 0, 9)

    def get(self, prefix, key):
        full_key = self._format_key(prefix, key)
        return self.client.lrange(full_key, 0, -1)

    def _decode(self, item):
        return { k.decode(): v.decode() for k, v in item.items() }

    def _format_key(self, prefix, key):
        return '{0}:{1}'.format(prefix, key)

class RedisDependency(DependencyProvider):
    def setup(self):
        self.client = StrictRedis.from_url(
            self.container.config.get('REDIS_URI'))

    def get_dependency(self, worker_ctx):
        return RedisWrapper(self.client)

*config.yml*
AMQP_URI:
amqp://${RMQ_USER:guest}:${RMQ_PASSWORD:guest}@${RMQ_HOST:localhost}
REDIS_URI: redis://myredishost:6379

LOGGING:
    version: 1
    handlers:
        console:
            class: logging.StreamHandler
        file:
            class: logging.handlers.RotatingFileHandler
            filename: service.log
            maxBytes: 1048576
            backupCount: 3
    root:
        level: INFO
        handlers: [console]

*Overview of the mem allocation:*

<https://lh3.googleusercontent.com/-UO65BwWFeBE/WaB5lzVen3I/AAAAAAAAHc0/HO_TvJcuF8oPNuvGZv2L_nXmtJS8XT5ugCLcBGAs/s1600/Screen%2BShot%2B2017-08-25%2Bat%2B21.23.48.png&gt;

I know this is not a huge increase but it follows a pattern and is
steady (will eventually cause the container to reboot due to hitting mem
limit). In containers with more load it would leak memory faster.

*What have I tried (without luck) so far:*

   - Logging to file instead of stdout.
   - Manually gc.collect() (as this doesn't release any memory I
   figure that it's not a reference problem but rather that the actual process
   wont release memory to the container once freed by the gc).
   - Removed the dependency injection and calling redis from the
   service itself.
   - Decrease and increase number of workers.
   - Simplify the service to just contain a timer, which logs a
   message every fifth second (still memory increases)

I understand that this might be perceived as to little information to
give any valuable input. But I sincerely don't know how to proceed. Anyone
got any ideas or see anything really strange with the example? Anyone
experienced something similar?

If you came this far, thank you for reading and taking your time! Any
help would be very appreciated!

Disclaimer: I don't necessarily believe this has to do with Nameko in
particular but rather some combination of errors/dependencies/bad
implementation.