Nameko and selenium

Hi everyone. I’m facing issues with nameko in production for executing selenium spiders, I don’t know if I’m doing everything right (first time working with microservices and nameko was the choice for that). So I will describe my architecture/problem here, let me know if you need some additional information.

Here’s a miro board to help you understand.

  • We have a django application where we input a spreadsheet with the information to be extracted by the spiders. Then, the django application starts a celery task to send these information to the spider microservice, one by one.
  • When the spider microservice receives a payload it fires an event for that data to be validated by another microservice and return ‘OK’ to django.
  • the data validating service do the work and fires 2 possible events:
    1 - data is valid: spider microservice handles this event, runs the spider and sends the result to django through a django rest framework API.
    2 - data is not valid: send the payload to django without processing it, a field is added to inform that the data is invalid.

The problem is: Something happens (I’m trying to find what) and the work just stops with no errors in logs and nothing that suggests an error happened. Celery keeps sending payloads, but nothing changes. It’s like nameko stops receiving the requests. The sheet we’re using has 17k rows, if it’s a smaller, about 100, it goes fine. The break happens around 3.5k and we have on django admin an action to continue the job, when we make the continue action, it works on 2 or 3 and stops again.

Next changes I’ll try: Simplify and make things synchronous. I’m planning to have a celery task for each row and make the data validating service a rpc call, so the celery task waits for the spider to run and gets the info on the result payload

*everything is running on a kubernetes cluster
*the API gateway was a flask application but I changed it to use the nameko http entrypoint
*celery and nameko are sharing the same RabbitMQ instance

I appreciate any help and corrections in the project architecture too. This is my first time working with microservices. Tell me if you need more information about.

There are a lot of moving parts to this architecture and it’s not possible to know from your description where the problem might be.

You should be able to look into your RabbitMQ cluster and see the messages sent between the services. That might give some clues as to what’s going wrong.

Using Celery and Nameko seems overly complex. What are you doing in Celery that can’t be accomplished with an async RPC call?

Celery makes a little transformation to the data and sends to the nameko just what is necessary for the spiders to run. Maybe having another service to make this and distribute the data after that, should be the right way to do it?

When everything stops, there’s always one message unacked in the queue and it stays there forever, but the services are running.

image

[EDIT] add screenshot

If I restart the services it continues the work for a little time and then stops again.