I know I have a bad habit of replying to myself, but I figure it's polite
to close the loop 
I have confirmed that there is a disk IOPS issue. I used SaltStack to send
the following command to all the workers at once (race!):
`fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1
--name=test --filename=test --bs=4k --iodepth=64 --size=4G
--readwrite=randrw --rwmixread=75 | grep iops`
The results (below) clearly show wildly variable IOPS performance.
So what looked like a Nameko/Kombu issue (those were the only errors being
thrown) was actually more fundamental 
Thanks for listening.
worker-02-p:
TERM environment variable not set.
/bin/sh: 1: cd: can't cd to ~
read : io=3071.7MB, bw=53605KB/s, iops=13401, runt= 58677msec
write: io=1024.4MB, bw=17876KB/s, iops=4469, runt= 58677msec
worker-05-p:
TERM environment variable not set.
/bin/sh: 1: cd: can't cd to ~
read : io=3071.7MB, bw=13852KB/s, iops=3463, runt=227064msec
write: io=1024.4MB, bw=4619.5KB/s, iops=1154, runt=227064msec
worker-07-p:
TERM environment variable not set.
/bin/sh: 1: cd: can't cd to ~
read : io=3071.7MB, bw=7588.3KB/s, iops=1897, runt=414510msec
write: io=1024.4MB, bw=2530.6KB/s, iops=632, runt=414510msec
worker-06-p:
TERM environment variable not set.
/bin/sh: 1: cd: can't cd to ~
read : io=3071.7MB, bw=6205.5KB/s, iops=1551, runt=506908msec
write: io=1024.4MB, bw=2069.3KB/s, iops=517, runt=506908msec
worker-03-p:
TERM environment variable not set.
/bin/sh: 1: cd: can't cd to ~
read : io=3071.7MB, bw=6203.6KB/s, iops=1550, runt=507030msec
write: io=1024.4MB, bw=2068.8KB/s, iops=517, runt=507030msec
worker-08-p:
TERM environment variable not set.
/bin/sh: 1: cd: can't cd to ~
read : io=3071.7MB, bw=6049.2KB/s, iops=1512, runt=519969msec
write: io=1024.4MB, bw=2017.3KB/s, iops=504, runt=519969msec
worker-04-p:
TERM environment variable not set.
/bin/sh: 1: cd: can't cd to ~
read : io=3071.7MB, bw=6021.4KB/s, iops=1505, runt=522370msec
write: io=1024.4MB, bw=2007.2KB/s, iops=501, runt=522370msec
worker-01-p:
TERM environment variable not set.
/bin/sh: 1: cd: can't cd to ~
read : io=3071.7MB, bw=3877.2KB/s, iops=969, runt=811256msec
write: io=1024.4MB, bw=1292.1KB/s, iops=323, runt=811256msec
···
On Friday, March 9, 2018 at 3:22:17 PM UTC-8, juko...@gmail.com wrote:
I hope someone can help me....
So I have a process that marshals several co-routines via a
concurrent.futures.ThreadPoolExecutor.
This works beautifully for quick processes.... but not all of my processes
are quick.... and in fact one particular one (transcoding audio) is
downright slooow.
And so I have fallen foul of RPC erroneously restarted if connection between worker and broker times out · Issue #477 · nameko/nameko · GitHub
where the ThreadPool is blocking the heartbeats, which causes the worker to
restart (and the whole thing starts again)
Now, I was able to switch the 'worker' side of this to the 'greened'
Subprocess - so the actual Transcode shouldn't be blocking anymore.
`from eventlet.green import subprocess`
But I am pretty sure the issue is now with the ThreadPool.
I'm am a total newbie to Eventlet - and honestly didn't anticipate this
kind of nuanced issue. So I'm not entirely sure how I can fix this. Does
anyone have any thoughts?
I could try running it inside a tpool? or am I over-complicating it, and
simply need to use the Futures in a diferent way to how I am right now?
with concurrent.futures.ThreadPoolExecutor(max_workers=25)
as executor:
x = '/net/' + platform.node() + temp_location
futures = [executor.submit(self.audio.transcode,
source_location + src, x + tgt, **transcode)
for src, tgt in filemap]
for future in concurrent.futures.as_completed(futures):
future.result() # Raises exceptions
*any* help, guidance, or pointers *greatly* appreciated 
Geoff