Timer interval drift

I noticed that the timer() tends to drift fairly significantly, this is a one-minute timer service that gained about a second in less than 30 minutes:

starting services: service
2019-03-13T11:13:40.424248
2019-03-13T11:14:40.484334
2019-03-13T11:15:40.506224
2019-03-13T11:16:40.520174
2019-03-13T11:17:40.570385
2019-03-13T11:18:40.625177
2019-03-13T11:19:40.674702
2019-03-13T11:20:40.715643
2019-03-13T11:21:40.774473
2019-03-13T11:22:40.813017
2019-03-13T11:23:40.841699
2019-03-13T11:24:40.900536
2019-03-13T11:25:40.933248
2019-03-13T11:26:40.946421
2019-03-13T11:27:41.001691
2019-03-13T11:28:41.061067
2019-03-13T11:29:41.083242
2019-03-13T11:30:41.141310
2019-03-13T11:31:41.193806
2019-03-13T11:32:41.253867
2019-03-13T11:33:41.309020
2019-03-13T11:34:41.368205
2019-03-13T11:35:41.397235
2019-03-13T11:36:41.451821
2019-03-13T11:37:41.477595

I modified timer.py to use a generator to keep track of the time interval:

  def _run(self):
      def get_next_interval():
            t = time.time()
            count = 0
            while True:
                count += 1
                yield max(t + count * self.interval - time.time(), 0)
        interval = get_next_interval()
        sleep_time = next(interval)
        while True:
            # sleep for `sleep_time`, unless `should_stop` fires, in which
            # case we leave the while loop and stop entirely
            with Timeout(sleep_time, exception=False):
                self.should_stop.wait()
                break
            self.handle_timer_tick()
            sleep_time = next(interval)

and the time no longer drifts:

starting services: service
2019-03-13T11:57:31.396596
2019-03-13T11:58:31.396487
2019-03-13T11:59:31.396466
2019-03-13T12:00:31.396515
2019-03-13T12:01:31.394675
2019-03-13T12:02:31.396431
2019-03-13T12:03:31.374350
2019-03-13T12:04:31.380671
2019-03-13T12:05:31.384550

Does that sound like a reasonable change? If so I’ll push a PR.

Thanks

Bob

Hi Bob,

Thanks for raising this. Your implementation seems better to me and a PR would be great.

One thing to consider – is the behaviour the same if you have over-running workers? I think that in your implementation one very badly over-running worker will affect multiple subsequent ones, and I’m not sure that’s true for the existing one.

I wouldn’t necessarily mind changing this but it’d be good to have a barrage of tests showing what the behaviour actually is.

Hi Matt:

I was wondering about that too. The existing implementation of handle_timer_tick() spawns a worker thread to do the heavy lifting, so if a worker overruns it’s timeslot it will just continue and a new worker will start on the next interval, so there will be multiple overlapping workers running.

I don’t think this change affects that because it is only providing a more accurate sleep interval before the next worker thread is spawned. I’ll take a look at the existing tests to see what is covered and add whatever is needed to ensure no breakage.

Thanks

Bob

Oh yes, so it does! In that case, your implementation is definitely preferable. A PR would be great :slightly_smiling_face:

Thanks Matt - I pushed the PR, let me know if you have any comments.

I may have another enhancement for timers, I need to run 1, 5 and 15 minutes timers, and I need them to fire “close” to the associated clock times. It’s not a big deal for the 1-minute timer, but as you get to 5 and 15, they could fall anywhere in the interval depending on when the process starts, and it would be nice to be able to have them (optionally) fire roughly on the clock time interval (00:05, 00:10, 00:15, 00:20 or 00:15, 00:30, 00:45). I’ll push a separate PR for that once this change is finished.

Thanks Bob. I’ve approved your PR.

Clock-time intervals sounds like a useful feature. Depending on the implementation though, I’d be tempted to outsource it to something like an external cron or scheduler. The timer is quite limited in the sense that it operates per service instance, rather than having any awareness of the cluster when there are multiple service instances. The usefulness of cron-like scheduling but only applied to a single instance may be quite small.

Thanks Matt. I was thinking of just using a modulo on the current time to do the interval implementation, so it would be a variation of “eager” in that it would (possibly) fire the first timer earlier than the interval would indicate, and from that point on its the same as the current implementation. I agree that anything more complicated should be implemented somewhere else. I could also make it a new class, e.g. SyncTimer(Timer), rather than modifying the existing Timer if that makes more sense.

@mattbennett - the changes in #579 cause the entrypoint to wait for the previous timer execution to finish before running the next execution, so a long-running entrypoint will delay subsequent timer executions, and could potentially “skip” an interval.

The previous implementation would support overlapping entrypoint execution, regardless of how long the entrypoint took to execute. I think the “wait” changes in 579 should be enabled with a flag to prevent breaking existing scenarios.

Enabling “wait” with a flag makes sense to me. The change in #579 makes the timer behave in the way it’s actually documented (after the worker has executed “sleep however long is left of our interval, taking off the time we took to run”).

#579 is in master but hasn’t been released yet, so we have a chance to improve it. The next release will be flagged as “backwards incompatible” for the timer whatever happens, because even starting to conform to the documented behaviour may be unexpected by existing users.

Re: clock-time intervals, your plan sounds good. Happy for it to be the same class as long as the interface is clear.

Thanks very much.

Hi folks,

Nice catch with the clock drift Bob!

Just adding my 2p. Feel free to ignore since my involvement is much decreased these days. Re clock-time intervals: this behaviour sounds a bit niche to me, and in general i would favour keeping the library of bundled extensions and their apis smaller rather than larger. Like other frameworks, we should make sure we have enough framework support to make shipping behaviours as third party libraries easy, and then encourage that.

Best,
David