4 Commits

Author SHA1 Message Date
Sybren A. Stüvel
658a3d7a85 Worker Timeout: subject all but offline/error workers to timeout checks
Workers that are in `starting`, `asleep`, or `testing` state should also
be subject to the timeout check, not just workers in `awake` state.
2022-07-18 11:30:39 +02:00
Sybren A. Stüvel
7d5aae25b5 Manager: add timeout checks for workers 2022-06-13 12:33:22 +02:00
Sybren A. Stüvel
09902d201c Manager: fix task timeout check logging of assigned workers
The task's worker wasn't fetched from the database, always causing
"unknown worker" messages in the task log.
2022-06-10 14:52:03 +02:00
Sybren A. Stüvel
d90a8b987d Manager: Task Timeout Checker
Tasks that are in state `active` but haven't been 'touched' by a Worker
for 10 minutes or longer will transition to state `failed`.

In the future, it might be better to move the decision about which state
is suitable to the Task State Machine service, so that it can be smarter
and take the history of the task into account. Going to `soft-failed`
first might be a nice touch.
2022-06-10 14:32:02 +02:00