From 0396919229bd0033c961d2f9e5a4bf818d6fdc23 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Sybren=20A=2E=20St=C3=BCvel?= Date: Fri, 17 Jun 2022 14:59:26 +0200 Subject: [PATCH] FEATURES: add new way in which jobs can get stuck --- FEATURES.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/FEATURES.md b/FEATURES.md index d2742390..dcbc6a8e 100644 --- a/FEATURES.md +++ b/FEATURES.md @@ -42,8 +42,9 @@ Note that list is **not** in any specific order. - [x] Web: Worker action buttons - [x] Implementation of lazy vs. forced status change requests - [x] Port the old 'fail-requested' task status handling code to the new Manager -- [x] At startup check & fix "stuck" jobs. - Example: jobs in statuses `cancel-requested`, `requeueing`, etc. +- [ ] At startup check & fix "stuck" jobs. + - [x] Jobs in transitional statuses `cancel-requested`, `requeueing`, etc. + - [ ] Jobs with impossible to execute tasks. For example, consider the scenario where all but one worker were blocklisted for a certain task type, and the last worker that could run it, failed it. Now if that failure was that Worker's first one, it wouldn't get blocklisted and still counts as "can execute this task type on this job". However, since it failed the task, it won't be allowed to retry it, and thus the task will get stuck in `soft-failed` status. - [x] Task timeout monitoring - [ ] Worker blocklisting & failed task requeueing - [x] Keep track of which worker failed which task.