164 Commits

Author SHA1 Message Date
Sybren A. Stüvel
2c932ebad5 Show Worker's "last seen" timestamp in web interface & API responses 2022-07-04 12:49:56 +02:00
Sybren A. Stüvel
d25151184d Add a "Last Rendered" view
Add a "Last Rendered" view to the webapp.

The Manager now stores (in the database) which job was the last
recipient of a rendered image, and serves that to the appropriate
OpenAPI endpoint.

A new SocketIO subscription + accompanying room makes it possible for
the web interface to receive all rendered images (if they survive the
queue, which discards images when it gets too full).
2022-07-01 12:34:40 +02:00
Sybren A. Stüvel
2457a63518 Manager: Show "nothing rendered yet" image in job details
Show a "nothing rendered yet" image in the job details when there is no
last-rendered image yet.
2022-06-30 19:20:19 +02:00
Sybren A. Stüvel
0fc5ba0bc6 Manager: broadcast last-rendered image info via SocketIO
After processing an image in the "last-rendered" processor, a SocketIO
object is sent to clients to indicate the last-rendered image needs to
be (re)loaded.

This also moves the previously existing "done callback" from a single
function to a per-image callback, so that it can be called with the
right information in there, and only when that particular image is
actually done processing.

The notification message sent via SocketIO also contains the necessary
info to render the image, so that the web client doesn't have to call
the `fetchJobLastRenderedInfo` operation.
2022-06-30 18:36:24 +02:00
Sybren A. Stüvel
6efd67b05c Manager: implement FetchJobLastRenderedInfo() API operation
Allow querying for the URL & available versions of a job's last-rendered
image.
2022-06-28 17:08:00 +02:00
Sybren A. Stüvel
64512c81ba Manager: implement OAPI operations to fetch blocklist & delete items 2022-06-27 11:32:35 +02:00
Sybren A. Stüvel
e687c95e5d Manager: add "last rendered image" processing pipeline
Add a handler for the OpenAPI `taskOutputProduced` operation, and an
image thumbnailing goroutine.

The queue of images to process + the function to handle queued images
is managed by `last_rendered.LastRenderedProcessor`. This queue currently
simply allows 3 requests; this should be improved such that it keeps
track of the job IDs as well, as with the current approach a spammy job
can starve the updates from a more calm job.
2022-06-24 16:51:11 +02:00
Sybren A. Stüvel
b53cd67eb4 Cleanup: rename assertResponseEmpty()assertResponseNoContent()
The function tests the HTTP response is `204 No Content`, and now the
name reflects that better.

No functional changes.
2022-06-24 16:09:46 +02:00
Sybren A. Stüvel
2d05e1c773 Fix unit test for recent scheduler change
Fix unit test for rF1586c37b.
2022-06-20 16:05:36 +02:00
Sybren A. Stüvel
1586c37b32 Manager: mark task as active as soon as it is assigned to a worker
Move the task to 'active' status so that it won't be assigned to another
worker. This also enables the task timeout monitoring.
2022-06-20 13:00:49 +02:00
Sybren A. Stüvel
a2b667c043 Manager: log blocklist threshold 2022-06-17 17:15:23 +02:00
Sybren A. Stüvel
13bdb0ed73 Manager: remove outdated TODO 2022-06-17 17:15:13 +02:00
Sybren A. Stüvel
a368230afa Manager: fix race condition in logging of worker name/UUID
Instead of updating the logger in the context, just store a new logger
in a new sub-context.
2022-06-17 17:13:32 +02:00
Sybren A. Stüvel
cdb7789f08 Refactor: Manager, move test code
Move code that covers `worker_task_updates.go` into
`worker_task_updates_test.go`.

No functional changes.
2022-06-17 15:51:15 +02:00
Sybren A. Stüvel
046853932d Manager: re-queue previously failed tasks of worker when blocklisting
When a Worker is blocked from a job, re-queue its previously failed tasks
so that other workers can give them a try.
2022-06-17 15:49:16 +02:00
Sybren A. Stüvel
b95bed1f96 Refactor: rename RequeueTasksOfWorker to RequeueActiveTasksOfWorker
Soon there will be another function to requeue tasks of workers by other
criteria, so being clear in the name helps.

No functional changes.
2022-06-17 15:49:16 +02:00
Sybren A. Stüvel
fd31a85bcd Manager: add blocking of workers when they fail certain tasks too much
When a worker fails too many tasks, of the same task type, on the same job,
it'll get blocked from doing those.
2022-06-17 15:49:16 +02:00
Sybren A. Stüvel
56abc825a6 Refactor: Manager, refactor handling of task failures
Split the handling of soft and hard failures into separate functions.

No functional changes intended.
2022-06-17 15:01:52 +02:00
Sybren A. Stüvel
6feee74c54 Cleanup: Manager, move worker task update handling code into its own file
Move the code related to task updates from workers to
`worker_task_updates.go`. It's going to get more complex with the
blocklisting in there; this prepares for that.

No functional changes.
2022-06-17 11:46:07 +02:00
Sybren A. Stüvel
81f81d0e0a Show task failure list in the web frontend
Show the task failure list in the web frontend's `TaskDetails` component.
2022-06-17 11:37:56 +02:00
Sybren A. Stüvel
0b5140fc5f Manager: clear task failure list on requeueing of jobs & tasks
When a job or task gets requeued from the web interface, its task
failure lists (i.e. the list of workers that previously failed this
task) will be cleared.

This clearing doesn't happen in other situations, e.g. when a worker
signs off and its task gets requeued, the task's failure list will
remain as-is.
2022-06-17 11:37:28 +02:00
Sybren A. Stüvel
9ab41984ac Adjust Go code for Nickname -> Name change
This fixes a bug where 'Worker undefined changed status' was logged in
the web interface, as that was (back then incorrectly) `workerupdate.name`.
Now that code is correct.
2022-06-16 11:03:18 +02:00
Sybren A. Stüvel
5f2712980e Manager: task scheduler, check for requested worker status change first
Before checking whether the Worker is allowed to do work (i.e. is in
`awake` state), check any queued-up status changes. Those should be
communicated, before saying "no work for you", so that the Worker can
actually respond to it.
2022-06-16 10:48:38 +02:00
Sybren A. Stüvel
ee53373878 Cleanup: compare worker state to constant instead of hard-coded state
Use the `requiredStatusToGetTask` constant to compare the worker status,
and not just for logging.

No functional changes, just better code.
2022-06-16 10:46:50 +02:00
Sybren A. Stüvel
40f711bf69 Fix two unit tests for the previous commit
I pushed too soon :'(
2022-06-16 10:42:04 +02:00
Sybren A. Stüvel
be0b10400f Manager: count workers as 'seen' even when there is no task
Fix a bug where a worker would only be counted as 'seen' by the task
scheduler if it actually got a task assigned.
2022-06-16 10:39:42 +02:00
Sybren A. Stüvel
6e12a2fb25 Manager: keep track of which worker failed which task
When a Worker indicates a task failed, mark it as `soft-failed` until
enough workers have tried & failed at the same task.

This is the first step in a blocklisting system, where tasks of an
often-failing worker will be requeued to be retried by others.

NOTE: currently the failure list of a task is NOT reset whenever it is
requeued! This will be implemented in a future commit, and is tracked in
`FEATURES.md`.
2022-06-13 18:41:38 +02:00
Sybren A. Stüvel
02bc03ae2b Manager: replace gorm.Model with our own persistence.Model struct
`persistence.Model` contains the common database fields for most model
structs. It is a copy of `gorm.Model`, but without the `DeletedAt`
field (which triggers Gorm's soft deletion).

Soft deletion is not used by Flamenco. If it ever becomes necessary to
support soft-deletion, see https://gorm.io/docs/delete.html#Soft-Delete
2022-06-13 18:40:42 +02:00
Sybren A. Stüvel
ec5b3aac52 Manager: on getting task update from Worker, write log before status change
When receiving a `TaskUpdate` from a Worker, write to the task log, before
handling any task status change.

If both log and task status change are sent, the log will likely contain
the cause of the task state change. Any subsequent task logs, for example
generated by the Manager in response to the status change, should be
logged after that.
2022-06-13 18:40:42 +02:00
Sybren A. Stüvel
5dac3c2dc0 Manager: mark workers as 'seen' when they send updates
Update the 'last seen at' timestamp of workers when they:
- sign on
- sign off
- get a task assigned
- send a task update
- check whether they can keep running their task

Note that this commit is necessary to not have the workers time out
immediately ;-)
2022-06-13 12:47:07 +02:00
Sybren A. Stüvel
c3525c3b1a Manager: move task requeueing to TaskStateMachine
Requeueing the tasks of a specific worker is now done in the
`TaskStateMachine`, such that it can be called from other services as
well in future commits.

This also makes the `LogStorage` service a dependency of the
`TaskStateMachine`, as it needs to write "this task was requeued" kind
of messages to the task logs.
2022-06-13 12:33:01 +02:00
Sybren A. Stüvel
24204084c1 Manager: move timestamping of log messages to task_logs package
In the future different services will write to the task log, and thus
it makes sense to move the responsibility of prepending the timestamps
to the log storage service.
2022-06-09 17:00:38 +02:00
Sybren A. Stüvel
819cad1d18 Manager: move broadcasting of task logs via SocketIO to task log service
To ensure all task logs also get broadcast via SocketIO, the responsibility
has moved from the `api_impl` to the `task_logs` package.
2022-06-09 16:49:48 +02:00
Sybren A. Stüvel
92d6693871 Show Task's "last touched" in the web interface 2022-06-09 11:59:43 +02:00
Sybren A. Stüvel
354fd29f9e Manager: Start timeout counting as soon as Worker gets task assigned
Set the task's "last touched" field in the database to "now" as soon as
the task is assigned to a worker.
2022-06-09 11:58:30 +02:00
Sybren A. Stüvel
87bce6be36 Manager: unify logging of task assignment and requeue-on-signoff
The requeue-task-on-worker-signoff operation also needs to log a timestamp.
The code for this, and the recently added code for timestamping the
"task assigned to worker" message, are now unified.
2022-06-09 11:30:46 +02:00
Sybren A. Stüvel
75903a2da3 Manager: prepend timestamp to "task assigned to worker" task log entries
Add a new `clock` service to the Flamenco struct, which allows us to mock
the passing of time, and thus test for timestamps in a stable fashion.
2022-06-09 11:24:02 +02:00
Sybren A. Stüvel
b186ea1828 Manager: write to task log when assigning it to a worker 2022-06-09 10:59:44 +02:00
Sybren A. Stüvel
b4d2fc4231 Manager: keep track of when a Worker last worked on a task
This will be used for keeping track of stuck tasks.
2022-06-03 16:33:50 +02:00
Sybren A. Stüvel
0be1ca30dd Cleanup: manager, move api_impl interfaces to interfaces.go
The number of interfaces declared by the `api_impl` package is getting
large, so they deserve their own file.

No functional changes.
2022-06-03 15:52:07 +02:00
Sybren A. Stüvel
8e7f1e2868 Manager: some extra unit tests for worker signoff behaviour 2022-06-02 16:37:29 +02:00
Sybren A. Stüvel
6cf82e5d43 Manager: cleanup, refactor Worker state change request persistence code
Move the setting & clearing of worker state change requests into separate
functions.

No functional changes.
2022-06-02 16:36:06 +02:00
Sybren A. Stüvel
132ce8f2ec Merge 'shutdown' and 'offline' states
Move the 'shutdown' state code to the 'offline' state, to match the
removal of the 'shutdown' state from the OpenAPI definition.
2022-06-02 16:35:07 +02:00
Sybren A. Stüvel
678308fb6d Manager: allow cancelling worker state change requests
A worker state change request can now be cancelled by requesting the worker
to go to its current state. In other words, a previously requested change
`A → B` can be cancelled by requesting the worker goes to state `A`.

Previously this would simply overwrite the last request, resulting in a
requested state change `A → A`. Having this non-lazy would even interrupt
the currently running task.
2022-06-02 12:43:16 +02:00
Sybren A. Stüvel
9ed6b6d931 Manager: adjust code for WorkerStatusChangeRequest extraction
See preceeding OpenAPI change.
2022-06-02 12:17:54 +02:00
Sybren A. Stüvel
ae6831ce6e Manager: fix unit test
rFcfb17b178da2055ef12b2aa2ad8f7f778a952bc3 changed the semantics of
`SocketIOWorkerUpdate`, in the sense that any update that doesn't change
the worker status can omit `previous_status`. This commit adjusts the
unit test for this.
2022-06-02 12:13:25 +02:00
Sybren A. Stüvel
487a31624f Cleanup: manager, make workerDBtoAPI(w) use workerSummary(w)
This makes the `workerDBtoAPI(w)` and `workerSummary(w)` functions
consistent, and makes the former use the latter.
2022-06-02 12:10:53 +02:00
Sybren A. Stüvel
f97f0a34c3 Manager: implement worker status change requests
Implement the OpenAPI `RequestWorkerStatusChange` operation, and handle
these changes in the web interface.
2022-05-31 17:22:03 +02:00
Sybren A. Stüvel
dd3f99ebaa Manager: Fix unit test 2022-05-31 16:12:28 +02:00
Sybren A. Stüvel
f6dff086ef Manager: show worker version in the workers table 2022-05-31 15:47:26 +02:00