flamenco

Author	SHA1	Message	Date
Sybren A. Stüvel	84f93e7502	Transition from ex-GORM structs to sqlc structs (2/5) Replace old used-to-be-GORM datastructures (#104305) with sqlc-generated structs. This also makes it possible to use more specific structs that are more taylored to the specific queries, increasing efficiency. This commit mostly deals with workers, including the sleep schedule and task scheduler. Functional changes are kept to a minimum, as the API still serves the same data. Because this work covers so much of Flamenco's code, it's been split up into different commits. Each commit brings Flamenco to a state where it compiles and unit tests pass. Only the result of the final commit has actually been tested properly. Ref: #104343	2024-12-04 14:00:13 +01:00
Sybren A. Stüvel	76a24243f0	Manager: Introduce event bus system Introduce an "event bus"-like system. It's more like a fan-out broadcaster for certain events. Instead of directly sending events to SocketIO, they are now sent to the broker, which in turn sends it to any registered "forwarder". Currently there is ony one forwarder, for SocketIO. This opens the door for a proper MQTT client that sends the same events to an MQTT server.	2024-02-03 22:55:23 +01:00
Sybren A. Stüvel	3e46322d14	Manager: reduce log level when last-rendered image was accepted Reduce the log level when a last-rendered image was accepted from a Worker.	2024-01-11 17:17:56 +01:00
Sybren A. Stüvel	3e72391cbf	Restartable workers When the worker is started with `-restart-exit-code 47` or has `restart_exit_code=47` in `flamenco-worker.yaml`, it's marked as 'restartable'. This will enable two worker actions 'Restart (immediately)' and 'Restart (after task is finished)' in the Manager web interface. When a worker is asked to restart, it will exit with exit code `47`. Of course any positive exit code can be used here.	2023-08-14 16:00:09 +02:00
Sybren A. Stüvel	02fac6a4df	Change Go package name from git.blender.org to projects.blender.org Change the package base name of the Go code, from `git.blender.org/flamenco` to `projects.blender.org/studio/flamenco`. The old location, `git.blender.org`, has no longer been use since the [migration to Gitea][1]. The new package names now reflect the actual location where Flamenco is hosted. [1]: https://code.blender.org/2023/02/new-blender-development-infrastructure/	2023-08-01 12:42:31 +02:00
Sybren A. Stüvel	77db55bb14	Manager: when worker signs off, only remember specific statuses Limit which worker statuses are remembered (when they go offline) to those that we want to restore when they come back online. This is now set to `awake` and `asleep`. This prevents workers from being told to go to states that they cannot handle, such as `error` or `starting`.	2023-06-23 11:38:37 +02:00
Eveline Anderson	4d2200bb0c	Fix #99549 : Remember Previous Status (#104217 ) Fix #99549: When sending Workers offline, remember their previous status When the status of a worker goes offline, the Manager will now make the status of the worker to be remembered once it goes back online. So when the Worker makes this status change (so for example `X → offline`), Manager should immediately set `StatusRequested = "X" ` once it goes online. Reviewed-on: https://projects.blender.org/studio/flamenco/pulls/104217	2023-06-02 22:50:07 +02:00
Sybren A. Stüvel	28cc7b7a3f	Manager: improve logging when workers register The info message that a worker registered now also includes its UUID. Any failure hashing the password will now also log the worker name + UUID.	2023-04-04 12:13:21 +02:00
Sybren A. Stüvel	e77bd9b841	Fix workers immediately switching state on a lazy request Fix an issue where workers would switch immediately on a state change request, even if it was of the "after task is finished" kind. The "may I keep running" endpoint wasn't checking the lazyness flag, and thus any state change, lazy or otherwise, would interrupt the worker's current task.	2022-10-20 12:30:37 +02:00
Sybren A. Stüvel	449c83b94a	Manager: broadcast worker update after assigning task The Manager now broadcasts a worker update to SocketIO clients when a worker gets a new task assigned. This ensures the "current task" shown in the worker details view is up to date.	2022-08-01 14:29:08 +02:00
Sybren A. Stüvel	b26374d480	Manager: when worker goes to sleep, log in task log which worker When a worker's tasks get requeued because it goes to sleep, the task log will now mention the worker identification (name + UUID). This aids in figuring out what happened to tasks.	2022-07-28 14:27:44 +02:00
Sybren A. Stüvel	de80a09223	Manager: include job UUID in "last-rendered image received" log entries This makes it possible to collect all "last-rendered image received" entries for a single job.	2022-07-19 18:40:22 +02:00
Sybren A. Stüvel	696b97c553	Re-queue tasks of worker after changing to non-'awake' state When a Worker changes state from `awake` to something else, it cannot run tasks any more. This now triggers a requeue of its active task (should be one at most, if things are sane) so that another worker can pick it up.	2022-07-19 15:38:36 +02:00
Sybren A. Stüvel	3baac0a2d8	Manager: reduce log level when worker asks task but has wrong status This can happen quite often and it's fine, so it's not worth a warning.	2022-07-18 19:26:49 +02:00
Sybren A. Stüvel	24f921b0c8	Manager: add more logging when worker cannot be marked as 'seen' SQLite often errors out on this with only `interrupted (9)` as message. This logging should at least tell us whether it's our own "background context" timing out, or whether something else fishy is going on.	2022-07-18 19:04:15 +02:00
Sybren A. Stüvel	bfd6746f78	Manager: consult the sleep schedule on worker sign-on If there is no status change queued for the Worker, the sleep schedule should determine its initial status.	2022-07-18 18:25:24 +02:00
Sybren A. Stüvel	bc725ea7dc	Manager: mark worker as 'seen' when calling the `WorkerState` operation Fix workers timing out when they're `asleep`. When sleeping, the Worker will call the `WorkerState` operation to see if they have to wake up, but that didn't mark the workers as "seen". As a result, a sleeping worker would always time out.	2022-07-18 17:56:56 +02:00
Sybren A. Stüvel	0697f71b62	Manager: run some operations in a background context Run some API operations in a background context. This should prevent some of the SQLite "interrupted" errors, as those can occur when the context closes while a query is running. The API operations that Workers use are now mostly running in a separate background context, at least from the moment onward when they can run independently of the Worker connection.	2022-07-18 16:26:06 +02:00
Sybren A. Stüvel	0e4ed1c54d	Manager: move worker password hasher into a struct + interface Move the Worker password hashing/comparison functions into a struct, and use it via an interface. This will make it easier to switch to different hashing algorithms. Even with a low number of iterations, BCrypt is quite slow. That's good for security, but not for Flamenco Worker authentication -- the password is more as "nice check to avoid accidentally reusing the same ID" than something for security.	2022-07-15 15:08:00 +02:00
Sybren A. Stüvel	d25151184d	Add a "Last Rendered" view Add a "Last Rendered" view to the webapp. The Manager now stores (in the database) which job was the last recipient of a rendered image, and serves that to the appropriate OpenAPI endpoint. A new SocketIO subscription + accompanying room makes it possible for the web interface to receive all rendered images (if they survive the queue, which discards images when it gets too full).	2022-07-01 12:34:40 +02:00
Sybren A. Stüvel	0fc5ba0bc6	Manager: broadcast last-rendered image info via SocketIO After processing an image in the "last-rendered" processor, a SocketIO object is sent to clients to indicate the last-rendered image needs to be (re)loaded. This also moves the previously existing "done callback" from a single function to a per-image callback, so that it can be called with the right information in there, and only when that particular image is actually done processing. The notification message sent via SocketIO also contains the necessary info to render the image, so that the web client doesn't have to call the `fetchJobLastRenderedInfo` operation.	2022-06-30 18:36:24 +02:00
Sybren A. Stüvel	e687c95e5d	Manager: add "last rendered image" processing pipeline Add a handler for the OpenAPI `taskOutputProduced` operation, and an image thumbnailing goroutine. The queue of images to process + the function to handle queued images is managed by `last_rendered.LastRenderedProcessor`. This queue currently simply allows 3 requests; this should be improved such that it keeps track of the job IDs as well, as with the current approach a spammy job can starve the updates from a more calm job.	2022-06-24 16:51:11 +02:00
Sybren A. Stüvel	1586c37b32	Manager: mark task as active as soon as it is assigned to a worker Move the task to 'active' status so that it won't be assigned to another worker. This also enables the task timeout monitoring.	2022-06-20 13:00:49 +02:00
Sybren A. Stüvel	b95bed1f96	Refactor: rename `RequeueTasksOfWorker` to `RequeueActiveTasksOfWorker` Soon there will be another function to requeue tasks of workers by other criteria, so being clear in the name helps. No functional changes.	2022-06-17 15:49:16 +02:00
Sybren A. Stüvel	6feee74c54	Cleanup: Manager, move worker task update handling code into its own file Move the code related to task updates from workers to `worker_task_updates.go`. It's going to get more complex with the blocklisting in there; this prepares for that. No functional changes.	2022-06-17 11:46:07 +02:00
Sybren A. Stüvel	9ab41984ac	Adjust Go code for Nickname -> Name change This fixes a bug where 'Worker undefined changed status' was logged in the web interface, as that was (back then incorrectly) `workerupdate.name`. Now that code is correct.	2022-06-16 11:03:18 +02:00
Sybren A. Stüvel	5f2712980e	Manager: task scheduler, check for requested worker status change first Before checking whether the Worker is allowed to do work (i.e. is in `awake` state), check any queued-up status changes. Those should be communicated, before saying "no work for you", so that the Worker can actually respond to it.	2022-06-16 10:48:38 +02:00
Sybren A. Stüvel	ee53373878	Cleanup: compare worker state to constant instead of hard-coded state Use the `requiredStatusToGetTask` constant to compare the worker status, and not just for logging. No functional changes, just better code.	2022-06-16 10:46:50 +02:00
Sybren A. Stüvel	be0b10400f	Manager: count workers as 'seen' even when there is no task Fix a bug where a worker would only be counted as 'seen' by the task scheduler if it actually got a task assigned.	2022-06-16 10:39:42 +02:00
Sybren A. Stüvel	6e12a2fb25	Manager: keep track of which worker failed which task When a Worker indicates a task failed, mark it as `soft-failed` until enough workers have tried & failed at the same task. This is the first step in a blocklisting system, where tasks of an often-failing worker will be requeued to be retried by others. NOTE: currently the failure list of a task is NOT reset whenever it is requeued! This will be implemented in a future commit, and is tracked in `FEATURES.md`.	2022-06-13 18:41:38 +02:00
Sybren A. Stüvel	ec5b3aac52	Manager: on getting task update from Worker, write log before status change When receiving a `TaskUpdate` from a Worker, write to the task log, before handling any task status change. If both log and task status change are sent, the log will likely contain the cause of the task state change. Any subsequent task logs, for example generated by the Manager in response to the status change, should be logged after that.	2022-06-13 18:40:42 +02:00
Sybren A. Stüvel	5dac3c2dc0	Manager: mark workers as 'seen' when they send updates Update the 'last seen at' timestamp of workers when they: - sign on - sign off - get a task assigned - send a task update - check whether they can keep running their task Note that this commit is necessary to not have the workers time out immediately ;-)	2022-06-13 12:47:07 +02:00
Sybren A. Stüvel	c3525c3b1a	Manager: move task requeueing to `TaskStateMachine` Requeueing the tasks of a specific worker is now done in the `TaskStateMachine`, such that it can be called from other services as well in future commits. This also makes the `LogStorage` service a dependency of the `TaskStateMachine`, as it needs to write "this task was requeued" kind of messages to the task logs.	2022-06-13 12:33:01 +02:00
Sybren A. Stüvel	24204084c1	Manager: move timestamping of log messages to `task_logs` package In the future different services will write to the task log, and thus it makes sense to move the responsibility of prepending the timestamps to the log storage service.	2022-06-09 17:00:38 +02:00
Sybren A. Stüvel	819cad1d18	Manager: move broadcasting of task logs via SocketIO to task log service To ensure all task logs also get broadcast via SocketIO, the responsibility has moved from the `api_impl` to the `task_logs` package.	2022-06-09 16:49:48 +02:00
Sybren A. Stüvel	354fd29f9e	Manager: Start timeout counting as soon as Worker gets task assigned Set the task's "last touched" field in the database to "now" as soon as the task is assigned to a worker.	2022-06-09 11:58:30 +02:00
Sybren A. Stüvel	87bce6be36	Manager: unify logging of task assignment and requeue-on-signoff The requeue-task-on-worker-signoff operation also needs to log a timestamp. The code for this, and the recently added code for timestamping the "task assigned to worker" message, are now unified.	2022-06-09 11:30:46 +02:00
Sybren A. Stüvel	75903a2da3	Manager: prepend timestamp to "task assigned to worker" task log entries Add a new `clock` service to the Flamenco struct, which allows us to mock the passing of time, and thus test for timestamps in a stable fashion.	2022-06-09 11:24:02 +02:00
Sybren A. Stüvel	b186ea1828	Manager: write to task log when assigning it to a worker	2022-06-09 10:59:44 +02:00
Sybren A. Stüvel	b4d2fc4231	Manager: keep track of when a Worker last worked on a task This will be used for keeping track of stuck tasks.	2022-06-03 16:33:50 +02:00
Sybren A. Stüvel	6cf82e5d43	Manager: cleanup, refactor Worker state change request persistence code Move the setting & clearing of worker state change requests into separate functions. No functional changes.	2022-06-02 16:36:06 +02:00
Sybren A. Stüvel	132ce8f2ec	Merge 'shutdown' and 'offline' states Move the 'shutdown' state code to the 'offline' state, to match the removal of the 'shutdown' state from the OpenAPI definition.	2022-06-02 16:35:07 +02:00
Sybren A. Stüvel	9ed6b6d931	Manager: adjust code for `WorkerStatusChangeRequest` extraction See preceeding OpenAPI change.	2022-06-02 12:17:54 +02:00
Sybren A. Stüvel	2e11c1c240	Manager: Implement SocketIO worker updates	2022-05-31 15:19:12 +02:00
Sybren A. Stüvel	90b567f97c	Manager: store software version on worker sign-on	2022-05-31 12:29:25 +02:00
Sybren A. Stüvel	19db947eb4	Manager: remove `Worker.LastActivity` This removes the field both from the OpenAPI interface and the database.	2022-05-31 10:46:27 +02:00
Sybren A. Stüvel	f77b11d85e	Manager: add a small wrapper around Google's UUID library Add a small wrapper around github.com/google/uuid. That way it's clearer which functionality is used by Flamenco, doesn't link most of the code to any specific UUID library, and allows a bit of customisation. The only customisation now is that Flamenco is a bit stricter in the formats it accepts; only the `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx` is accepted. This makes things a little bit stricter, with the advantage that we don't need to do any normalisation of received UUID strings.	2022-05-20 15:35:51 +02:00
Sybren A. Stüvel	792b4ab141	Manager: on worker signoff, add a note to any requeued task logs When a worker signs off, its tasks get requeued. This is now also saved in the task log, and broadcast via SocketIO as task log chunk.	2022-05-20 14:17:17 +02:00
Sybren A. Stüvel	3e5f681321	Task log broadcasting via SocketIO Implement task log broadcasting via SocketIO. The logs aren't shown in the web interface yet, but do arrive there in a Pinia store. That store is capped at 1000 lines to keep memory requirements low-ish.	2022-05-20 13:03:41 +02:00
Sybren A. Stüvel	3c622264a4	Manager: include 'activity' in SocketIO task updates This also changes the order in which the task is updated; the activity is now saved first, so that it can be included in the task status change notification sent to SocketIO clients.	2022-05-19 14:27:42 +02:00

1 2

77 Commits