flamenco

Author	SHA1	Message	Date
Sybren A. Stüvel	1586c37b32	Manager: mark task as active as soon as it is assigned to a worker Move the task to 'active' status so that it won't be assigned to another worker. This also enables the task timeout monitoring.	2022-06-20 13:00:49 +02:00
Sybren A. Stüvel	b95bed1f96	Refactor: rename `RequeueTasksOfWorker` to `RequeueActiveTasksOfWorker` Soon there will be another function to requeue tasks of workers by other criteria, so being clear in the name helps. No functional changes.	2022-06-17 15:49:16 +02:00
Sybren A. Stüvel	6feee74c54	Cleanup: Manager, move worker task update handling code into its own file Move the code related to task updates from workers to `worker_task_updates.go`. It's going to get more complex with the blocklisting in there; this prepares for that. No functional changes.	2022-06-17 11:46:07 +02:00
Sybren A. Stüvel	9ab41984ac	Adjust Go code for Nickname -> Name change This fixes a bug where 'Worker undefined changed status' was logged in the web interface, as that was (back then incorrectly) `workerupdate.name`. Now that code is correct.	2022-06-16 11:03:18 +02:00
Sybren A. Stüvel	5f2712980e	Manager: task scheduler, check for requested worker status change first Before checking whether the Worker is allowed to do work (i.e. is in `awake` state), check any queued-up status changes. Those should be communicated, before saying "no work for you", so that the Worker can actually respond to it.	2022-06-16 10:48:38 +02:00
Sybren A. Stüvel	ee53373878	Cleanup: compare worker state to constant instead of hard-coded state Use the `requiredStatusToGetTask` constant to compare the worker status, and not just for logging. No functional changes, just better code.	2022-06-16 10:46:50 +02:00
Sybren A. Stüvel	be0b10400f	Manager: count workers as 'seen' even when there is no task Fix a bug where a worker would only be counted as 'seen' by the task scheduler if it actually got a task assigned.	2022-06-16 10:39:42 +02:00
Sybren A. Stüvel	6e12a2fb25	Manager: keep track of which worker failed which task When a Worker indicates a task failed, mark it as `soft-failed` until enough workers have tried & failed at the same task. This is the first step in a blocklisting system, where tasks of an often-failing worker will be requeued to be retried by others. NOTE: currently the failure list of a task is NOT reset whenever it is requeued! This will be implemented in a future commit, and is tracked in `FEATURES.md`.	2022-06-13 18:41:38 +02:00
Sybren A. Stüvel	ec5b3aac52	Manager: on getting task update from Worker, write log before status change When receiving a `TaskUpdate` from a Worker, write to the task log, before handling any task status change. If both log and task status change are sent, the log will likely contain the cause of the task state change. Any subsequent task logs, for example generated by the Manager in response to the status change, should be logged after that.	2022-06-13 18:40:42 +02:00
Sybren A. Stüvel	5dac3c2dc0	Manager: mark workers as 'seen' when they send updates Update the 'last seen at' timestamp of workers when they: - sign on - sign off - get a task assigned - send a task update - check whether they can keep running their task Note that this commit is necessary to not have the workers time out immediately ;-)	2022-06-13 12:47:07 +02:00
Sybren A. Stüvel	c3525c3b1a	Manager: move task requeueing to `TaskStateMachine` Requeueing the tasks of a specific worker is now done in the `TaskStateMachine`, such that it can be called from other services as well in future commits. This also makes the `LogStorage` service a dependency of the `TaskStateMachine`, as it needs to write "this task was requeued" kind of messages to the task logs.	2022-06-13 12:33:01 +02:00
Sybren A. Stüvel	24204084c1	Manager: move timestamping of log messages to `task_logs` package In the future different services will write to the task log, and thus it makes sense to move the responsibility of prepending the timestamps to the log storage service.	2022-06-09 17:00:38 +02:00
Sybren A. Stüvel	819cad1d18	Manager: move broadcasting of task logs via SocketIO to task log service To ensure all task logs also get broadcast via SocketIO, the responsibility has moved from the `api_impl` to the `task_logs` package.	2022-06-09 16:49:48 +02:00
Sybren A. Stüvel	354fd29f9e	Manager: Start timeout counting as soon as Worker gets task assigned Set the task's "last touched" field in the database to "now" as soon as the task is assigned to a worker.	2022-06-09 11:58:30 +02:00
Sybren A. Stüvel	87bce6be36	Manager: unify logging of task assignment and requeue-on-signoff The requeue-task-on-worker-signoff operation also needs to log a timestamp. The code for this, and the recently added code for timestamping the "task assigned to worker" message, are now unified.	2022-06-09 11:30:46 +02:00
Sybren A. Stüvel	75903a2da3	Manager: prepend timestamp to "task assigned to worker" task log entries Add a new `clock` service to the Flamenco struct, which allows us to mock the passing of time, and thus test for timestamps in a stable fashion.	2022-06-09 11:24:02 +02:00
Sybren A. Stüvel	b186ea1828	Manager: write to task log when assigning it to a worker	2022-06-09 10:59:44 +02:00
Sybren A. Stüvel	b4d2fc4231	Manager: keep track of when a Worker last worked on a task This will be used for keeping track of stuck tasks.	2022-06-03 16:33:50 +02:00
Sybren A. Stüvel	6cf82e5d43	Manager: cleanup, refactor Worker state change request persistence code Move the setting & clearing of worker state change requests into separate functions. No functional changes.	2022-06-02 16:36:06 +02:00
Sybren A. Stüvel	132ce8f2ec	Merge 'shutdown' and 'offline' states Move the 'shutdown' state code to the 'offline' state, to match the removal of the 'shutdown' state from the OpenAPI definition.	2022-06-02 16:35:07 +02:00
Sybren A. Stüvel	9ed6b6d931	Manager: adjust code for `WorkerStatusChangeRequest` extraction See preceeding OpenAPI change.	2022-06-02 12:17:54 +02:00
Sybren A. Stüvel	2e11c1c240	Manager: Implement SocketIO worker updates	2022-05-31 15:19:12 +02:00
Sybren A. Stüvel	90b567f97c	Manager: store software version on worker sign-on	2022-05-31 12:29:25 +02:00
Sybren A. Stüvel	19db947eb4	Manager: remove `Worker.LastActivity` This removes the field both from the OpenAPI interface and the database.	2022-05-31 10:46:27 +02:00
Sybren A. Stüvel	f77b11d85e	Manager: add a small wrapper around Google's UUID library Add a small wrapper around github.com/google/uuid. That way it's clearer which functionality is used by Flamenco, doesn't link most of the code to any specific UUID library, and allows a bit of customisation. The only customisation now is that Flamenco is a bit stricter in the formats it accepts; only the `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx` is accepted. This makes things a little bit stricter, with the advantage that we don't need to do any normalisation of received UUID strings.	2022-05-20 15:35:51 +02:00
Sybren A. Stüvel	792b4ab141	Manager: on worker signoff, add a note to any requeued task logs When a worker signs off, its tasks get requeued. This is now also saved in the task log, and broadcast via SocketIO as task log chunk.	2022-05-20 14:17:17 +02:00
Sybren A. Stüvel	3e5f681321	Task log broadcasting via SocketIO Implement task log broadcasting via SocketIO. The logs aren't shown in the web interface yet, but do arrive there in a Pinia store. That store is capped at 1000 lines to keep memory requirements low-ish.	2022-05-20 13:03:41 +02:00
Sybren A. Stüvel	3c622264a4	Manager: include 'activity' in SocketIO task updates This also changes the order in which the task is updated; the activity is now saved first, so that it can be included in the task status change notification sent to SocketIO clients.	2022-05-19 14:27:42 +02:00
Sybren A. Stüvel	43f244ecab	Manager: move TaskUpdate API function from jobs.go to workers.go The OpenAPI spec tags this operation as `workers`, so it should be in `workers.go`. No functional changes.	2022-05-19 14:20:02 +02:00
Sybren A. Stüvel	0b39f229a1	Implement may-I-keep-running protocol Worker and Manager implementation of the "may-I-kee-running" protocol. While running tasks, the Worker will ask the Manager periodically whether it's still allowed to keep running that task. This allows the Manager to abort commands on Workers when: - the Worker should go to another state (typically 'asleep' or 'shutdown'), - the task changed status from 'active' to something non-runnable (typically 'canceled' when the job as a whole is canceled). - the task has been assigned to a different Worker. This can happen when a Worker loses its connection to its Manager, resulting in a task timeout (not yet implemented) after which the task can be assigned to another Worker. If then the connectivity is restored, the first Worker should abort (last-assigned Worker wins).	2022-05-12 15:06:05 +02:00
Sybren A. Stüvel	ba34652cd1	Implement task status changes from web interface This also reworks some of the logic due to the recently-removed `cancel-requested` task status.	2022-05-05 16:44:09 +02:00
Sybren A. Stüvel	90be370095	Manager: reduce password strength of Workers The password check of worker API calls was 2 orders of magnitude slower than actually handling the API call itself. Since the Worker authentication is not that important (it's all on the same network anyway, and Worker account registration is automatic too), lowering the BCrypt cost to the minimum helps. On my machine, this reduces the time for password checks from 50 to 2 ms.	2022-04-21 19:06:18 +02:00
Sybren A. Stüvel	65427ee38e	Manager: use `e.NoContent(http.StatusNoContent)` to return "no content" No functional changes, just the right call for the job.	2022-04-21 19:06:18 +02:00
Sybren A. Stüvel	930d7497d7	OAPI: Better 'SQLITE_BUSY' error handling SQLite can return `SQLITE_BUSY` errors when it's doing too many things at the same time. This is now improved a bit by setting a 5-second timeout, during which the SQLite driver will wait for the database to become available. If that doesn't happen, Flamenco Manager will return a `503 Service Unavailable` response so that the client knows to back off a little.	2022-04-08 12:02:30 +02:00
Sybren A. Stüvel	93616cef3a	Manager: reduce log level of "worker requesting task"	2022-03-17 10:53:00 +01:00
Sybren A. Stüvel	9f5e4cc0cc	License: license all code under "GPL-3.0-or-later" The add-on code was copy-pasted from other addons and used the GPL v2 license, whereas by accident the LICENSE text file had the GNU "Affero" GPL license v3 (instead of regular GPL v3). This is now all streamlined, and all code is licensed as "GPL v3 or later". Furthermore, the code comments just show a SPDX License Identifier instead of an entire license block.	2022-03-07 15:26:46 +01:00
Sybren A. Stüvel	47e36c927c	Change package URL to the blender.org repository	2022-03-01 20:45:09 +01:00
Sybren A. Stüvel	7689a988b1	Manager: re-queue tasks of worker when signing off	2022-02-28 12:06:50 +01:00
Sybren A. Stüvel	32af1ffaef	Manager: actually pass context to Gorm queries	2022-02-28 11:53:31 +01:00
Sybren A. Stüvel	d198e228b7	Manager: perform variable replacement on scheduled tasks	2022-02-21 19:58:13 +01:00
Sybren A. Stüvel	ef2bbd2845	Unified Command field names Some parts of Flamenco had a Command consist of "name + settings", and other parts used "type + parameters" (with the same semantics). This is now unified to "name + parameters".	2022-02-21 18:03:51 +01:00
Sybren A. Stüvel	270c54fdb7	More status change acks & checks to get stable flow between worker states	2022-02-15 17:46:37 +01:00
Sybren A. Stüvel	93517549b0	Manager: actually return worker state in /api/worker/state endpoint	2022-02-15 15:56:38 +01:00
Sybren A. Stüvel	50088b4c94	Save worker info on sign-on (not just on registration)	2022-02-15 10:57:29 +01:00
Sybren A. Stüvel	4aafb782ac	Scheduler: Assign task to worker	2022-02-14 17:47:26 +01:00
Sybren A. Stüvel	2ca8858c28	Only update status field in DB when worker changes status	2022-02-01 10:16:10 +01:00
Sybren A. Stüvel	be89349632	Very basic non-functional framework for a task runner Also has some login/logout functionality for storing stuff in the DB.	2022-01-31 16:05:27 +01:00
Sybren A. Stüvel	d3071146da	Better logging of worker info	2022-01-31 15:35:57 +01:00
Sybren A. Stüvel	d880f7e7f0	Worker authentication is working	2022-01-31 15:27:13 +01:00
Sybren A. Stüvel	7c14b2648d	Much more of the Worker life cycle implemented	2022-01-31 15:02:05 +01:00

1 2

55 Commits