flamenco

Author	SHA1	Message	Date
Sybren A. Stüvel	9ab41984ac	Adjust Go code for Nickname -> Name change This fixes a bug where 'Worker undefined changed status' was logged in the web interface, as that was (back then incorrectly) `workerupdate.name`. Now that code is correct.	2022-06-16 11:03:18 +02:00
Sybren A. Stüvel	12f0a605a4	Manager: log configured worker timeout at startup	2022-06-16 10:51:17 +02:00
Sybren A. Stüvel	5f2712980e	Manager: task scheduler, check for requested worker status change first Before checking whether the Worker is allowed to do work (i.e. is in `awake` state), check any queued-up status changes. Those should be communicated, before saying "no work for you", so that the Worker can actually respond to it.	2022-06-16 10:48:38 +02:00
Sybren A. Stüvel	ee53373878	Cleanup: compare worker state to constant instead of hard-coded state Use the `requiredStatusToGetTask` constant to compare the worker status, and not just for logging. No functional changes, just better code.	2022-06-16 10:46:50 +02:00
Sybren A. Stüvel	40f711bf69	Fix two unit tests for the previous commit I pushed too soon :'(	2022-06-16 10:42:04 +02:00
Sybren A. Stüvel	be0b10400f	Manager: count workers as 'seen' even when there is no task Fix a bug where a worker would only be counted as 'seen' by the task scheduler if it actually got a task assigned.	2022-06-16 10:39:42 +02:00
Sybren A. Stüvel	7d7c2b1bd6	Cleanup: blacklist → blocklist Change "blacklist" to "blocklist", because that makes people happier. No functional changes.	2022-06-16 10:36:36 +02:00
Sybren A. Stüvel	6e12a2fb25	Manager: keep track of which worker failed which task When a Worker indicates a task failed, mark it as `soft-failed` until enough workers have tried & failed at the same task. This is the first step in a blocklisting system, where tasks of an often-failing worker will be requeued to be retried by others. NOTE: currently the failure list of a task is NOT reset whenever it is requeued! This will be implemented in a future commit, and is tracked in `FEATURES.md`.	2022-06-13 18:41:38 +02:00
Sybren A. Stüvel	c5debdeb70	Manager: add 'task failure list' to record workers failing tasks The persistence layer can now store which worker failed which task, as preparation for a blocklisting system. Such a system should be able to determine whether there are still any workers left to do the work.	2022-06-13 18:41:30 +02:00
Sybren A. Stüvel	e35911d106	Manager: add ability to delete jobs This is needed for a future unit test, and exposed the fact that SQLite didn't enforce foreign key constraints (and thus also didn't handle on-delete-cascade attributes). This has been fixed in the previous commit.	2022-06-13 18:41:19 +02:00
Sybren A. Stüvel	e5d0e987e1	Manager: enforce DB foreign key checks at startup SQLite disables foreign key checks by default, so Flamenco has to enable them explicitly.	2022-06-13 18:41:19 +02:00
Sybren A. Stüvel	6ec493d944	Manager, more efficiently create tasks When creating tasks the inter-task dependencies are saved as a 2nd pass,by updating the tasks in the database. This now only saves those dependencies, and no longer saves the entire task again.	2022-06-13 18:40:42 +02:00
Sybren A. Stüvel	02bc03ae2b	Manager: replace `gorm.Model` with our own `persistence.Model` struct `persistence.Model` contains the common database fields for most model structs. It is a copy of `gorm.Model`, but without the `DeletedAt` field (which triggers Gorm's soft deletion). Soft deletion is not used by Flamenco. If it ever becomes necessary to support soft-deletion, see https://gorm.io/docs/delete.html#Soft-Delete	2022-06-13 18:40:42 +02:00
Sybren A. Stüvel	ec5b3aac52	Manager: on getting task update from Worker, write log before status change When receiving a `TaskUpdate` from a Worker, write to the task log, before handling any task status change. If both log and task status change are sent, the log will likely contain the cause of the task state change. Any subsequent task logs, for example generated by the Manager in response to the status change, should be logged after that.	2022-06-13 18:40:42 +02:00
Sybren A. Stüvel	25d5b01b3c	Cleanup: test errors with `assert.NoError()` instead of `assert.Nil()` No functional changes, just nicer way to test.	2022-06-13 18:40:42 +02:00
Sybren A. Stüvel	6fc936d0a6	Revert accidental debug code Revert change in rF01c45afc20854918d1f18e6859b4154499d500b6 that made unit tests use an on-disk database.	2022-06-13 18:40:25 +02:00
Sybren A. Stüvel	b922722614	Manager: broadcast worker timeouts over SocketIO This way the web interface will also show timed-out workers.	2022-06-13 13:05:20 +02:00
Sybren A. Stüvel	75ca0e652e	Cleanup: timeout checker, improve readability of failed tests No functional changes	2022-06-13 12:50:27 +02:00
Sybren A. Stüvel	1de1e3a9a5	Manager: add 'canary' test to all timeout checker tests The canary test asserts that certain constants still have the expected value. Lowering those constants is good for testing the timeout stuff with the actual Flamenco Manager + Worker (without having to wait 5 minutes for it to kick in), but it's too easy to accidentally run the unit tests and get cryptic errors about everything failing horribly and miserably when you leave those constants low.	2022-06-13 12:50:02 +02:00
Sybren A. Stüvel	5dac3c2dc0	Manager: mark workers as 'seen' when they send updates Update the 'last seen at' timestamp of workers when they: - sign on - sign off - get a task assigned - send a task update - check whether they can keep running their task Note that this commit is necessary to not have the workers time out immediately ;-)	2022-06-13 12:47:07 +02:00
Sybren A. Stüvel	986b647967	Manager: re-queue tasks of timed-out workers Allow other workers to pick up the task(s) assigned to a timed-out worker.	2022-06-13 12:38:35 +02:00
Sybren A. Stüvel	7d5aae25b5	Manager: add timeout checks for workers	2022-06-13 12:33:22 +02:00
Sybren A. Stüvel	e8171fc597	Cleanup: Manager, reduce log level of task timeout checks	2022-06-13 12:33:16 +02:00
Sybren A. Stüvel	67562856d3	Manager: let Gorm create an index on `Task.LastTouchedAt` It's used in timeout queries, and there could be tens or hundreds of thousands of tasks in the database.	2022-06-13 12:33:05 +02:00
Sybren A. Stüvel	c3525c3b1a	Manager: move task requeueing to `TaskStateMachine` Requeueing the tasks of a specific worker is now done in the `TaskStateMachine`, such that it can be called from other services as well in future commits. This also makes the `LogStorage` service a dependency of the `TaskStateMachine`, as it needs to write "this task was requeued" kind of messages to the task logs.	2022-06-13 12:33:01 +02:00
Sybren A. Stüvel	e06bc484f4	Cleanup: manager, move task state machine interfaces to their own file No functional changes.	2022-06-13 12:32:18 +02:00
Sybren A. Stüvel	01c45afc20	Manager: explicitly store timestamps as UTC SQLite doesn't handle timezones by default, when you just use something like `date1 < date2`, for example. This makes GORM explicitly use UTC timestamps for the `CreatedAt`, `UpdatedAt`, and `DeletedAt` fields. Our own code should also use UTC when saving timestamps. That way all datetimes in the database are in the same timezone, and can be compared naievely.	2022-06-13 12:10:11 +02:00
Sybren A. Stüvel	fe1627dd85	Cleanup: timeout checker, move task-specific code to `tasks.go` Just a cleanup to prepare for the addition of worker timeouts.	2022-06-10 14:58:44 +02:00
Sybren A. Stüvel	13307c5a24	Manager: add canary test to timeout checker unit test The `TestTaskTimeout()` unit test assumes specific durations for initial & subsequent sleeps of the timeout checker. The test will fail quite cryptically when that assumption doesn't hold, so just test for it at the start of the unit test.	2022-06-10 14:53:23 +02:00
Sybren A. Stüvel	09902d201c	Manager: fix task timeout check logging of assigned workers The task's worker wasn't fetched from the database, always causing "unknown worker" messages in the task log.	2022-06-10 14:52:03 +02:00
Sybren A. Stüvel	d90a8b987d	Manager: Task Timeout Checker Tasks that are in state `active` but haven't been 'touched' by a Worker for 10 minutes or longer will transition to state `failed`. In the future, it might be better to move the decision about which state is suitable to the Task State Machine service, so that it can be smarter and take the history of the task into account. Going to `soft-failed` first might be a nice touch.	2022-06-10 14:32:02 +02:00
Sybren A. Stüvel	295891a17a	Manager: ensure Gorm-generated timestamps are in UTC SQLite should store all timestamps in UTC, as the database is woefully unaware of timezones and will compare lexicographically.	2022-06-10 14:31:53 +02:00
Sybren A. Stüvel	24204084c1	Manager: move timestamping of log messages to `task_logs` package In the future different services will write to the task log, and thus it makes sense to move the responsibility of prepending the timestamps to the log storage service.	2022-06-09 17:00:38 +02:00
Sybren A. Stüvel	819cad1d18	Manager: move broadcasting of task logs via SocketIO to task log service To ensure all task logs also get broadcast via SocketIO, the responsibility has moved from the `api_impl` to the `task_logs` package.	2022-06-09 16:49:48 +02:00
Sybren A. Stüvel	04dd479248	Manager: protect task log writing with mutex A per-task mutex is used to protect the writing of task logs, so that mutliple goroutines can safely write to the same task log.	2022-06-09 14:44:54 +02:00
Sybren A. Stüvel	92d6693871	Show Task's "last touched" in the web interface	2022-06-09 11:59:43 +02:00
Sybren A. Stüvel	354fd29f9e	Manager: Start timeout counting as soon as Worker gets task assigned Set the task's "last touched" field in the database to "now" as soon as the task is assigned to a worker.	2022-06-09 11:58:30 +02:00
Sybren A. Stüvel	87bce6be36	Manager: unify logging of task assignment and requeue-on-signoff The requeue-task-on-worker-signoff operation also needs to log a timestamp. The code for this, and the recently added code for timestamping the "task assigned to worker" message, are now unified.	2022-06-09 11:30:46 +02:00
Sybren A. Stüvel	75903a2da3	Manager: prepend timestamp to "task assigned to worker" task log entries Add a new `clock` service to the Flamenco struct, which allows us to mock the passing of time, and thus test for timestamps in a stable fashion.	2022-06-09 11:24:02 +02:00
Sybren A. Stüvel	b186ea1828	Manager: write to task log when assigning it to a worker	2022-06-09 10:59:44 +02:00
Sybren A. Stüvel	b4d2fc4231	Manager: keep track of when a Worker last worked on a task This will be used for keeping track of stuck tasks.	2022-06-03 16:33:50 +02:00
Sybren A. Stüvel	0be1ca30dd	Cleanup: manager, move api_impl interfaces to interfaces.go The number of interfaces declared by the `api_impl` package is getting large, so they deserve their own file. No functional changes.	2022-06-03 15:52:07 +02:00
Sybren A. Stüvel	8e7f1e2868	Manager: some extra unit tests for worker signoff behaviour	2022-06-02 16:37:29 +02:00
Sybren A. Stüvel	6cf82e5d43	Manager: cleanup, refactor Worker state change request persistence code Move the setting & clearing of worker state change requests into separate functions. No functional changes.	2022-06-02 16:36:06 +02:00
Sybren A. Stüvel	132ce8f2ec	Merge 'shutdown' and 'offline' states Move the 'shutdown' state code to the 'offline' state, to match the removal of the 'shutdown' state from the OpenAPI definition.	2022-06-02 16:35:07 +02:00
Sybren A. Stüvel	678308fb6d	Manager: allow cancelling worker state change requests A worker state change request can now be cancelled by requesting the worker to go to its current state. In other words, a previously requested change `A → B` can be cancelled by requesting the worker goes to state `A`. Previously this would simply overwrite the last request, resulting in a requested state change `A → A`. Having this non-lazy would even interrupt the currently running task.	2022-06-02 12:43:16 +02:00
Sybren A. Stüvel	9ed6b6d931	Manager: adjust code for `WorkerStatusChangeRequest` extraction See preceeding OpenAPI change.	2022-06-02 12:17:54 +02:00
Sybren A. Stüvel	ae6831ce6e	Manager: fix unit test rFcfb17b178da2055ef12b2aa2ad8f7f778a952bc3 changed the semantics of `SocketIOWorkerUpdate`, in the sense that any update that doesn't change the worker status can omit `previous_status`. This commit adjusts the unit test for this.	2022-06-02 12:13:25 +02:00
Sybren A. Stüvel	487a31624f	Cleanup: manager, make `workerDBtoAPI(w)` use `workerSummary(w)` This makes the `workerDBtoAPI(w)` and `workerSummary(w)` functions consistent, and makes the former use the latter.	2022-06-02 12:10:53 +02:00
Sybren A. Stüvel	f97f0a34c3	Manager: implement worker status change requests Implement the OpenAPI `RequestWorkerStatusChange` operation, and handle these changes in the web interface.	2022-05-31 17:22:03 +02:00

1 2 3 4 5 ...

381 Commits