flamenco

Author	SHA1	Message	Date
Sybren A. Stüvel	64512c81ba	Manager: implement OAPI operations to fetch blocklist & delete items	2022-06-27 11:32:35 +02:00
Sybren A. Stüvel	87f1959e26	Manager: use blocklist to actually block workers Actually use the blocklist in the task scheduler to block workers from doing blocked job types.	2022-06-21 17:59:20 +02:00
Sybren A. Stüvel	64c8fa851d	Show assigned worker in task details Show the worker assigned to the task in the task details view, as link to the worker itself.	2022-06-17 16:36:55 +02:00
Sybren A. Stüvel	046853932d	Manager: re-queue previously failed tasks of worker when blocklisting When a Worker is blocked from a job, re-queue its previously failed tasks so that other workers can give them a try.	2022-06-17 15:49:16 +02:00
Sybren A. Stüvel	fd31a85bcd	Manager: add blocking of workers when they fail certain tasks too much When a worker fails too many tasks, of the same task type, on the same job, it'll get blocked from doing those.	2022-06-17 15:49:16 +02:00
Sybren A. Stüvel	81f81d0e0a	Show task failure list in the web frontend Show the task failure list in the web frontend's `TaskDetails` component.	2022-06-17 11:37:56 +02:00
Sybren A. Stüvel	0b5140fc5f	Manager: clear task failure list on requeueing of jobs & tasks When a job or task gets requeued from the web interface, its task failure lists (i.e. the list of workers that previously failed this task) will be cleared. This clearing doesn't happen in other situations, e.g. when a worker signs off and its task gets requeued, the task's failure list will remain as-is.	2022-06-17 11:37:28 +02:00
Sybren A. Stüvel	e9fca8d993	Cleanup: typo fix in comment	2022-06-17 11:03:43 +02:00
Sybren A. Stüvel	8764f8f7c1	Manager: task scheduler, don't schedule tasks the worker failed before When a worker asks for a task to perform, don't give it a task that it failed before.	2022-06-16 16:02:28 +02:00
Sybren A. Stüvel	7d7c2b1bd6	Cleanup: blacklist → blocklist Change "blacklist" to "blocklist", because that makes people happier. No functional changes.	2022-06-16 10:36:36 +02:00
Sybren A. Stüvel	c5debdeb70	Manager: add 'task failure list' to record workers failing tasks The persistence layer can now store which worker failed which task, as preparation for a blocklisting system. Such a system should be able to determine whether there are still any workers left to do the work.	2022-06-13 18:41:30 +02:00
Sybren A. Stüvel	e35911d106	Manager: add ability to delete jobs This is needed for a future unit test, and exposed the fact that SQLite didn't enforce foreign key constraints (and thus also didn't handle on-delete-cascade attributes). This has been fixed in the previous commit.	2022-06-13 18:41:19 +02:00
Sybren A. Stüvel	e5d0e987e1	Manager: enforce DB foreign key checks at startup SQLite disables foreign key checks by default, so Flamenco has to enable them explicitly.	2022-06-13 18:41:19 +02:00
Sybren A. Stüvel	6ec493d944	Manager, more efficiently create tasks When creating tasks the inter-task dependencies are saved as a 2nd pass,by updating the tasks in the database. This now only saves those dependencies, and no longer saves the entire task again.	2022-06-13 18:40:42 +02:00
Sybren A. Stüvel	02bc03ae2b	Manager: replace `gorm.Model` with our own `persistence.Model` struct `persistence.Model` contains the common database fields for most model structs. It is a copy of `gorm.Model`, but without the `DeletedAt` field (which triggers Gorm's soft deletion). Soft deletion is not used by Flamenco. If it ever becomes necessary to support soft-deletion, see https://gorm.io/docs/delete.html#Soft-Delete	2022-06-13 18:40:42 +02:00
Sybren A. Stüvel	6fc936d0a6	Revert accidental debug code Revert change in rF01c45afc20854918d1f18e6859b4154499d500b6 that made unit tests use an on-disk database.	2022-06-13 18:40:25 +02:00
Sybren A. Stüvel	5dac3c2dc0	Manager: mark workers as 'seen' when they send updates Update the 'last seen at' timestamp of workers when they: - sign on - sign off - get a task assigned - send a task update - check whether they can keep running their task Note that this commit is necessary to not have the workers time out immediately ;-)	2022-06-13 12:47:07 +02:00
Sybren A. Stüvel	7d5aae25b5	Manager: add timeout checks for workers	2022-06-13 12:33:22 +02:00
Sybren A. Stüvel	67562856d3	Manager: let Gorm create an index on `Task.LastTouchedAt` It's used in timeout queries, and there could be tens or hundreds of thousands of tasks in the database.	2022-06-13 12:33:05 +02:00
Sybren A. Stüvel	01c45afc20	Manager: explicitly store timestamps as UTC SQLite doesn't handle timezones by default, when you just use something like `date1 < date2`, for example. This makes GORM explicitly use UTC timestamps for the `CreatedAt`, `UpdatedAt`, and `DeletedAt` fields. Our own code should also use UTC when saving timestamps. That way all datetimes in the database are in the same timezone, and can be compared naievely.	2022-06-13 12:10:11 +02:00
Sybren A. Stüvel	09902d201c	Manager: fix task timeout check logging of assigned workers The task's worker wasn't fetched from the database, always causing "unknown worker" messages in the task log.	2022-06-10 14:52:03 +02:00
Sybren A. Stüvel	d90a8b987d	Manager: Task Timeout Checker Tasks that are in state `active` but haven't been 'touched' by a Worker for 10 minutes or longer will transition to state `failed`. In the future, it might be better to move the decision about which state is suitable to the Task State Machine service, so that it can be smarter and take the history of the task into account. Going to `soft-failed` first might be a nice touch.	2022-06-10 14:32:02 +02:00
Sybren A. Stüvel	295891a17a	Manager: ensure Gorm-generated timestamps are in UTC SQLite should store all timestamps in UTC, as the database is woefully unaware of timezones and will compare lexicographically.	2022-06-10 14:31:53 +02:00
Sybren A. Stüvel	04dd479248	Manager: protect task log writing with mutex A per-task mutex is used to protect the writing of task logs, so that mutliple goroutines can safely write to the same task log.	2022-06-09 14:44:54 +02:00
Sybren A. Stüvel	b4d2fc4231	Manager: keep track of when a Worker last worked on a task This will be used for keeping track of stuck tasks.	2022-06-03 16:33:50 +02:00
Sybren A. Stüvel	6cf82e5d43	Manager: cleanup, refactor Worker state change request persistence code Move the setting & clearing of worker state change requests into separate functions. No functional changes.	2022-06-02 16:36:06 +02:00
Sybren A. Stüvel	f97f0a34c3	Manager: implement worker status change requests Implement the OpenAPI `RequestWorkerStatusChange` operation, and handle these changes in the web interface.	2022-05-31 17:22:03 +02:00
Sybren A. Stüvel	1496736f7a	Manager: wrap Worker fetching errors Do the same wrapping as for task/job errors, but then for workers.	2022-05-31 11:18:57 +02:00
Sybren A. Stüvel	19db947eb4	Manager: remove `Worker.LastActivity` This removes the field both from the OpenAPI interface and the database.	2022-05-31 10:46:27 +02:00
Sybren A. Stüvel	08676f48f4	Manager: implement `fetchWorkers` OpenAPI operation	2022-05-30 18:52:02 +02:00
Sybren A. Stüvel	f77b11d85e	Manager: add a small wrapper around Google's UUID library Add a small wrapper around github.com/google/uuid. That way it's clearer which functionality is used by Flamenco, doesn't link most of the code to any specific UUID library, and allows a bit of customisation. The only customisation now is that Flamenco is a bit stricter in the formats it accepts; only the `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx` is accepted. This makes things a little bit stricter, with the advantage that we don't need to do any normalisation of received UUID strings.	2022-05-20 15:35:51 +02:00
Sybren A. Stüvel	7b664475ca	Rename job status `requeued` to `requeueing`	2022-05-19 17:25:53 +02:00
Sybren A. Stüvel	530520b1c7	Implement mass updating of tasks when `JobUpdate.refresh_tasks = true` Send & handle `JobUpdate.refresh_tasks = true` when many tasks are updated simultaneously. This applies to things like cancelling & requeueing an entire job. This partially rolls back 67bf77de13d99b1bc5d7344951068822c4fadd88, as it was too slow when 1000+ tasks were being updated all at once.	2022-05-17 14:48:50 +02:00
Sybren A. Stüvel	d35ca9d98f	Manager: limit database connections Limit the database connection pool to only a single connection. I hope that this will solve the intermittent `SQLITE_BUSY` errors I've been seeing.	2022-05-12 13:58:15 +02:00
Sybren A. Stüvel	3d606a3fa0	Manager: task scheduler, fix handling of worker assignment of tasks Improve how the task scheduler deals with tasks that already have a worker assigned to them: - When a Worker asks for a task, and there is already an active task assigned to it, always return that task. - Otherwise, never allow scheduling of active tasks, as those are already being run by another worker. If this is not the case, their status should change to queued/failed, instead of handling the situation in the task scheduler. - Apart from the assigned-and-active case above, ignore task's worker ID when scheduling tasks. If the status is 'queued' or 'soft-failed', the task's worker ID just indicates who ran the task last.	2022-05-12 13:52:16 +02:00
Sybren A. Stüvel	d3e2638f84	Cleanup: rename `uri` to `dsn` "DSN" (Data Source Name) is used to indicate which database to open, and was intermixed with "URI". This is now consistent. No functional changes.	2022-05-12 11:08:54 +02:00
Sybren A. Stüvel	d673da7a0c	Manager: check for stuck jobs at startup Check for jobs in 'cancel-requested' or 'requeued' statuses, and ensure they transition to the right status. This happens at startup, before even starting the web interface, so that a consistent state is presented.	2022-05-06 16:07:27 +02:00
Sybren A. Stüvel	98da20f1a9	Manager: vacuum the database at startup	2022-05-06 14:35:34 +02:00
Sybren A. Stüvel	ba34652cd1	Implement task status changes from web interface This also reworks some of the logic due to the recently-removed `cancel-requested` task status.	2022-05-05 16:44:09 +02:00
Sybren A. Stüvel	67bf77de13	Manager: rework mass updates to task statuses When the job status changes, it impacts the task statuses as well. These status changes are now no longer done with a single database query, but instead each affected task is fetched, changed, and saved. This unifies the regular & mass updates to the tasks, and causes the resulting task changes to be broadcast to SocketIO clients.	2022-05-03 16:13:44 +02:00
Sybren A. Stüvel	b3e1d1c6de	Cleanup: manager, typo fix	2022-05-03 13:05:30 +02:00
Sybren A. Stüvel	bb68488c5e	Cleanup: Manager, add bit of documentation	2022-05-03 10:39:44 +02:00
Sybren A. Stüvel	629c073ed7	Manager: fix query for job tasks	2022-04-29 12:26:53 +02:00
Sybren A. Stüvel	992fc38604	OAPI: add endpoint for fetching the tasks of a job Add `fetchJobTasks` operation to the Jobs API. This returns a summary of each of the job's tasks, suitable for display in a task list view. The actually used fields may need tweaking once we actually have a task list view, but at least the functionality is there.	2022-04-22 12:52:57 +02:00
Sybren A. Stüvel	d79fde17f3	Manager: keep track of the reason of job status changes To prepare for job status changes being requestable from the API, store the reason for any status change on the job itself. Not yet part of the API, just on the persistence layer.	2022-04-21 12:32:07 +02:00
Sybren A. Stüvel	c3b694ab2a	Manager: wrap job/task errors in persistence layer Avoid users of the persistence layer to have to test against Gorm errors, by wrapping job/task errors in a new `PersistenceError` struct. Instead of testing for `gorm.ErrRecordNotFound`, code can now test for `persistence.ErrJobNotFound` or `persistence.ErrTaskNotFound`.	2022-04-21 11:54:59 +02:00
Sybren A. Stüvel	1960b668aa	Cleanup: remove unused code	2022-04-08 14:47:07 +02:00
Sybren A. Stüvel	930d7497d7	OAPI: Better 'SQLITE_BUSY' error handling SQLite can return `SQLITE_BUSY` errors when it's doing too many things at the same time. This is now improved a bit by setting a 5-second timeout, during which the SQLite driver will wait for the database to become available. If that doesn't happen, Flamenco Manager will return a `503 Service Unavailable` response so that the client knows to back off a little.	2022-04-08 12:02:30 +02:00
Sybren A. Stüvel	781f1d936a	OAPI: add jobs query endpoint	2022-04-04 18:53:19 +02:00
Sybren A. Stüvel	8d52a03648	Manager: fix bug in task scheduler The task scheduler was handing out tasks for which any dependency (instead of all dependencies) were completed.	2022-03-17 13:07:20 +01:00

1 2 3

108 Commits