220 Commits

Author SHA1 Message Date
Sybren A. Stüvel
736ca103c3 Manager: show current/last task in worker details
The Task details component already linked to the Worker it was assigned
to last, and now the Worker links back to the task.

There's only one task shown in the Worker details. If the Worker is
actively working on a task, that one's shown. Otherwise it's the
last-updated task that was assigned to the worker.
2022-07-26 10:36:02 +02:00
Sybren A. Stüvel
ab8ecc24cc Cleanup: Add missing license specifiers
Add license specifiers to Go files that were missing them:

```
// SPDX-License-Identifier: GPL-3.0-or-later
```

No functional changes.
2022-07-25 16:08:07 +02:00
Sybren A. Stüvel
83467e4c60 Sleep schedule: store 'next check' timestamp in UTC
SQLite doesn't parse the timezone info, so timestamps should always be in
UTC.
2022-07-18 19:30:17 +02:00
Sybren A. Stüvel
658a3d7a85 Worker Timeout: subject all but offline/error workers to timeout checks
Workers that are in `starting`, `asleep`, or `testing` state should also
be subject to the timeout check, not just workers in `awake` state.
2022-07-18 11:30:39 +02:00
Sybren A. Stüvel
d7b164133a Sleep Scheduler implementation for the Manager
The Manager now has a sleep scheduler for Workers. The API and background
service work, but there is no web interface yet.

Manifest Task: T99397
2022-07-17 17:27:32 +02:00
Sybren A. Stüvel
627996525e Manager: implement operations for getting & setting worker sleep schedule
This is just the API, no web interface yet.

Manifest Task: T99397
2022-07-16 16:00:25 +02:00
Sybren A. Stüvel
859a261b05 Manager: on deletion of a worker, do not cascade to deletion of its tasks
Fix an issue where deleting a Worker would also delete the tasks it was
assigned to.
2022-07-15 17:00:25 +02:00
Sybren A. Stüvel
1fceae3604 Manager: more efficient database queries
Be more selective in what's saved to the database to speed some things up.
Most importantly, this avoids saving the entire job when a task status is
updated or a task is assigned.
2022-07-15 15:08:00 +02:00
Sybren A. Stüvel
1055aabee2 Manager: optimise db.SaveActivity() query
Use an explicit `Select()` GORM call to avoid saving related objects.
2022-07-15 15:08:00 +02:00
Sybren A. Stüvel
6e28271c93 Manager: prevent saving related job & worker when "touching" task 2022-07-15 15:08:00 +02:00
Sybren A. Stüvel
6b5f9317cb Manager: clear job's blocklist when requeueing the job
Requeueing a job means that the issues that caused workers to get blocked
might be resolved, so it should be run with a clean slate.
2022-07-14 11:03:11 +02:00
Sybren A. Stüvel
d25151184d Add a "Last Rendered" view
Add a "Last Rendered" view to the webapp.

The Manager now stores (in the database) which job was the last
recipient of a rendered image, and serves that to the appropriate
OpenAPI endpoint.

A new SocketIO subscription + accompanying room makes it possible for
the web interface to receive all rendered images (if they survive the
queue, which discards images when it gets too full).
2022-07-01 12:34:40 +02:00
Sybren A. Stüvel
64512c81ba Manager: implement OAPI operations to fetch blocklist & delete items 2022-06-27 11:32:35 +02:00
Sybren A. Stüvel
87f1959e26 Manager: use blocklist to actually block workers
Actually use the blocklist in the task scheduler to block workers from
doing blocked job types.
2022-06-21 17:59:20 +02:00
Sybren A. Stüvel
64c8fa851d Show assigned worker in task details
Show the worker assigned to the task in the task details view, as link
to the worker itself.
2022-06-17 16:36:55 +02:00
Sybren A. Stüvel
046853932d Manager: re-queue previously failed tasks of worker when blocklisting
When a Worker is blocked from a job, re-queue its previously failed tasks
so that other workers can give them a try.
2022-06-17 15:49:16 +02:00
Sybren A. Stüvel
fd31a85bcd Manager: add blocking of workers when they fail certain tasks too much
When a worker fails too many tasks, of the same task type, on the same job,
it'll get blocked from doing those.
2022-06-17 15:49:16 +02:00
Sybren A. Stüvel
81f81d0e0a Show task failure list in the web frontend
Show the task failure list in the web frontend's `TaskDetails` component.
2022-06-17 11:37:56 +02:00
Sybren A. Stüvel
0b5140fc5f Manager: clear task failure list on requeueing of jobs & tasks
When a job or task gets requeued from the web interface, its task
failure lists (i.e. the list of workers that previously failed this
task) will be cleared.

This clearing doesn't happen in other situations, e.g. when a worker
signs off and its task gets requeued, the task's failure list will
remain as-is.
2022-06-17 11:37:28 +02:00
Sybren A. Stüvel
e9fca8d993 Cleanup: typo fix in comment 2022-06-17 11:03:43 +02:00
Sybren A. Stüvel
8764f8f7c1 Manager: task scheduler, don't schedule tasks the worker failed before
When a worker asks for a task to perform, don't give it a task that it
failed before.
2022-06-16 16:02:28 +02:00
Sybren A. Stüvel
7d7c2b1bd6 Cleanup: blacklist → blocklist
Change "blacklist" to "blocklist", because that makes people happier.

No functional changes.
2022-06-16 10:36:36 +02:00
Sybren A. Stüvel
c5debdeb70 Manager: add 'task failure list' to record workers failing tasks
The persistence layer can now store which worker failed which task, as
preparation for a blocklisting system. Such a system should be able to
determine whether there are still any workers left to do the work.
2022-06-13 18:41:30 +02:00
Sybren A. Stüvel
e35911d106 Manager: add ability to delete jobs
This is needed for a future unit test, and exposed the fact that SQLite
didn't enforce foreign key constraints (and thus also didn't handle
on-delete-cascade attributes). This has been fixed in the previous commit.
2022-06-13 18:41:19 +02:00
Sybren A. Stüvel
e5d0e987e1 Manager: enforce DB foreign key checks at startup
SQLite disables foreign key checks by default, so Flamenco has to enable
them explicitly.
2022-06-13 18:41:19 +02:00
Sybren A. Stüvel
6ec493d944 Manager, more efficiently create tasks
When creating tasks the inter-task dependencies are saved as a 2nd pass,by
updating the tasks in the database. This now only saves those dependencies,
and no longer saves the entire task again.
2022-06-13 18:40:42 +02:00
Sybren A. Stüvel
02bc03ae2b Manager: replace gorm.Model with our own persistence.Model struct
`persistence.Model` contains the common database fields for most model
structs. It is a copy of `gorm.Model`, but without the `DeletedAt`
field (which triggers Gorm's soft deletion).

Soft deletion is not used by Flamenco. If it ever becomes necessary to
support soft-deletion, see https://gorm.io/docs/delete.html#Soft-Delete
2022-06-13 18:40:42 +02:00
Sybren A. Stüvel
6fc936d0a6 Revert accidental debug code
Revert change in rF01c45afc20854918d1f18e6859b4154499d500b6 that made
unit tests use an on-disk database.
2022-06-13 18:40:25 +02:00
Sybren A. Stüvel
5dac3c2dc0 Manager: mark workers as 'seen' when they send updates
Update the 'last seen at' timestamp of workers when they:
- sign on
- sign off
- get a task assigned
- send a task update
- check whether they can keep running their task

Note that this commit is necessary to not have the workers time out
immediately ;-)
2022-06-13 12:47:07 +02:00
Sybren A. Stüvel
7d5aae25b5 Manager: add timeout checks for workers 2022-06-13 12:33:22 +02:00
Sybren A. Stüvel
67562856d3 Manager: let Gorm create an index on Task.LastTouchedAt
It's used in timeout queries, and there could be tens or hundreds of
thousands of tasks in the database.
2022-06-13 12:33:05 +02:00
Sybren A. Stüvel
01c45afc20 Manager: explicitly store timestamps as UTC
SQLite doesn't handle timezones by default, when you just use something
like `date1 < date2`, for example. This makes GORM explicitly use UTC
timestamps for the `CreatedAt`, `UpdatedAt`, and `DeletedAt` fields.
Our own code should also use UTC when saving timestamps. That way all
datetimes in the database are in the same timezone, and can be compared
naievely.
2022-06-13 12:10:11 +02:00
Sybren A. Stüvel
09902d201c Manager: fix task timeout check logging of assigned workers
The task's worker wasn't fetched from the database, always causing
"unknown worker" messages in the task log.
2022-06-10 14:52:03 +02:00
Sybren A. Stüvel
d90a8b987d Manager: Task Timeout Checker
Tasks that are in state `active` but haven't been 'touched' by a Worker
for 10 minutes or longer will transition to state `failed`.

In the future, it might be better to move the decision about which state
is suitable to the Task State Machine service, so that it can be smarter
and take the history of the task into account. Going to `soft-failed`
first might be a nice touch.
2022-06-10 14:32:02 +02:00
Sybren A. Stüvel
295891a17a Manager: ensure Gorm-generated timestamps are in UTC
SQLite should store all timestamps in UTC, as the database is woefully
unaware of timezones and will compare lexicographically.
2022-06-10 14:31:53 +02:00
Sybren A. Stüvel
04dd479248 Manager: protect task log writing with mutex
A per-task mutex is used to protect the writing of task logs, so that
mutliple goroutines can safely write to the same task log.
2022-06-09 14:44:54 +02:00
Sybren A. Stüvel
b4d2fc4231 Manager: keep track of when a Worker last worked on a task
This will be used for keeping track of stuck tasks.
2022-06-03 16:33:50 +02:00
Sybren A. Stüvel
6cf82e5d43 Manager: cleanup, refactor Worker state change request persistence code
Move the setting & clearing of worker state change requests into separate
functions.

No functional changes.
2022-06-02 16:36:06 +02:00
Sybren A. Stüvel
f97f0a34c3 Manager: implement worker status change requests
Implement the OpenAPI `RequestWorkerStatusChange` operation, and handle
these changes in the web interface.
2022-05-31 17:22:03 +02:00
Sybren A. Stüvel
1496736f7a Manager: wrap Worker fetching errors
Do the same wrapping as for task/job errors, but then for workers.
2022-05-31 11:18:57 +02:00
Sybren A. Stüvel
19db947eb4 Manager: remove Worker.LastActivity
This removes the field both from the OpenAPI interface and the database.
2022-05-31 10:46:27 +02:00
Sybren A. Stüvel
08676f48f4 Manager: implement fetchWorkers OpenAPI operation 2022-05-30 18:52:02 +02:00
Sybren A. Stüvel
f77b11d85e Manager: add a small wrapper around Google's UUID library
Add a small wrapper around github.com/google/uuid. That way it's clearer
which functionality is used by Flamenco, doesn't link most of the code to
any specific UUID library, and allows a bit of customisation.

The only customisation now is that Flamenco is a bit stricter in the
formats it accepts; only the `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx` is
accepted. This makes things a little bit stricter, with the advantage
that we don't need to do any normalisation of received UUID strings.
2022-05-20 15:35:51 +02:00
Sybren A. Stüvel
7b664475ca Rename job status requeued to requeueing 2022-05-19 17:25:53 +02:00
Sybren A. Stüvel
530520b1c7 Implement mass updating of tasks when JobUpdate.refresh_tasks = true
Send & handle `JobUpdate.refresh_tasks = true` when many tasks are
updated simultaneously. This applies to things like cancelling &
requeueing an entire job.

This partially rolls back 67bf77de13d99b1bc5d7344951068822c4fadd88, as
it was too slow when 1000+ tasks were being updated all at once.
2022-05-17 14:48:50 +02:00
Sybren A. Stüvel
d35ca9d98f Manager: limit database connections
Limit the database connection pool to only a single connection. I hope that
this will solve the intermittent `SQLITE_BUSY` errors I've been seeing.
2022-05-12 13:58:15 +02:00
Sybren A. Stüvel
3d606a3fa0 Manager: task scheduler, fix handling of worker assignment of tasks
Improve how the task scheduler deals with tasks that already have a
worker assigned to them:

- When a Worker asks for a task, and there is already an active task
  assigned to it, always return that task.
- Otherwise, never allow scheduling of active tasks, as those are
  already being run by another worker. If this is not the case, their
  status should change to queued/failed, instead of handling the
  situation in the task scheduler.
- Apart from the assigned-and-active case above, ignore task's worker ID
  when scheduling tasks. If the status is 'queued' or 'soft-failed', the
  task's worker ID just indicates who ran the task last.
2022-05-12 13:52:16 +02:00
Sybren A. Stüvel
d3e2638f84 Cleanup: rename uri to dsn
"DSN" (Data Source Name) is used to indicate which database to open, and
was intermixed with "URI". This is now consistent.

No functional changes.
2022-05-12 11:08:54 +02:00
Sybren A. Stüvel
d673da7a0c Manager: check for stuck jobs at startup
Check for jobs in 'cancel-requested' or 'requeued' statuses, and ensure
they transition to the right status. This happens at startup, before
even starting the web interface, so that a consistent state is presented.
2022-05-06 16:07:27 +02:00
Sybren A. Stüvel
98da20f1a9 Manager: vacuum the database at startup 2022-05-06 14:35:34 +02:00