247 Commits

Author SHA1 Message Date
Sybren A. Stüvel
06738b8aa4 Manager: disable SQLite foreign key constraints when migrating the database
There is an issue with the GORM auto-migration, in that it doesn't
always disable foreign key constraints when it should. Due to
limitations of SQLite, not all 'alter table' commands you'd want to use
are available. As a workaround, these steps are performed:

1. create a new table with the desired schema,
2. copy the data over,
3. drop the old table,
4. rename the new table to the old name.

Step #3 will wreak havoc with the database when foreign key constraint
checks are active, so no we temporarily deactivate them while performing
database migration.
2023-07-10 14:06:21 +02:00
Sybren A. Stüvel
60d54eabb3 Manager: avoid recreation of Worker table at startup
Mark the default value of `Worker.LazyStatusRequest` as `false`. The
previous default was configured as `0`, which was different enough to
always trigger a database migration of that column. However, since these
values do map to each other, the migration didn't do anything concrete,
and would be triggered again at the next startup.
2023-07-10 13:56:47 +02:00
Eveline Anderson
830c3fe794 Rename worker 'clusters' to 'tags'
As it was decided that the name "tags" would be better for the clarity
of the feature, all files and code named "cluster" or "worker cluster"
have been removed and replaced with "tag" and "worker tag". This is only
a name change, no other features were touched.

This addresses part of #104204.

Reviewed-on: https://projects.blender.org/studio/flamenco/pulls/104223

As a note to anyone who already ran a pre-release version of Flamenco
and configured some worker clusters, with the help of an SQLite client
you can migrate the clusters to tags. First build Flamenco Manager and
start it, to create the new database schema. Then run these SQL queries
via an sqlite commandline client:

```sql
insert into worker_tags
    (id, created_at, updated_at, uuid, name, description)
  select id, created_at, updated_at, uuid, name, description
  from worker_clusters;

insert into worker_tag_membership (worker_tag_id, worker_id)
  select worker_cluster_id, worker_id from worker_cluster_membership;
```
2023-07-10 11:11:03 +02:00
Sybren A. Stüvel
7a508c7e6b Manager: perform database integrity check at startup
Perform these two SQL calls & check their result:

- `PRAGMA integrity_check`
- `PRAGMA foreign_key_check`:

See  https: //www.sqlite.org/pragma.html for more info on these.

This also removes the unused `PeriodicMaintenanceLoop()` function.
Periodic checking while Flamenco Manager is running might be introduced
in a future commit, after the startup-time checks have been shown to not
get in the way.
2023-07-07 16:03:06 +02:00
Sybren A. Stüvel
7f588e6dbc Manager: close database connection on startup errors
When there is an error detected at startup, close the database connection.
Before, the connection could be kept open even when an error was returned,
causing the write-ahead log files to be kept around. These are now
properly integrated into the main database file before exiting.
2023-07-07 15:48:08 +02:00
Sybren A. Stüvel
988cdf61ff Upgrade GORM & SQLite
Upgrade:
- `gorm.io/gorm` v1.23.8 → 1.25.2
- `github.com/glebarez/go-sqlite` v1.17.3 → v1.8.0
- `github.com/glebarez/sqlite` v1.4.6 → v1.8.0

and also some indirect dependencies.

This is in the hope that some weird cases at Blender Studio get resolved.
It appears that sometimes, for some unknown reason, when deleting a job,
its tasks get reassigned to another job (instead of also getting deleted).

Since there is no code in Flamenco itself to do this task deletion (it's
all depending on SQLite following the foreign keys and cascading to tasks),
I hope it was a bug in either GORM or SQLite that got fixed at some point.
2023-07-06 16:08:57 +02:00
Sybren A. Stüvel
22f4aa09f3 Manager: expand job deletion unit test
Add extra job to the database before deleting one, to ensure that job
deletion doesn't do anything with other jobs (and their tasks).

No functional changes to Flamenco itself.
2023-07-06 16:08:57 +02:00
Sybren A. Stüvel
afde952c10 Fix incompatibility with 32-bit platforms 2023-05-24 21:23:05 +02:00
Anish Bharadwaj (he)
0502498dfa Fix #104201: Task Limit error in Flamenco Manager
Insert tasks in batches so that the required SQL query stays within the limits of SQLite.

No changes to the API, only to the persistence layer.

Reviewed-on: https://projects.blender.org/studio/flamenco/pulls/104205
2023-04-24 15:10:59 +02:00
Sybren A. Stüvel
6a89fa346c Manager: correctly count how many workers can run a job
Basically this accounts for the change in 3724a8874e4f22ef0740f464d9e912b19a1e061e
2023-04-04 15:19:21 +02:00
Sybren A. Stüvel
3724a8874e Slight change of worker cluster behaviour
Workers without cluster now only run jobs without cluster.
2023-04-04 13:17:45 +02:00
Sybren A. Stüvel
8408d28a6b Manager: add support for worker clusters 2023-04-04 12:18:35 +02:00
Sybren A. Stüvel
e2559b1181 Cleanup: remove doubly-declared default value in persistence layer
No functional changes.
2023-04-03 16:59:22 +02:00
Sybren A. Stüvel
426b2aab4d Gracefully handle sleep schedules of deleted workers
Workers can be soft-deleted, which means that they stay in the database.
As such, foreign key constraints `ON DELETE CASCADE` do not trigger, and
thus their sleep schedule can still be active. This is now detected and
handled gracefully.
2023-02-09 11:18:38 +01:00
Sybren A. Stüvel
fe0899fd55 shaman-checkout-id-setter: Don't update job's "updated at" timestamp
The Shaman Checkout ID setter shouldn't update a job's "updated at"
timestamp. Its goal is to fake that the job was submitted with a new
enough Flamenco version, and thus should not touch the timestamps.
2023-02-07 16:24:23 +01:00
Sybren A. Stüvel
01a85d86cb Add "Shaman Checkout ID setter" command
This is a command that can be run to retroactively set the Shaman
Checkout ID of jobs, allowing the job deletion to also remove the job's
Shaman checkout directory.

This is highly experimental, and not built by default or shipped with
Flamenco releases. It's only been used once at Blender Animation Studio
to help cleaning up. Run at your own risk. Make backups first.
2023-02-07 15:07:41 +01:00
Sybren A. Stüvel
aa1c6b8ff3 Close the database when Flamenco shuts down
This prevents SQLite journal files from lingering around.
2023-02-07 15:05:49 +01:00
Sybren A. Stüvel
ef3cab9745 Webapp: handle job deletions properly
- Add a little confirmation overlay before deleting a job. This overlay
  also shows information about whether the Shaman checkout directory
  will be deleted or not.
- Send job updates to the web frontend when jobs are marked for
  deletion, and when they are actually deleted.
- Respond to those updates, and handle some corner cases where job info
  is missing (because it just got deleted).

This closes T99401.
2023-02-03 16:59:15 +01:00
Sybren A. Stüvel
bf0906eb95 Manager: avoid logging an error when requesting a non-existent job
This is expected to happen every once in a while, especially now that
Flamenco supports job deletion. It's not something to log at error level.
2023-02-03 16:37:55 +01:00
Sybren A. Stüvel
791d877ff1 Manager: implement API endpoint for deleting jobs
Implement the `deleteJob` API endpoint. Calling this endpoint will mark
the job as "deletion requested", after which it's queued for actual
deletion. This makes the API response fast, even when there is a lot of
work to do in the background.

A new background service "job deleter" keeps track of the queue of such
jobs, and performs the actual deletion. It removes:

- Shaman checkout for the job (but see below)
- Manager-local files of the job (task logs, last-rendered images)
- The job itself

The removal is done in the above order, so the job is only removed from the
database if the rest of the removal was succesful.

Shaman checkouts are only removed if the job was submitted with Flamenco
version 3.2. Earlier versions did not record enough information to reliably
do this.
2023-01-04 01:18:21 +01:00
Sybren A. Stüvel
f413a40f4e Store Shaman checkout ID when submitting a job
If Shaman is used to submit the job files, store the job's checkout ID
(i.e. the path relative to the checkout root) in the database. This will
make it possible in the future to remove the Shaman checkout along with
the job itself.
2023-01-04 01:18:21 +01:00
Sybren A. Stüvel
15e3745820 Manager: SQLite WAL journal + NORMAL sync mode
Run `PRAGMA journal_mode = WAL` and `PRAGMA synchronous = normal` when
connecting to the SQLite database. This enables the write-ahead-log journal
mode, which makes it safe to enable "normal" synchronisation (instead of
the default "full" synchronisation).
2022-11-24 17:18:06 +01:00
Sybren A. Stüvel
85d53de1f9 Manager: implement API endpoint for changing job priority
The priority of an existing can now be changed. It will be taken into
account when assigning tasks to workers, but it will not reassign tasks
that are already active.
2022-09-30 16:30:03 +02:00
Sybren A. Stüvel
59655ea770 Manager: fix error in sleep scheduler when shutting down
When the Manager was shutting down while the sleep scheduler was running, it
could cause a null pointer dereference. This is now doubly solved:

- `worker.Identifier()` is now nil-safe, as in, `worker` can be `nil` and
  it will still return a sensible string.
- failure to apply the sleep schedule due to the context closing is not
  logged as error any more.
2022-09-27 12:27:18 +02:00
Sybren A. Stüvel
2a345a3d2c API for deleting workers
Workers can now be soft-deleted. Tasks assigned to the worker will remain
associated with that Worker. Active tasks will be re-queued so other
workers can pick them up.
2022-08-11 16:59:53 -07:00
Sybren A. Stüvel
1469345f3a Manager: sort blocklist by worker name 2022-08-01 18:54:28 +02:00
Sybren A. Stüvel
be1ddaa4eb Manager test: reduce timeout to practical value
The timeout was increased to aid debugging, but shouldn't have been
committed.
2022-07-29 09:59:54 +02:00
Sybren A. Stüvel
736ca103c3 Manager: show current/last task in worker details
The Task details component already linked to the Worker it was assigned
to last, and now the Worker links back to the task.

There's only one task shown in the Worker details. If the Worker is
actively working on a task, that one's shown. Otherwise it's the
last-updated task that was assigned to the worker.
2022-07-26 10:36:02 +02:00
Sybren A. Stüvel
ab8ecc24cc Cleanup: Add missing license specifiers
Add license specifiers to Go files that were missing them:

```
// SPDX-License-Identifier: GPL-3.0-or-later
```

No functional changes.
2022-07-25 16:08:07 +02:00
Sybren A. Stüvel
83467e4c60 Sleep schedule: store 'next check' timestamp in UTC
SQLite doesn't parse the timezone info, so timestamps should always be in
UTC.
2022-07-18 19:30:17 +02:00
Sybren A. Stüvel
658a3d7a85 Worker Timeout: subject all but offline/error workers to timeout checks
Workers that are in `starting`, `asleep`, or `testing` state should also
be subject to the timeout check, not just workers in `awake` state.
2022-07-18 11:30:39 +02:00
Sybren A. Stüvel
d7b164133a Sleep Scheduler implementation for the Manager
The Manager now has a sleep scheduler for Workers. The API and background
service work, but there is no web interface yet.

Manifest Task: T99397
2022-07-17 17:27:32 +02:00
Sybren A. Stüvel
627996525e Manager: implement operations for getting & setting worker sleep schedule
This is just the API, no web interface yet.

Manifest Task: T99397
2022-07-16 16:00:25 +02:00
Sybren A. Stüvel
859a261b05 Manager: on deletion of a worker, do not cascade to deletion of its tasks
Fix an issue where deleting a Worker would also delete the tasks it was
assigned to.
2022-07-15 17:00:25 +02:00
Sybren A. Stüvel
1fceae3604 Manager: more efficient database queries
Be more selective in what's saved to the database to speed some things up.
Most importantly, this avoids saving the entire job when a task status is
updated or a task is assigned.
2022-07-15 15:08:00 +02:00
Sybren A. Stüvel
1055aabee2 Manager: optimise db.SaveActivity() query
Use an explicit `Select()` GORM call to avoid saving related objects.
2022-07-15 15:08:00 +02:00
Sybren A. Stüvel
6e28271c93 Manager: prevent saving related job & worker when "touching" task 2022-07-15 15:08:00 +02:00
Sybren A. Stüvel
6b5f9317cb Manager: clear job's blocklist when requeueing the job
Requeueing a job means that the issues that caused workers to get blocked
might be resolved, so it should be run with a clean slate.
2022-07-14 11:03:11 +02:00
Sybren A. Stüvel
d25151184d Add a "Last Rendered" view
Add a "Last Rendered" view to the webapp.

The Manager now stores (in the database) which job was the last
recipient of a rendered image, and serves that to the appropriate
OpenAPI endpoint.

A new SocketIO subscription + accompanying room makes it possible for
the web interface to receive all rendered images (if they survive the
queue, which discards images when it gets too full).
2022-07-01 12:34:40 +02:00
Sybren A. Stüvel
64512c81ba Manager: implement OAPI operations to fetch blocklist & delete items 2022-06-27 11:32:35 +02:00
Sybren A. Stüvel
87f1959e26 Manager: use blocklist to actually block workers
Actually use the blocklist in the task scheduler to block workers from
doing blocked job types.
2022-06-21 17:59:20 +02:00
Sybren A. Stüvel
64c8fa851d Show assigned worker in task details
Show the worker assigned to the task in the task details view, as link
to the worker itself.
2022-06-17 16:36:55 +02:00
Sybren A. Stüvel
046853932d Manager: re-queue previously failed tasks of worker when blocklisting
When a Worker is blocked from a job, re-queue its previously failed tasks
so that other workers can give them a try.
2022-06-17 15:49:16 +02:00
Sybren A. Stüvel
fd31a85bcd Manager: add blocking of workers when they fail certain tasks too much
When a worker fails too many tasks, of the same task type, on the same job,
it'll get blocked from doing those.
2022-06-17 15:49:16 +02:00
Sybren A. Stüvel
81f81d0e0a Show task failure list in the web frontend
Show the task failure list in the web frontend's `TaskDetails` component.
2022-06-17 11:37:56 +02:00
Sybren A. Stüvel
0b5140fc5f Manager: clear task failure list on requeueing of jobs & tasks
When a job or task gets requeued from the web interface, its task
failure lists (i.e. the list of workers that previously failed this
task) will be cleared.

This clearing doesn't happen in other situations, e.g. when a worker
signs off and its task gets requeued, the task's failure list will
remain as-is.
2022-06-17 11:37:28 +02:00
Sybren A. Stüvel
e9fca8d993 Cleanup: typo fix in comment 2022-06-17 11:03:43 +02:00
Sybren A. Stüvel
8764f8f7c1 Manager: task scheduler, don't schedule tasks the worker failed before
When a worker asks for a task to perform, don't give it a task that it
failed before.
2022-06-16 16:02:28 +02:00
Sybren A. Stüvel
7d7c2b1bd6 Cleanup: blacklist → blocklist
Change "blacklist" to "blocklist", because that makes people happier.

No functional changes.
2022-06-16 10:36:36 +02:00
Sybren A. Stüvel
c5debdeb70 Manager: add 'task failure list' to record workers failing tasks
The persistence layer can now store which worker failed which task, as
preparation for a blocklisting system. Such a system should be able to
determine whether there are still any workers left to do the work.
2022-06-13 18:41:30 +02:00