Due to an issue (which has been fixed in the previous commit), all tasks
in the database were deleted when starting Flamenco. This tool attempts
to recompile the job and recreate its tasks.
The statuses of the tasks are set based on the job status. Basically:
- job active → tasks queued
- job completed → tasks completed
- job cancelled / failed → tasks cancelled
- otherwise → tasks queued
To ensure that the tool is only used to create tasks from scratch, it
refuses to work on a job that still has tasks in the database.
There is an issue with the GORM auto-migration, in that it doesn't
always disable foreign key constraints when it should. Due to
limitations of SQLite, not all 'alter table' commands you'd want to use
are available. As a workaround, these steps are performed:
1. create a new table with the desired schema,
2. copy the data over,
3. drop the old table,
4. rename the new table to the old name.
Step #3 will wreak havoc with the database when foreign key constraint
checks are active, so no we temporarily deactivate them while performing
database migration.
Mark the default value of `Worker.LazyStatusRequest` as `false`. The
previous default was configured as `0`, which was different enough to
always trigger a database migration of that column. However, since these
values do map to each other, the migration didn't do anything concrete,
and would be triggered again at the next startup.
As it was decided that the name "tags" would be better for the clarity
of the feature, all files and code named "cluster" or "worker cluster"
have been removed and replaced with "tag" and "worker tag". This is only
a name change, no other features were touched.
This addresses part of #104204.
Reviewed-on: https://projects.blender.org/studio/flamenco/pulls/104223
As a note to anyone who already ran a pre-release version of Flamenco
and configured some worker clusters, with the help of an SQLite client
you can migrate the clusters to tags. First build Flamenco Manager and
start it, to create the new database schema. Then run these SQL queries
via an sqlite commandline client:
```sql
insert into worker_tags
(id, created_at, updated_at, uuid, name, description)
select id, created_at, updated_at, uuid, name, description
from worker_clusters;
insert into worker_tag_membership (worker_tag_id, worker_id)
select worker_cluster_id, worker_id from worker_cluster_membership;
```
Perform these two SQL calls & check their result:
- `PRAGMA integrity_check`
- `PRAGMA foreign_key_check`:
See https: //www.sqlite.org/pragma.html for more info on these.
This also removes the unused `PeriodicMaintenanceLoop()` function.
Periodic checking while Flamenco Manager is running might be introduced
in a future commit, after the startup-time checks have been shown to not
get in the way.
When there is an error detected at startup, close the database connection.
Before, the connection could be kept open even when an error was returned,
causing the write-ahead log files to be kept around. These are now
properly integrated into the main database file before exiting.
Upgrade:
- `gorm.io/gorm` v1.23.8 → 1.25.2
- `github.com/glebarez/go-sqlite` v1.17.3 → v1.8.0
- `github.com/glebarez/sqlite` v1.4.6 → v1.8.0
and also some indirect dependencies.
This is in the hope that some weird cases at Blender Studio get resolved.
It appears that sometimes, for some unknown reason, when deleting a job,
its tasks get reassigned to another job (instead of also getting deleted).
Since there is no code in Flamenco itself to do this task deletion (it's
all depending on SQLite following the foreign keys and cascading to tasks),
I hope it was a bug in either GORM or SQLite that got fixed at some point.
Add extra job to the database before deleting one, to ensure that job
deletion doesn't do anything with other jobs (and their tasks).
No functional changes to Flamenco itself.
Workers can be soft-deleted, which means that they stay in the database.
As such, foreign key constraints `ON DELETE CASCADE` do not trigger, and
thus their sleep schedule can still be active. This is now detected and
handled gracefully.
The Shaman Checkout ID setter shouldn't update a job's "updated at"
timestamp. Its goal is to fake that the job was submitted with a new
enough Flamenco version, and thus should not touch the timestamps.
This is a command that can be run to retroactively set the Shaman
Checkout ID of jobs, allowing the job deletion to also remove the job's
Shaman checkout directory.
This is highly experimental, and not built by default or shipped with
Flamenco releases. It's only been used once at Blender Animation Studio
to help cleaning up. Run at your own risk. Make backups first.
- Add a little confirmation overlay before deleting a job. This overlay
also shows information about whether the Shaman checkout directory
will be deleted or not.
- Send job updates to the web frontend when jobs are marked for
deletion, and when they are actually deleted.
- Respond to those updates, and handle some corner cases where job info
is missing (because it just got deleted).
This closes T99401.
Implement the `deleteJob` API endpoint. Calling this endpoint will mark
the job as "deletion requested", after which it's queued for actual
deletion. This makes the API response fast, even when there is a lot of
work to do in the background.
A new background service "job deleter" keeps track of the queue of such
jobs, and performs the actual deletion. It removes:
- Shaman checkout for the job (but see below)
- Manager-local files of the job (task logs, last-rendered images)
- The job itself
The removal is done in the above order, so the job is only removed from the
database if the rest of the removal was succesful.
Shaman checkouts are only removed if the job was submitted with Flamenco
version 3.2. Earlier versions did not record enough information to reliably
do this.
If Shaman is used to submit the job files, store the job's checkout ID
(i.e. the path relative to the checkout root) in the database. This will
make it possible in the future to remove the Shaman checkout along with
the job itself.
Run `PRAGMA journal_mode = WAL` and `PRAGMA synchronous = normal` when
connecting to the SQLite database. This enables the write-ahead-log journal
mode, which makes it safe to enable "normal" synchronisation (instead of
the default "full" synchronisation).
The priority of an existing can now be changed. It will be taken into
account when assigning tasks to workers, but it will not reassign tasks
that are already active.
When the Manager was shutting down while the sleep scheduler was running, it
could cause a null pointer dereference. This is now doubly solved:
- `worker.Identifier()` is now nil-safe, as in, `worker` can be `nil` and
it will still return a sensible string.
- failure to apply the sleep schedule due to the context closing is not
logged as error any more.
Workers can now be soft-deleted. Tasks assigned to the worker will remain
associated with that Worker. Active tasks will be re-queued so other
workers can pick them up.
The Task details component already linked to the Worker it was assigned
to last, and now the Worker links back to the task.
There's only one task shown in the Worker details. If the Worker is
actively working on a task, that one's shown. Otherwise it's the
last-updated task that was assigned to the worker.
Be more selective in what's saved to the database to speed some things up.
Most importantly, this avoids saving the entire job when a task status is
updated or a task is assigned.
Add a "Last Rendered" view to the webapp.
The Manager now stores (in the database) which job was the last
recipient of a rendered image, and serves that to the appropriate
OpenAPI endpoint.
A new SocketIO subscription + accompanying room makes it possible for
the web interface to receive all rendered images (if they survive the
queue, which discards images when it gets too full).
When a job or task gets requeued from the web interface, its task
failure lists (i.e. the list of workers that previously failed this
task) will be cleared.
This clearing doesn't happen in other situations, e.g. when a worker
signs off and its task gets requeued, the task's failure list will
remain as-is.