175 Commits

Author SHA1 Message Date
Sybren A. Stüvel
3974770f36 Manager: refuse to delete workers without foreign key constraints
As a safety measure, refuse to delete Workers from the Manager's database
when foreign key constraints are disabled.

In the long term, the underlying problem should be solved. This is a stop-
gap measure to ensure database consistency.
2024-04-12 10:48:40 +02:00
Sybren A. Stüvel
6c28db780f Manager: refuse to delete worker tags without foreign key constraints
Before deleting a Worker Tag, check that foreign key constraints are
active for the current database connection.

Sometimes GORM decides to create a new database connection by itself,
without telling us, and then foreign key constraints are not active on
it. This commit is a workaround to avoid database corruption.
2024-04-12 10:48:40 +02:00
Sybren A. Stüvel
b313a2020d Refactor: Manager, move some test code into a function of its own
Move some of the Worker Tags test code into a function of its own, to have
a clearer separation between 'the test' and 'what needs to happen to do
this part of the test'.

Also it'll make an upcoming change easier to implement.

No functional changes.
2024-04-12 10:48:40 +02:00
Sybren A. Stüvel
b219f9b1c2 Manager tests: replace assert.NoError() with require.NoError()
Back in the days when I wrote the code, I didn't know about the
`require` package yet. Using `require.NoError()` makes the test code
more straight-forward.

No functional changes, except that when tests fail, they now fail
without panicking.
2024-03-16 12:14:39 +01:00
Sybren A. Stüvel
3f4a9025fe Manager tests: replace assert.NoError() with require.NoError()
Back in the days when I wrote the code, I didn't know about the
`require` package yet. Using `require.NoError()` makes the test code
more straight-forward.

No functional changes, except that when tests fail, they now fail
without panicking.
2024-03-16 11:09:18 +01:00
Sybren A. Stüvel
358efe7ae0 Manager: perform a database vacuum after migrations
Just to make sure the DB is properly cleaned up after a big migration
happened.
2024-03-06 11:59:17 +01:00
Sybren A. Stüvel
27cbb2ed0f Manager: increase timeout for database integrity check
With a fuller database, 2 seconds is apparently not always long enough,
so increase the timeout to 10 seconds.
2024-03-04 14:04:59 +01:00
Sybren A. Stüvel
a4e5eef83e Manager: fix database migration 0004
Fix the database migration that adds `NOT NULL` clauses. It used
`INSERT INTO temp_x SELECT * from x;`, and the `*` returns the fields in
the order they are defined on the table. Since this might be different from
the order that the `INSERT INTO temp_x` expects, strange problems can
happen where columns get swapped (or constraints can fail on columns that
they should not fail for, because they got fed data from a different
column).
2024-03-04 13:06:09 +01:00
Sybren A. Stüvel
7b72d0ca43 Refactor: move jobs-related queries to queries_jobs.sql
This makes it easier to later also create `query_workesr.sql`,
`query_meta.sql` etc. so that the sqlc-generated code can follow the
same subdivision as the persistence service code itself.

No functional changes.
2024-03-03 23:27:55 +01:00
Sybren A. Stüvel
b102b73a1f Refactor: convert more job functions to sqlc
No functional changes.
2024-03-03 23:23:51 +01:00
Sybren A. Stüvel
1ac796d0d8 Refactor: Manager: remove unused query from queries.sql
No functional changes.
2024-03-03 22:42:37 +01:00
Sybren A. Stüvel
3fbb3cde34 Manager: SQLC rename Uuid to UUID
No functional changes.
2024-03-03 20:54:43 +01:00
Sybren A. Stüvel
c046094880 Manager: start replacing GORM with SQLC
GORM has certain downsides:

- Code-first approach, where queries have to be translated to the Go code
  required to execute them.
- GORM comes with its own SQLite implementation, which doesn't provide an
  on-connect callback. This means that new connections cannot correctly
  enable foreign key constraints, causing database consistency issues.

[SQLC](https://sqlc.dev/) solves these issues for us.

This commit doesn't fully replace GORM with SQLC, but introduces it for
a few queries. Once all queries have been converted, GORM can be removed
completely.
2024-03-03 20:15:39 +01:00
Sybren A. Stüvel
7eb5eb68a3 Manager: ensure foreign keys are enabled in periodic integrity check
There are still issues with foreign keys getting disabled, so enable them
in the periodic database consistency check.

A more permanent solution is likely to drop GORM and switch to something
else that gives us an on-connect-callback, which can then be used to
turn on foreign key constraints for every connection made.
2024-03-01 23:42:04 +01:00
Sybren A. Stüvel
61cc8ff04d Manager: implement API operation to get the farm status
Add a new API operation to get the overall farm status. This is based on
the jobs and workers, and their status.

The statuses are:

- `active`: Actively working on jobs.
- `idle`: Farm could be active, but has no work to do.
- `waiting`: Work has been queued, but all workers are asleep.
- `asleep`: Farm is idle, and all workers are asleep.
- `inoperative`: Cannot work: no workers, or all are offline/error.
- `starting`: Farm is starting up.
- `unknown`: Unexpected configuration of worker and job statuses.
2024-02-29 20:42:28 +01:00
Sybren A. Stüvel
9afd79d8c0 Manager: prevent logging an error when fetching unknown worker
Prevent logging an error in the persistence layer when an unknown worker
is requested.

This reduces the noise & confusion when the web interface is showing the
details of a worker, but the worker gets removed by someone else. Or when
the Manager doesn't know about a Worker and it's trying to connect.

See #104282.
2024-01-25 12:38:13 +01:00
Sybren A. Stüvel
70faa4e225 Move URLs to the Flamenco website to constants in a dedicated package
Create a dedicated package `.../pkg/website` to contain constants for the
URLs of documentation, bug reporting, etc. That way it's easier to see
which parts of the website are being referred to from the Flamenco
binaries, and updates can happen in a central spot.

No functional changes.
2024-01-25 12:25:06 +01:00
Sybren A. Stüvel
b39f116b0e Manager: after deleting a job, perform a database consistency check
Deleting jobs from the database can still sometimes cause consistency
errors, as if foreign key constraints aren't enabled. This check is there
to try and get a grip on things.
2024-01-11 20:03:53 +01:00
Sybren A. Stüvel
6777e89589 Manager: refuse to delete job when foreign keys are disabled
Just as a safety measure, before deleting a job, check that foreign key
constraints are enabled. These are optional in SQLite, and the deletion
function assumes that they are on.
2024-01-11 17:17:56 +01:00
Sybren A. Stüvel
246916475f Manager: Implement mass mark-for-deletion of jobs
Implement the API function to mass-mark jobs for deletion, based on
their 'updated_at' timestamp.

Note that the `last_updated_max` parameter is rounded up to entire
seconds. This may mark more jobs for deletion than you expect, if their
`updated_at` timestamps differ by less than a second.
2023-12-16 23:05:52 +01:00
Sybren A. Stüvel
acc9499f2a Manager: drop the job_storage_infos database table
GORM Automigration created a separate `job_storage_infos` table (because
we used it wrong, to be fair), which is actually only used as an
embedded struct in the `jobs` table. This means this table itself can be
dropped.
2023-12-14 10:13:42 +01:00
Sybren A. Stüvel
a65f234bea Manager: replace GORM database migration with Goose
Replace GORM's auto-migration with Goose. The latter uses hand-written
SQL queries to apply database schema changes, which is safer and easier to
understand than what GORM is doing.
2023-12-14 10:13:40 +01:00
Sybren A. Stüvel
3e72391cbf Restartable workers
When the worker is started with `-restart-exit-code 47` or has
`restart_exit_code=47` in `flamenco-worker.yaml`, it's marked as
'restartable'. This will enable two worker actions 'Restart
(immediately)' and 'Restart (after task is finished)' in the Manager web
interface. When a worker is asked to restart, it will exit with exit
code `47`. Of course any positive exit code can be used here.
2023-08-14 16:00:09 +02:00
Sybren A. Stüvel
02fac6a4df Change Go package name from git.blender.org to projects.blender.org
Change the package base name of the Go code, from
`git.blender.org/flamenco` to `projects.blender.org/studio/flamenco`.

The old location, `git.blender.org`, has no longer been use since the
[migration to Gitea][1]. The new package names now reflect the actual
location where Flamenco is hosted.

[1]: https://code.blender.org/2023/02/new-blender-development-infrastructure/
2023-08-01 12:42:31 +02:00
Sybren A. Stüvel
63634361ce Manager: make periodic database integrity check configurable
Instead of always performing the periodic integrity check, make it possible
to disable it or run it at different intervals.

Currently for the Blender Studio it's crunch time, so the check should
really only run when there is someone looking at the system (i.e. at
restarts for upgrade purposes).
2023-07-18 16:33:01 +02:00
Sybren A. Stüvel
1a79c19583 Manager: improve logging of database consistency checks
The log messages now all start with `database: `.

No functional changes.
2023-07-18 16:12:26 +02:00
Sybren A. Stüvel
4121c899c3 Manager: perform database integrity check every hour
Perform a database integrity check every hour. This check was already
performed at startup, in the main goroutine.
2023-07-18 16:10:17 +02:00
Sybren A. Stüvel
b58f1e15f1 Add CLI utility to recreate tasks of jobs
Due to an issue (which has been fixed in the previous commit), all tasks
in the database were deleted when starting Flamenco. This tool attempts
to recompile the job and recreate its tasks.

The statuses of the tasks are set based on the job status. Basically:

- job active → tasks queued
- job completed → tasks completed
- job cancelled / failed → tasks cancelled
- otherwise → tasks queued

To ensure that the tool is only used to create tasks from scratch, it
refuses to work on a job that still has tasks in the database.
2023-07-10 14:10:15 +02:00
Sybren A. Stüvel
06738b8aa4 Manager: disable SQLite foreign key constraints when migrating the database
There is an issue with the GORM auto-migration, in that it doesn't
always disable foreign key constraints when it should. Due to
limitations of SQLite, not all 'alter table' commands you'd want to use
are available. As a workaround, these steps are performed:

1. create a new table with the desired schema,
2. copy the data over,
3. drop the old table,
4. rename the new table to the old name.

Step #3 will wreak havoc with the database when foreign key constraint
checks are active, so no we temporarily deactivate them while performing
database migration.
2023-07-10 14:06:21 +02:00
Sybren A. Stüvel
60d54eabb3 Manager: avoid recreation of Worker table at startup
Mark the default value of `Worker.LazyStatusRequest` as `false`. The
previous default was configured as `0`, which was different enough to
always trigger a database migration of that column. However, since these
values do map to each other, the migration didn't do anything concrete,
and would be triggered again at the next startup.
2023-07-10 13:56:47 +02:00
Eveline Anderson
830c3fe794 Rename worker 'clusters' to 'tags'
As it was decided that the name "tags" would be better for the clarity
of the feature, all files and code named "cluster" or "worker cluster"
have been removed and replaced with "tag" and "worker tag". This is only
a name change, no other features were touched.

This addresses part of #104204.

Reviewed-on: https://projects.blender.org/studio/flamenco/pulls/104223

As a note to anyone who already ran a pre-release version of Flamenco
and configured some worker clusters, with the help of an SQLite client
you can migrate the clusters to tags. First build Flamenco Manager and
start it, to create the new database schema. Then run these SQL queries
via an sqlite commandline client:

```sql
insert into worker_tags
    (id, created_at, updated_at, uuid, name, description)
  select id, created_at, updated_at, uuid, name, description
  from worker_clusters;

insert into worker_tag_membership (worker_tag_id, worker_id)
  select worker_cluster_id, worker_id from worker_cluster_membership;
```
2023-07-10 11:11:03 +02:00
Sybren A. Stüvel
7a508c7e6b Manager: perform database integrity check at startup
Perform these two SQL calls & check their result:

- `PRAGMA integrity_check`
- `PRAGMA foreign_key_check`:

See  https: //www.sqlite.org/pragma.html for more info on these.

This also removes the unused `PeriodicMaintenanceLoop()` function.
Periodic checking while Flamenco Manager is running might be introduced
in a future commit, after the startup-time checks have been shown to not
get in the way.
2023-07-07 16:03:06 +02:00
Sybren A. Stüvel
7f588e6dbc Manager: close database connection on startup errors
When there is an error detected at startup, close the database connection.
Before, the connection could be kept open even when an error was returned,
causing the write-ahead log files to be kept around. These are now
properly integrated into the main database file before exiting.
2023-07-07 15:48:08 +02:00
Sybren A. Stüvel
988cdf61ff Upgrade GORM & SQLite
Upgrade:
- `gorm.io/gorm` v1.23.8 → 1.25.2
- `github.com/glebarez/go-sqlite` v1.17.3 → v1.8.0
- `github.com/glebarez/sqlite` v1.4.6 → v1.8.0

and also some indirect dependencies.

This is in the hope that some weird cases at Blender Studio get resolved.
It appears that sometimes, for some unknown reason, when deleting a job,
its tasks get reassigned to another job (instead of also getting deleted).

Since there is no code in Flamenco itself to do this task deletion (it's
all depending on SQLite following the foreign keys and cascading to tasks),
I hope it was a bug in either GORM or SQLite that got fixed at some point.
2023-07-06 16:08:57 +02:00
Sybren A. Stüvel
22f4aa09f3 Manager: expand job deletion unit test
Add extra job to the database before deleting one, to ensure that job
deletion doesn't do anything with other jobs (and their tasks).

No functional changes to Flamenco itself.
2023-07-06 16:08:57 +02:00
Sybren A. Stüvel
afde952c10 Fix incompatibility with 32-bit platforms 2023-05-24 21:23:05 +02:00
Anish Bharadwaj (he)
0502498dfa Fix #104201: Task Limit error in Flamenco Manager
Insert tasks in batches so that the required SQL query stays within the limits of SQLite.

No changes to the API, only to the persistence layer.

Reviewed-on: https://projects.blender.org/studio/flamenco/pulls/104205
2023-04-24 15:10:59 +02:00
Sybren A. Stüvel
6a89fa346c Manager: correctly count how many workers can run a job
Basically this accounts for the change in 3724a8874e4f22ef0740f464d9e912b19a1e061e
2023-04-04 15:19:21 +02:00
Sybren A. Stüvel
3724a8874e Slight change of worker cluster behaviour
Workers without cluster now only run jobs without cluster.
2023-04-04 13:17:45 +02:00
Sybren A. Stüvel
8408d28a6b Manager: add support for worker clusters 2023-04-04 12:18:35 +02:00
Sybren A. Stüvel
e2559b1181 Cleanup: remove doubly-declared default value in persistence layer
No functional changes.
2023-04-03 16:59:22 +02:00
Sybren A. Stüvel
426b2aab4d Gracefully handle sleep schedules of deleted workers
Workers can be soft-deleted, which means that they stay in the database.
As such, foreign key constraints `ON DELETE CASCADE` do not trigger, and
thus their sleep schedule can still be active. This is now detected and
handled gracefully.
2023-02-09 11:18:38 +01:00
Sybren A. Stüvel
fe0899fd55 shaman-checkout-id-setter: Don't update job's "updated at" timestamp
The Shaman Checkout ID setter shouldn't update a job's "updated at"
timestamp. Its goal is to fake that the job was submitted with a new
enough Flamenco version, and thus should not touch the timestamps.
2023-02-07 16:24:23 +01:00
Sybren A. Stüvel
01a85d86cb Add "Shaman Checkout ID setter" command
This is a command that can be run to retroactively set the Shaman
Checkout ID of jobs, allowing the job deletion to also remove the job's
Shaman checkout directory.

This is highly experimental, and not built by default or shipped with
Flamenco releases. It's only been used once at Blender Animation Studio
to help cleaning up. Run at your own risk. Make backups first.
2023-02-07 15:07:41 +01:00
Sybren A. Stüvel
aa1c6b8ff3 Close the database when Flamenco shuts down
This prevents SQLite journal files from lingering around.
2023-02-07 15:05:49 +01:00
Sybren A. Stüvel
ef3cab9745 Webapp: handle job deletions properly
- Add a little confirmation overlay before deleting a job. This overlay
  also shows information about whether the Shaman checkout directory
  will be deleted or not.
- Send job updates to the web frontend when jobs are marked for
  deletion, and when they are actually deleted.
- Respond to those updates, and handle some corner cases where job info
  is missing (because it just got deleted).

This closes T99401.
2023-02-03 16:59:15 +01:00
Sybren A. Stüvel
bf0906eb95 Manager: avoid logging an error when requesting a non-existent job
This is expected to happen every once in a while, especially now that
Flamenco supports job deletion. It's not something to log at error level.
2023-02-03 16:37:55 +01:00
Sybren A. Stüvel
791d877ff1 Manager: implement API endpoint for deleting jobs
Implement the `deleteJob` API endpoint. Calling this endpoint will mark
the job as "deletion requested", after which it's queued for actual
deletion. This makes the API response fast, even when there is a lot of
work to do in the background.

A new background service "job deleter" keeps track of the queue of such
jobs, and performs the actual deletion. It removes:

- Shaman checkout for the job (but see below)
- Manager-local files of the job (task logs, last-rendered images)
- The job itself

The removal is done in the above order, so the job is only removed from the
database if the rest of the removal was succesful.

Shaman checkouts are only removed if the job was submitted with Flamenco
version 3.2. Earlier versions did not record enough information to reliably
do this.
2023-01-04 01:18:21 +01:00
Sybren A. Stüvel
f413a40f4e Store Shaman checkout ID when submitting a job
If Shaman is used to submit the job files, store the job's checkout ID
(i.e. the path relative to the checkout root) in the database. This will
make it possible in the future to remove the Shaman checkout along with
the job itself.
2023-01-04 01:18:21 +01:00
Sybren A. Stüvel
15e3745820 Manager: SQLite WAL journal + NORMAL sync mode
Run `PRAGMA journal_mode = WAL` and `PRAGMA synchronous = normal` when
connecting to the SQLite database. This enables the write-ahead-log journal
mode, which makes it safe to enable "normal" synchronisation (instead of
the default "full" synchronisation).
2022-11-24 17:18:06 +01:00