248 Commits

Author SHA1 Message Date
Sybren A. Stüvel
0c4240ec3a Manager: make some db fields boolean instead of smallint
Turn `workers.lazy_status_request` and `workers.can_restart` into a
`boolean`. They were `smallint` before.

Having these explicitly modeled as `boolean` will make sqlc generate the
right type for them.

No functional changes.
2024-05-28 18:15:21 +02:00
Sybren A. Stüvel
ee31316d9d Manager: more gracefully log context cancellation errors in database layer
The context passed to the database layer will auto-close when the HTTP
client disconnects. This will cancel any running query, which is the
expected behaviour. Now this no longer results in an error being logged
in the database layer. Instead, a message is logged at debug level.

The API layer is also adjusted to silence logging of `context.Canceled`
for certain operations, most notably getting all jobs and getting all
tasks of a job. These calls occur when the webapp reconnects after a
restart of the Manager. That may trigger a refresh of the page, which
immediately aborts any pending API calls. This is normal and should not
cause errors to be logged.
2024-05-28 17:27:27 +02:00
Sybren A. Stüvel
ee4e41329a Manager: properly set task.JobUUID and task.WorkerUUID when using GORM
Add a GORM hook that sets `task.JobUUID` and `.WorkerUUID`. These were
only set by the sqlc code; this change ensures that they are now always
set, so that the caller doesn't have to worry about which function is
already ported to sqlc and which one is still GORM.
2024-05-28 16:34:09 +02:00
Sybren A. Stüvel
7fd8eca8d9 Manager: more gracefull handle SQLite "interrupted (9)" error
Wrap the SQLite error "interrupted (9)". That error is (as far as I
could figure out) caused by the context being closed. Unfortunately
there is no wrapping of the underlying context error, so it's not
possible to determine whether it was due to a 'deadline exceeded' error
or another cancellation cause (like upstream HTTP connection closing).

Primarily this makes a rather unreliable unit test properly reliable.
The code under test could return either `context.DeadlineExceeded` or
the "interrupted (9)" error (GORM + SQLite doesn't reliably chose one or
the other), and now this is cleanly tested for.
2024-05-28 16:07:23 +02:00
Sybren A. Stüvel
5ec479a983 Manager: remove testing.T parameter from some test setup functions
Replace the use of the `t *testing.T` parameter with just plain `panic()`
when test setup fails. This makes it easier to call the same functions
from other situations, like benchmark functions.

No functional changes to Flamenco itself.
2024-05-28 16:07:22 +02:00
Sybren A. Stüvel
88f90ef0a5 Manager: properly close database at end of test
Instead of closing the sqlite database connection, tell GORM to close the
connection. Only that properly closes the DB, so that testing with a file
on disk doesn't fail when trying to delete that file.

No functional changes to the Manager itself.
2024-05-28 16:07:22 +02:00
Sybren A. Stüvel
98cbe6a67d Manager: lightly polish job deletion
Tweak the logging a little bit so it's less noisy, properly warns when the
Shaman checkout dir cannot be removed, and optimise the database query
a bit (by just fetching the one field that's needed, instead of the entire
job).

Deletion still works the same.
2024-05-28 16:07:22 +02:00
Sybren A. Stüvel
79076be91b Manager: Convert task failure persistence to SQLC
No functional changes.
2024-05-28 16:07:22 +02:00
Sybren A. Stüvel
a99e68ec99 Manager: Convert TaskTouchedByWorker to sqlc
No functional changes.
2024-05-28 16:07:21 +02:00
Sybren A. Stüvel
7175bb469b Manager: Convert UpdateJobsTaskStatuses(Conditional) to sqlc
No functional changes.
2024-05-28 16:07:21 +02:00
Sybren A. Stüvel
4435633756 Manager: Convert FetchTasksOfJob() and FetchTasksOfJobInStatus() to sqlc
No functional changes.
2024-05-28 16:07:21 +02:00
Sybren A. Stüvel
4ab853da40 Manager: Convert JobHasTasksInStatus and CountTasksOfJobInStatus to sqlc
No functional changes.
2024-05-28 16:07:21 +02:00
Sybren A. Stüvel
b66490831c Manager: fetch jobs of tasks in FetchTasksOfWorkerInStatus()
The task state machine expects that `task.Job` is set correctly. Since
SQLC does not automatically fill this field (and rightfully so), I've added
a bit of Go code that fetches the job in a separate query.

A TODO is added as a reminder that it would be better for the task state
machine itself to fetch the job when needed.
2024-05-28 16:07:17 +02:00
Sybren A. Stüvel
1e327c510e Manager: Convert FetchTasksOfWorkerInStatusOfJob to sqlc
No functional changes.
2024-05-28 14:46:43 +02:00
Sybren A. Stüvel
950d661377 Manager: convert TaskAssignToWorker and FetchTasksOfWorkerInStatus to sqlc
No functional changes.
2024-05-28 14:46:43 +02:00
Sybren A. Stüvel
dcca9aef03 Manager: convert db.SaveTaskActivity() to SQLC
No functional changes.
2024-05-28 14:46:42 +02:00
Sybren A. Stüvel
a9be729e59 Manager: Convert db.SaveTaskStatus() to SQLC
No functional changes.
2024-05-28 14:46:42 +02:00
Sybren A. Stüvel
a54972ddd0 Manager: Convert db.SaveTask() to SQLC
No functional changes.
2024-05-28 14:46:42 +02:00
Sybren A. Stüvel
f632f2dbb6 SQLC: upgrade to 1.26.0
Doesn't change anything function in the generated code, just the version
numbers & handling of empty comments in the query file.
2024-05-28 14:46:42 +02:00
Sybren A. Stüvel
c1cdff567e Manager: Convert FetchTask to sqlc
This is a bit more work than other queries, as it also breaks apart the
fetching of the job and the worker into separate ones. In other words,
internally the persistence layer API changes.
2024-05-28 14:46:42 +02:00
Sybren A. Stüvel
94923c628d Manager: increase wait time in worker timeout test
Instead of waiting for 1ns, wait for 1ms. That's more stable on my laptop,
and still short enough to not really slow down the test.
2024-05-28 08:53:15 +02:00
Sybren A. Stüvel
b97d27c955 Manager: add unit test for db.SaveTaskActivity()
No functional changes.
2024-05-18 12:38:03 +02:00
Sybren A. Stüvel
ae0774b440 Manager: add unit tests
Add a few more unit tests for the persistence layer. The goal is to have
100% coverage of the happy flow, to aid in conversion from GORM to sqlc.

No functional changes.
2024-05-11 10:35:17 +02:00
Sybren A. Stüvel
3974770f36 Manager: refuse to delete workers without foreign key constraints
As a safety measure, refuse to delete Workers from the Manager's database
when foreign key constraints are disabled.

In the long term, the underlying problem should be solved. This is a stop-
gap measure to ensure database consistency.
2024-04-12 10:48:40 +02:00
Sybren A. Stüvel
6c28db780f Manager: refuse to delete worker tags without foreign key constraints
Before deleting a Worker Tag, check that foreign key constraints are
active for the current database connection.

Sometimes GORM decides to create a new database connection by itself,
without telling us, and then foreign key constraints are not active on
it. This commit is a workaround to avoid database corruption.
2024-04-12 10:48:40 +02:00
Sybren A. Stüvel
b313a2020d Refactor: Manager, move some test code into a function of its own
Move some of the Worker Tags test code into a function of its own, to have
a clearer separation between 'the test' and 'what needs to happen to do
this part of the test'.

Also it'll make an upcoming change easier to implement.

No functional changes.
2024-04-12 10:48:40 +02:00
Sybren A. Stüvel
b219f9b1c2 Manager tests: replace assert.NoError() with require.NoError()
Back in the days when I wrote the code, I didn't know about the
`require` package yet. Using `require.NoError()` makes the test code
more straight-forward.

No functional changes, except that when tests fail, they now fail
without panicking.
2024-03-16 12:14:39 +01:00
Sybren A. Stüvel
3f4a9025fe Manager tests: replace assert.NoError() with require.NoError()
Back in the days when I wrote the code, I didn't know about the
`require` package yet. Using `require.NoError()` makes the test code
more straight-forward.

No functional changes, except that when tests fail, they now fail
without panicking.
2024-03-16 11:09:18 +01:00
Sybren A. Stüvel
358efe7ae0 Manager: perform a database vacuum after migrations
Just to make sure the DB is properly cleaned up after a big migration
happened.
2024-03-06 11:59:17 +01:00
Sybren A. Stüvel
27cbb2ed0f Manager: increase timeout for database integrity check
With a fuller database, 2 seconds is apparently not always long enough,
so increase the timeout to 10 seconds.
2024-03-04 14:04:59 +01:00
Sybren A. Stüvel
a4e5eef83e Manager: fix database migration 0004
Fix the database migration that adds `NOT NULL` clauses. It used
`INSERT INTO temp_x SELECT * from x;`, and the `*` returns the fields in
the order they are defined on the table. Since this might be different from
the order that the `INSERT INTO temp_x` expects, strange problems can
happen where columns get swapped (or constraints can fail on columns that
they should not fail for, because they got fed data from a different
column).
2024-03-04 13:06:09 +01:00
Sybren A. Stüvel
7b72d0ca43 Refactor: move jobs-related queries to queries_jobs.sql
This makes it easier to later also create `query_workesr.sql`,
`query_meta.sql` etc. so that the sqlc-generated code can follow the
same subdivision as the persistence service code itself.

No functional changes.
2024-03-03 23:27:55 +01:00
Sybren A. Stüvel
b102b73a1f Refactor: convert more job functions to sqlc
No functional changes.
2024-03-03 23:23:51 +01:00
Sybren A. Stüvel
1ac796d0d8 Refactor: Manager: remove unused query from queries.sql
No functional changes.
2024-03-03 22:42:37 +01:00
Sybren A. Stüvel
3fbb3cde34 Manager: SQLC rename Uuid to UUID
No functional changes.
2024-03-03 20:54:43 +01:00
Sybren A. Stüvel
c046094880 Manager: start replacing GORM with SQLC
GORM has certain downsides:

- Code-first approach, where queries have to be translated to the Go code
  required to execute them.
- GORM comes with its own SQLite implementation, which doesn't provide an
  on-connect callback. This means that new connections cannot correctly
  enable foreign key constraints, causing database consistency issues.

[SQLC](https://sqlc.dev/) solves these issues for us.

This commit doesn't fully replace GORM with SQLC, but introduces it for
a few queries. Once all queries have been converted, GORM can be removed
completely.
2024-03-03 20:15:39 +01:00
Sybren A. Stüvel
7eb5eb68a3 Manager: ensure foreign keys are enabled in periodic integrity check
There are still issues with foreign keys getting disabled, so enable them
in the periodic database consistency check.

A more permanent solution is likely to drop GORM and switch to something
else that gives us an on-connect-callback, which can then be used to
turn on foreign key constraints for every connection made.
2024-03-01 23:42:04 +01:00
Sybren A. Stüvel
61cc8ff04d Manager: implement API operation to get the farm status
Add a new API operation to get the overall farm status. This is based on
the jobs and workers, and their status.

The statuses are:

- `active`: Actively working on jobs.
- `idle`: Farm could be active, but has no work to do.
- `waiting`: Work has been queued, but all workers are asleep.
- `asleep`: Farm is idle, and all workers are asleep.
- `inoperative`: Cannot work: no workers, or all are offline/error.
- `starting`: Farm is starting up.
- `unknown`: Unexpected configuration of worker and job statuses.
2024-02-29 20:42:28 +01:00
Sybren A. Stüvel
9afd79d8c0 Manager: prevent logging an error when fetching unknown worker
Prevent logging an error in the persistence layer when an unknown worker
is requested.

This reduces the noise & confusion when the web interface is showing the
details of a worker, but the worker gets removed by someone else. Or when
the Manager doesn't know about a Worker and it's trying to connect.

See #104282.
2024-01-25 12:38:13 +01:00
Sybren A. Stüvel
70faa4e225 Move URLs to the Flamenco website to constants in a dedicated package
Create a dedicated package `.../pkg/website` to contain constants for the
URLs of documentation, bug reporting, etc. That way it's easier to see
which parts of the website are being referred to from the Flamenco
binaries, and updates can happen in a central spot.

No functional changes.
2024-01-25 12:25:06 +01:00
Sybren A. Stüvel
b39f116b0e Manager: after deleting a job, perform a database consistency check
Deleting jobs from the database can still sometimes cause consistency
errors, as if foreign key constraints aren't enabled. This check is there
to try and get a grip on things.
2024-01-11 20:03:53 +01:00
Sybren A. Stüvel
6777e89589 Manager: refuse to delete job when foreign keys are disabled
Just as a safety measure, before deleting a job, check that foreign key
constraints are enabled. These are optional in SQLite, and the deletion
function assumes that they are on.
2024-01-11 17:17:56 +01:00
Sybren A. Stüvel
246916475f Manager: Implement mass mark-for-deletion of jobs
Implement the API function to mass-mark jobs for deletion, based on
their 'updated_at' timestamp.

Note that the `last_updated_max` parameter is rounded up to entire
seconds. This may mark more jobs for deletion than you expect, if their
`updated_at` timestamps differ by less than a second.
2023-12-16 23:05:52 +01:00
Sybren A. Stüvel
acc9499f2a Manager: drop the job_storage_infos database table
GORM Automigration created a separate `job_storage_infos` table (because
we used it wrong, to be fair), which is actually only used as an
embedded struct in the `jobs` table. This means this table itself can be
dropped.
2023-12-14 10:13:42 +01:00
Sybren A. Stüvel
a65f234bea Manager: replace GORM database migration with Goose
Replace GORM's auto-migration with Goose. The latter uses hand-written
SQL queries to apply database schema changes, which is safer and easier to
understand than what GORM is doing.
2023-12-14 10:13:40 +01:00
Sybren A. Stüvel
3e72391cbf Restartable workers
When the worker is started with `-restart-exit-code 47` or has
`restart_exit_code=47` in `flamenco-worker.yaml`, it's marked as
'restartable'. This will enable two worker actions 'Restart
(immediately)' and 'Restart (after task is finished)' in the Manager web
interface. When a worker is asked to restart, it will exit with exit
code `47`. Of course any positive exit code can be used here.
2023-08-14 16:00:09 +02:00
Sybren A. Stüvel
02fac6a4df Change Go package name from git.blender.org to projects.blender.org
Change the package base name of the Go code, from
`git.blender.org/flamenco` to `projects.blender.org/studio/flamenco`.

The old location, `git.blender.org`, has no longer been use since the
[migration to Gitea][1]. The new package names now reflect the actual
location where Flamenco is hosted.

[1]: https://code.blender.org/2023/02/new-blender-development-infrastructure/
2023-08-01 12:42:31 +02:00
Sybren A. Stüvel
63634361ce Manager: make periodic database integrity check configurable
Instead of always performing the periodic integrity check, make it possible
to disable it or run it at different intervals.

Currently for the Blender Studio it's crunch time, so the check should
really only run when there is someone looking at the system (i.e. at
restarts for upgrade purposes).
2023-07-18 16:33:01 +02:00
Sybren A. Stüvel
1a79c19583 Manager: improve logging of database consistency checks
The log messages now all start with `database: `.

No functional changes.
2023-07-18 16:12:26 +02:00
Sybren A. Stüvel
4121c899c3 Manager: perform database integrity check every hour
Perform a database integrity check every hour. This check was already
performed at startup, in the main goroutine.
2023-07-18 16:10:17 +02:00