Replace the use of the `t *testing.T` parameter with just plain `panic()`
when test setup fails. This makes it easier to call the same functions
from other situations, like benchmark functions.
No functional changes to Flamenco itself.
Instead of closing the sqlite database connection, tell GORM to close the
connection. Only that properly closes the DB, so that testing with a file
on disk doesn't fail when trying to delete that file.
No functional changes to the Manager itself.
Tweak the logging a little bit so it's less noisy, properly warns when the
Shaman checkout dir cannot be removed, and optimise the database query
a bit (by just fetching the one field that's needed, instead of the entire
job).
Deletion still works the same.
The task state machine expects that `task.Job` is set correctly. Since
SQLC does not automatically fill this field (and rightfully so), I've added
a bit of Go code that fetches the job in a separate query.
A TODO is added as a reminder that it would be better for the task state
machine itself to fetch the job when needed.
This is a bit more work than other queries, as it also breaks apart the
fetching of the job and the worker into separate ones. In other words,
internally the persistence layer API changes.
Add a few more unit tests for the persistence layer. The goal is to have
100% coverage of the happy flow, to aid in conversion from GORM to sqlc.
No functional changes.
As a safety measure, refuse to delete Workers from the Manager's database
when foreign key constraints are disabled.
In the long term, the underlying problem should be solved. This is a stop-
gap measure to ensure database consistency.
Before deleting a Worker Tag, check that foreign key constraints are
active for the current database connection.
Sometimes GORM decides to create a new database connection by itself,
without telling us, and then foreign key constraints are not active on
it. This commit is a workaround to avoid database corruption.
Move some of the Worker Tags test code into a function of its own, to have
a clearer separation between 'the test' and 'what needs to happen to do
this part of the test'.
Also it'll make an upcoming change easier to implement.
No functional changes.
Back in the days when I wrote the code, I didn't know about the
`require` package yet. Using `require.NoError()` makes the test code
more straight-forward.
No functional changes, except that when tests fail, they now fail
without panicking.
Back in the days when I wrote the code, I didn't know about the
`require` package yet. Using `require.NoError()` makes the test code
more straight-forward.
No functional changes, except that when tests fail, they now fail
without panicking.
Fix the database migration that adds `NOT NULL` clauses. It used
`INSERT INTO temp_x SELECT * from x;`, and the `*` returns the fields in
the order they are defined on the table. Since this might be different from
the order that the `INSERT INTO temp_x` expects, strange problems can
happen where columns get swapped (or constraints can fail on columns that
they should not fail for, because they got fed data from a different
column).
This makes it easier to later also create `query_workesr.sql`,
`query_meta.sql` etc. so that the sqlc-generated code can follow the
same subdivision as the persistence service code itself.
No functional changes.
GORM has certain downsides:
- Code-first approach, where queries have to be translated to the Go code
required to execute them.
- GORM comes with its own SQLite implementation, which doesn't provide an
on-connect callback. This means that new connections cannot correctly
enable foreign key constraints, causing database consistency issues.
[SQLC](https://sqlc.dev/) solves these issues for us.
This commit doesn't fully replace GORM with SQLC, but introduces it for
a few queries. Once all queries have been converted, GORM can be removed
completely.
There are still issues with foreign keys getting disabled, so enable them
in the periodic database consistency check.
A more permanent solution is likely to drop GORM and switch to something
else that gives us an on-connect-callback, which can then be used to
turn on foreign key constraints for every connection made.
Add a new API operation to get the overall farm status. This is based on
the jobs and workers, and their status.
The statuses are:
- `active`: Actively working on jobs.
- `idle`: Farm could be active, but has no work to do.
- `waiting`: Work has been queued, but all workers are asleep.
- `asleep`: Farm is idle, and all workers are asleep.
- `inoperative`: Cannot work: no workers, or all are offline/error.
- `starting`: Farm is starting up.
- `unknown`: Unexpected configuration of worker and job statuses.
Prevent logging an error in the persistence layer when an unknown worker
is requested.
This reduces the noise & confusion when the web interface is showing the
details of a worker, but the worker gets removed by someone else. Or when
the Manager doesn't know about a Worker and it's trying to connect.
See #104282.
Create a dedicated package `.../pkg/website` to contain constants for the
URLs of documentation, bug reporting, etc. That way it's easier to see
which parts of the website are being referred to from the Flamenco
binaries, and updates can happen in a central spot.
No functional changes.
Deleting jobs from the database can still sometimes cause consistency
errors, as if foreign key constraints aren't enabled. This check is there
to try and get a grip on things.
Just as a safety measure, before deleting a job, check that foreign key
constraints are enabled. These are optional in SQLite, and the deletion
function assumes that they are on.
Implement the API function to mass-mark jobs for deletion, based on
their 'updated_at' timestamp.
Note that the `last_updated_max` parameter is rounded up to entire
seconds. This may mark more jobs for deletion than you expect, if their
`updated_at` timestamps differ by less than a second.
GORM Automigration created a separate `job_storage_infos` table (because
we used it wrong, to be fair), which is actually only used as an
embedded struct in the `jobs` table. This means this table itself can be
dropped.
Replace GORM's auto-migration with Goose. The latter uses hand-written
SQL queries to apply database schema changes, which is safer and easier to
understand than what GORM is doing.
When the worker is started with `-restart-exit-code 47` or has
`restart_exit_code=47` in `flamenco-worker.yaml`, it's marked as
'restartable'. This will enable two worker actions 'Restart
(immediately)' and 'Restart (after task is finished)' in the Manager web
interface. When a worker is asked to restart, it will exit with exit
code `47`. Of course any positive exit code can be used here.
Change the package base name of the Go code, from
`git.blender.org/flamenco` to `projects.blender.org/studio/flamenco`.
The old location, `git.blender.org`, has no longer been use since the
[migration to Gitea][1]. The new package names now reflect the actual
location where Flamenco is hosted.
[1]: https://code.blender.org/2023/02/new-blender-development-infrastructure/
Instead of always performing the periodic integrity check, make it possible
to disable it or run it at different intervals.
Currently for the Blender Studio it's crunch time, so the check should
really only run when there is someone looking at the system (i.e. at
restarts for upgrade purposes).
Due to an issue (which has been fixed in the previous commit), all tasks
in the database were deleted when starting Flamenco. This tool attempts
to recompile the job and recreate its tasks.
The statuses of the tasks are set based on the job status. Basically:
- job active → tasks queued
- job completed → tasks completed
- job cancelled / failed → tasks cancelled
- otherwise → tasks queued
To ensure that the tool is only used to create tasks from scratch, it
refuses to work on a job that still has tasks in the database.
There is an issue with the GORM auto-migration, in that it doesn't
always disable foreign key constraints when it should. Due to
limitations of SQLite, not all 'alter table' commands you'd want to use
are available. As a workaround, these steps are performed:
1. create a new table with the desired schema,
2. copy the data over,
3. drop the old table,
4. rename the new table to the old name.
Step #3 will wreak havoc with the database when foreign key constraint
checks are active, so no we temporarily deactivate them while performing
database migration.
Mark the default value of `Worker.LazyStatusRequest` as `false`. The
previous default was configured as `0`, which was different enough to
always trigger a database migration of that column. However, since these
values do map to each other, the migration didn't do anything concrete,
and would be triggered again at the next startup.
As it was decided that the name "tags" would be better for the clarity
of the feature, all files and code named "cluster" or "worker cluster"
have been removed and replaced with "tag" and "worker tag". This is only
a name change, no other features were touched.
This addresses part of #104204.
Reviewed-on: https://projects.blender.org/studio/flamenco/pulls/104223
As a note to anyone who already ran a pre-release version of Flamenco
and configured some worker clusters, with the help of an SQLite client
you can migrate the clusters to tags. First build Flamenco Manager and
start it, to create the new database schema. Then run these SQL queries
via an sqlite commandline client:
```sql
insert into worker_tags
(id, created_at, updated_at, uuid, name, description)
select id, created_at, updated_at, uuid, name, description
from worker_clusters;
insert into worker_tag_membership (worker_tag_id, worker_id)
select worker_cluster_id, worker_id from worker_cluster_membership;
```