Replace old used-to-be-GORM datastructures (#104305) with sqlc-generated
structs. This also makes it possible to use more specific structs that
are more taylored to the specific queries, increasing efficiency.
This commit deals with worker tags (on both workers and jobs).
Functional changes are kept to a minimum, as the API still serves the
same data.
Because this work covers so much of Flamenco's code, it's been split up
into different commits. Each commit brings Flamenco to a state where it
compiles and unit tests pass. Only the result of the final commit has
actually been tested properly.
Ref: #104343
Replace old used-to-be-GORM datastructures (#104305) with sqlc-generated
structs. This also makes it possible to use more specific structs that
are more taylored to the specific queries, increasing efficiency.
This commit mostly deals with workers, including the sleep schedule and
task scheduler.
Functional changes are kept to a minimum, as the API still serves the
same data.
Because this work covers so much of Flamenco's code, it's been split up
into different commits. Each commit brings Flamenco to a state where it
compiles and unit tests pass. Only the result of the final commit has
actually been tested properly.
Ref: #104343
GORM implicitly sets 'created at', 'updated at' and 'deleted at' timestamps
to 'now' by calling a 'now function'. This is now implemented by Flamenco
directly, instead of relying on GORM.
Ref: #104305
Instead of returning an error when getting the sqlc queries object, just
panic. This'll make the calling code quite a bit simpler. The situation
in which it might error out is so rare that I've never seen it, and I
don't even know if it will ever be possible to happen with the SQLite
implementation we use now. Furthermore, once we get rid of GORM, it
should just always work anyway.
Ref: #104305
Turn `workers.lazy_status_request` and `workers.can_restart` into a
`boolean`. They were `smallint` before.
Having these explicitly modeled as `boolean` will make sqlc generate the
right type for them.
No functional changes.
This is a bit more work than other queries, as it also breaks apart the
fetching of the job and the worker into separate ones. In other words,
internally the persistence layer API changes.
As a safety measure, refuse to delete Workers from the Manager's database
when foreign key constraints are disabled.
In the long term, the underlying problem should be solved. This is a stop-
gap measure to ensure database consistency.
Add a new API operation to get the overall farm status. This is based on
the jobs and workers, and their status.
The statuses are:
- `active`: Actively working on jobs.
- `idle`: Farm could be active, but has no work to do.
- `waiting`: Work has been queued, but all workers are asleep.
- `asleep`: Farm is idle, and all workers are asleep.
- `inoperative`: Cannot work: no workers, or all are offline/error.
- `starting`: Farm is starting up.
- `unknown`: Unexpected configuration of worker and job statuses.
Prevent logging an error in the persistence layer when an unknown worker
is requested.
This reduces the noise & confusion when the web interface is showing the
details of a worker, but the worker gets removed by someone else. Or when
the Manager doesn't know about a Worker and it's trying to connect.
See #104282.
When the worker is started with `-restart-exit-code 47` or has
`restart_exit_code=47` in `flamenco-worker.yaml`, it's marked as
'restartable'. This will enable two worker actions 'Restart
(immediately)' and 'Restart (after task is finished)' in the Manager web
interface. When a worker is asked to restart, it will exit with exit
code `47`. Of course any positive exit code can be used here.
Change the package base name of the Go code, from
`git.blender.org/flamenco` to `projects.blender.org/studio/flamenco`.
The old location, `git.blender.org`, has no longer been use since the
[migration to Gitea][1]. The new package names now reflect the actual
location where Flamenco is hosted.
[1]: https://code.blender.org/2023/02/new-blender-development-infrastructure/
Mark the default value of `Worker.LazyStatusRequest` as `false`. The
previous default was configured as `0`, which was different enough to
always trigger a database migration of that column. However, since these
values do map to each other, the migration didn't do anything concrete,
and would be triggered again at the next startup.
As it was decided that the name "tags" would be better for the clarity
of the feature, all files and code named "cluster" or "worker cluster"
have been removed and replaced with "tag" and "worker tag". This is only
a name change, no other features were touched.
This addresses part of #104204.
Reviewed-on: https://projects.blender.org/studio/flamenco/pulls/104223
As a note to anyone who already ran a pre-release version of Flamenco
and configured some worker clusters, with the help of an SQLite client
you can migrate the clusters to tags. First build Flamenco Manager and
start it, to create the new database schema. Then run these SQL queries
via an sqlite commandline client:
```sql
insert into worker_tags
(id, created_at, updated_at, uuid, name, description)
select id, created_at, updated_at, uuid, name, description
from worker_clusters;
insert into worker_tag_membership (worker_tag_id, worker_id)
select worker_cluster_id, worker_id from worker_cluster_membership;
```
When the Manager was shutting down while the sleep scheduler was running, it
could cause a null pointer dereference. This is now doubly solved:
- `worker.Identifier()` is now nil-safe, as in, `worker` can be `nil` and
it will still return a sensible string.
- failure to apply the sleep schedule due to the context closing is not
logged as error any more.
Workers can now be soft-deleted. Tasks assigned to the worker will remain
associated with that Worker. Active tasks will be re-queued so other
workers can pick them up.
The Task details component already linked to the Worker it was assigned
to last, and now the Worker links back to the task.
There's only one task shown in the Worker details. If the Worker is
actively working on a task, that one's shown. Otherwise it's the
last-updated task that was assigned to the worker.
`persistence.Model` contains the common database fields for most model
structs. It is a copy of `gorm.Model`, but without the `DeletedAt`
field (which triggers Gorm's soft deletion).
Soft deletion is not used by Flamenco. If it ever becomes necessary to
support soft-deletion, see https://gorm.io/docs/delete.html#Soft-Delete
Update the 'last seen at' timestamp of workers when they:
- sign on
- sign off
- get a task assigned
- send a task update
- check whether they can keep running their task
Note that this commit is necessary to not have the workers time out
immediately ;-)
The add-on code was copy-pasted from other addons and used the GPL v2
license, whereas by accident the LICENSE text file had the GNU "Affero" GPL
license v3 (instead of regular GPL v3).
This is now all streamlined, and all code is licensed as "GPL v3 or later".
Furthermore, the code comments just show a SPDX License Identifier
instead of an entire license block.
Instead of returning an error "error doing X", just return "doing X". The
fact that it's returned as an error object says enough about that it's
an error.
This also makes it easier to chain error messages, without seeing the
word "error" in every part of the chain.
Where the PostgreSQL DB migration code could handle `NOT NULL` columns just
fine, SQLite has less table-altering functionality. As a result, migrations
have to copy entire database tables, which doesn't play well with
not-nullable columns.