762 Commits

Author SHA1 Message Date
Sybren A. Stüvel
572089f13b Manager: speed up sequential job deletion by checking db when queue empty
Job deletions are placed in an in-memory queue in batches of 100 jobs.
Between batches the Manager's job deleter would idle for 1 minute. Now,
once the in-memory queue has been emptied, the job deleter will wait
only 100ms before checking the database again.

This 100ms might not be necessary either, but I think it's nice to give
the Manager a bit of a breather before diving into another batch of
deletions.
2024-05-28 16:07:22 +02:00
Sybren A. Stüvel
286d0efa2d Manager: speed up job deletion by skipping the DB integrity check
Speed up the deletion of multiple jobs by skipping the database integrity
check. It is now clear what was causing the integrity issues (disabled
foreign key constraints), and this is now checked for before deleting
anything. This reduces the deletion time from ~500ms per job to ~150ms
(on my computer, with my database, of course).
2024-05-28 16:07:22 +02:00
Sybren A. Stüvel
390cb9445c Manager: log duration of job deletion
When a job has been deleted, log how long it took to delete.
2024-05-28 16:07:22 +02:00
Sybren A. Stüvel
5ec479a983 Manager: remove testing.T parameter from some test setup functions
Replace the use of the `t *testing.T` parameter with just plain `panic()`
when test setup fails. This makes it easier to call the same functions
from other situations, like benchmark functions.

No functional changes to Flamenco itself.
2024-05-28 16:07:22 +02:00
Sybren A. Stüvel
88f90ef0a5 Manager: properly close database at end of test
Instead of closing the sqlite database connection, tell GORM to close the
connection. Only that properly closes the DB, so that testing with a file
on disk doesn't fail when trying to delete that file.

No functional changes to the Manager itself.
2024-05-28 16:07:22 +02:00
Sybren A. Stüvel
98cbe6a67d Manager: lightly polish job deletion
Tweak the logging a little bit so it's less noisy, properly warns when the
Shaman checkout dir cannot be removed, and optimise the database query
a bit (by just fetching the one field that's needed, instead of the entire
job).

Deletion still works the same.
2024-05-28 16:07:22 +02:00
Sybren A. Stüvel
79076be91b Manager: Convert task failure persistence to SQLC
No functional changes.
2024-05-28 16:07:22 +02:00
Sybren A. Stüvel
a99e68ec99 Manager: Convert TaskTouchedByWorker to sqlc
No functional changes.
2024-05-28 16:07:21 +02:00
Sybren A. Stüvel
7175bb469b Manager: Convert UpdateJobsTaskStatuses(Conditional) to sqlc
No functional changes.
2024-05-28 16:07:21 +02:00
Sybren A. Stüvel
4435633756 Manager: Convert FetchTasksOfJob() and FetchTasksOfJobInStatus() to sqlc
No functional changes.
2024-05-28 16:07:21 +02:00
Sybren A. Stüvel
4ab853da40 Manager: Convert JobHasTasksInStatus and CountTasksOfJobInStatus to sqlc
No functional changes.
2024-05-28 16:07:21 +02:00
Sybren A. Stüvel
b66490831c Manager: fetch jobs of tasks in FetchTasksOfWorkerInStatus()
The task state machine expects that `task.Job` is set correctly. Since
SQLC does not automatically fill this field (and rightfully so), I've added
a bit of Go code that fetches the job in a separate query.

A TODO is added as a reminder that it would be better for the task state
machine itself to fetch the job when needed.
2024-05-28 16:07:17 +02:00
Sybren A. Stüvel
1e327c510e Manager: Convert FetchTasksOfWorkerInStatusOfJob to sqlc
No functional changes.
2024-05-28 14:46:43 +02:00
Sybren A. Stüvel
950d661377 Manager: convert TaskAssignToWorker and FetchTasksOfWorkerInStatus to sqlc
No functional changes.
2024-05-28 14:46:43 +02:00
Sybren A. Stüvel
dcca9aef03 Manager: convert db.SaveTaskActivity() to SQLC
No functional changes.
2024-05-28 14:46:42 +02:00
Sybren A. Stüvel
a9be729e59 Manager: Convert db.SaveTaskStatus() to SQLC
No functional changes.
2024-05-28 14:46:42 +02:00
Sybren A. Stüvel
a54972ddd0 Manager: Convert db.SaveTask() to SQLC
No functional changes.
2024-05-28 14:46:42 +02:00
Sybren A. Stüvel
f632f2dbb6 SQLC: upgrade to 1.26.0
Doesn't change anything function in the generated code, just the version
numbers & handling of empty comments in the query file.
2024-05-28 14:46:42 +02:00
Sybren A. Stüvel
c1cdff567e Manager: Convert FetchTask to sqlc
This is a bit more work than other queries, as it also breaks apart the
fetching of the job and the worker into separate ones. In other words,
internally the persistence layer API changes.
2024-05-28 14:46:42 +02:00
Sybren A. Stüvel
dc893dcad4 Manager: regenerate Go mock after removal of SaveTask
Regenerate the Go mock implementation after the removal of the SaveTask
function from the mocked interface.

See 097d5abb7c13e6eff1facea12f89f24c144194c0
2024-05-28 14:45:34 +02:00
Sybren A. Stüvel
94923c628d Manager: increase wait time in worker timeout test
Instead of waiting for 1ns, wait for 1ms. That's more stable on my laptop,
and still short enough to not really slow down the test.
2024-05-28 08:53:15 +02:00
Sybren A. Stüvel
097d5abb7c Manager: Remove SaveTask function from interface
Remove `SaveTask(...)` from the persistence layer interface as defined
by the `api_impl` package. It's not used.
2024-05-28 08:53:15 +02:00
Sybren A. Stüvel
b97d27c955 Manager: add unit test for db.SaveTaskActivity()
No functional changes.
2024-05-18 12:38:03 +02:00
Sybren A. Stüvel
ae0774b440 Manager: add unit tests
Add a few more unit tests for the persistence layer. The goal is to have
100% coverage of the happy flow, to aid in conversion from GORM to sqlc.

No functional changes.
2024-05-11 10:35:17 +02:00
Sybren A. Stüvel
df334deca5 Add shellSplit(someString) function to the job compiler scripts
Add a function `shellSplit(string)` to the global namespace of job
compiler scripts. It splits a string into an array of strings using
shell/CLI semantics.

For example: `shellSplit("--python-expr 'print(1 + 1)'")` will return
`["--python-expr", "print(1 + 1)"]`.
2024-05-07 12:39:13 +02:00
Sybren A. Stüvel
94fba20ef6 Worker: reduce log level of some internal components
Reduce the log level from 'info' to 'debug' on some internal components
of Flamenco Worker. This makes the console output slightly less noisy,
and it's unlikely that these particular messages are commonly needed.
2024-04-16 10:53:29 +02:00
Sybren A. Stüvel
e2bca9ad61 Worker: add configuration for Linux out-of-memory killer
Add a Worker configuration option to configure the Linux out-of-memory
behaviour. Add `oom_score_adjust=500` to `flamenco-worker.yaml` to increase
the chance that Blender gets killed when the machine runs out of memory,
instead of Flamenco Worker itself.
2024-04-15 17:21:11 +02:00
Sybren A. Stüvel
3974770f36 Manager: refuse to delete workers without foreign key constraints
As a safety measure, refuse to delete Workers from the Manager's database
when foreign key constraints are disabled.

In the long term, the underlying problem should be solved. This is a stop-
gap measure to ensure database consistency.
2024-04-12 10:48:40 +02:00
Sybren A. Stüvel
6c28db780f Manager: refuse to delete worker tags without foreign key constraints
Before deleting a Worker Tag, check that foreign key constraints are
active for the current database connection.

Sometimes GORM decides to create a new database connection by itself,
without telling us, and then foreign key constraints are not active on
it. This commit is a workaround to avoid database corruption.
2024-04-12 10:48:40 +02:00
Sybren A. Stüvel
b313a2020d Refactor: Manager, move some test code into a function of its own
Move some of the Worker Tags test code into a function of its own, to have
a clearer separation between 'the test' and 'what needs to happen to do
this part of the test'.

Also it'll make an upcoming change easier to implement.

No functional changes.
2024-04-12 10:48:40 +02:00
Taylor Wiebe
a0cb8735c9 Manager: add optional description to job types
This description will be shown as a tooltip in the job submission UI.
2024-04-04 11:12:42 +02:00
Sybren A. Stüvel
b219f9b1c2 Manager tests: replace assert.NoError() with require.NoError()
Back in the days when I wrote the code, I didn't know about the
`require` package yet. Using `require.NoError()` makes the test code
more straight-forward.

No functional changes, except that when tests fail, they now fail
without panicking.
2024-03-16 12:14:39 +01:00
Sybren A. Stüvel
3f4a9025fe Manager tests: replace assert.NoError() with require.NoError()
Back in the days when I wrote the code, I didn't know about the
`require` package yet. Using `require.NoError()` makes the test code
more straight-forward.

No functional changes, except that when tests fail, they now fail
without panicking.
2024-03-16 11:09:18 +01:00
Sybren A. Stüvel
d1fbe8b9f9 Manager: set default MQTT topic prefix to 'flamenco'
Set the default MQTT topic prefix to 'flamenco'. It can still be overridden
by the config in the YAML file, but it's nice to have a sensible default
when people don't configure this.
2024-03-08 16:44:39 +01:00
Sybren A. Stüvel
b476e31c0c Manager: remove unused configuration defaults
Remove commented-out sections in the configuration defaults. They're a
leftover from Flamenco v2.
2024-03-08 16:41:40 +01:00
Sybren A. Stüvel
cbafacdff6 Manager: don't forward task log updates to MQTT
Task log updates are big and frequent, and should not be sent via MQTT.
At least not until we have a practical reason to do so.
2024-03-07 15:22:44 +01:00
Sybren A. Stüvel
358efe7ae0 Manager: perform a database vacuum after migrations
Just to make sure the DB is properly cleaned up after a big migration
happened.
2024-03-06 11:59:17 +01:00
Sybren A. Stüvel
16114ee529 Worker: fix Go scheduling issue in sleep command test
Add a 1ms delay in the test loop, so that other goroutines can be scheduled
as well. This should fix #104288.
2024-03-04 14:18:08 +01:00
Sybren A. Stüvel
27cbb2ed0f Manager: increase timeout for database integrity check
With a fuller database, 2 seconds is apparently not always long enough,
so increase the timeout to 10 seconds.
2024-03-04 14:04:59 +01:00
Sybren A. Stüvel
a4e5eef83e Manager: fix database migration 0004
Fix the database migration that adds `NOT NULL` clauses. It used
`INSERT INTO temp_x SELECT * from x;`, and the `*` returns the fields in
the order they are defined on the table. Since this might be different from
the order that the `INSERT INTO temp_x` expects, strange problems can
happen where columns get swapped (or constraints can fail on columns that
they should not fail for, because they got fed data from a different
column).
2024-03-04 13:06:09 +01:00
Sybren A. Stüvel
7b72d0ca43 Refactor: move jobs-related queries to queries_jobs.sql
This makes it easier to later also create `query_workesr.sql`,
`query_meta.sql` etc. so that the sqlc-generated code can follow the
same subdivision as the persistence service code itself.

No functional changes.
2024-03-03 23:27:55 +01:00
Sybren A. Stüvel
b102b73a1f Refactor: convert more job functions to sqlc
No functional changes.
2024-03-03 23:23:51 +01:00
Sybren A. Stüvel
1ac796d0d8 Refactor: Manager: remove unused query from queries.sql
No functional changes.
2024-03-03 22:42:37 +01:00
Sybren A. Stüvel
3fbb3cde34 Manager: SQLC rename Uuid to UUID
No functional changes.
2024-03-03 20:54:43 +01:00
Sybren A. Stüvel
c046094880 Manager: start replacing GORM with SQLC
GORM has certain downsides:

- Code-first approach, where queries have to be translated to the Go code
  required to execute them.
- GORM comes with its own SQLite implementation, which doesn't provide an
  on-connect callback. This means that new connections cannot correctly
  enable foreign key constraints, causing database consistency issues.

[SQLC](https://sqlc.dev/) solves these issues for us.

This commit doesn't fully replace GORM with SQLC, but introduces it for
a few queries. Once all queries have been converted, GORM can be removed
completely.
2024-03-03 20:15:39 +01:00
Sybren A. Stüvel
1e7c059d12 Manager: check the farm status quickly after startup
The database is polled every 30 seconds to determine the farm status; at
startup the first poll is done after 1 second to get a faster status.

Note that when jobs and workers change their status, the farm status is
always updated.
2024-03-02 22:09:53 +01:00
Sybren A. Stüvel
7eb5eb68a3 Manager: ensure foreign keys are enabled in periodic integrity check
There are still issues with foreign keys getting disabled, so enable them
in the periodic database consistency check.

A more permanent solution is likely to drop GORM and switch to something
else that gives us an on-connect-callback, which can then be used to
turn on foreign key constraints for every connection made.
2024-03-01 23:42:04 +01:00
Sybren A. Stüvel
c1a9b1e877 Manager: force a poll of the farm status when a job/worker changes state
This introduces the concept of 'event listener', which is now used by
the farm status service to respond to events on the event bus.

This makes it possible to reduce the regular poll period from 5 to 30
seconds. That's now only necessary as backup, just in case events are
missed or otherwise things change without the event bus logic noticing.
2024-03-01 22:36:38 +01:00
Sybren A. Stüvel
9bfb53a7f6 Manager: log error when an event doesn't have a SocketIO event type
SocketIO has 'rooms' and 'event types'. The 'event type' is set via
reflection of the OpenAPI type of the event payload. This has to be set
up in a mapping, though, and if that mapping is incomplete, an error will
now be logged.
2024-03-01 22:36:26 +01:00
Sybren A. Stüvel
ee7af29748 Manager: fix unit test for farm status events 2024-03-01 22:36:26 +01:00