269 Commits

Author SHA1 Message Date
Sybren A. Stüvel
d60451a829 Manager: run some worker management API calls in a background context
Run the actually-doing-stuff parts of `RequestWorkerStatusChange()` and
`SetWorkerTags()` in a background context. That way the operation can
continue even when the HTTP client disconnects.
2024-06-26 10:48:03 +02:00
Sybren A. Stüvel
6c2d3d7fc0 Manager: avoid logging error on HTTP disconnect on some API calls
Improve the error handling on some worker management API calls, to deal
with closed HTTP connections better.

A new function, `api_impl.handleConnectionClosed()` can now be called when
`errors.Is(err, context.Canceled)`. This will only log at debug level, and
send a `419 I'm a Teapot` response to the client. This response will very
likely never be seen, as the connection was closed. However, in case this
function is called by mistake, this response is unlikely to be accepted
by the HTTP client.
2024-06-26 10:26:33 +02:00
Sybren A. Stüvel
ee31316d9d Manager: more gracefully log context cancellation errors in database layer
The context passed to the database layer will auto-close when the HTTP
client disconnects. This will cancel any running query, which is the
expected behaviour. Now this no longer results in an error being logged
in the database layer. Instead, a message is logged at debug level.

The API layer is also adjusted to silence logging of `context.Canceled`
for certain operations, most notably getting all jobs and getting all
tasks of a job. These calls occur when the webapp reconnects after a
restart of the Manager. That may trigger a refresh of the page, which
immediately aborts any pending API calls. This is normal and should not
cause errors to be logged.
2024-05-28 17:27:27 +02:00
Sybren A. Stüvel
c1cdff567e Manager: Convert FetchTask to sqlc
This is a bit more work than other queries, as it also breaks apart the
fetching of the job and the worker into separate ones. In other words,
internally the persistence layer API changes.
2024-05-28 14:46:42 +02:00
Sybren A. Stüvel
dc893dcad4 Manager: regenerate Go mock after removal of SaveTask
Regenerate the Go mock implementation after the removal of the SaveTask
function from the mocked interface.

See 097d5abb7c13e6eff1facea12f89f24c144194c0
2024-05-28 14:45:34 +02:00
Sybren A. Stüvel
097d5abb7c Manager: Remove SaveTask function from interface
Remove `SaveTask(...)` from the persistence layer interface as defined
by the `api_impl` package. It's not used.
2024-05-28 08:53:15 +02:00
Taylor Wiebe
a0cb8735c9 Manager: add optional description to job types
This description will be shown as a tooltip in the job submission UI.
2024-04-04 11:12:42 +02:00
Sybren A. Stüvel
3f4a9025fe Manager tests: replace assert.NoError() with require.NoError()
Back in the days when I wrote the code, I didn't know about the
`require` package yet. Using `require.NoError()` makes the test code
more straight-forward.

No functional changes, except that when tests fail, they now fail
without panicking.
2024-03-16 11:09:18 +01:00
Sybren A. Stüvel
61cc8ff04d Manager: implement API operation to get the farm status
Add a new API operation to get the overall farm status. This is based on
the jobs and workers, and their status.

The statuses are:

- `active`: Actively working on jobs.
- `idle`: Farm could be active, but has no work to do.
- `waiting`: Work has been queued, but all workers are asleep.
- `asleep`: Farm is idle, and all workers are asleep.
- `inoperative`: Cannot work: no workers, or all are offline/error.
- `starting`: Farm is starting up.
- `unknown`: Unexpected configuration of worker and job statuses.
2024-02-29 20:42:28 +01:00
Sybren A. Stüvel
e7c4285ac6 Manager: Adjust code for renaming SocketIO... types to Event...
No functional changes, just adjusting to the OpenAPI renames.
2024-02-05 09:25:43 +01:00
Sybren A. Stüvel
76a24243f0 Manager: Introduce event bus system
Introduce an "event bus"-like system. It's more like a fan-out
broadcaster for certain events. Instead of directly sending events to
SocketIO, they are now sent to the broker, which in turn sends it to any
registered "forwarder". Currently there is ony one forwarder, for
SocketIO.

This opens the door for a proper MQTT client that sends the same events
to an MQTT server.
2024-02-03 22:55:23 +01:00
Sybren A. Stüvel
f464aea137 Manager & website: provide more helpful info when Worker auth fails
Provide more useful info when a Worker tries to communicate but fails
the authentication check. The message about this is now more friendly
and links to a new FAQ entry at
https://flamenco.blender.org/faq/#what-does-unknown-worker-is-trying-to-communicate-mean
2024-01-25 14:19:24 +01:00
Sybren A. Stüvel
aac2ec7bf6 Manager: when requesting job deletion, also log its low-level database ID
When an API request comes in to delete a job, not only log the job's UUID,
but also include its database ID. This can help in figuring out database
issues, as when the job is deleted, it's unknown what UUID it had. Database
relations use the ID, and not the UUID.
2024-01-11 17:17:56 +01:00
Sybren A. Stüvel
3e46322d14 Manager: reduce log level when last-rendered image was accepted
Reduce the log level when a last-rendered image was accepted from a Worker.
2024-01-11 17:17:56 +01:00
Sybren A. Stüvel
246916475f Manager: Implement mass mark-for-deletion of jobs
Implement the API function to mass-mark jobs for deletion, based on
their 'updated_at' timestamp.

Note that the `last_updated_max` parameter is rounded up to entire
seconds. This may mark more jobs for deletion than you expect, if their
`updated_at` timestamps differ by less than a second.
2023-12-16 23:05:52 +01:00
Sybren A. Stüvel
ef726da17b SocketIO broadcasting for worker tags CUD operations
Broadcast create/update/delete operations on worker tags via SocketIO.

Ref: #104204
2023-08-23 13:54:02 +00:00
Sybren A. Stüvel
e231f6f221 Manager: better logging of tag create/update/delete
Emit an info-level log message when worker tags are created, updated, or
deleted.
2023-08-15 10:36:54 +02:00
Sybren A. Stüvel
c477992467 Manager: tag update without description now keeps the description
Updating a tag without `description` field in the request body will keep
the tag's description as-is. Previously this caused it to become empty,
which is now still possible by using an explicit `description: ""`.
2023-08-15 10:29:44 +02:00
Sybren A. Stüvel
3e72391cbf Restartable workers
When the worker is started with `-restart-exit-code 47` or has
`restart_exit_code=47` in `flamenco-worker.yaml`, it's marked as
'restartable'. This will enable two worker actions 'Restart
(immediately)' and 'Restart (after task is finished)' in the Manager web
interface. When a worker is asked to restart, it will exit with exit
code `47`. Of course any positive exit code can be used here.
2023-08-14 16:00:09 +02:00
Sybren A. Stüvel
02fac6a4df Change Go package name from git.blender.org to projects.blender.org
Change the package base name of the Go code, from
`git.blender.org/flamenco` to `projects.blender.org/studio/flamenco`.

The old location, `git.blender.org`, has no longer been use since the
[migration to Gitea][1]. The new package names now reflect the actual
location where Flamenco is hosted.

[1]: https://code.blender.org/2023/02/new-blender-development-infrastructure/
2023-08-01 12:42:31 +02:00
Sybren A. Stüvel
dae5b1a571 Fix #104237: fix issue with drive-only paths on Windows
Fix an issue where a shared storage path on Linux, that maps via two-way
variables to a drive root on Windows, caused problems with the path
translation system.

Windows paths that consist only of a drive letter (`F:`) cannot just be
concatenated to a relative path, as that will result in `F:path\to\file`,
which is still a relative path of sorts. This is now handled correctly,
and should result in `F:\path\to\file`.

This fixes #104237.
2023-07-31 15:28:07 +02:00
Sybren A. Stüvel
7dc3def1d5 Manager: simplify variable expansion
Simplify the variable expansion code. Instead of using a separate goroutine
and two channels, use a struct + a simple function call.

No functional changes.
2023-07-31 15:15:20 +02:00
Sybren A. Stüvel
7d1ce8131a Manager: simplify value-to-variable replacement
Simplify the code for the two-way variables' value-to-variable replacement.

Instead of using a goroutine and two channels, use a separate struct and
call a function on that directly.

No functional changes.
2023-07-31 13:58:43 +02:00
Sybren A. Stüvel
ef68f71d54 Manager: actually include manager name in version API call
The API call `GetVersion` should return the Manager name, but it returned
the hard-coded application name `"Flamenco"` instead.
2023-07-21 17:08:10 +02:00
Sybren A. Stüvel
5eb57427fc Manager: better logging of schedule changes
Log more details of schedule changes, from within the sleep scheduler
(instead of the API implementation).
2023-07-18 15:55:51 +02:00
Eveline Anderson
830c3fe794 Rename worker 'clusters' to 'tags'
As it was decided that the name "tags" would be better for the clarity
of the feature, all files and code named "cluster" or "worker cluster"
have been removed and replaced with "tag" and "worker tag". This is only
a name change, no other features were touched.

This addresses part of #104204.

Reviewed-on: https://projects.blender.org/studio/flamenco/pulls/104223

As a note to anyone who already ran a pre-release version of Flamenco
and configured some worker clusters, with the help of an SQLite client
you can migrate the clusters to tags. First build Flamenco Manager and
start it, to create the new database schema. Then run these SQL queries
via an sqlite commandline client:

```sql
insert into worker_tags
    (id, created_at, updated_at, uuid, name, description)
  select id, created_at, updated_at, uuid, name, description
  from worker_clusters;

insert into worker_tag_membership (worker_tag_id, worker_id)
  select worker_cluster_id, worker_id from worker_cluster_membership;
```
2023-07-10 11:11:03 +02:00
Michael Cook
b20ede97ea Shaman: fail unit test when running as root user
If the mock tests are run by root user then this specific test of
inaccessible path fails because root can write files to anywhere on the
filesystem. It is not clear that Flamenco Manager test
TestCheckSharedStoragePath is checking inaccessible file locations when
it fails and that it should be run by an unprivileged user.

Fix is to fail the permission test if the tests are run as a root user.
2023-07-07 16:05:43 +02:00
Sybren A. Stüvel
6a30f844eb Manager: Better reporting of version via API call
Before: `3.3-alpha0-v3.2-76-gdd34d538-dirty`
After : `3.3-alpha0 (v3.2-76-gdd34d538-dirty)`

Also include the new `git` property to always have the Git hash (the part
between parentheses).
2023-07-06 12:21:47 +02:00
Sybren A. Stüvel
77db55bb14 Manager: when worker signs off, only remember specific statuses
Limit which worker statuses are remembered (when they go offline) to
those that we want to restore when they come back online. This is now
set to `awake` and `asleep`. This prevents workers from being told to go
to states that they cannot handle, such as `error` or `starting`.
2023-06-23 11:38:37 +02:00
Eveline Anderson
4d2200bb0c Fix #99549: Remember Previous Status (#104217)
Fix #99549: When sending Workers offline, remember their previous status

When the status of a worker goes offline, the Manager will now make the status of the worker to be remembered once it goes back online. So when the Worker makes this status change (so for example `X → offline`), Manager should immediately set `StatusRequested = "X" ` once it goes online.

Reviewed-on: https://projects.blender.org/studio/flamenco/pulls/104217
2023-06-02 22:50:07 +02:00
Nitin-Rawat-1
752597b8e1 Check for number of workers before soft failing the task. (#104195)
Manager: fixed issue #104190 job getting stuck with less workers than soft-failed threshold,
before soft-failing check the number of workers to decide if job should be failed or not.

Reviewed-on: https://projects.blender.org/studio/flamenco/pulls/104195
2023-04-20 11:53:41 +02:00
Sybren A. Stüvel
10d7e7e203 Manager: allow creation of worker clusters without UUID 2023-04-04 13:19:11 +02:00
Sybren A. Stüvel
8408d28a6b Manager: add support for worker clusters 2023-04-04 12:18:35 +02:00
Sybren A. Stüvel
28cc7b7a3f Manager: improve logging when workers register
The info message that a worker registered now also includes its UUID.
Any failure hashing the password will now also log the worker name + UUID.
2023-04-04 12:13:21 +02:00
Sybren A. Stüvel
159ce5b34a Manager: avoid starting error messages with 'error'
No real functional changes, just server-side logging.
2023-04-03 16:58:48 +02:00
MKRelax
7963ab5efd Manager: fixed copy/paste typo in CheckBlenderExePath() (#104192)
The `toCheck` variable in `CheckBlenderExePath()` was initialized to `CheckSharedStoragePathJSONBody`, should be `CheckBlenderExePathJSONBody`.

Reviewed-on: https://projects.blender.org/studio/flamenco/pulls/104192
2023-03-06 12:55:53 +01:00
Sybren A. Stüvel
ef3cab9745 Webapp: handle job deletions properly
- Add a little confirmation overlay before deleting a job. This overlay
  also shows information about whether the Shaman checkout directory
  will be deleted or not.
- Send job updates to the web frontend when jobs are marked for
  deletion, and when they are actually deleted.
- Respond to those updates, and handle some corner cases where job info
  is missing (because it just got deleted).

This closes T99401.
2023-02-03 16:59:15 +01:00
Sybren A. Stüvel
c21cc7d316 OAPI: regenerate code 2023-02-03 16:44:55 +01:00
Sybren A. Stüvel
a97a4e6e67 Manager: show delete-requested jobs in the web interface
Show jobs that have been marked for deletion with a red strike-through
line in the jobs table, and show the deletion-request timestamp in the
job details.
2023-01-08 13:52:27 +01:00
Sybren A. Stüvel
416138fd70 Manager: add test for QueryJobs() API function
No functional changes.
2023-01-08 13:15:30 +01:00
Sybren A. Stüvel
791d877ff1 Manager: implement API endpoint for deleting jobs
Implement the `deleteJob` API endpoint. Calling this endpoint will mark
the job as "deletion requested", after which it's queued for actual
deletion. This makes the API response fast, even when there is a lot of
work to do in the background.

A new background service "job deleter" keeps track of the queue of such
jobs, and performs the actual deletion. It removes:

- Shaman checkout for the job (but see below)
- Manager-local files of the job (task logs, last-rendered images)
- The job itself

The removal is done in the above order, so the job is only removed from the
database if the rest of the removal was succesful.

Shaman checkouts are only removed if the job was submitted with Flamenco
version 3.2. Earlier versions did not record enough information to reliably
do this.
2023-01-04 01:18:21 +01:00
Sybren A. Stüvel
f413a40f4e Store Shaman checkout ID when submitting a job
If Shaman is used to submit the job files, store the job's checkout ID
(i.e. the path relative to the checkout root) in the database. This will
make it possible in the future to remove the Shaman checkout along with
the job itself.
2023-01-04 01:18:21 +01:00
Sybren A. Stüvel
9bda21648e Manager: add timeout when fetching job
Add a timeout when fetching a job from the persistence layers.

It's my intention to add more timeouts, so this also introduces some code
to make it easier to test that a context has a deadline set.
2022-12-14 13:02:59 +01:00
Sybren A. Stüvel
c16c1f4b15 Refactor: deduplicate job fetching code
Deduplicate API implementation code to fetch a job from the persistence
service.

Almost no functional changes. Checking that the requested job UUID is
actually a valid UUID is now consistently done on all fetches. This is
not a functional change in normal Flamenco operations, where only valid
UUIDs are used anyway.
2022-12-14 13:02:59 +01:00
Sybren A. Stüvel
7a60bb70e0 Manager: implement job check endpoint 2022-10-20 13:13:35 +02:00
Sybren A. Stüvel
73dd8c7d78 Cleanup: pass submittedJob as pointer to two-way variable replacer
The two-way variable replacement function changes the submitted job. To
clarify that this happens, pass the pointer `&submittedJob`.

Both pass-by-pointer and pass-by-value work, because the variable
replacement typically works on maps/slices, which are passed by reference
anyway. Better to be explicit about this, though.

No functional changes.
2022-10-20 12:55:01 +02:00
Sybren A. Stüvel
e77bd9b841 Fix workers immediately switching state on a lazy request
Fix an issue where workers would switch immediately on a state change
request, even if it was of the "after task is finished" kind.

The "may I keep running" endpoint wasn't checking the lazyness flag, and
thus any state change, lazy or otherwise, would interrupt the worker's
current task.
2022-10-20 12:30:37 +02:00
Sybren A. Stüvel
85d53de1f9 Manager: implement API endpoint for changing job priority
The priority of an existing can now be changed. It will be taken into
account when assigning tasks to workers, but it will not reassign tasks
that are already active.
2022-09-30 16:30:03 +02:00
Sybren A. Stüvel
759a94e49b Blender finder: also handle exec.ErrNotFound as "expected"
Blender not being found can be reported via various errors (this should be
reworked in the 'blender finder API' at some point). `exec.ErrNotFound` is
returned when Blender cannot be found on `$PATH`, which is something that's
absolutely fine. This is now logged less dramatically.
2022-09-22 12:39:40 +02:00
Sybren A. Stüvel
161a7f7cb3 Less dramatic logging when Blender cannot be found
Avoid the word "error" in logging when Blender cannot be found. Typically
these are warnings, and having the word "error" there makes people think
otherwise.
2022-09-22 12:37:46 +02:00