71 Commits

Author SHA1 Message Date
Sybren A. Stüvel
84f93e7502 Transition from ex-GORM structs to sqlc structs (2/5)
Replace old used-to-be-GORM datastructures (#104305) with sqlc-generated
structs. This also makes it possible to use more specific structs that
are more taylored to the specific queries, increasing efficiency.

This commit mostly deals with workers, including the sleep schedule and
task scheduler.

Functional changes are kept to a minimum, as the API still serves the
same data.

Because this work covers so much of Flamenco's code, it's been split up
into different commits. Each commit brings Flamenco to a state where it
compiles and unit tests pass. Only the result of the final commit has
actually been tested properly.

Ref: #104343
2024-12-04 14:00:13 +01:00
Sybren A. Stüvel
94d71bc3c9 Transition from ex-GORM structs to sqlc structs (1/5)
Replace old used-to-be-GORM datastructures (#104305) with sqlc-generated
structs. This also makes it possible to use more specific structs that
are more taylored to the specific queries, increasing efficiency.

This commit covers job blocklists and last-rendered images.

Functional changes are kept to a minimum, as the API still serves the
same data.

Because this work covers so much of Flamenco's code, it's been split up
into different commits. Each commit brings Flamenco to a state where it
compiles and unit tests pass. Only the result of the final commit has
actually been tested properly.

Ref: #104343
2024-12-04 14:00:07 +01:00
Sybren A. Stüvel
cda0b916fb Manager: replace queryJobs with fetchJobs operation
See the previous two commits for the motivation.
2024-09-18 14:29:15 +02:00
Sybren A. Stüvel
c1cdff567e Manager: Convert FetchTask to sqlc
This is a bit more work than other queries, as it also breaks apart the
fetching of the job and the worker into separate ones. In other words,
internally the persistence layer API changes.
2024-05-28 14:46:42 +02:00
Sybren A. Stüvel
dc893dcad4 Manager: regenerate Go mock after removal of SaveTask
Regenerate the Go mock implementation after the removal of the SaveTask
function from the mocked interface.

See 097d5abb7c13e6eff1facea12f89f24c144194c0
2024-05-28 14:45:34 +02:00
Sybren A. Stüvel
61cc8ff04d Manager: implement API operation to get the farm status
Add a new API operation to get the overall farm status. This is based on
the jobs and workers, and their status.

The statuses are:

- `active`: Actively working on jobs.
- `idle`: Farm could be active, but has no work to do.
- `waiting`: Work has been queued, but all workers are asleep.
- `asleep`: Farm is idle, and all workers are asleep.
- `inoperative`: Cannot work: no workers, or all are offline/error.
- `starting`: Farm is starting up.
- `unknown`: Unexpected configuration of worker and job statuses.
2024-02-29 20:42:28 +01:00
Sybren A. Stüvel
e7c4285ac6 Manager: Adjust code for renaming SocketIO... types to Event...
No functional changes, just adjusting to the OpenAPI renames.
2024-02-05 09:25:43 +01:00
Sybren A. Stüvel
246916475f Manager: Implement mass mark-for-deletion of jobs
Implement the API function to mass-mark jobs for deletion, based on
their 'updated_at' timestamp.

Note that the `last_updated_max` parameter is rounded up to entire
seconds. This may mark more jobs for deletion than you expect, if their
`updated_at` timestamps differ by less than a second.
2023-12-16 23:05:52 +01:00
Sybren A. Stüvel
ef726da17b SocketIO broadcasting for worker tags CUD operations
Broadcast create/update/delete operations on worker tags via SocketIO.

Ref: #104204
2023-08-23 13:54:02 +00:00
Sybren A. Stüvel
02fac6a4df Change Go package name from git.blender.org to projects.blender.org
Change the package base name of the Go code, from
`git.blender.org/flamenco` to `projects.blender.org/studio/flamenco`.

The old location, `git.blender.org`, has no longer been use since the
[migration to Gitea][1]. The new package names now reflect the actual
location where Flamenco is hosted.

[1]: https://code.blender.org/2023/02/new-blender-development-infrastructure/
2023-08-01 12:42:31 +02:00
Sybren A. Stüvel
7dc3def1d5 Manager: simplify variable expansion
Simplify the variable expansion code. Instead of using a separate goroutine
and two channels, use a struct + a simple function call.

No functional changes.
2023-07-31 15:15:20 +02:00
Sybren A. Stüvel
7d1ce8131a Manager: simplify value-to-variable replacement
Simplify the code for the two-way variables' value-to-variable replacement.

Instead of using a goroutine and two channels, use a separate struct and
call a function on that directly.

No functional changes.
2023-07-31 13:58:43 +02:00
Eveline Anderson
830c3fe794 Rename worker 'clusters' to 'tags'
As it was decided that the name "tags" would be better for the clarity
of the feature, all files and code named "cluster" or "worker cluster"
have been removed and replaced with "tag" and "worker tag". This is only
a name change, no other features were touched.

This addresses part of #104204.

Reviewed-on: https://projects.blender.org/studio/flamenco/pulls/104223

As a note to anyone who already ran a pre-release version of Flamenco
and configured some worker clusters, with the help of an SQLite client
you can migrate the clusters to tags. First build Flamenco Manager and
start it, to create the new database schema. Then run these SQL queries
via an sqlite commandline client:

```sql
insert into worker_tags
    (id, created_at, updated_at, uuid, name, description)
  select id, created_at, updated_at, uuid, name, description
  from worker_clusters;

insert into worker_tag_membership (worker_tag_id, worker_id)
  select worker_cluster_id, worker_id from worker_cluster_membership;
```
2023-07-10 11:11:03 +02:00
Sybren A. Stüvel
8408d28a6b Manager: add support for worker clusters 2023-04-04 12:18:35 +02:00
Sybren A. Stüvel
c21cc7d316 OAPI: regenerate code 2023-02-03 16:44:55 +01:00
Sybren A. Stüvel
791d877ff1 Manager: implement API endpoint for deleting jobs
Implement the `deleteJob` API endpoint. Calling this endpoint will mark
the job as "deletion requested", after which it's queued for actual
deletion. This makes the API response fast, even when there is a lot of
work to do in the background.

A new background service "job deleter" keeps track of the queue of such
jobs, and performs the actual deletion. It removes:

- Shaman checkout for the job (but see below)
- Manager-local files of the job (task logs, last-rendered images)
- The job itself

The removal is done in the above order, so the job is only removed from the
database if the rest of the removal was succesful.

Shaman checkouts are only removed if the job was submitted with Flamenco
version 3.2. Earlier versions did not record enough information to reliably
do this.
2023-01-04 01:18:21 +01:00
Sybren A. Stüvel
85d53de1f9 Manager: implement API endpoint for changing job priority
The priority of an existing can now be changed. It will be taken into
account when assigning tasks to workers, but it will not reassign tasks
that are already active.
2022-09-30 16:30:03 +02:00
Sybren A. Stüvel
2a345a3d2c API for deleting workers
Workers can now be soft-deleted. Tasks assigned to the worker will remain
associated with that Worker. Active tasks will be re-queued so other
workers can pick them up.
2022-08-11 16:59:53 -07:00
Sybren A. Stüvel
736ca103c3 Manager: show current/last task in worker details
The Task details component already linked to the Worker it was assigned
to last, and now the Worker links back to the task.

There's only one task shown in the Worker details. If the Worker is
actively working on a task, that one's shown. Otherwise it's the
last-updated task that was assigned to the worker.
2022-07-26 10:36:02 +02:00
Sybren A. Stüvel
11a352968a Fix T99434: Two-way Variables
Two-way variable implementation in the job submission end-point. Where
Flamenco v2 did the variable replacement in the add-on, this has now
been moved to the Manager itself. The only thing the add-on needs to
pass is its platform, so that the right values can be recognised.

This also implements two-way replacement when tasks are handed out, such
that the `{jobs}` value gets replaced to a value suitable for the
Worker's platform as well.
2022-07-22 11:58:35 +02:00
Sybren A. Stüvel
bfd6746f78 Manager: consult the sleep schedule on worker sign-on
If there is no status change queued for the Worker, the sleep schedule
should determine its initial status.
2022-07-18 18:25:24 +02:00
Sybren A. Stüvel
d7b164133a Sleep Scheduler implementation for the Manager
The Manager now has a sleep scheduler for Workers. The API and background
service work, but there is no web interface yet.

Manifest Task: T99397
2022-07-17 17:27:32 +02:00
Sybren A. Stüvel
627996525e Manager: implement operations for getting & setting worker sleep schedule
This is just the API, no web interface yet.

Manifest Task: T99397
2022-07-16 16:00:25 +02:00
Sybren A. Stüvel
726129446d T99730: Allow access to full task log
The web interface has a button that opens the task log in a new window.
This might need some restyling ;-)
2022-07-16 12:55:41 +02:00
Sybren A. Stüvel
686295090b Manager: implement endpoint for getting the full task log
Previously only the log tail was available, which is fine for many cases,
but for serious debugging the entire log is needed.

Manifest task: T99730
2022-07-16 11:13:31 +02:00
Sybren A. Stüvel
10f56148d4 Allow saving configuration from the first-time wizard
This just updates the config and saves it to `flamenco-manager.yaml`.

Saving the configuration doesn't restart the Manager yet, that's for
another commit.
2022-07-14 17:27:17 +02:00
Sybren A. Stüvel
aec5ee49e0 First-Time Wizard: allow selecting Blender executables
The wizard now finds Blender in various ways, and lets the user select
which one to use.

Doesn't save anything yet, though.
2022-07-14 12:22:56 +02:00
Sybren A. Stüvel
aa9837b5f0 First incarnation of the first-time wizard
This adds a `-wizard` CLI option to the Manager, which opens a webbrowser
and shows the First-Time Wizard to aid in configuration of Flamenco.

This is work in progress. The wizard is just one page, and doesn't save
anything yet to the configuration.
2022-07-14 11:17:03 +02:00
Sybren A. Stüvel
6b5f9317cb Manager: clear job's blocklist when requeueing the job
Requeueing a job means that the issues that caused workers to get blocked
might be resolved, so it should be run with a clean slate.
2022-07-14 11:03:11 +02:00
Sybren A. Stüvel
0ff8ed7585 Manager: implement the getVariables OpenAPI operation 2022-07-08 11:36:00 +02:00
Sybren A. Stüvel
d25151184d Add a "Last Rendered" view
Add a "Last Rendered" view to the webapp.

The Manager now stores (in the database) which job was the last
recipient of a rendered image, and serves that to the appropriate
OpenAPI endpoint.

A new SocketIO subscription + accompanying room makes it possible for
the web interface to receive all rendered images (if they survive the
queue, which discards images when it gets too full).
2022-07-01 12:34:40 +02:00
Sybren A. Stüvel
2457a63518 Manager: Show "nothing rendered yet" image in job details
Show a "nothing rendered yet" image in the job details when there is no
last-rendered image yet.
2022-06-30 19:20:19 +02:00
Sybren A. Stüvel
0fc5ba0bc6 Manager: broadcast last-rendered image info via SocketIO
After processing an image in the "last-rendered" processor, a SocketIO
object is sent to clients to indicate the last-rendered image needs to
be (re)loaded.

This also moves the previously existing "done callback" from a single
function to a per-image callback, so that it can be called with the
right information in there, and only when that particular image is
actually done processing.

The notification message sent via SocketIO also contains the necessary
info to render the image, so that the web client doesn't have to call
the `fetchJobLastRenderedInfo` operation.
2022-06-30 18:36:24 +02:00
Sybren A. Stüvel
6efd67b05c Manager: implement FetchJobLastRenderedInfo() API operation
Allow querying for the URL & available versions of a job's last-rendered
image.
2022-06-28 17:08:00 +02:00
Sybren A. Stüvel
64512c81ba Manager: implement OAPI operations to fetch blocklist & delete items 2022-06-27 11:32:35 +02:00
Sybren A. Stüvel
e687c95e5d Manager: add "last rendered image" processing pipeline
Add a handler for the OpenAPI `taskOutputProduced` operation, and an
image thumbnailing goroutine.

The queue of images to process + the function to handle queued images
is managed by `last_rendered.LastRenderedProcessor`. This queue currently
simply allows 3 requests; this should be improved such that it keeps
track of the job IDs as well, as with the current approach a spammy job
can starve the updates from a more calm job.
2022-06-24 16:51:11 +02:00
Sybren A. Stüvel
046853932d Manager: re-queue previously failed tasks of worker when blocklisting
When a Worker is blocked from a job, re-queue its previously failed tasks
so that other workers can give them a try.
2022-06-17 15:49:16 +02:00
Sybren A. Stüvel
b95bed1f96 Refactor: rename RequeueTasksOfWorker to RequeueActiveTasksOfWorker
Soon there will be another function to requeue tasks of workers by other
criteria, so being clear in the name helps.

No functional changes.
2022-06-17 15:49:16 +02:00
Sybren A. Stüvel
fd31a85bcd Manager: add blocking of workers when they fail certain tasks too much
When a worker fails too many tasks, of the same task type, on the same job,
it'll get blocked from doing those.
2022-06-17 15:49:16 +02:00
Sybren A. Stüvel
81f81d0e0a Show task failure list in the web frontend
Show the task failure list in the web frontend's `TaskDetails` component.
2022-06-17 11:37:56 +02:00
Sybren A. Stüvel
0b5140fc5f Manager: clear task failure list on requeueing of jobs & tasks
When a job or task gets requeued from the web interface, its task
failure lists (i.e. the list of workers that previously failed this
task) will be cleared.

This clearing doesn't happen in other situations, e.g. when a worker
signs off and its task gets requeued, the task's failure list will
remain as-is.
2022-06-17 11:37:28 +02:00
Sybren A. Stüvel
6e12a2fb25 Manager: keep track of which worker failed which task
When a Worker indicates a task failed, mark it as `soft-failed` until
enough workers have tried & failed at the same task.

This is the first step in a blocklisting system, where tasks of an
often-failing worker will be requeued to be retried by others.

NOTE: currently the failure list of a task is NOT reset whenever it is
requeued! This will be implemented in a future commit, and is tracked in
`FEATURES.md`.
2022-06-13 18:41:38 +02:00
Sybren A. Stüvel
5dac3c2dc0 Manager: mark workers as 'seen' when they send updates
Update the 'last seen at' timestamp of workers when they:
- sign on
- sign off
- get a task assigned
- send a task update
- check whether they can keep running their task

Note that this commit is necessary to not have the workers time out
immediately ;-)
2022-06-13 12:47:07 +02:00
Sybren A. Stüvel
c3525c3b1a Manager: move task requeueing to TaskStateMachine
Requeueing the tasks of a specific worker is now done in the
`TaskStateMachine`, such that it can be called from other services as
well in future commits.

This also makes the `LogStorage` service a dependency of the
`TaskStateMachine`, as it needs to write "this task was requeued" kind
of messages to the task logs.
2022-06-13 12:33:01 +02:00
Sybren A. Stüvel
24204084c1 Manager: move timestamping of log messages to task_logs package
In the future different services will write to the task log, and thus
it makes sense to move the responsibility of prepending the timestamps
to the log storage service.
2022-06-09 17:00:38 +02:00
Sybren A. Stüvel
819cad1d18 Manager: move broadcasting of task logs via SocketIO to task log service
To ensure all task logs also get broadcast via SocketIO, the responsibility
has moved from the `api_impl` to the `task_logs` package.
2022-06-09 16:49:48 +02:00
Sybren A. Stüvel
b4d2fc4231 Manager: keep track of when a Worker last worked on a task
This will be used for keeping track of stuck tasks.
2022-06-03 16:33:50 +02:00
Sybren A. Stüvel
2e11c1c240 Manager: Implement SocketIO worker updates 2022-05-31 15:19:12 +02:00
Sybren A. Stüvel
08676f48f4 Manager: implement fetchWorkers OpenAPI operation 2022-05-30 18:52:02 +02:00
Sybren A. Stüvel
9bb4dd49dd Manager: add endpoint to fetch task log tail
It returns 2048 bytes at most. It'll likely be less than that, as it will
ignore the first bytes until the very first newline (to avoid returning
cut-off lines). If the log file itself is 2048 bytes or smaller, return the
entire file.
2022-05-20 16:34:13 +02:00