40 Commits

Author SHA1 Message Date
Sybren A. Stüvel
76a24243f0 Manager: Introduce event bus system
Introduce an "event bus"-like system. It's more like a fan-out
broadcaster for certain events. Instead of directly sending events to
SocketIO, they are now sent to the broker, which in turn sends it to any
registered "forwarder". Currently there is ony one forwarder, for
SocketIO.

This opens the door for a proper MQTT client that sends the same events
to an MQTT server.
2024-02-03 22:55:23 +01:00
Sybren A. Stüvel
246916475f Manager: Implement mass mark-for-deletion of jobs
Implement the API function to mass-mark jobs for deletion, based on
their 'updated_at' timestamp.

Note that the `last_updated_max` parameter is rounded up to entire
seconds. This may mark more jobs for deletion than you expect, if their
`updated_at` timestamps differ by less than a second.
2023-12-16 23:05:52 +01:00
Sybren A. Stüvel
ef726da17b SocketIO broadcasting for worker tags CUD operations
Broadcast create/update/delete operations on worker tags via SocketIO.

Ref: #104204
2023-08-23 13:54:02 +00:00
Sybren A. Stüvel
02fac6a4df Change Go package name from git.blender.org to projects.blender.org
Change the package base name of the Go code, from
`git.blender.org/flamenco` to `projects.blender.org/studio/flamenco`.

The old location, `git.blender.org`, has no longer been use since the
[migration to Gitea][1]. The new package names now reflect the actual
location where Flamenco is hosted.

[1]: https://code.blender.org/2023/02/new-blender-development-infrastructure/
2023-08-01 12:42:31 +02:00
Eveline Anderson
830c3fe794 Rename worker 'clusters' to 'tags'
As it was decided that the name "tags" would be better for the clarity
of the feature, all files and code named "cluster" or "worker cluster"
have been removed and replaced with "tag" and "worker tag". This is only
a name change, no other features were touched.

This addresses part of #104204.

Reviewed-on: https://projects.blender.org/studio/flamenco/pulls/104223

As a note to anyone who already ran a pre-release version of Flamenco
and configured some worker clusters, with the help of an SQLite client
you can migrate the clusters to tags. First build Flamenco Manager and
start it, to create the new database schema. Then run these SQL queries
via an sqlite commandline client:

```sql
insert into worker_tags
    (id, created_at, updated_at, uuid, name, description)
  select id, created_at, updated_at, uuid, name, description
  from worker_clusters;

insert into worker_tag_membership (worker_tag_id, worker_id)
  select worker_cluster_id, worker_id from worker_cluster_membership;
```
2023-07-10 11:11:03 +02:00
Sybren A. Stüvel
8408d28a6b Manager: add support for worker clusters 2023-04-04 12:18:35 +02:00
Sybren A. Stüvel
ef3cab9745 Webapp: handle job deletions properly
- Add a little confirmation overlay before deleting a job. This overlay
  also shows information about whether the Shaman checkout directory
  will be deleted or not.
- Send job updates to the web frontend when jobs are marked for
  deletion, and when they are actually deleted.
- Respond to those updates, and handle some corner cases where job info
  is missing (because it just got deleted).

This closes T99401.
2023-02-03 16:59:15 +01:00
Sybren A. Stüvel
791d877ff1 Manager: implement API endpoint for deleting jobs
Implement the `deleteJob` API endpoint. Calling this endpoint will mark
the job as "deletion requested", after which it's queued for actual
deletion. This makes the API response fast, even when there is a lot of
work to do in the background.

A new background service "job deleter" keeps track of the queue of such
jobs, and performs the actual deletion. It removes:

- Shaman checkout for the job (but see below)
- Manager-local files of the job (task logs, last-rendered images)
- The job itself

The removal is done in the above order, so the job is only removed from the
database if the rest of the removal was succesful.

Shaman checkouts are only removed if the job was submitted with Flamenco
version 3.2. Earlier versions did not record enough information to reliably
do this.
2023-01-04 01:18:21 +01:00
Sybren A. Stüvel
85d53de1f9 Manager: implement API endpoint for changing job priority
The priority of an existing can now be changed. It will be taken into
account when assigning tasks to workers, but it will not reassign tasks
that are already active.
2022-09-30 16:30:03 +02:00
Sybren A. Stüvel
2a345a3d2c API for deleting workers
Workers can now be soft-deleted. Tasks assigned to the worker will remain
associated with that Worker. Active tasks will be re-queued so other
workers can pick them up.
2022-08-11 16:59:53 -07:00
Sybren A. Stüvel
736ca103c3 Manager: show current/last task in worker details
The Task details component already linked to the Worker it was assigned
to last, and now the Worker links back to the task.

There's only one task shown in the Worker details. If the Worker is
actively working on a task, that one's shown. Otherwise it's the
last-updated task that was assigned to the worker.
2022-07-26 10:36:02 +02:00
Francesco Siddi
9948fdab71 Rename First Time Wizard to Setup Assistant
This commit does not introduce functional changes, besides renaming
every mention of 'wizard' with 'setup assistant'. In order to run the
manager setup assistant use:

./flamenco-manager -setup-assistant

The change was introduced to favor more neutral and descriptive working
for this functionality. Thanks to Sybren for helping to get this done!
2022-07-25 17:17:04 +02:00
Sybren A. Stüvel
bfd6746f78 Manager: consult the sleep schedule on worker sign-on
If there is no status change queued for the Worker, the sleep schedule
should determine its initial status.
2022-07-18 18:25:24 +02:00
Sybren A. Stüvel
d7b164133a Sleep Scheduler implementation for the Manager
The Manager now has a sleep scheduler for Workers. The API and background
service work, but there is no web interface yet.

Manifest Task: T99397
2022-07-17 17:27:32 +02:00
Sybren A. Stüvel
627996525e Manager: implement operations for getting & setting worker sleep schedule
This is just the API, no web interface yet.

Manifest Task: T99397
2022-07-16 16:00:25 +02:00
Sybren A. Stüvel
726129446d T99730: Allow access to full task log
The web interface has a button that opens the task log in a new window.
This might need some restyling ;-)
2022-07-16 12:55:41 +02:00
Sybren A. Stüvel
686295090b Manager: implement endpoint for getting the full task log
Previously only the log tail was available, which is fine for many cases,
but for serious debugging the entire log is needed.

Manifest task: T99730
2022-07-16 11:13:31 +02:00
Sybren A. Stüvel
10f56148d4 Allow saving configuration from the first-time wizard
This just updates the config and saves it to `flamenco-manager.yaml`.

Saving the configuration doesn't restart the Manager yet, that's for
another commit.
2022-07-14 17:27:17 +02:00
Sybren A. Stüvel
aec5ee49e0 First-Time Wizard: allow selecting Blender executables
The wizard now finds Blender in various ways, and lets the user select
which one to use.

Doesn't save anything yet, though.
2022-07-14 12:22:56 +02:00
Sybren A. Stüvel
aa9837b5f0 First incarnation of the first-time wizard
This adds a `-wizard` CLI option to the Manager, which opens a webbrowser
and shows the First-Time Wizard to aid in configuration of Flamenco.

This is work in progress. The wizard is just one page, and doesn't save
anything yet to the configuration.
2022-07-14 11:17:03 +02:00
Sybren A. Stüvel
6b5f9317cb Manager: clear job's blocklist when requeueing the job
Requeueing a job means that the issues that caused workers to get blocked
might be resolved, so it should be run with a clean slate.
2022-07-14 11:03:11 +02:00
Sybren A. Stüvel
d25151184d Add a "Last Rendered" view
Add a "Last Rendered" view to the webapp.

The Manager now stores (in the database) which job was the last
recipient of a rendered image, and serves that to the appropriate
OpenAPI endpoint.

A new SocketIO subscription + accompanying room makes it possible for
the web interface to receive all rendered images (if they survive the
queue, which discards images when it gets too full).
2022-07-01 12:34:40 +02:00
Sybren A. Stüvel
2457a63518 Manager: Show "nothing rendered yet" image in job details
Show a "nothing rendered yet" image in the job details when there is no
last-rendered image yet.
2022-06-30 19:20:19 +02:00
Sybren A. Stüvel
0fc5ba0bc6 Manager: broadcast last-rendered image info via SocketIO
After processing an image in the "last-rendered" processor, a SocketIO
object is sent to clients to indicate the last-rendered image needs to
be (re)loaded.

This also moves the previously existing "done callback" from a single
function to a per-image callback, so that it can be called with the
right information in there, and only when that particular image is
actually done processing.

The notification message sent via SocketIO also contains the necessary
info to render the image, so that the web client doesn't have to call
the `fetchJobLastRenderedInfo` operation.
2022-06-30 18:36:24 +02:00
Sybren A. Stüvel
6efd67b05c Manager: implement FetchJobLastRenderedInfo() API operation
Allow querying for the URL & available versions of a job's last-rendered
image.
2022-06-28 17:08:00 +02:00
Sybren A. Stüvel
64512c81ba Manager: implement OAPI operations to fetch blocklist & delete items 2022-06-27 11:32:35 +02:00
Sybren A. Stüvel
e687c95e5d Manager: add "last rendered image" processing pipeline
Add a handler for the OpenAPI `taskOutputProduced` operation, and an
image thumbnailing goroutine.

The queue of images to process + the function to handle queued images
is managed by `last_rendered.LastRenderedProcessor`. This queue currently
simply allows 3 requests; this should be improved such that it keeps
track of the job IDs as well, as with the current approach a spammy job
can starve the updates from a more calm job.
2022-06-24 16:51:11 +02:00
Sybren A. Stüvel
046853932d Manager: re-queue previously failed tasks of worker when blocklisting
When a Worker is blocked from a job, re-queue its previously failed tasks
so that other workers can give them a try.
2022-06-17 15:49:16 +02:00
Sybren A. Stüvel
b95bed1f96 Refactor: rename RequeueTasksOfWorker to RequeueActiveTasksOfWorker
Soon there will be another function to requeue tasks of workers by other
criteria, so being clear in the name helps.

No functional changes.
2022-06-17 15:49:16 +02:00
Sybren A. Stüvel
fd31a85bcd Manager: add blocking of workers when they fail certain tasks too much
When a worker fails too many tasks, of the same task type, on the same job,
it'll get blocked from doing those.
2022-06-17 15:49:16 +02:00
Sybren A. Stüvel
81f81d0e0a Show task failure list in the web frontend
Show the task failure list in the web frontend's `TaskDetails` component.
2022-06-17 11:37:56 +02:00
Sybren A. Stüvel
0b5140fc5f Manager: clear task failure list on requeueing of jobs & tasks
When a job or task gets requeued from the web interface, its task
failure lists (i.e. the list of workers that previously failed this
task) will be cleared.

This clearing doesn't happen in other situations, e.g. when a worker
signs off and its task gets requeued, the task's failure list will
remain as-is.
2022-06-17 11:37:28 +02:00
Sybren A. Stüvel
6e12a2fb25 Manager: keep track of which worker failed which task
When a Worker indicates a task failed, mark it as `soft-failed` until
enough workers have tried & failed at the same task.

This is the first step in a blocklisting system, where tasks of an
often-failing worker will be requeued to be retried by others.

NOTE: currently the failure list of a task is NOT reset whenever it is
requeued! This will be implemented in a future commit, and is tracked in
`FEATURES.md`.
2022-06-13 18:41:38 +02:00
Sybren A. Stüvel
5dac3c2dc0 Manager: mark workers as 'seen' when they send updates
Update the 'last seen at' timestamp of workers when they:
- sign on
- sign off
- get a task assigned
- send a task update
- check whether they can keep running their task

Note that this commit is necessary to not have the workers time out
immediately ;-)
2022-06-13 12:47:07 +02:00
Sybren A. Stüvel
c3525c3b1a Manager: move task requeueing to TaskStateMachine
Requeueing the tasks of a specific worker is now done in the
`TaskStateMachine`, such that it can be called from other services as
well in future commits.

This also makes the `LogStorage` service a dependency of the
`TaskStateMachine`, as it needs to write "this task was requeued" kind
of messages to the task logs.
2022-06-13 12:33:01 +02:00
Sybren A. Stüvel
24204084c1 Manager: move timestamping of log messages to task_logs package
In the future different services will write to the task log, and thus
it makes sense to move the responsibility of prepending the timestamps
to the log storage service.
2022-06-09 17:00:38 +02:00
Sybren A. Stüvel
819cad1d18 Manager: move broadcasting of task logs via SocketIO to task log service
To ensure all task logs also get broadcast via SocketIO, the responsibility
has moved from the `api_impl` to the `task_logs` package.
2022-06-09 16:49:48 +02:00
Sybren A. Stüvel
75903a2da3 Manager: prepend timestamp to "task assigned to worker" task log entries
Add a new `clock` service to the Flamenco struct, which allows us to mock
the passing of time, and thus test for timestamps in a stable fashion.
2022-06-09 11:24:02 +02:00
Sybren A. Stüvel
b4d2fc4231 Manager: keep track of when a Worker last worked on a task
This will be used for keeping track of stuck tasks.
2022-06-03 16:33:50 +02:00
Sybren A. Stüvel
0be1ca30dd Cleanup: manager, move api_impl interfaces to interfaces.go
The number of interfaces declared by the `api_impl` package is getting
large, so they deserve their own file.

No functional changes.
2022-06-03 15:52:07 +02:00