52 Commits

Author SHA1 Message Date
Sybren A. Stüvel
d10bb0a9d7 Manager: at startup check for stuck pause-requested jobs
When the Manager starts up, it now also checks whether `pause-requested`
jobs can actually go to `paused`.
2025-02-28 12:32:36 +01:00
Sybren A. Stüvel
ff29d2ef26 Manager: fix jobs getting stuck in pause-requested status
When a Worker would sign off while working on a task, and that task's
job would be in `pause-requested` state, it would always re-queue the
task. This in turn would not get detected, which caused the job to get
stuck.

Now tasks correctly go to `paused` when a Worker signs off, and
subsequently the job will be re-checked and go to `paused` when possible
as well.

Note that this does not handle already-stuck jobs. That'll be for the
following commit.
2025-02-28 12:26:01 +01:00
Sybren A. Stüvel
b845189dfc Manager: protect task state machine with a mutex
Protect the public functions of the task state machine with a mutex, so
that only one task/job state change is handled at a time.

This should avoid race conditions.
2025-01-09 15:28:06 +01:00
Sybren A. Stüvel
531a0184f7 Transition from ex-GORM structs to sqlc structs (5/5)
Replace old used-to-be-GORM datastructures (#104305) with sqlc-generated
structs. This also makes it possible to use more specific structs that
are more taylored to the specific queries, increasing efficiency.

This commit deals with the remaining areas, like the job deleter, task
timeout checker, and task state machine. And anything else to get things
running again.

Functional changes are kept to a minimum, as the API still serves the
same data.

Because this work covers so much of Flamenco's code, it's been split up
into different commits. Each commit brings Flamenco to a state where it
compiles and unit tests pass. Only the result of the final commit has
actually been tested properly.

Ref: #104343
2024-12-04 14:00:22 +01:00
Sybren A. Stüvel
84f93e7502 Transition from ex-GORM structs to sqlc structs (2/5)
Replace old used-to-be-GORM datastructures (#104305) with sqlc-generated
structs. This also makes it possible to use more specific structs that
are more taylored to the specific queries, increasing efficiency.

This commit mostly deals with workers, including the sleep schedule and
task scheduler.

Functional changes are kept to a minimum, as the API still serves the
same data.

Because this work covers so much of Flamenco's code, it's been split up
into different commits. Each commit brings Flamenco to a state where it
compiles and unit tests pass. Only the result of the final commit has
actually been tested properly.

Ref: #104343
2024-12-04 14:00:13 +01:00
Sybren A. Stüvel
a9bec98fcd Fix linter warnings
Fix most linter warnings reported by 'staticcheck'. This doesn't fix all
of them, some unused functions are still there, and some generated code
also still triggers some warnings. Most issues are fixed, though.

No functional changes, except for the captialisation of some error
messages.
2024-12-01 14:49:25 +01:00
David Zhang
aac55e7e3c Manager: Support pausing jobs
A job first goes to `pause-requested` status, during which any `active` task
gets a chance to be completed. Once there are no more active tasks, the job
goes to `paused` state (or `failed`, if that is applicable).

Pull request: https://projects.blender.org/studio/flamenco/pulls/104313
2024-07-01 10:59:37 -04:00
Sybren A. Stüvel
1ec65e8c29 Cleanup: fix comment
Fix a comment in a unit test. No functional changes.
2024-06-08 20:55:54 +02:00
Sybren A. Stüvel
b66490831c Manager: fetch jobs of tasks in FetchTasksOfWorkerInStatus()
The task state machine expects that `task.Job` is set correctly. Since
SQLC does not automatically fill this field (and rightfully so), I've added
a bit of Go code that fetches the job in a separate query.

A TODO is added as a reminder that it would be better for the task state
machine itself to fetch the job when needed.
2024-05-28 16:07:17 +02:00
Sybren A. Stüvel
3f4a9025fe Manager tests: replace assert.NoError() with require.NoError()
Back in the days when I wrote the code, I didn't know about the
`require` package yet. Using `require.NoError()` makes the test code
more straight-forward.

No functional changes, except that when tests fail, they now fail
without panicking.
2024-03-16 11:09:18 +01:00
Sybren A. Stüvel
e7c4285ac6 Manager: Adjust code for renaming SocketIO... types to Event...
No functional changes, just adjusting to the OpenAPI renames.
2024-02-05 09:25:43 +01:00
Sybren A. Stüvel
76a24243f0 Manager: Introduce event bus system
Introduce an "event bus"-like system. It's more like a fan-out
broadcaster for certain events. Instead of directly sending events to
SocketIO, they are now sent to the broker, which in turn sends it to any
registered "forwarder". Currently there is ony one forwarder, for
SocketIO.

This opens the door for a proper MQTT client that sends the same events
to an MQTT server.
2024-02-03 22:55:23 +01:00
Sybren A. Stüvel
02fac6a4df Change Go package name from git.blender.org to projects.blender.org
Change the package base name of the Go code, from
`git.blender.org/flamenco` to `projects.blender.org/studio/flamenco`.

The old location, `git.blender.org`, has no longer been use since the
[migration to Gitea][1]. The new package names now reflect the actual
location where Flamenco is hosted.

[1]: https://code.blender.org/2023/02/new-blender-development-infrastructure/
2023-08-01 12:42:31 +02:00
Sybren A. Stüvel
c42665322b Cleanup: add a comment
Just a comment that explains why an error is ignored.

No functional changes.
2022-07-28 14:28:02 +02:00
Sybren A. Stüvel
ab8ecc24cc Cleanup: Add missing license specifiers
Add license specifiers to Go files that were missing them:

```
// SPDX-License-Identifier: GPL-3.0-or-later
```

No functional changes.
2022-07-25 16:08:07 +02:00
Sybren A. Stüvel
d929885b06 Manager: only log task status change if there is an actual change
Don't log "changes" from, say, `active` -> `active`.
2022-07-19 17:47:43 +02:00
Sybren A. Stüvel
ac3236786b Manager: add entry to task log whenever task changes status
Add a line to the task log whenever task changes status. This only applies
to directly-changed tasks, and not to mass-updates (like all tasks going
from 'completed' to 'queued' on a job requeue).
2022-07-19 17:23:13 +02:00
Sybren A. Stüvel
1fceae3604 Manager: more efficient database queries
Be more selective in what's saved to the database to speed some things up.
Most importantly, this avoids saving the entire job when a task status is
updated or a task is assigned.
2022-07-15 15:08:00 +02:00
Sybren A. Stüvel
046853932d Manager: re-queue previously failed tasks of worker when blocklisting
When a Worker is blocked from a job, re-queue its previously failed tasks
so that other workers can give them a try.
2022-06-17 15:49:16 +02:00
Sybren A. Stüvel
b95bed1f96 Refactor: rename RequeueTasksOfWorker to RequeueActiveTasksOfWorker
Soon there will be another function to requeue tasks of workers by other
criteria, so being clear in the name helps.

No functional changes.
2022-06-17 15:49:16 +02:00
Sybren A. Stüvel
b991e5f446 Cleanup: Manager, clarify some function names of the task state machine
Rename functions `onTaskStatusX` to `updateJobOnTaskStatusX` to clarify
their responsibility is to update the job in reaction to a task status
change.

No functional changes.
2022-06-17 11:01:41 +02:00
Sybren A. Stüvel
7d7c2b1bd6 Cleanup: blacklist → blocklist
Change "blacklist" to "blocklist", because that makes people happier.

No functional changes.
2022-06-16 10:36:36 +02:00
Sybren A. Stüvel
02bc03ae2b Manager: replace gorm.Model with our own persistence.Model struct
`persistence.Model` contains the common database fields for most model
structs. It is a copy of `gorm.Model`, but without the `DeletedAt`
field (which triggers Gorm's soft deletion).

Soft deletion is not used by Flamenco. If it ever becomes necessary to
support soft-deletion, see https://gorm.io/docs/delete.html#Soft-Delete
2022-06-13 18:40:42 +02:00
Sybren A. Stüvel
c3525c3b1a Manager: move task requeueing to TaskStateMachine
Requeueing the tasks of a specific worker is now done in the
`TaskStateMachine`, such that it can be called from other services as
well in future commits.

This also makes the `LogStorage` service a dependency of the
`TaskStateMachine`, as it needs to write "this task was requeued" kind
of messages to the task logs.
2022-06-13 12:33:01 +02:00
Sybren A. Stüvel
e06bc484f4 Cleanup: manager, move task state machine interfaces to their own file
No functional changes.
2022-06-13 12:32:18 +02:00
Sybren A. Stüvel
7b664475ca Rename job status requeued to requeueing 2022-05-19 17:25:53 +02:00
Sybren A. Stüvel
cc62cab1d6 Update code to handle the JobUpdate to SocketIOJobUpdate rename
No functional changes.
2022-05-19 15:18:06 +02:00
Sybren A. Stüvel
b928896066 OAPI: regenerate code 2022-05-19 15:17:19 +02:00
Sybren A. Stüvel
530520b1c7 Implement mass updating of tasks when JobUpdate.refresh_tasks = true
Send & handle `JobUpdate.refresh_tasks = true` when many tasks are
updated simultaneously. This applies to things like cancelling &
requeueing an entire job.

This partially rolls back 67bf77de13d99b1bc5d7344951068822c4fadd88, as
it was too slow when 1000+ tasks were being updated all at once.
2022-05-17 14:48:50 +02:00
Sybren A. Stüvel
0b39f229a1 Implement may-I-keep-running protocol
Worker and Manager implementation of the "may-I-kee-running" protocol.

While running tasks, the Worker will ask the Manager periodically
whether it's still allowed to keep running that task. This allows the
Manager to abort commands on Workers when:

- the Worker should go to another state (typically 'asleep' or
  'shutdown'),
- the task changed status from 'active' to something non-runnable
  (typically 'canceled' when the job as a whole is canceled).
- the task has been assigned to a different Worker. This can happen when
  a Worker loses its connection to its Manager, resulting in a task
  timeout (not yet implemented) after which the task can be assigned to
  another Worker. If then the connectivity is restored, the first Worker
  should abort (last-assigned Worker wins).
2022-05-12 15:06:05 +02:00
Sybren A. Stüvel
d673da7a0c Manager: check for stuck jobs at startup
Check for jobs in 'cancel-requested' or 'requeued' statuses, and ensure
they transition to the right status. This happens at startup, before
even starting the web interface, so that a consistent state is presented.
2022-05-06 16:07:27 +02:00
Sybren A. Stüvel
d008991bf4 Revert "Manager: broadcast job/task updates in a separate goroutine"
This reverts commit cd28ef552e2476dda68ba671436b805d7b32a655, as it
breaks the unit tests and I don't want to spend the time to fix those.
2022-05-06 14:48:16 +02:00
Sybren A. Stüvel
cd28ef552e Manager: broadcast job/task updates in a separate goroutine 2022-05-06 12:27:10 +02:00
Sybren A. Stüvel
ba34652cd1 Implement task status changes from web interface
This also reworks some of the logic due to the recently-removed
`cancel-requested` task status.
2022-05-05 16:44:09 +02:00
Sybren A. Stüvel
23680c27bf OAPI: regenerate code 2022-05-05 16:36:38 +02:00
Sybren A. Stüvel
67bf77de13 Manager: rework mass updates to task statuses
When the job status changes, it impacts the task statuses as well. These
status changes are now no longer done with a single database query, but
instead each affected task is fetched, changed, and saved. This unifies
the regular & mass updates to the tasks, and causes the resulting task
changes to be broadcast to SocketIO clients.
2022-05-03 16:13:44 +02:00
Sybren A. Stüvel
50c8cd39f2 Task update notifications via SocketIO
Manager now sends out task updates via SocketIO, and the web interface
handles those.

Note that there is a `BroadcastTaskUpdate()` function, but not a
`BroadcastNewTask`. The 'new job' broadcast is sent after the job's
tasks have been created, and thus there is no need for a separate
broadcast per task.
2022-05-03 11:26:24 +02:00
Sybren A. Stüvel
d79fde17f3 Manager: keep track of the reason of job status changes
To prepare for job status changes being requestable from the API, store
the reason for any status change on the job itself.

Not yet part of the API, just on the persistence layer.
2022-04-21 12:32:07 +02:00
Sybren A. Stüvel
89e130c04f Manager: update tests for inclusion of job name in job updates 2022-04-08 11:59:30 +02:00
Sybren A. Stüvel
df3f7b44b9 Hook up web interface to job updates 2022-04-07 18:46:09 +02:00
Sybren A. Stüvel
0c0df41f5d Job status change system for SocketIO broadcasts
Not fully tested yet.
2022-04-05 15:52:55 +02:00
Sybren A. Stüvel
9f5e4cc0cc License: license all code under "GPL-3.0-or-later"
The add-on code was copy-pasted from other addons and used the GPL v2
license, whereas by accident the LICENSE text file had the GNU "Affero" GPL
license v3 (instead of regular GPL v3).

This is now all streamlined, and all code is licensed as "GPL v3 or later".

Furthermore, the code comments just show a SPDX License Identifier
instead of an entire license block.
2022-03-07 15:26:46 +01:00
Sybren A. Stüvel
cd2fe8170e Errors: remove "error" prefix from message
Instead of returning an error "error doing X", just return "doing X". The
fact that it's returned as an error object says enough about that it's
an error.

This also makes it easier to chain error messages, without seeing the
word "error" in every part of the chain.
2022-03-04 11:30:31 +01:00
Sybren A. Stüvel
47e36c927c Change package URL to the blender.org repository 2022-03-01 20:45:09 +01:00
Sybren A. Stüvel
7e5a631f33 Cleanup: refactor updateJobAfterTaskStatusChange()
Break up a complex function into smaller functions.
2022-02-28 12:50:34 +01:00
Sybren A. Stüvel
3d854078ba Manager: integrate task state machine into API implementation 2022-02-25 16:30:27 +01:00
Sybren A. Stüvel
9a5bbb4131 Manager: implement persistence layer interface for task status machine
Implement the functions used by the task status machine in the DB
persistence layer.
2022-02-25 14:34:29 +01:00
Sybren A. Stüvel
7279f2e35f Manager: task state machine, handle job status -> task status changes
Direct copy of the Flamenco Server Python code, for handling the change
of a job's status to trigger status changes on its tasks.

Not yet connected to the rest of the Manager logic.
2022-02-25 12:30:40 +01:00
Sybren A. Stüvel
e8707171b4 Manager: add test for saving task status change with error on job save 2022-02-25 11:07:06 +01:00
Sybren A. Stüvel
d2f1cf3614 Cleanup: move test helper functions down in the file
This way it starts with the actual tests.
2022-02-25 11:02:47 +01:00