45 Commits

Author SHA1 Message Date
Sybren A. Stüvel
2a345a3d2c API for deleting workers
Workers can now be soft-deleted. Tasks assigned to the worker will remain
associated with that Worker. Active tasks will be re-queued so other
workers can pick them up.
2022-08-11 16:59:53 -07:00
Sybren A. Stüvel
859a261b05 Manager: on deletion of a worker, do not cascade to deletion of its tasks
Fix an issue where deleting a Worker would also delete the tasks it was
assigned to.
2022-07-15 17:00:25 +02:00
Sybren A. Stüvel
1fceae3604 Manager: more efficient database queries
Be more selective in what's saved to the database to speed some things up.
Most importantly, this avoids saving the entire job when a task status is
updated or a task is assigned.
2022-07-15 15:08:00 +02:00
Sybren A. Stüvel
1055aabee2 Manager: optimise db.SaveActivity() query
Use an explicit `Select()` GORM call to avoid saving related objects.
2022-07-15 15:08:00 +02:00
Sybren A. Stüvel
6e28271c93 Manager: prevent saving related job & worker when "touching" task 2022-07-15 15:08:00 +02:00
Sybren A. Stüvel
64c8fa851d Show assigned worker in task details
Show the worker assigned to the task in the task details view, as link
to the worker itself.
2022-06-17 16:36:55 +02:00
Sybren A. Stüvel
046853932d Manager: re-queue previously failed tasks of worker when blocklisting
When a Worker is blocked from a job, re-queue its previously failed tasks
so that other workers can give them a try.
2022-06-17 15:49:16 +02:00
Sybren A. Stüvel
81f81d0e0a Show task failure list in the web frontend
Show the task failure list in the web frontend's `TaskDetails` component.
2022-06-17 11:37:56 +02:00
Sybren A. Stüvel
0b5140fc5f Manager: clear task failure list on requeueing of jobs & tasks
When a job or task gets requeued from the web interface, its task
failure lists (i.e. the list of workers that previously failed this
task) will be cleared.

This clearing doesn't happen in other situations, e.g. when a worker
signs off and its task gets requeued, the task's failure list will
remain as-is.
2022-06-17 11:37:28 +02:00
Sybren A. Stüvel
c5debdeb70 Manager: add 'task failure list' to record workers failing tasks
The persistence layer can now store which worker failed which task, as
preparation for a blocklisting system. Such a system should be able to
determine whether there are still any workers left to do the work.
2022-06-13 18:41:30 +02:00
Sybren A. Stüvel
e35911d106 Manager: add ability to delete jobs
This is needed for a future unit test, and exposed the fact that SQLite
didn't enforce foreign key constraints (and thus also didn't handle
on-delete-cascade attributes). This has been fixed in the previous commit.
2022-06-13 18:41:19 +02:00
Sybren A. Stüvel
6ec493d944 Manager, more efficiently create tasks
When creating tasks the inter-task dependencies are saved as a 2nd pass,by
updating the tasks in the database. This now only saves those dependencies,
and no longer saves the entire task again.
2022-06-13 18:40:42 +02:00
Sybren A. Stüvel
02bc03ae2b Manager: replace gorm.Model with our own persistence.Model struct
`persistence.Model` contains the common database fields for most model
structs. It is a copy of `gorm.Model`, but without the `DeletedAt`
field (which triggers Gorm's soft deletion).

Soft deletion is not used by Flamenco. If it ever becomes necessary to
support soft-deletion, see https://gorm.io/docs/delete.html#Soft-Delete
2022-06-13 18:40:42 +02:00
Sybren A. Stüvel
67562856d3 Manager: let Gorm create an index on Task.LastTouchedAt
It's used in timeout queries, and there could be tens or hundreds of
thousands of tasks in the database.
2022-06-13 12:33:05 +02:00
Sybren A. Stüvel
d90a8b987d Manager: Task Timeout Checker
Tasks that are in state `active` but haven't been 'touched' by a Worker
for 10 minutes or longer will transition to state `failed`.

In the future, it might be better to move the decision about which state
is suitable to the Task State Machine service, so that it can be smarter
and take the history of the task into account. Going to `soft-failed`
first might be a nice touch.
2022-06-10 14:32:02 +02:00
Sybren A. Stüvel
b4d2fc4231 Manager: keep track of when a Worker last worked on a task
This will be used for keeping track of stuck tasks.
2022-06-03 16:33:50 +02:00
Sybren A. Stüvel
530520b1c7 Implement mass updating of tasks when JobUpdate.refresh_tasks = true
Send & handle `JobUpdate.refresh_tasks = true` when many tasks are
updated simultaneously. This applies to things like cancelling &
requeueing an entire job.

This partially rolls back 67bf77de13d99b1bc5d7344951068822c4fadd88, as
it was too slow when 1000+ tasks were being updated all at once.
2022-05-17 14:48:50 +02:00
Sybren A. Stüvel
d673da7a0c Manager: check for stuck jobs at startup
Check for jobs in 'cancel-requested' or 'requeued' statuses, and ensure
they transition to the right status. This happens at startup, before
even starting the web interface, so that a consistent state is presented.
2022-05-06 16:07:27 +02:00
Sybren A. Stüvel
ba34652cd1 Implement task status changes from web interface
This also reworks some of the logic due to the recently-removed
`cancel-requested` task status.
2022-05-05 16:44:09 +02:00
Sybren A. Stüvel
67bf77de13 Manager: rework mass updates to task statuses
When the job status changes, it impacts the task statuses as well. These
status changes are now no longer done with a single database query, but
instead each affected task is fetched, changed, and saved. This unifies
the regular & mass updates to the tasks, and causes the resulting task
changes to be broadcast to SocketIO clients.
2022-05-03 16:13:44 +02:00
Sybren A. Stüvel
bb68488c5e Cleanup: Manager, add bit of documentation 2022-05-03 10:39:44 +02:00
Sybren A. Stüvel
d79fde17f3 Manager: keep track of the reason of job status changes
To prepare for job status changes being requestable from the API, store
the reason for any status change on the job itself.

Not yet part of the API, just on the persistence layer.
2022-04-21 12:32:07 +02:00
Sybren A. Stüvel
c3b694ab2a Manager: wrap job/task errors in persistence layer
Avoid users of the persistence layer to have to test against Gorm errors,
by wrapping job/task errors in a new `PersistenceError` struct.

Instead of testing for `gorm.ErrRecordNotFound`, code can now test for
`persistence.ErrJobNotFound` or `persistence.ErrTaskNotFound`.
2022-04-21 11:54:59 +02:00
Sybren A. Stüvel
9f5e4cc0cc License: license all code under "GPL-3.0-or-later"
The add-on code was copy-pasted from other addons and used the GPL v2
license, whereas by accident the LICENSE text file had the GNU "Affero" GPL
license v3 (instead of regular GPL v3).

This is now all streamlined, and all code is licensed as "GPL v3 or later".

Furthermore, the code comments just show a SPDX License Identifier
instead of an entire license block.
2022-03-07 15:26:46 +01:00
Sybren A. Stüvel
cd2fe8170e Errors: remove "error" prefix from message
Instead of returning an error "error doing X", just return "doing X". The
fact that it's returned as an error object says enough about that it's
an error.

This also makes it easier to chain error messages, without seeing the
word "error" in every part of the chain.
2022-03-04 11:30:31 +01:00
Sybren A. Stüvel
2b04623e00 Manager: fix DB transaction isolation issue in task scheduler
The created transaction wasn't actually used for the should-be-in-the-
transaction queries. That's now resolved.
2022-03-03 13:46:27 +01:00
Sybren A. Stüvel
9643bf768e Manager: Fix DB migration error of not-null columns
Where the PostgreSQL DB migration code could handle `NOT NULL` columns just
fine, SQLite has less table-altering functionality. As a result, migrations
have to copy entire database tables, which doesn't play well with
not-nullable columns.
2022-03-03 12:10:13 +01:00
Sybren A. Stüvel
47e36c927c Change package URL to the blender.org repository 2022-03-01 20:45:09 +01:00
Sybren A. Stüvel
7689a988b1 Manager: re-queue tasks of worker when signing off 2022-02-28 12:06:50 +01:00
Sybren A. Stüvel
32af1ffaef Manager: actually pass context to Gorm queries 2022-02-28 11:53:31 +01:00
Sybren A. Stüvel
3d854078ba Manager: integrate task state machine into API implementation 2022-02-25 16:30:27 +01:00
Sybren A. Stüvel
9a5bbb4131 Manager: implement persistence layer interface for task status machine
Implement the functions used by the task status machine in the DB
persistence layer.
2022-02-25 14:34:29 +01:00
Sybren A. Stüvel
7420177209 Manager: use api.JobStatus in persistence layer as well 2022-02-24 11:54:35 +01:00
Sybren A. Stüvel
7e776167bb Manager: use api.TaskStatus in persistence layer as well 2022-02-24 11:53:05 +01:00
Sybren A. Stüvel
90a2140b8c Manager: store task logs to disk 2022-02-21 19:47:07 +01:00
Sybren A. Stüvel
ef2bbd2845 Unified Command field names
Some parts of Flamenco had a Command consist of "name + settings", and
other parts used "type + parameters" (with the same semantics). This is
now unified to "name + parameters".
2022-02-21 18:03:51 +01:00
Sybren A. Stüvel
399c8af750 Correctly handle workers assigned to tasks + simple task updates 2022-02-17 17:30:52 +01:00
Sybren A. Stüvel
c4df62d5d4 Start of sending task updates to Manager
This includes a mocking framework for unittests.
2022-02-15 15:58:24 +01:00
Sybren A. Stüvel
9543d6221b Wrap error with %w 2022-02-15 15:56:54 +01:00
Sybren A. Stüvel
4aafb782ac Scheduler: Assign task to worker 2022-02-14 17:47:26 +01:00
Sybren A. Stüvel
8e01c069d1 DB: task dependencies should be cascade-deleted with their tasks 2022-02-14 15:07:14 +01:00
Sybren A. Stüvel
97ab93d996 Initial task scheduler implementation 2022-02-01 17:17:19 +01:00
Sybren A. Stüvel
fad2dc3042 Job authoring: handling of task dependencies + some bugfixes 2022-02-01 17:17:19 +01:00
Sybren A. Stüvel
e598397ba4 Persistence: move some DB to API struct conversion to API implementation 2022-02-01 16:22:10 +01:00
Sybren A. Stüvel
931fd1a24c Move db/jobs tests to persistence/jobs.go 2022-01-28 14:53:02 +01:00