flamenco

Author	SHA1	Message	Date
Sybren A. Stüvel	b58f1e15f1	Add CLI utility to recreate tasks of jobs Due to an issue (which has been fixed in the previous commit), all tasks in the database were deleted when starting Flamenco. This tool attempts to recompile the job and recreate its tasks. The statuses of the tasks are set based on the job status. Basically: - job active → tasks queued - job completed → tasks completed - job cancelled / failed → tasks cancelled - otherwise → tasks queued To ensure that the tool is only used to create tasks from scratch, it refuses to work on a job that still has tasks in the database.	2023-07-10 14:10:15 +02:00
Sybren A. Stüvel	06738b8aa4	Manager: disable SQLite foreign key constraints when migrating the database There is an issue with the GORM auto-migration, in that it doesn't always disable foreign key constraints when it should. Due to limitations of SQLite, not all 'alter table' commands you'd want to use are available. As a workaround, these steps are performed: 1. create a new table with the desired schema, 2. copy the data over, 3. drop the old table, 4. rename the new table to the old name. Step #3 will wreak havoc with the database when foreign key constraint checks are active, so no we temporarily deactivate them while performing database migration.	2023-07-10 14:06:21 +02:00
Sybren A. Stüvel	60d54eabb3	Manager: avoid recreation of Worker table at startup Mark the default value of `Worker.LazyStatusRequest` as `false`. The previous default was configured as `0`, which was different enough to always trigger a database migration of that column. However, since these values do map to each other, the migration didn't do anything concrete, and would be triggered again at the next startup.	2023-07-10 13:56:47 +02:00
Eveline Anderson	830c3fe794	Rename worker 'clusters' to 'tags' As it was decided that the name "tags" would be better for the clarity of the feature, all files and code named "cluster" or "worker cluster" have been removed and replaced with "tag" and "worker tag". This is only a name change, no other features were touched. This addresses part of #104204. Reviewed-on: https://projects.blender.org/studio/flamenco/pulls/104223 As a note to anyone who already ran a pre-release version of Flamenco and configured some worker clusters, with the help of an SQLite client you can migrate the clusters to tags. First build Flamenco Manager and start it, to create the new database schema. Then run these SQL queries via an sqlite commandline client: ```sql insert into worker_tags (id, created_at, updated_at, uuid, name, description) select id, created_at, updated_at, uuid, name, description from worker_clusters; insert into worker_tag_membership (worker_tag_id, worker_id) select worker_cluster_id, worker_id from worker_cluster_membership; ```	2023-07-10 11:11:03 +02:00
Sybren A. Stüvel	7a508c7e6b	Manager: perform database integrity check at startup Perform these two SQL calls & check their result: - `PRAGMA integrity_check` - `PRAGMA foreign_key_check`: See https: //www.sqlite.org/pragma.html for more info on these. This also removes the unused `PeriodicMaintenanceLoop()` function. Periodic checking while Flamenco Manager is running might be introduced in a future commit, after the startup-time checks have been shown to not get in the way.	2023-07-07 16:03:06 +02:00
Sybren A. Stüvel	7f588e6dbc	Manager: close database connection on startup errors When there is an error detected at startup, close the database connection. Before, the connection could be kept open even when an error was returned, causing the write-ahead log files to be kept around. These are now properly integrated into the main database file before exiting.	2023-07-07 15:48:08 +02:00
Sybren A. Stüvel	988cdf61ff	Upgrade GORM & SQLite Upgrade: - `gorm.io/gorm` v1.23.8 → 1.25.2 - `github.com/glebarez/go-sqlite` v1.17.3 → v1.8.0 - `github.com/glebarez/sqlite` v1.4.6 → v1.8.0 and also some indirect dependencies. This is in the hope that some weird cases at Blender Studio get resolved. It appears that sometimes, for some unknown reason, when deleting a job, its tasks get reassigned to another job (instead of also getting deleted). Since there is no code in Flamenco itself to do this task deletion (it's all depending on SQLite following the foreign keys and cascading to tasks), I hope it was a bug in either GORM or SQLite that got fixed at some point.	2023-07-06 16:08:57 +02:00
Sybren A. Stüvel	22f4aa09f3	Manager: expand job deletion unit test Add extra job to the database before deleting one, to ensure that job deletion doesn't do anything with other jobs (and their tasks). No functional changes to Flamenco itself.	2023-07-06 16:08:57 +02:00
Sybren A. Stüvel	afde952c10	Fix incompatibility with 32-bit platforms	2023-05-24 21:23:05 +02:00
Anish Bharadwaj (he)	0502498dfa	Fix #104201 : Task Limit error in Flamenco Manager Insert tasks in batches so that the required SQL query stays within the limits of SQLite. No changes to the API, only to the persistence layer. Reviewed-on: https://projects.blender.org/studio/flamenco/pulls/104205	2023-04-24 15:10:59 +02:00
Sybren A. Stüvel	6a89fa346c	Manager: correctly count how many workers can run a job Basically this accounts for the change in 3724a8874e4f22ef0740f464d9e912b19a1e061e	2023-04-04 15:19:21 +02:00
Sybren A. Stüvel	3724a8874e	Slight change of worker cluster behaviour Workers without cluster now only run jobs without cluster.	2023-04-04 13:17:45 +02:00
Sybren A. Stüvel	8408d28a6b	Manager: add support for worker clusters	2023-04-04 12:18:35 +02:00
Sybren A. Stüvel	e2559b1181	Cleanup: remove doubly-declared default value in persistence layer No functional changes.	2023-04-03 16:59:22 +02:00
Sybren A. Stüvel	426b2aab4d	Gracefully handle sleep schedules of deleted workers Workers can be soft-deleted, which means that they stay in the database. As such, foreign key constraints `ON DELETE CASCADE` do not trigger, and thus their sleep schedule can still be active. This is now detected and handled gracefully.	2023-02-09 11:18:38 +01:00
Sybren A. Stüvel	fe0899fd55	shaman-checkout-id-setter: Don't update job's "updated at" timestamp The Shaman Checkout ID setter shouldn't update a job's "updated at" timestamp. Its goal is to fake that the job was submitted with a new enough Flamenco version, and thus should not touch the timestamps.	2023-02-07 16:24:23 +01:00
Sybren A. Stüvel	01a85d86cb	Add "Shaman Checkout ID setter" command This is a command that can be run to retroactively set the Shaman Checkout ID of jobs, allowing the job deletion to also remove the job's Shaman checkout directory. This is highly experimental, and not built by default or shipped with Flamenco releases. It's only been used once at Blender Animation Studio to help cleaning up. Run at your own risk. Make backups first.	2023-02-07 15:07:41 +01:00
Sybren A. Stüvel	aa1c6b8ff3	Close the database when Flamenco shuts down This prevents SQLite journal files from lingering around.	2023-02-07 15:05:49 +01:00
Sybren A. Stüvel	ef3cab9745	Webapp: handle job deletions properly - Add a little confirmation overlay before deleting a job. This overlay also shows information about whether the Shaman checkout directory will be deleted or not. - Send job updates to the web frontend when jobs are marked for deletion, and when they are actually deleted. - Respond to those updates, and handle some corner cases where job info is missing (because it just got deleted). This closes T99401.	2023-02-03 16:59:15 +01:00
Sybren A. Stüvel	bf0906eb95	Manager: avoid logging an error when requesting a non-existent job This is expected to happen every once in a while, especially now that Flamenco supports job deletion. It's not something to log at error level.	2023-02-03 16:37:55 +01:00
Sybren A. Stüvel	791d877ff1	Manager: implement API endpoint for deleting jobs Implement the `deleteJob` API endpoint. Calling this endpoint will mark the job as "deletion requested", after which it's queued for actual deletion. This makes the API response fast, even when there is a lot of work to do in the background. A new background service "job deleter" keeps track of the queue of such jobs, and performs the actual deletion. It removes: - Shaman checkout for the job (but see below) - Manager-local files of the job (task logs, last-rendered images) - The job itself The removal is done in the above order, so the job is only removed from the database if the rest of the removal was succesful. Shaman checkouts are only removed if the job was submitted with Flamenco version 3.2. Earlier versions did not record enough information to reliably do this.	2023-01-04 01:18:21 +01:00
Sybren A. Stüvel	f413a40f4e	Store Shaman checkout ID when submitting a job If Shaman is used to submit the job files, store the job's checkout ID (i.e. the path relative to the checkout root) in the database. This will make it possible in the future to remove the Shaman checkout along with the job itself.	2023-01-04 01:18:21 +01:00
Sybren A. Stüvel	15e3745820	Manager: SQLite WAL journal + NORMAL sync mode Run `PRAGMA journal_mode = WAL` and `PRAGMA synchronous = normal` when connecting to the SQLite database. This enables the write-ahead-log journal mode, which makes it safe to enable "normal" synchronisation (instead of the default "full" synchronisation).	2022-11-24 17:18:06 +01:00
Sybren A. Stüvel	85d53de1f9	Manager: implement API endpoint for changing job priority The priority of an existing can now be changed. It will be taken into account when assigning tasks to workers, but it will not reassign tasks that are already active.	2022-09-30 16:30:03 +02:00
Sybren A. Stüvel	59655ea770	Manager: fix error in sleep scheduler when shutting down When the Manager was shutting down while the sleep scheduler was running, it could cause a null pointer dereference. This is now doubly solved: - `worker.Identifier()` is now nil-safe, as in, `worker` can be `nil` and it will still return a sensible string. - failure to apply the sleep schedule due to the context closing is not logged as error any more.	2022-09-27 12:27:18 +02:00
Sybren A. Stüvel	2a345a3d2c	API for deleting workers Workers can now be soft-deleted. Tasks assigned to the worker will remain associated with that Worker. Active tasks will be re-queued so other workers can pick them up.	2022-08-11 16:59:53 -07:00
Sybren A. Stüvel	1469345f3a	Manager: sort blocklist by worker name	2022-08-01 18:54:28 +02:00
Sybren A. Stüvel	be1ddaa4eb	Manager test: reduce timeout to practical value The timeout was increased to aid debugging, but shouldn't have been committed.	2022-07-29 09:59:54 +02:00
Sybren A. Stüvel	736ca103c3	Manager: show current/last task in worker details The Task details component already linked to the Worker it was assigned to last, and now the Worker links back to the task. There's only one task shown in the Worker details. If the Worker is actively working on a task, that one's shown. Otherwise it's the last-updated task that was assigned to the worker.	2022-07-26 10:36:02 +02:00
Sybren A. Stüvel	ab8ecc24cc	Cleanup: Add missing license specifiers Add license specifiers to Go files that were missing them: ``` // SPDX-License-Identifier: GPL-3.0-or-later ``` No functional changes.	2022-07-25 16:08:07 +02:00
Sybren A. Stüvel	83467e4c60	Sleep schedule: store 'next check' timestamp in UTC SQLite doesn't parse the timezone info, so timestamps should always be in UTC.	2022-07-18 19:30:17 +02:00
Sybren A. Stüvel	658a3d7a85	Worker Timeout: subject all but offline/error workers to timeout checks Workers that are in `starting`, `asleep`, or `testing` state should also be subject to the timeout check, not just workers in `awake` state.	2022-07-18 11:30:39 +02:00
Sybren A. Stüvel	d7b164133a	Sleep Scheduler implementation for the Manager The Manager now has a sleep scheduler for Workers. The API and background service work, but there is no web interface yet. Manifest Task: T99397	2022-07-17 17:27:32 +02:00
Sybren A. Stüvel	627996525e	Manager: implement operations for getting & setting worker sleep schedule This is just the API, no web interface yet. Manifest Task: T99397	2022-07-16 16:00:25 +02:00
Sybren A. Stüvel	859a261b05	Manager: on deletion of a worker, do not cascade to deletion of its tasks Fix an issue where deleting a Worker would also delete the tasks it was assigned to.	2022-07-15 17:00:25 +02:00
Sybren A. Stüvel	1fceae3604	Manager: more efficient database queries Be more selective in what's saved to the database to speed some things up. Most importantly, this avoids saving the entire job when a task status is updated or a task is assigned.	2022-07-15 15:08:00 +02:00
Sybren A. Stüvel	1055aabee2	Manager: optimise db.SaveActivity() query Use an explicit `Select()` GORM call to avoid saving related objects.	2022-07-15 15:08:00 +02:00
Sybren A. Stüvel	6e28271c93	Manager: prevent saving related job & worker when "touching" task	2022-07-15 15:08:00 +02:00
Sybren A. Stüvel	6b5f9317cb	Manager: clear job's blocklist when requeueing the job Requeueing a job means that the issues that caused workers to get blocked might be resolved, so it should be run with a clean slate.	2022-07-14 11:03:11 +02:00
Sybren A. Stüvel	d25151184d	Add a "Last Rendered" view Add a "Last Rendered" view to the webapp. The Manager now stores (in the database) which job was the last recipient of a rendered image, and serves that to the appropriate OpenAPI endpoint. A new SocketIO subscription + accompanying room makes it possible for the web interface to receive all rendered images (if they survive the queue, which discards images when it gets too full).	2022-07-01 12:34:40 +02:00
Sybren A. Stüvel	64512c81ba	Manager: implement OAPI operations to fetch blocklist & delete items	2022-06-27 11:32:35 +02:00
Sybren A. Stüvel	87f1959e26	Manager: use blocklist to actually block workers Actually use the blocklist in the task scheduler to block workers from doing blocked job types.	2022-06-21 17:59:20 +02:00
Sybren A. Stüvel	64c8fa851d	Show assigned worker in task details Show the worker assigned to the task in the task details view, as link to the worker itself.	2022-06-17 16:36:55 +02:00
Sybren A. Stüvel	046853932d	Manager: re-queue previously failed tasks of worker when blocklisting When a Worker is blocked from a job, re-queue its previously failed tasks so that other workers can give them a try.	2022-06-17 15:49:16 +02:00
Sybren A. Stüvel	fd31a85bcd	Manager: add blocking of workers when they fail certain tasks too much When a worker fails too many tasks, of the same task type, on the same job, it'll get blocked from doing those.	2022-06-17 15:49:16 +02:00
Sybren A. Stüvel	81f81d0e0a	Show task failure list in the web frontend Show the task failure list in the web frontend's `TaskDetails` component.	2022-06-17 11:37:56 +02:00
Sybren A. Stüvel	0b5140fc5f	Manager: clear task failure list on requeueing of jobs & tasks When a job or task gets requeued from the web interface, its task failure lists (i.e. the list of workers that previously failed this task) will be cleared. This clearing doesn't happen in other situations, e.g. when a worker signs off and its task gets requeued, the task's failure list will remain as-is.	2022-06-17 11:37:28 +02:00
Sybren A. Stüvel	e9fca8d993	Cleanup: typo fix in comment	2022-06-17 11:03:43 +02:00
Sybren A. Stüvel	8764f8f7c1	Manager: task scheduler, don't schedule tasks the worker failed before When a worker asks for a task to perform, don't give it a task that it failed before.	2022-06-16 16:02:28 +02:00
Sybren A. Stüvel	7d7c2b1bd6	Cleanup: blacklist → blocklist Change "blacklist" to "blocklist", because that makes people happier. No functional changes.	2022-06-16 10:36:36 +02:00

1 2 3

148 Commits