Various improvements to the logging of the job deletion:
- Reduce the log level of the "removing logs" and "removing job from
database" lines from INFO to DEBUG, so that only one line of INFO is
logged per deleted job
- Show size of the queue and the check interval in the "job deletion
queue is full" log message.
When queueing up jobs to be deleted, log how many deletions remain to be
picked up later. Once a minute the database is checked for such deletion
requests, so the next batch will be scheduled in a minute.
Workers can be soft-deleted, which means that they stay in the database.
As such, foreign key constraints `ON DELETE CASCADE` do not trigger, and
thus their sleep schedule can still be active. This is now detected and
handled gracefully.
The Shaman Checkout ID setter shouldn't update a job's "updated at"
timestamp. Its goal is to fake that the job was submitted with a new
enough Flamenco version, and thus should not touch the timestamps.
This is a command that can be run to retroactively set the Shaman
Checkout ID of jobs, allowing the job deletion to also remove the job's
Shaman checkout directory.
This is highly experimental, and not built by default or shipped with
Flamenco releases. It's only been used once at Blender Animation Studio
to help cleaning up. Run at your own risk. Make backups first.
- Add a little confirmation overlay before deleting a job. This overlay
also shows information about whether the Shaman checkout directory
will be deleted or not.
- Send job updates to the web frontend when jobs are marked for
deletion, and when they are actually deleted.
- Respond to those updates, and handle some corner cases where job info
is missing (because it just got deleted).
This closes T99401.
Show jobs that have been marked for deletion with a red strike-through
line in the jobs table, and show the deletion-request timestamp in the
job details.
Implement the `deleteJob` API endpoint. Calling this endpoint will mark
the job as "deletion requested", after which it's queued for actual
deletion. This makes the API response fast, even when there is a lot of
work to do in the background.
A new background service "job deleter" keeps track of the queue of such
jobs, and performs the actual deletion. It removes:
- Shaman checkout for the job (but see below)
- Manager-local files of the job (task logs, last-rendered images)
- The job itself
The removal is done in the above order, so the job is only removed from the
database if the rest of the removal was succesful.
Shaman checkouts are only removed if the job was submitted with Flamenco
version 3.2. Earlier versions did not record enough information to reliably
do this.
If Shaman is used to submit the job files, store the job's checkout ID
(i.e. the path relative to the checkout root) in the database. This will
make it possible in the future to remove the Shaman checkout along with
the job itself.
Manager had a limit of 10 MB, but the Worker can produce images that are
larger than that (even after down-scaling the image). I've bumped the
limit to 25 MB, which should be enough (it's 2x the bug reporter's file
size).
Add a timeout when fetching a job from the persistence layers.
It's my intention to add more timeouts, so this also introduces some code
to make it easier to test that a context has a deadline set.
Deduplicate API implementation code to fetch a job from the persistence
service.
Almost no functional changes. Checking that the requested job UUID is
actually a valid UUID is now consistently done on all fetches. This is
not a functional change in normal Flamenco operations, where only valid
UUIDs are used anyway.
Run `PRAGMA journal_mode = WAL` and `PRAGMA synchronous = normal` when
connecting to the SQLite database. This enables the write-ahead-log journal
mode, which makes it safe to enable "normal" synchronisation (instead of
the default "full" synchronisation).
The preview video task would attempt to use JPEG preview files when the
"Preview" checkbox is checked, even when this checkbox is not shown in
Blender's UI (when the output format is not EXR). Blender still stores
this property, even when it's unused, and that's what tripped up the job
compiler.
The "Simple Blender Render" job type now first checks whether the previews
are necessary at all, and only then uses them.
The two-way variable replacement function changes the submitted job. To
clarify that this happens, pass the pointer `&submittedJob`.
Both pass-by-pointer and pass-by-value work, because the variable
replacement typically works on maps/slices, which are passed by reference
anyway. Better to be explicit about this, though.
No functional changes.
Fix an issue where workers would switch immediately on a state change
request, even if it was of the "after task is finished" kind.
The "may I keep running" endpoint wasn't checking the lazyness flag, and
thus any state change, lazy or otherwise, would interrupt the worker's
current task.
The priority of an existing can now be changed. It will be taken into
account when assigning tasks to workers, but it will not reassign tasks
that are already active.
When the Manager was shutting down while the sleep scheduler was running, it
could cause a null pointer dereference. This is now doubly solved:
- `worker.Identifier()` is now nil-safe, as in, `worker` can be `nil` and
it will still return a sensible string.
- failure to apply the sleep schedule due to the context closing is not
logged as error any more.
Blender not being found can be reported via various errors (this should be
reworked in the 'blender finder API' at some point). `exec.ErrNotFound` is
returned when Blender cannot be found on `$PATH`, which is something that's
absolutely fine. This is now logged less dramatically.
Avoid the word "error" in logging when Blender cannot be found. Typically
these are warnings, and having the word "error" there makes people think
otherwise.
When doing two-way variable replacement, if the variable has a Windows
path (i.e. backslashes) also do a match for the value with forward slashes.
In other words, if a path `Y:/shared/...` comes in, and the variable value
is (correctly) `Y:\shared\...`, it will be seen as a match.
When a submitted job is refused because of a mismatched etag, there is
now a more explanatory error logged on the Manager. The website also has
an entry in the FAQ for this, as I expect more people to run into this
issue when they upgrade Flamenco.
The `require.XXX` functions are exactly the same as `assert.XXX`
functions + directly failing the test, so this refactor simplifies the
code quite a bit. Can be done in more areas than this.
No functional changes.
Simple Blender Render now no longer renders to an intermediate directory.
This not only simplifies the script, but it also opens the door for
selective re-running of individual tasks.
In the old situation, where the intermediate directory was renamed to
the desired name in the last task, rerunning tasks would fail because the
directory they expect to exist no longer exists. This is now resolved.
The original idea behind this job type was that it would work equally
well for videos as for images, but that was never really well tested.
It's currently broken, so this commit removes video support altogether.
Remove the `blender_cmd` setting, and just hard-code it to `{blender}`.
The Blender add-on was already passing this string, and it's very unlikely
that people are already writing custom add-ons to pass something different.
It provided flexibility that was untested, so it's better to simplify
things.
Implement the `getSharedStorage` operation in the Manager, and use it in
the add-on to get the shared storage location in a way that makes sense
for the platform of the user.
Manifest task: T100196
This variable is used in tests to mock the current OS, but wasn't set
during normal operation of the Manager. This caused issues with the
two-way variable system.