Worker Clusters can be managed via the API, workers can be assigned to
any number of clusters (if not assigned to any, they'll pick up any task).
Jobs can be submitted with a cluster ID, in which case only workers that
are in that cluster or are clusterless will pick up its tasks.
Add a "what-would-delete-do" operation, to query the Manager about what
the deletion of a specific job would entail. For some jobs the job files
will also be deleted (if they were created with a new enough Flamenco),
otherwise they will remain untouched.
Also expand the `SocketIOJobUpdate` schema to include info about job
deletion.
Implement the `deleteJob` API endpoint. Calling this endpoint will mark
the job as "deletion requested", after which it's queued for actual
deletion. This makes the API response fast, even when there is a lot of
work to do in the background.
A new background service "job deleter" keeps track of the queue of such
jobs, and performs the actual deletion. It removes:
- Shaman checkout for the job (but see below)
- Manager-local files of the job (task logs, last-rendered images)
- The job itself
The removal is done in the above order, so the job is only removed from the
database if the rest of the removal was succesful.
Shaman checkouts are only removed if the job was submitted with Flamenco
version 3.2. Earlier versions did not record enough information to reliably
do this.
If Shaman is used to submit the job files, store the job's checkout ID
(i.e. the path relative to the checkout root) in the database. This will
make it possible in the future to remove the Shaman checkout along with
the job itself.
Add fields to the job schemas (`SubmittedJob` and `Job`) to allow
storing the shaman checkout ID (so the Shaman checkout can be deleted
along with the job later).
Add a timeout when fetching a job from the persistence layers.
It's my intention to add more timeouts, so this also introduces some code
to make it easier to test that a context has a deadline set.
Add an endpoint that mimicks the job submission endpoint, to see whether
the job survives the job compiler script. This can be used to fail early,
before actually sending files to the farm.
Add an operation `getSharedStorage` that can return the shared storage
location, adjusted for the given audience & platform. This uses the
two-way variables system to adjust the Manager's configuration.
Include a `shortversion` property in the `FlamencoVersion` schema, which
will just be the version number with the release phase (and not the git
hash, the number of commits since the last tag, and the `-dirty` suffix).
Two-way variable replacement now also changes the path separators. Since
the two-way replacement is made for paths, it makes sense to also clean up
the path for the target platform.
This will remove a worker by soft-deletion. Any task still assigned to
the worker will be requeued.
Note that this removal should only happen when the worker is offline, or
it will cause errors on the worker as its credentials will not be
accepted any more.
Instead of erroring out when a symlink already exists, investigate it. If
the linked file is the one that's intended, just use it.
For some reason, BAT and/or the Flamenco add-on include some files twice
in the checkout request to Shaman. This is now handled gracefully.
The etag prevents job submissions with old settings, when the job
compiler script has been edited. The etag is the SHA1 hash of the
`JOB_TYPE` dictionary (as defined by the JavaScript file). The hash is
computed in a way that's independent of the exact formatting in the
JavaScript file. Also the actual JS code itself is irrelevant, just the
`JOB_TYPE` dictionary is used.
Shaman cannot handle cases where the storage path is a symlink (i.e. cases
where `filepath.EvalSymlinks(storagePath)` does not return `storagePath`).
This caused macOS devices to fail the unit tests, because macOS uses a
symlinked path for temporary files.
This commit changes the unit tests, to always use the real path instead of
the OS-provided symlink. This does *not* fix the actual issue in Shaman,
for that see T99965.
Increase verbosity (debug → info) when checkout dir traversal fails, and
add a trace-level log for each file that is still in use. There were some
issues with symlinks, where the wrong paths were compared (see T99965) and
this log made it visible what was going wrong.
The submitter's platform is used to perform two-way variable
replacement. The variables of that submitter's platform are looked up,
and their values are replaced with the variable names. This only applies
to the job's settings and metadata, and is only performed on prefixes.
For example, if the submitter's platform has a variable
`render = /shared/frames`, a job setting
`output = "/shared/frames/shot123"` will be stored as
`output = "{render}/shot123".
When a Worker gets a task of this job, `{render}` will be expanded to
the value appropriate for their platform, hence the "two-way" name.
Add examples to the `WorkerSignOn` and `WorkerStateChanged` schemas.
These will make it easier to test with SwaggerUI, as they reflect a worker
signing on with the default task types.
Change the `FlamencoVersion` schema definition so that it follows the style
of the other schema definitions:
- List properties before mentioning which are required.
- Put quotes around the property names, so that they stand out from the
other YAML keys.
The task log API endpoint was loading the entire log into RAM, then sending
it as response. This makes display in a browser also a bit harder.
The API endpoint now returns some JSON with info about the task log,
including its size and which URL can be used to download it.
Manifest task: T99730