In the future different services will write to the task log, and thus
it makes sense to move the responsibility of prepending the timestamps
to the log storage service.
The requeue-task-on-worker-signoff operation also needs to log a timestamp.
The code for this, and the recently added code for timestamping the
"task assigned to worker" message, are now unified.
A worker state change request can now be cancelled by requesting the worker
to go to its current state. In other words, a previously requested change
`A → B` can be cancelled by requesting the worker goes to state `A`.
Previously this would simply overwrite the last request, resulting in a
requested state change `A → A`. Having this non-lazy would even interrupt
the currently running task.
rFcfb17b178da2055ef12b2aa2ad8f7f778a952bc3 changed the semantics of
`SocketIOWorkerUpdate`, in the sense that any update that doesn't change
the worker status can omit `previous_status`. This commit adjusts the
unit test for this.
A task can exist in the database but not have any log stored on disk yet.
This is now returned as `204 No Content` instead of an internal server
error.
The web interface is also adjusted to cope with this.
It returns 2048 bytes at most. It'll likely be less than that, as it will
ignore the first bytes until the very first newline (to avoid returning
cut-off lines). If the log file itself is 2048 bytes or smaller, return the
entire file.
Add a small wrapper around github.com/google/uuid. That way it's clearer
which functionality is used by Flamenco, doesn't link most of the code to
any specific UUID library, and allows a bit of customisation.
The only customisation now is that Flamenco is a bit stricter in the
formats it accepts; only the `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx` is
accepted. This makes things a little bit stricter, with the advantage
that we don't need to do any normalisation of received UUID strings.
Implement task log broadcasting via SocketIO. The logs aren't shown in the
web interface yet, but do arrive there in a Pinia store. That store is
capped at 1000 lines to keep memory requirements low-ish.
Rename `pkg/api/flamenco-manager.yaml` to `flamenco-openapi.yaml`, to
distinguish the OpenAPI definition file from the Flamenco Manager
configuration file of the same name (but in a different directory).
No functional changes.
This also changes the order in which the task is updated; the activity is
now saved first, so that it can be included in the task status change
notification sent to SocketIO clients.
Worker and Manager implementation of the "may-I-kee-running" protocol.
While running tasks, the Worker will ask the Manager periodically
whether it's still allowed to keep running that task. This allows the
Manager to abort commands on Workers when:
- the Worker should go to another state (typically 'asleep' or
'shutdown'),
- the task changed status from 'active' to something non-runnable
(typically 'canceled' when the job as a whole is canceled).
- the task has been assigned to a different Worker. This can happen when
a Worker loses its connection to its Manager, resulting in a task
timeout (not yet implemented) after which the task can be assigned to
another Worker. If then the connectivity is restored, the first Worker
should abort (last-assigned Worker wins).
Manager now sends out task updates via SocketIO, and the web interface
handles those.
Note that there is a `BroadcastTaskUpdate()` function, but not a
`BroadcastNewTask`. The 'new job' broadcast is sent after the job's
tasks have been created, and thus there is no need for a separate
broadcast per task.
Add `fetchJobTasks` operation to the Jobs API. This returns a summary of
each of the job's tasks, suitable for display in a task list view.
The actually used fields may need tweaking once we actually have a task
list view, but at least the functionality is there.
The password check of worker API calls was 2 orders of magnitude slower
than actually handling the API call itself. Since the Worker authentication
is not that important (it's all on the same network anyway, and Worker
account registration is automatic too), lowering the BCrypt cost to the
minimum helps.
On my machine, this reduces the time for password checks from 50 to 2 ms.