235 Commits

Author SHA1 Message Date
Sybren A. Stüvel
5dac3c2dc0 Manager: mark workers as 'seen' when they send updates
Update the 'last seen at' timestamp of workers when they:
- sign on
- sign off
- get a task assigned
- send a task update
- check whether they can keep running their task

Note that this commit is necessary to not have the workers time out
immediately ;-)
2022-06-13 12:47:07 +02:00
Sybren A. Stüvel
c3525c3b1a Manager: move task requeueing to TaskStateMachine
Requeueing the tasks of a specific worker is now done in the
`TaskStateMachine`, such that it can be called from other services as
well in future commits.

This also makes the `LogStorage` service a dependency of the
`TaskStateMachine`, as it needs to write "this task was requeued" kind
of messages to the task logs.
2022-06-13 12:33:01 +02:00
Sybren A. Stüvel
24204084c1 Manager: move timestamping of log messages to task_logs package
In the future different services will write to the task log, and thus
it makes sense to move the responsibility of prepending the timestamps
to the log storage service.
2022-06-09 17:00:38 +02:00
Sybren A. Stüvel
819cad1d18 Manager: move broadcasting of task logs via SocketIO to task log service
To ensure all task logs also get broadcast via SocketIO, the responsibility
has moved from the `api_impl` to the `task_logs` package.
2022-06-09 16:49:48 +02:00
Sybren A. Stüvel
92d6693871 Show Task's "last touched" in the web interface 2022-06-09 11:59:43 +02:00
Sybren A. Stüvel
354fd29f9e Manager: Start timeout counting as soon as Worker gets task assigned
Set the task's "last touched" field in the database to "now" as soon as
the task is assigned to a worker.
2022-06-09 11:58:30 +02:00
Sybren A. Stüvel
87bce6be36 Manager: unify logging of task assignment and requeue-on-signoff
The requeue-task-on-worker-signoff operation also needs to log a timestamp.
The code for this, and the recently added code for timestamping the
"task assigned to worker" message, are now unified.
2022-06-09 11:30:46 +02:00
Sybren A. Stüvel
75903a2da3 Manager: prepend timestamp to "task assigned to worker" task log entries
Add a new `clock` service to the Flamenco struct, which allows us to mock
the passing of time, and thus test for timestamps in a stable fashion.
2022-06-09 11:24:02 +02:00
Sybren A. Stüvel
b186ea1828 Manager: write to task log when assigning it to a worker 2022-06-09 10:59:44 +02:00
Sybren A. Stüvel
b4d2fc4231 Manager: keep track of when a Worker last worked on a task
This will be used for keeping track of stuck tasks.
2022-06-03 16:33:50 +02:00
Sybren A. Stüvel
0be1ca30dd Cleanup: manager, move api_impl interfaces to interfaces.go
The number of interfaces declared by the `api_impl` package is getting
large, so they deserve their own file.

No functional changes.
2022-06-03 15:52:07 +02:00
Sybren A. Stüvel
8e7f1e2868 Manager: some extra unit tests for worker signoff behaviour 2022-06-02 16:37:29 +02:00
Sybren A. Stüvel
6cf82e5d43 Manager: cleanup, refactor Worker state change request persistence code
Move the setting & clearing of worker state change requests into separate
functions.

No functional changes.
2022-06-02 16:36:06 +02:00
Sybren A. Stüvel
132ce8f2ec Merge 'shutdown' and 'offline' states
Move the 'shutdown' state code to the 'offline' state, to match the
removal of the 'shutdown' state from the OpenAPI definition.
2022-06-02 16:35:07 +02:00
Sybren A. Stüvel
678308fb6d Manager: allow cancelling worker state change requests
A worker state change request can now be cancelled by requesting the worker
to go to its current state. In other words, a previously requested change
`A → B` can be cancelled by requesting the worker goes to state `A`.

Previously this would simply overwrite the last request, resulting in a
requested state change `A → A`. Having this non-lazy would even interrupt
the currently running task.
2022-06-02 12:43:16 +02:00
Sybren A. Stüvel
9ed6b6d931 Manager: adjust code for WorkerStatusChangeRequest extraction
See preceeding OpenAPI change.
2022-06-02 12:17:54 +02:00
Sybren A. Stüvel
ae6831ce6e Manager: fix unit test
rFcfb17b178da2055ef12b2aa2ad8f7f778a952bc3 changed the semantics of
`SocketIOWorkerUpdate`, in the sense that any update that doesn't change
the worker status can omit `previous_status`. This commit adjusts the
unit test for this.
2022-06-02 12:13:25 +02:00
Sybren A. Stüvel
487a31624f Cleanup: manager, make workerDBtoAPI(w) use workerSummary(w)
This makes the `workerDBtoAPI(w)` and `workerSummary(w)` functions
consistent, and makes the former use the latter.
2022-06-02 12:10:53 +02:00
Sybren A. Stüvel
f97f0a34c3 Manager: implement worker status change requests
Implement the OpenAPI `RequestWorkerStatusChange` operation, and handle
these changes in the web interface.
2022-05-31 17:22:03 +02:00
Sybren A. Stüvel
dd3f99ebaa Manager: Fix unit test 2022-05-31 16:12:28 +02:00
Sybren A. Stüvel
f6dff086ef Manager: show worker version in the workers table 2022-05-31 15:47:26 +02:00
Sybren A. Stüvel
3063e1fe6d Manager: construct api.Worker from api.WorkerSummary + extra fields 2022-05-31 15:30:46 +02:00
Sybren A. Stüvel
2e11c1c240 Manager: Implement SocketIO worker updates 2022-05-31 15:19:12 +02:00
Sybren A. Stüvel
ec02247973 Manager: logging in the FetchWorkers API endpoint 2022-05-31 15:17:39 +02:00
Sybren A. Stüvel
90b567f97c Manager: store software version on worker sign-on 2022-05-31 12:29:25 +02:00
Sybren A. Stüvel
8e247b9dfc Manager: implement fetchWorker API endpoint 2022-05-31 11:21:55 +02:00
Sybren A. Stüvel
6e3667225a Manager: fix bug in sendAPIError() formatting code
Formatting parameters weren't passed to `fmt.Sprintf()` correctly.
2022-05-31 11:10:49 +02:00
Sybren A. Stüvel
19db947eb4 Manager: remove Worker.LastActivity
This removes the field both from the OpenAPI interface and the database.
2022-05-31 10:46:27 +02:00
Sybren A. Stüvel
ce07a46455 Fix error fetching non-existing log tail
A task can exist in the database but not have any log stored on disk yet.
This is now returned as `204 No Content` instead of an internal server
error.

The web interface is also adjusted to cope with this.
2022-05-30 19:23:10 +02:00
Sybren A. Stüvel
08676f48f4 Manager: implement fetchWorkers OpenAPI operation 2022-05-30 18:52:02 +02:00
Sybren A. Stüvel
9bb4dd49dd Manager: add endpoint to fetch task log tail
It returns 2048 bytes at most. It'll likely be less than that, as it will
ignore the first bytes until the very first newline (to avoid returning
cut-off lines). If the log file itself is 2048 bytes or smaller, return the
entire file.
2022-05-20 16:34:13 +02:00
Sybren A. Stüvel
23a5e9df4c Manager: cleanup, reorder some imports 2022-05-20 15:36:05 +02:00
Sybren A. Stüvel
f77b11d85e Manager: add a small wrapper around Google's UUID library
Add a small wrapper around github.com/google/uuid. That way it's clearer
which functionality is used by Flamenco, doesn't link most of the code to
any specific UUID library, and allows a bit of customisation.

The only customisation now is that Flamenco is a bit stricter in the
formats it accepts; only the `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx` is
accepted. This makes things a little bit stricter, with the advantage
that we don't need to do any normalisation of received UUID strings.
2022-05-20 15:35:51 +02:00
Sybren A. Stüvel
792b4ab141 Manager: on worker signoff, add a note to any requeued task logs
When a worker signs off, its tasks get requeued. This is now also saved in
the task log, and broadcast via SocketIO as task log chunk.
2022-05-20 14:17:17 +02:00
Sybren A. Stüvel
64e9f7cbbe Manager: fix unit test
It was missing the task log broadcasting.
2022-05-20 13:57:42 +02:00
Sybren A. Stüvel
3e5f681321 Task log broadcasting via SocketIO
Implement task log broadcasting via SocketIO. The logs aren't shown in the
web interface yet, but do arrive there in a Pinia store. That store is
capped at 1000 lines to keep memory requirements low-ish.
2022-05-20 13:03:41 +02:00
Sybren A. Stüvel
744fabea78 OAPI: rename pkg/api/flamenco-manager.yaml to flamenco-openapi.yaml
Rename `pkg/api/flamenco-manager.yaml` to `flamenco-openapi.yaml`, to
distinguish the OpenAPI definition file from the Flamenco Manager
configuration file of the same name (but in a different directory).

No functional changes.
2022-05-19 15:22:37 +02:00
Sybren A. Stüvel
cc62cab1d6 Update code to handle the JobUpdate to SocketIOJobUpdate rename
No functional changes.
2022-05-19 15:18:06 +02:00
Sybren A. Stüvel
3c622264a4 Manager: include 'activity' in SocketIO task updates
This also changes the order in which the task is updated; the activity is
now saved first, so that it can be included in the task status change
notification sent to SocketIO clients.
2022-05-19 14:27:42 +02:00
Sybren A. Stüvel
797dea85ed Cleanup: manager, document two functions 2022-05-19 14:20:17 +02:00
Sybren A. Stüvel
43f244ecab Manager: move TaskUpdate API function from jobs.go to workers.go
The OpenAPI spec tags this operation as `workers`, so it should be in
`workers.go`.

No functional changes.
2022-05-19 14:20:02 +02:00
Sybren A. Stüvel
0b39f229a1 Implement may-I-keep-running protocol
Worker and Manager implementation of the "may-I-kee-running" protocol.

While running tasks, the Worker will ask the Manager periodically
whether it's still allowed to keep running that task. This allows the
Manager to abort commands on Workers when:

- the Worker should go to another state (typically 'asleep' or
  'shutdown'),
- the task changed status from 'active' to something non-runnable
  (typically 'canceled' when the job as a whole is canceled).
- the task has been assigned to a different Worker. This can happen when
  a Worker loses its connection to its Manager, resulting in a task
  timeout (not yet implemented) after which the task can be assigned to
  another Worker. If then the connectivity is restored, the first Worker
  should abort (last-assigned Worker wins).
2022-05-12 15:06:05 +02:00
Sybren A. Stüvel
ba34652cd1 Implement task status changes from web interface
This also reworks some of the logic due to the recently-removed
`cancel-requested` task status.
2022-05-05 16:44:09 +02:00
Sybren A. Stüvel
18891dda91 Manager: implement FetchTask OAPI endpoint 2022-05-03 13:04:28 +02:00
Sybren A. Stüvel
50c8cd39f2 Task update notifications via SocketIO
Manager now sends out task updates via SocketIO, and the web interface
handles those.

Note that there is a `BroadcastTaskUpdate()` function, but not a
`BroadcastNewTask`. The 'new job' broadcast is sent after the job's
tasks have been created, and thus there is no need for a separate
broadcast per task.
2022-05-03 11:26:24 +02:00
Sybren A. Stüvel
bb68488c5e Cleanup: Manager, add bit of documentation 2022-05-03 10:39:44 +02:00
Sybren A. Stüvel
992fc38604 OAPI: add endpoint for fetching the tasks of a job
Add `fetchJobTasks` operation to the Jobs API. This returns a summary of
each of the job's tasks, suitable for display in a task list view.

The actually used fields may need tweaking once we actually have a task
list view, but at least the functionality is there.
2022-04-22 12:52:57 +02:00
Sybren A. Stüvel
e399b14e66 Manager: cleanup, rename jobId to jobID
No functional changes.
2022-04-22 12:16:11 +02:00
Sybren A. Stüvel
0cd478a409 Manager: move FetchJob function into jobs_query.go
I want to put more of the "get stuff" code into `jobs_query.go`, keeping
`jobs.go` for creationg & manipulation.
2022-04-22 11:51:02 +02:00
Sybren A. Stüvel
6bdc198301 Manager: more graceful errors when receiving task update of unknown task
Return a 404 Not Found when the task can't be found, and a 500 on other
errors.
2022-04-21 19:06:18 +02:00