311 Commits

Author SHA1 Message Date
Sybren A. Stüvel
792b4ab141 Manager: on worker signoff, add a note to any requeued task logs
When a worker signs off, its tasks get requeued. This is now also saved in
the task log, and broadcast via SocketIO as task log chunk.
2022-05-20 14:17:17 +02:00
Sybren A. Stüvel
64e9f7cbbe Manager: fix unit test
It was missing the task log broadcasting.
2022-05-20 13:57:42 +02:00
Sybren A. Stüvel
c4cda79ec0 Worker: chunk logs at 10kB instead of 1kB
Send logs in bigger chunks, otherwise a Blender render can cause too many
individual requests.
2022-05-20 13:36:16 +02:00
Sybren A. Stüvel
3e5f681321 Task log broadcasting via SocketIO
Implement task log broadcasting via SocketIO. The logs aren't shown in the
web interface yet, but do arrive there in a Pinia store. That store is
capped at 1000 lines to keep memory requirements low-ish.
2022-05-20 13:03:41 +02:00
Sybren A. Stüvel
d9a955beee Worker: only call may-I-keep-running endpoint every 10 seconds
For debugging it was nice to have this called every second, but for
production use that's a bit too frequent.
2022-05-20 12:57:27 +02:00
Sybren A. Stüvel
8730157c1c Manager: friendlier warning when unknown SocketIO subscription type is used
Replace "invalid subscription type" with "unknown subscription type", as
that's a bit friendlier.
2022-05-20 11:42:09 +02:00
Sybren A. Stüvel
4c18a19786 Cleanup, Manager: move some SocketIO room handling code and add docs
Add some clarifications and move the `roomXXX()` functions into
`sio_rooms.go`.
2022-05-20 11:41:06 +02:00
Sybren A. Stüvel
7b664475ca Rename job status requeued to requeueing 2022-05-19 17:25:53 +02:00
Sybren A. Stüvel
dbd32e56cd Worker: fix FFmpeg issues on Windows
Fix the FFmpeg unit test on Windows, by:
- Having actual input files (otherwise the input-glob-creation-function
  errors out), and
- ensuring the cleanup function is always run, and
- testing for the right CLI arguments.
2022-05-19 16:42:40 +02:00
Sybren A. Stüvel
fd0ff82352 Use new job setting visibility rules
Update the Blender add-on, web interface, and job compiler script to use
the new visibility settings of job settings.
2022-05-19 16:15:13 +02:00
Sybren A. Stüvel
744fabea78 OAPI: rename pkg/api/flamenco-manager.yaml to flamenco-openapi.yaml
Rename `pkg/api/flamenco-manager.yaml` to `flamenco-openapi.yaml`, to
distinguish the OpenAPI definition file from the Flamenco Manager
configuration file of the same name (but in a different directory).

No functional changes.
2022-05-19 15:22:37 +02:00
Sybren A. Stüvel
cc62cab1d6 Update code to handle the JobUpdate to SocketIOJobUpdate rename
No functional changes.
2022-05-19 15:18:06 +02:00
Sybren A. Stüvel
b928896066 OAPI: regenerate code 2022-05-19 15:17:19 +02:00
Sybren A. Stüvel
2c79a10650 Worker: don't log error if may-i-keep-running is shut down
Don't log an error if a worker shutdown (indicated by the context closing)
interrupts a may-i-keep-running call. Instead, log at debug level and just
return "yes, keep running"; we want the Worker to stop the task because it
is shutting down, and not because the Manager told us so.
2022-05-19 15:00:03 +02:00
Sybren A. Stüvel
3c622264a4 Manager: include 'activity' in SocketIO task updates
This also changes the order in which the task is updated; the activity is
now saved first, so that it can be included in the task status change
notification sent to SocketIO clients.
2022-05-19 14:27:42 +02:00
Sybren A. Stüvel
797dea85ed Cleanup: manager, document two functions 2022-05-19 14:20:17 +02:00
Sybren A. Stüvel
43f244ecab Manager: move TaskUpdate API function from jobs.go to workers.go
The OpenAPI spec tags this operation as `workers`, so it should be in
`workers.go`.

No functional changes.
2022-05-19 14:20:02 +02:00
Sybren A. Stüvel
530520b1c7 Implement mass updating of tasks when JobUpdate.refresh_tasks = true
Send & handle `JobUpdate.refresh_tasks = true` when many tasks are
updated simultaneously. This applies to things like cancelling &
requeueing an entire job.

This partially rolls back 67bf77de13d99b1bc5d7344951068822c4fadd88, as
it was too slow when 1000+ tasks were being updated all at once.
2022-05-17 14:48:50 +02:00
Sybren A. Stüvel
0b39f229a1 Implement may-I-keep-running protocol
Worker and Manager implementation of the "may-I-kee-running" protocol.

While running tasks, the Worker will ask the Manager periodically
whether it's still allowed to keep running that task. This allows the
Manager to abort commands on Workers when:

- the Worker should go to another state (typically 'asleep' or
  'shutdown'),
- the task changed status from 'active' to something non-runnable
  (typically 'canceled' when the job as a whole is canceled).
- the task has been assigned to a different Worker. This can happen when
  a Worker loses its connection to its Manager, resulting in a task
  timeout (not yet implemented) after which the task can be assigned to
  another Worker. If then the connectivity is restored, the first Worker
  should abort (last-assigned Worker wins).
2022-05-12 15:06:05 +02:00
Sybren A. Stüvel
fd16f7939e OAPI: regenerate code 2022-05-12 15:06:05 +02:00
Sybren A. Stüvel
bedf10e435 Worker: clarify message when sleep command is aborted
Instead of logging "sleep aborted", the message is now "sleep command
aborted", to make it clear that it's about the sleep command, and not the
"asleep" worker state.
2022-05-12 14:59:10 +02:00
Sybren A. Stüvel
d35ca9d98f Manager: limit database connections
Limit the database connection pool to only a single connection. I hope that
this will solve the intermittent `SQLITE_BUSY` errors I've been seeing.
2022-05-12 13:58:15 +02:00
Sybren A. Stüvel
3d606a3fa0 Manager: task scheduler, fix handling of worker assignment of tasks
Improve how the task scheduler deals with tasks that already have a
worker assigned to them:

- When a Worker asks for a task, and there is already an active task
  assigned to it, always return that task.
- Otherwise, never allow scheduling of active tasks, as those are
  already being run by another worker. If this is not the case, their
  status should change to queued/failed, instead of handling the
  situation in the task scheduler.
- Apart from the assigned-and-active case above, ignore task's worker ID
  when scheduling tasks. If the status is 'queued' or 'soft-failed', the
  task's worker ID just indicates who ran the task last.
2022-05-12 13:52:16 +02:00
Sybren A. Stüvel
9dbc952c09 Worker: move wait time into variable
No functional changes.
2022-05-12 12:44:50 +02:00
Sybren A. Stüvel
d3e2638f84 Cleanup: rename uri to dsn
"DSN" (Data Source Name) is used to indicate which database to open, and
was intermixed with "URI". This is now consistent.

No functional changes.
2022-05-12 11:08:54 +02:00
Sybren A. Stüvel
d673da7a0c Manager: check for stuck jobs at startup
Check for jobs in 'cancel-requested' or 'requeued' statuses, and ensure
they transition to the right status. This happens at startup, before
even starting the web interface, so that a consistent state is presented.
2022-05-06 16:07:27 +02:00
Sybren A. Stüvel
d008991bf4 Revert "Manager: broadcast job/task updates in a separate goroutine"
This reverts commit cd28ef552e2476dda68ba671436b805d7b32a655, as it
breaks the unit tests and I don't want to spend the time to fix those.
2022-05-06 14:48:16 +02:00
Sybren A. Stüvel
98da20f1a9 Manager: vacuum the database at startup 2022-05-06 14:35:34 +02:00
Sybren A. Stüvel
1fc71ccf92 Manager: reduce log level 2022-05-06 14:35:27 +02:00
Sybren A. Stüvel
cd28ef552e Manager: broadcast job/task updates in a separate goroutine 2022-05-06 12:27:10 +02:00
Sybren A. Stüvel
ba34652cd1 Implement task status changes from web interface
This also reworks some of the logic due to the recently-removed
`cancel-requested` task status.
2022-05-05 16:44:09 +02:00
Sybren A. Stüvel
23680c27bf OAPI: regenerate code 2022-05-05 16:36:38 +02:00
Sybren A. Stüvel
7b1b6030d3 OAPI: regenerate code 2022-05-05 16:04:45 +02:00
Sybren A. Stüvel
67bf77de13 Manager: rework mass updates to task statuses
When the job status changes, it impacts the task statuses as well. These
status changes are now no longer done with a single database query, but
instead each affected task is fetched, changed, and saved. This unifies
the regular & mass updates to the tasks, and causes the resulting task
changes to be broadcast to SocketIO clients.
2022-05-03 16:13:44 +02:00
Sybren A. Stüvel
b3e1d1c6de Cleanup: manager, typo fix 2022-05-03 13:05:30 +02:00
Sybren A. Stüvel
18891dda91 Manager: implement FetchTask OAPI endpoint 2022-05-03 13:04:28 +02:00
Sybren A. Stüvel
4da7f67105 OAPI: generate code 2022-05-03 13:03:59 +02:00
Sybren A. Stüvel
891e791853 Manager: reduce log level of socketIO subscription changes 2022-05-03 12:04:27 +02:00
Sybren A. Stüvel
50c8cd39f2 Task update notifications via SocketIO
Manager now sends out task updates via SocketIO, and the web interface
handles those.

Note that there is a `BroadcastTaskUpdate()` function, but not a
`BroadcastNewTask`. The 'new job' broadcast is sent after the job's
tasks have been created, and thus there is no need for a separate
broadcast per task.
2022-05-03 11:26:24 +02:00
Sybren A. Stüvel
bb68488c5e Cleanup: Manager, add bit of documentation 2022-05-03 10:39:44 +02:00
Sybren A. Stüvel
9b330280b7 Add SocketIO subscription system for job-related updates
SocketIO clients can now send a message with `/subscription` event type
in order to subscribe to or unsubscribe from job-related updates.

These job-related updates themselves aren't sent yet, so this is a change
that's impossible to really test. The socketIO code for joining/leaving
rooms is called, though.
2022-05-02 18:36:14 +02:00
Sybren A. Stüvel
8d69bfe069 Cleanup: Manager, reorganise the socketio code a bit 2022-04-29 16:58:48 +02:00
Sybren A. Stüvel
629c073ed7 Manager: fix query for job tasks 2022-04-29 12:26:53 +02:00
Sybren A. Stüvel
992fc38604 OAPI: add endpoint for fetching the tasks of a job
Add `fetchJobTasks` operation to the Jobs API. This returns a summary of
each of the job's tasks, suitable for display in a task list view.

The actually used fields may need tweaking once we actually have a task
list view, but at least the functionality is there.
2022-04-22 12:52:57 +02:00
Sybren A. Stüvel
e399b14e66 Manager: cleanup, rename jobId to jobID
No functional changes.
2022-04-22 12:16:11 +02:00
Sybren A. Stüvel
0cd478a409 Manager: move FetchJob function into jobs_query.go
I want to put more of the "get stuff" code into `jobs_query.go`, keeping
`jobs.go` for creationg & manipulation.
2022-04-22 11:51:02 +02:00
Sybren A. Stüvel
e34a0ba6ea Worker: more granular locking when flushing upstream buffer
Only lock the database mutex when actual queries are performed, but not
during the entire flush loop.
2022-04-21 19:19:01 +02:00
Sybren A. Stüvel
8937a6f06f Cleanup: worker, remove debug timers
Remove accidentally committed debug timing code.
2022-04-21 19:14:09 +02:00
Sybren A. Stüvel
bcbacf6c42 Worker: fix race condition getting logger with worker status 2022-04-21 19:12:53 +02:00
Sybren A. Stüvel
6bdc198301 Manager: more graceful errors when receiving task update of unknown task
Return a 404 Not Found when the task can't be found, and a 500 on other
errors.
2022-04-21 19:06:18 +02:00