A message-driven orchestration framework envisioned from the ground-up for Human-in-the-Loop workflows. Think accelerated, distributed/federated machine learning where fast iterations and continuous fine tuning stand in foreground; where you want humans validating, correcting, and steering the data pipelines rather than just fire-and-forget inference, or bulk data -> bulked model training.
The architecture is deliberately minimal: ZeroMQ based broker, coordinating worker nodes through a rather spartanic protocol that extends MajorDomo. Messages carry UUIDs for correlation, sender/receiver routing, type codes for context-dependent semantics and optional (but very much used) payloads. Pipeline definitions live in YAML files (as do worker and client configs) describing multi-step workflows with conditional routing, parallel execution, and wait conditions based on worker responses. Python is the language of the logic part.
I am trying to follow the "functional core, imperative shell" philosophy where each message is essentially an immutable, auditable block in a temporal chain of state transformations. This should enable audit trails, event sourcing, and potentially no-loss crash recovery. A built-in block-chain-like verification is something I'm currently researching and could add to the whole pipeline processing.
The hook system provides composable extensibility of all main user-facing "submodules" through mixin classes, so you only add complexity for features you actually need. The main pillars of functionality, the broker, the worker and the client, as well some others, are designed to be self contained monolithic classes (often breaking the DRY principle...), whose additional functionality is composed rather than inherited through mixins that add functionality while at the same time minimizing the amount of added "state capital" (accent on behaviour rather than state management). The user-definebale @hook("process_message"), @hook("async_init"), @hook("cleanup") etc. cross-cut into the lifecycle of each submodule and allow for simple functionality extension.
I'm also implementing a very simple distributed virtual file system with unixoid command patterns (ls, cd, cp, mv etc) supporting multiple backends for storage and transfer; i.e. you can simply have your data worker store files it subscribes to in a local folder and have it use either its SSH, HTTPS or FTPS backend to serve these on demand. The data transfers employ per file operation ephemeral credentials, the broker only orchestrates metadata message flow between sender and receiver of the file(s), the transfer happens between nodes themselves. THe broker is the ultimate and only source of truth when it comes to keeping tabs on file tables, the rest sync, in part or in toto, the actual, physical files themselves. The VFS also features a rather rudimentary permission control.
So where's the ML part, you might ask? The framework treats ML models as workers that consume messages and produce outputs, making it trivial to chain preprocessing, inference, postprocessing, fine-tuning, and validation steps into declarative YAML pipelines with human checkpoints at critical decision points. Each pipeline can be client-controlled to run continuously, step-by-step, or interrupted at any point of its lifecycle. So each step or rather each message is client-verifiable, and clients can modify them and propagate the pipeline with the corrected message content; the pipelines can define "on_correction", "on_rejection", "on_abort" steps for each step along the way where the endpoints are all "service" that workers need to register. The workers provide services like "whisper_cpp_infer", "bert_foo_finetune_lora", "clean_whitespaces", "openeye_gpt5_validate_local_model_summary", etc., the broker makes sure the messages flow to the right workers, the workers make sure the messages' content is correctly processed, the client (can) make(s) sure the workers did a good job.
Sorry for the wall of text and disclaimer: I'm not a dev, I'm an MD who does a little programming as a hobby (thanks to gen-AI it's easier than ever to build software).