Thanks for sharing your experience! I know there's many of us out there dabbling with LLMs and some solid businesess built on Ruby, lurking in the background without publishing much.
Your single-tool approach is a solid starting point. As it grows, you might hit context window limits and find the prompt getting unwieldy. Things like why is this prompt choking on 1.5MB of JSON coming from this other API/Tool?
When you look at systems like Codex CLI, they run at least four separate LLM subsystems: (1) the main agent prompt, (2) a summarizer model that watches the reasoning trace and produces user-facing updates like "Searching for test files...", (3) compaction and (4) a reviewer agent. Each one only sees the context it needs. Like a function with their inputs and outputs. Total tokens stay similar, but signal density per prompt goes up.
DSPy.rb[0] enables this pattern in Ruby: define typed Signatures for each concern, compose them as Modules/Prompting Techniques (simple predictor, CoT, ReAct, CodeAct, your own, ...), and let each maintain its own memory scope. Three articles that show this:
- "Ephemeral Memory Chat"[1] — the Two-Struct pattern (rich storage vs. lean prompt context) plus cost-based routing between cheap and expensive models.
- "Evaluator Loops"[2] — decompose generation from evaluation: a cheap model drafts, a smarter model critiques, each with its own focused signature.
- "Workflow Router"[3] — route requests to the right model based on complexity, only escalate to expensive LLMs when needed.
And since you're already using RubyLLM, the dspy-ruby_llm adapter lets you keep your provider setup while gaining the decomposition benefits.
Thanks for coming to my TED talk. Let me know if you need someone to bounce ideas off.
[0] https://github.com/vicentereig/dspy.rb
[1] https://oss.vicente.services/dspy.rb/blog/articles/ephemeral...
[2] https://oss.vicente.services/dspy.rb/blog/articles/evaluator...
[3] https://oss.vicente.services/dspy.rb/blog/articles/workflow-...
(edit: minor formatting)