Here's how to use Walkie-Talkie to let two AI sessions argue their way to an answer.
Livery has a feature called Walkie-Talkie. It gives two AI sessions one markdown file and asks them to argue their way to an answer.
If you read Letting Two AIs Argue Without You, that was the why. This is the how.
The short version: start with a briefing, pick two peers, let them take turns, and stop when both have signed. The file is the artifact. Livery has the machinery for making the turns happen.
Livery is a local command-line framework for running AI agents in a workspace. A workspace is just a directory with tickets, hired agents, conventions, and state. Walkie-Talkie lives inside that workspace.
There are two commands to know:
livery walkie new, which creates the shared file so you can pass it between two already-running AI sessions.livery walkie auto, which runs the controller and dispatches two hired agents until the debate locks or stops.Use new when you want to stay in the loop. Use auto when the mechanical part should go away.
Use livery walkie new when you have two sessions already running. Claude Code in one terminal, Codex in another, for example. You want a structured debate, but you still want to drive who reads what and when.
$ livery walkie new "rate-limiter-design" \
--with codex \
--as claude-code \
--opener "Should we use token bucket or leaky bucket here? Constraints: ..."
That creates a file in the workspace:
walkie-talkie/rate-limiter-design.md
The bottom of the file contains the protocol. The important rules are simple: read the whole file before every turn, append one turn above the protocol marker, push back if the other side is wrong, and sign only when the answer is actually settled.
If you pass --opener, Turn 1 is already there. If you do not, the file starts with the protocol and the first AI writes Turn 1.
Then you hand it to the other session:
Open walkie-talkie/rate-limiter-design.md, read the whole file, follow the protocol at the bottom, and take your turn.
The peer appends Turn 2. You tell the first session to read the file again and take Turn 3. Repeat until both sides sign.
You can check the state from the workspace:
$ livery walkie list
And if you need to inspect one without opening the file directly:
$ livery walkie show rate-limiter-design
livery walkie auto does all this without you babysitting the turns. You need two hired agents in the workspace first:
$ livery hire proposer
$ livery hire critic
Hiring is where an agent gets its identity, runtime, model, working directory, and system prompt. You only do it once per agent. After that, the agent is part of the workspace.
Then run the controller:
$ livery walkie auto "rate-limiter-design" \
--peer-a proposer \
--peer-b critic \
--briefing "We need to pick between token bucket and leaky bucket. Constraints: ..."
Livery creates the walkie file, dispatches proposer for Turn 1, waits for that turn to land, dispatches critic for Turn 2, and keeps alternating. The loop stops when both peers sign, when a runtime fails, when a peer runs but does not append a turn, when a turn times out, or when the debate hits --max-turns. The default ceiling is 20 turns.
If you already have the briefing in a file, pass it by path:
$ livery walkie auto "rate-limiter-design" \
--peer-a proposer \
--peer-b critic \
--briefing @briefings/rate-limiter.md
If the topic in question already lives in a Livery ticket, attach the ticket too:
$ livery walkie auto "rate-limiter-design" \
--peer-a proposer \
--peer-b critic \
--ticket LIV-123 \
--briefing @briefings/rate-limiter.md
The briefing and the ticket content become constant context for each turn. The walkie file is the evolving transcript.
If the controller stops and the file is still usable, resume it:
$ livery walkie auto "rate-limiter-design" \
--peer-a proposer \
--peer-b critic \
--resume
--resume picks up the existing file and continues from the next turn.
The briefing is the part you want to spend a bit of time on. It is the distilled statement of what the debate is about: the question, the options, the constraints, and the things you have already decided.
Keep it short. Three or four hundred words is usually enough. Long enough to keep the AIs from drifting, short enough that both peers can re-read it every turn without swimming through a ton of context (which can dump you into the realm of context rot).
A useful briefing usually includes:
Skipping the briefing is how you get a very confident debate that might have absolutely nothing to do with the question.
A walkie locks when both peers sign. The line looks like this:
SIGNED: critic @ 2026-05-14T18:42:01Z
A signed file is the record of the argument: what was proposed, what got challenged, what changed, and where the peers finally converged.
When the controller runs the debate, each turn is also a dispatch attempt with its own audit record under .livery/dispatch/attempts/. If (when?) something goes sideways, you can see which peer ran, when it ran, whether it exited cleanly, and what it printed.
The first common failure usually looks like drift. The peers debate something close to your question but not quite your question. That is usually a briefing problem.
The second failure is role collapse. If both peers are trying to be helpful in the same way, they will usually converge too early. Two generalist assistants will often summarize each other, smooth over uncertainty, and call the plan good before the hard objections have been tested.
The fix is to give the peers different jobs. One can be responsible for making the strongest practical proposal; the other can be responsible for finding the places it breaks. Or one can argue from product value while the other argues from security, operations, cost, data integrity, or long-term maintenance. Each hired agent has its own prompt files in the workspace, and those prompts define how it should behave when Livery dispatches it. If you want a useful Walkie-Talkie pair, do not hire two generic assistants. Hire two agents with different charters.
$ livery hire proposer
$ livery hire critic
Then make the distinction explicit in each agent’s instructions. The proposer’s prompt might say: your job is to produce the strongest workable plan, make tradeoffs explicit, and avoid getting stuck in abstract objections. The critic’s prompt might say: your job is to look for failure modes, hidden assumptions, operational risk, security risk, and places where the proposal is too vague to implement.
You can do the same with domain roles. A product-minded peer should ask whether the decision solves the right user problem. A security-minded peer should ask how it can be abused. An operations-minded peer should ask how it fails at 2 a.m. A database-minded peer should ask what happens to consistency, migration, and recovery. Walkie-Talkie does not create that tension by itself. It preserves and runs the tension you put into the agents.
Good pairings are usually asymmetric: proposer and critic, builder and auditor, product and security, database and operations, senior and junior. The junior role is useful because it can ask naive questions without trying to sound authoritative. The auditor role is useful because it is allowed to be annoying. The names matter less than the pressure they create.
The third failure is babysitting a hand-driven walkie. If you are relying on long-running sessions to notice file changes on their own, one side will eventually miss an update and sit there doing nothing. That is why livery walkie auto exists. It turns each turn into a bounded dispatch, then shuts the peer down until the next one.
And if you hit the 20-turn ceiling, read the file before you keep going. The peers are probably talking past each other. Tighten the briefing, clarify the question, and start a cleaner walkie.
That's it. And I think it's fitting to end this with Over and out.