Baseline: distribution workspace before observability redesign
This commit is contained in:
149
.agents/ORCHESTRATION.md
Normal file
149
.agents/ORCHESTRATION.md
Normal file
@@ -0,0 +1,149 @@
|
||||
# Distribution Platform Agent Orchestration
|
||||
|
||||
This workspace uses a sequential, branch-based agent workflow.
|
||||
|
||||
## Roles
|
||||
|
||||
Codex manager:
|
||||
|
||||
- Owns orchestration, routing, validation, and reporting.
|
||||
- Writes run metadata and decision records.
|
||||
- Does not implement application code during Opus phases.
|
||||
|
||||
Snarky:
|
||||
|
||||
- Writes product proposals and acceptance criteria.
|
||||
- Reviews whether Opus satisfies the product requirements.
|
||||
- Has final say on product scope and required behavior.
|
||||
- Challenges Opus concerns on their merits.
|
||||
|
||||
Opus:
|
||||
|
||||
- Owns layout and visual design decisions.
|
||||
- Is consulted for technical problems until the manager can drive consensus.
|
||||
- Is the only role allowed to write application code.
|
||||
- Implements on git branches, not worktrees.
|
||||
|
||||
## Branch Rules
|
||||
|
||||
- No parallel implementation agents.
|
||||
- No worktrees.
|
||||
- Every implementation run uses a named branch, for example:
|
||||
- `opus/redesign-distribution-mock`
|
||||
- `opus/install-flow-detail`
|
||||
- `opus/release-console-ui`
|
||||
- The working tree must be clean before Opus implementation starts.
|
||||
- Opus implementation receives an explicit `allowedPaths` list.
|
||||
- Manager validates the final diff before Snarky approval.
|
||||
|
||||
## Run Directory Shape
|
||||
|
||||
Each task gets a durable run directory:
|
||||
|
||||
```txt
|
||||
.agents/runs/<run-id>/
|
||||
state.json
|
||||
agents.json
|
||||
brief.md
|
||||
opus-proposal.md
|
||||
concerns.json
|
||||
snarky-review.md
|
||||
decisions.json
|
||||
implementation-instructions.md
|
||||
progress.log
|
||||
implementation-log.md
|
||||
validation.md
|
||||
final-approval.md
|
||||
```
|
||||
|
||||
Planning and orchestration work also gets a run directory when it should be
|
||||
visible in the dashboard. If the manager is only editing files without creating
|
||||
`.agents/runs/<run-id>/state.json`, the dashboard cannot show that session.
|
||||
|
||||
## Agent Recovery
|
||||
|
||||
Agents are treated as disposable workers. The run directory is the durable
|
||||
source of truth.
|
||||
|
||||
Each active run tracks:
|
||||
|
||||
- `state.json`: current phase, expected artifact, input hash, retry count, and
|
||||
next action.
|
||||
- `agents.json`: active role leases, heartbeat timestamps, process/thread ids
|
||||
when available, and last known status.
|
||||
|
||||
Manager responsibilities:
|
||||
|
||||
- Start one agent phase at a time.
|
||||
- Write the phase input before invoking the agent.
|
||||
- Record the expected output path.
|
||||
- Record a lease timeout for the active role.
|
||||
- Poll or wait for completion.
|
||||
- Mark the agent dead if the lease expires or the process/tool call fails.
|
||||
- Restart the same role with the same input artifact.
|
||||
- Refuse to advance phases until the expected output exists and validates.
|
||||
|
||||
Recovery rules:
|
||||
|
||||
- Snarky can be recreated from the run brief, decisions, and checkpoint files.
|
||||
- Opus proposal/review calls can be retried because they do not edit files.
|
||||
- Opus implementation can only be retried after checking branch and dirty state.
|
||||
- If an Opus implementation dies with a dirty branch, manager must inspect the
|
||||
diff before retrying. Do not blindly overwrite partial code.
|
||||
- Retry count is capped. After repeated failure, manager records a blocker and
|
||||
asks the user.
|
||||
|
||||
Recommended lease defaults:
|
||||
|
||||
- Snarky brief/review: 10 minutes.
|
||||
- Opus proposal/review/technical consult: 30 minutes minimum.
|
||||
- Opus implementation: 45 minutes minimum.
|
||||
|
||||
Opus CLI/tool calls must never use a timeout below 30 minutes, even for
|
||||
read-only design proposals. If a shorter timeout is accidentally used, record
|
||||
the attempt, update the lease, and retry with at least 30 minutes.
|
||||
|
||||
## Status Command
|
||||
|
||||
From the workspace root:
|
||||
|
||||
```sh
|
||||
node .agents/scripts/status.mjs --tail 40
|
||||
```
|
||||
|
||||
Browser dashboard:
|
||||
|
||||
```sh
|
||||
node .agents/scripts/observe.mjs --port 4317
|
||||
```
|
||||
|
||||
This prints every recorded run, every recorded agent, lease expiry state,
|
||||
blockers, next action, and the tail of the active run progress file. The browser
|
||||
dashboard shows the same state and refreshes automatically.
|
||||
|
||||
Managers should append progress events to `.agents/runs/<run-id>/progress.log`.
|
||||
|
||||
## Sequential Flow
|
||||
|
||||
1. Manager creates `.agents/runs/<run-id>/`.
|
||||
2. Snarky writes `brief.md`.
|
||||
3. Manager calls Opus for `opus-proposal.md`.
|
||||
4. Snarky reviews product fit and challenges concerns.
|
||||
5. Opus responds to accepted concern challenges.
|
||||
6. Manager resolves consensus for technical issues.
|
||||
7. Snarky writes `implementation-instructions.md`.
|
||||
8. Manager calls Opus implementation on a named branch.
|
||||
9. Manager validates the branch diff and checks.
|
||||
10. Snarky approves or requests another Opus pass.
|
||||
|
||||
## Opus Runner Tools
|
||||
|
||||
The local `opus-runner` plugin exposes:
|
||||
|
||||
- `opus_branch_status`
|
||||
- `opus_propose_design`
|
||||
- `opus_review_concern`
|
||||
- `opus_solve_technical_issue`
|
||||
- `opus_implement_on_branch`
|
||||
|
||||
Only `opus_implement_on_branch` may enable file editing.
|
||||
Reference in New Issue
Block a user