Why Your AI Collaboration Keeps Going Off Track: Redesigning AI Tasks with State Machines
Why Your AI Collaboration Keeps Going Off Track — Redesigning AI Tasks with State Machines
Have you encountered these situations?
Scenario 1: Context Loss
You: Help me refactor this login module, remember to maintain backward compatibility
AI: Done, I refactored it (30 minutes later)
You: What's the test coverage?
AI: What test coverage? You never mentioned that before
Scenario 2: Goal Drift
You: Fix this bug, and optimize performance while you're at it
AI: (starts refactoring the entire architecture)
You: Wait! I just wanted to fix a bug!
AI: But this is better...
Scenario 3: Unverifiable Completion
You: Is the task complete?
AI: Yes
You: Really complete? Did tests pass? Is documentation updated?
AI: Um... what do you mean by "complete"...
What's the root cause of these three problems?
AI has no "working memory"—every conversation is a fresh start. Traditional "chat-style collaboration" makes AI tasks a tangled mess:
- Context is implicit in conversation history (AI easily loses it)
- Goals are vague in natural language (AI easily misunderstands)
- Completion criteria are in your head (AI has no idea)
The Solution: Redesign AI Tasks with State Machine Thinking
What Is a State Machine?
Don't be intimidated by "state machine"—you use them every day:
Traffic lights are state machines
Current state: Red light
Input condition: Wait 30 seconds
Next state: Green light
Verification: Light turned green
Vending machines are state machines
Current state: Waiting for coins
Input condition: Insert 5 yuan coin
Next state: Can select product
Verification: Screen shows "Please select product"
Core ideas of state machines:
- Clear "where we are now" (current state)
- Clear "where we're going" (target state)
- Clear "how to get there" (transition conditions)
- Clear "have we arrived" (verification criteria)
AI Task State Machine: 5 Elements
We map state machine thinking to AI task management with 5 core elements:
1. Initial State (q₀): What the system looks like when the task begins
Not: What to do (that's the task requirement) But: What it currently looks like
Example: Refactoring login module
Initial State:
Code:
- auth.js has 350 lines
- Contains login/logout/validateToken functions
Tests:
- 15 test cases
- 60% coverage
- All passing
Documentation:
- README has basic description
- Missing API docs
Known Issues:
- Password reset has a bug
- Session timeout too short
2. Target State (F): What the system should look like when complete
Key: This is a set, not a single state (multiple acceptable completion states allowed)
Example:
Perfect Completion:
Code: Refactoring complete, complexity reduced, 20% performance improvement
Tests: Coverage > 90%, all tests passing
Docs: Complete API documentation + migration guide
Issues: All known bugs fixed
Acceptable Completion:
Code: Refactoring complete, backward compatible
Tests: Coverage > 80%, main tests passing
Docs: API documentation complete
Minimum Deliverable:
Code: Core functionality refactored
Tests: Main path tests passing
Docs: README updated
3. Context Space (Σ): All background information needed for task execution
Key: This information doesn't change during task execution (or changes slowly)
Four layers:
| Layer | Content | Example |
|---|---|---|
| Information | Existing code, design docs, API specs | Current auth.js source, technical spec |
| Constraints | Tech stack, performance requirements, compatibility | Must use TypeScript, support Chrome 90+ |
| Standards | Code style, test requirements, commit conventions | ESLint config, coverage > 80% |
| Dependencies | Dependent tasks, subsequent tasks, parallel tasks | User module complete, payment module waiting |
4. Transition Function (δ): AI's execution capabilities
This is the AI's (or agent's) job:
AI Capabilities:
Understand: Understand initial and target states
Plan: Autonomously decide execution path
Execute: Call tools to complete work
Verify: Confirm whether target state is reached
Key characteristics:
- We don't control how AI thinks internally (black box)
- We only control input and expected output
- Same input may produce different paths (non-deterministic)
5. Production State Space (Q): All possible work output states during execution
Important distinction:
- ❌ Runtime state: AI is reading files, thinking (we don't care)
- ✅ Production state: How much code is written, how many tests pass (we care)
Two Core Guarantees: Atomicity + Consistency
Atomicity: Tasks either complete fully or don't happen at all
Traditional approach problems:
You: Help me implement user registration
AI: (writes half the code)
You: Wait, I need to add captcha
AI: (continues writing)
You: No, email verification too
AI: (code gets messier)
Result: Half-finished product, unusable
State machine approach:
Initial State: User module has no registration feature
Target State: Registration feature complete
- Includes form, validation, email confirmation
Verification Criteria:
- User can register
- Receives confirmation email
- Clicks link to activate account
- All tests pass
Result: Either reach this state or return to initial state
Intermediate states are not deliverable
How to guarantee:
- Clearly define "complete" criteria (target state F)
- Verify whether F is reached when task ends
- Not reaching F = task failed, needs re-execution or rollback
Consistency: Same input should produce same result
Traditional approach problems:
First time:
You: Help me write a user authentication module
AI: (uses JWT)
Second time (same words):
You: Help me write a user authentication module
AI: (uses Session)
You: ??? Why different?
State machine approach:
Fixed Input:
Initial State: {codebase status, tech stack, standards...}
Context Space: {must use JWT, reference existing auth module...}
Target State: {auth feature complete, tests passing...}
Guarantee: As long as these three inputs are the same,
AI output should be consistent
(even if paths differ, final result meets target state)
How to guarantee:
- Explicitly define all constraints (context space Σ)
- Include tech choices, reference implementations, design patterns
- Don't rely on "AI's own judgment"
Real Case: Defining a Real Task with State Machine
Requirement: Add comment feature to blog system
Traditional Approach (Chat-style)
You: Help me add comments
AI: OK (starts writing code)
You: Need to support replies
AI: OK (adds reply feature)
You: Also need likes
AI: Sure (keeps adding)
You: Should we add reporting?
AI: Could do (writes more and more)
...(3 hours later, a mess)
State Machine Approach
1. Initial State (q₀)
Code Status:
- Have Post model (title, content, author)
- Have user authentication system
- Database: PostgreSQL
Test Status:
- Post-related test coverage 85%
Tech Stack:
- Backend: Node.js + Express + TypeORM
- Frontend: React + TypeScript
2. Context Space (Σ)
| Layer | Specifics |
|---|---|
| Information | Existing Post model code, user auth implementation, database schema |
| Constraints | Must use TypeORM, comments max 1000 chars, need XSS protection |
| Standards | RESTful API design, test coverage > 80%, code must pass ESLint |
| Dependencies | Depends on: User auth module (complete), Parallel: Frontend UI redesign (in progress) |
3. Target State (F)
Full Completion:
Features:
✓ Users can post comments (login required)
✓ Support two-level replies (reply to comments)
✓ Support liking comments
✓ Comment authors can edit/delete their comments
✓ Admins can delete any comment
Technical:
✓ Comment model created (id, postId, userId, content, parentId, createdAt)
✓ RESTful API implemented (POST /comments, GET /posts/:id/comments, DELETE /comments/:id)
✓ Frontend components complete (CommentList, CommentForm, CommentItem)
Quality:
✓ Test coverage > 85%
✓ XSS protection implemented
✓ API docs updated
Minimum Deliverable:
Features:
✓ Users can post comments
✓ Can view comment list
Technical:
✓ Comment model created
✓ Basic API implemented
Quality:
✓ Main path tests passing
4. Verification Checklist
Check each item when task ends:
Feature Verification:
[ ] Logged-in user can post comment under article
[ ] Comment appears immediately in comment list
[ ] Can reply to comments (two-level replies)
[ ] Can like comments
[ ] Author can edit their comment
[ ] Author can delete their comment
Technical Verification:
[ ] npm test all passing
[ ] npm run lint no errors
[ ] API follows RESTful conventions
Security Verification:
[ ] Comment content XSS filtered
[ ] Non-logged-in users cannot post comments
[ ] Users can only edit/delete their own comments
5. Execution
AI autonomously decides execution path based on (q₀, Σ, F):
- Create database migration
- Implement Comment model
- Develop API endpoints
- Write frontend components
- Write test cases
- Update documentation
Key: AI's execution path may vary, but must satisfy verification checklist (reach F)
Traditional vs State Machine Comparison
| Dimension | Traditional Chat-style | State Machine Style |
|---|---|---|
| Context Management | Implicit in chat history, easily lost | Explicitly defined in Σ, always visible |
| Goal Definition | Vague natural language, "make a feature" | Clear state description, verifiable checklist |
| Completion Judgment | Subjective feeling, "looks about done" | Objective verification, all boxes checked |
| Predictability | Different results each time, luck-based | Same input, consistent result, reproducible |
| Composability | Hard to chain tasks, context breaks | Clear state transfer between tasks, can form workflows |
| Atomicity | Often abandoned halfway, leaves half-products | Either complete or fail, no half-products |
| Parallel Collaboration | Multiple AIs conflict | Clear states, can parallelize, collaborate via state interfaces |
Core Value of This Approach
1. Make AI Tasks Controllable
Three dimensions of control:
- Predictable: Same input → same output
- Verifiable: Clear completion criteria
- Replaceable: Switch AI, same result
2. Make AI Collaboration Reliable
Reliability guarantees:
- Atomicity: Tasks either succeed or fail, no half-products
- Consistency: Multiple executions yield consistent results
- Isolation: Tasks don't interfere with each other
- Durability: State transitions are traceable
3. Make AI Workflows Composable
Task Network:
Task A's target state → Task B's initial state
Task B's target state → Task C's initial state
...
Forms composable workflow
Next Steps: How to Practice
For Product/Project Managers
When defining tasks, ask yourself 5 questions:
- What's the initial state? (What does it look like now)
- What's the target state? (What should it become)
- What context is needed? (What does AI need to know)
- How to verify completion? (What are the completion criteria)
- What's the minimum acceptable completion? (What's the MVP)
For Technical Leaders
When implementing, do three things:
- Establish task templates: Standardize task definition format
- Build context library: Maintain project tech stack, standards, design docs
- Design verification process: Automated tests + manual checklists
For AI Tool Developers
When building, consider three layers:
- Task engine: Parse state machine definition, drive AI execution
- State management: Record state transition traces, support rollback
- Verification framework: Automatically check if target state is reached
Summary
Core Idea:
- Redesign AI tasks with state machine thinking
- Focus on "production state" (what's produced), not "runtime state" (what's happening)
- Explicitly define initial state, target state, context, verification criteria
Core Value:
- Guarantee atomicity: Tasks either complete fully or don't happen
- Guarantee consistency: Same input produces same result
- Improve controllability: AI tasks go from "luck-based" to "predictable"
Where to Start:
- Next time you assign a task to AI, think through these 5 elements first
- Don't say "help me make a feature"
- Say "from state A to state B, must satisfy condition C"
The future of AI collaboration isn't smarter AI—it's clearer task definitions.