Engineering Practice

Why Your AI Collaboration Keeps Going Off Track: Redesigning AI Tasks with State Machines

Sean6 min read

Why Your AI Collaboration Keeps Going Off Track — Redesigning AI Tasks with State Machines

Have you encountered these situations?

Scenario 1: Context Loss

You: Help me refactor this login module, remember to maintain backward compatibility
AI: Done, I refactored it (30 minutes later)
You: What's the test coverage?
AI: What test coverage? You never mentioned that before

Scenario 2: Goal Drift

You: Fix this bug, and optimize performance while you're at it
AI: (starts refactoring the entire architecture)
You: Wait! I just wanted to fix a bug!
AI: But this is better...

Scenario 3: Unverifiable Completion

You: Is the task complete?
AI: Yes
You: Really complete? Did tests pass? Is documentation updated?
AI: Um... what do you mean by "complete"...

What's the root cause of these three problems?

AI has no "working memory"—every conversation is a fresh start. Traditional "chat-style collaboration" makes AI tasks a tangled mess:

  • Context is implicit in conversation history (AI easily loses it)
  • Goals are vague in natural language (AI easily misunderstands)
  • Completion criteria are in your head (AI has no idea)

The Solution: Redesign AI Tasks with State Machine Thinking

What Is a State Machine?

Don't be intimidated by "state machine"—you use them every day:

Traffic lights are state machines

Current state: Red light
Input condition: Wait 30 seconds
Next state: Green light
Verification: Light turned green

Vending machines are state machines

Current state: Waiting for coins
Input condition: Insert 5 yuan coin
Next state: Can select product
Verification: Screen shows "Please select product"

Core ideas of state machines:

  1. Clear "where we are now" (current state)
  2. Clear "where we're going" (target state)
  3. Clear "how to get there" (transition conditions)
  4. Clear "have we arrived" (verification criteria)

AI Task State Machine: 5 Elements

We map state machine thinking to AI task management with 5 core elements:

1. Initial State (q₀): What the system looks like when the task begins

Not: What to do (that's the task requirement) But: What it currently looks like

Example: Refactoring login module

Initial State:
  Code:
    - auth.js has 350 lines
    - Contains login/logout/validateToken functions
  Tests:
    - 15 test cases
    - 60% coverage
    - All passing
  Documentation:
    - README has basic description
    - Missing API docs
  Known Issues:
    - Password reset has a bug
    - Session timeout too short

2. Target State (F): What the system should look like when complete

Key: This is a set, not a single state (multiple acceptable completion states allowed)

Example:

Perfect Completion:
  Code: Refactoring complete, complexity reduced, 20% performance improvement
  Tests: Coverage > 90%, all tests passing
  Docs: Complete API documentation + migration guide
  Issues: All known bugs fixed

Acceptable Completion:
  Code: Refactoring complete, backward compatible
  Tests: Coverage > 80%, main tests passing
  Docs: API documentation complete

Minimum Deliverable:
  Code: Core functionality refactored
  Tests: Main path tests passing
  Docs: README updated

3. Context Space (Σ): All background information needed for task execution

Key: This information doesn't change during task execution (or changes slowly)

Four layers:

Layer Content Example
Information Existing code, design docs, API specs Current auth.js source, technical spec
Constraints Tech stack, performance requirements, compatibility Must use TypeScript, support Chrome 90+
Standards Code style, test requirements, commit conventions ESLint config, coverage > 80%
Dependencies Dependent tasks, subsequent tasks, parallel tasks User module complete, payment module waiting

4. Transition Function (δ): AI's execution capabilities

This is the AI's (or agent's) job:

AI Capabilities:
  Understand: Understand initial and target states
  Plan: Autonomously decide execution path
  Execute: Call tools to complete work
  Verify: Confirm whether target state is reached

Key characteristics:

  • We don't control how AI thinks internally (black box)
  • We only control input and expected output
  • Same input may produce different paths (non-deterministic)

5. Production State Space (Q): All possible work output states during execution

Important distinction:

  • ❌ Runtime state: AI is reading files, thinking (we don't care)
  • ✅ Production state: How much code is written, how many tests pass (we care)

Two Core Guarantees: Atomicity + Consistency

Atomicity: Tasks either complete fully or don't happen at all

Traditional approach problems:

You: Help me implement user registration
AI: (writes half the code)
You: Wait, I need to add captcha
AI: (continues writing)
You: No, email verification too
AI: (code gets messier)
Result: Half-finished product, unusable

State machine approach:

Initial State: User module has no registration feature

Target State: Registration feature complete
  - Includes form, validation, email confirmation

Verification Criteria:
  - User can register
  - Receives confirmation email
  - Clicks link to activate account
  - All tests pass

Result: Either reach this state or return to initial state
        Intermediate states are not deliverable

How to guarantee:

  • Clearly define "complete" criteria (target state F)
  • Verify whether F is reached when task ends
  • Not reaching F = task failed, needs re-execution or rollback

Consistency: Same input should produce same result

Traditional approach problems:

First time:
You: Help me write a user authentication module
AI: (uses JWT)

Second time (same words):
You: Help me write a user authentication module
AI: (uses Session)

You: ??? Why different?

State machine approach:

Fixed Input:
  Initial State: {codebase status, tech stack, standards...}
  Context Space: {must use JWT, reference existing auth module...}
  Target State: {auth feature complete, tests passing...}

Guarantee: As long as these three inputs are the same,
           AI output should be consistent
           (even if paths differ, final result meets target state)

How to guarantee:

  • Explicitly define all constraints (context space Σ)
  • Include tech choices, reference implementations, design patterns
  • Don't rely on "AI's own judgment"

Real Case: Defining a Real Task with State Machine

Requirement: Add comment feature to blog system

Traditional Approach (Chat-style)

You: Help me add comments
AI: OK (starts writing code)
You: Need to support replies
AI: OK (adds reply feature)
You: Also need likes
AI: Sure (keeps adding)
You: Should we add reporting?
AI: Could do (writes more and more)
...(3 hours later, a mess)

State Machine Approach

1. Initial State (q₀)

Code Status:
  - Have Post model (title, content, author)
  - Have user authentication system
  - Database: PostgreSQL

Test Status:
  - Post-related test coverage 85%

Tech Stack:
  - Backend: Node.js + Express + TypeORM
  - Frontend: React + TypeScript

2. Context Space (Σ)

Layer Specifics
Information Existing Post model code, user auth implementation, database schema
Constraints Must use TypeORM, comments max 1000 chars, need XSS protection
Standards RESTful API design, test coverage > 80%, code must pass ESLint
Dependencies Depends on: User auth module (complete), Parallel: Frontend UI redesign (in progress)

3. Target State (F)

Full Completion:

Features:
  ✓ Users can post comments (login required)
  ✓ Support two-level replies (reply to comments)
  ✓ Support liking comments
  ✓ Comment authors can edit/delete their comments
  ✓ Admins can delete any comment

Technical:
  ✓ Comment model created (id, postId, userId, content, parentId, createdAt)
  ✓ RESTful API implemented (POST /comments, GET /posts/:id/comments, DELETE /comments/:id)
  ✓ Frontend components complete (CommentList, CommentForm, CommentItem)

Quality:
  ✓ Test coverage > 85%
  ✓ XSS protection implemented
  ✓ API docs updated

Minimum Deliverable:

Features:
  ✓ Users can post comments
  ✓ Can view comment list

Technical:
  ✓ Comment model created
  ✓ Basic API implemented

Quality:
  ✓ Main path tests passing

4. Verification Checklist

Check each item when task ends:

Feature Verification:
  [ ] Logged-in user can post comment under article
  [ ] Comment appears immediately in comment list
  [ ] Can reply to comments (two-level replies)
  [ ] Can like comments
  [ ] Author can edit their comment
  [ ] Author can delete their comment

Technical Verification:
  [ ] npm test all passing
  [ ] npm run lint no errors
  [ ] API follows RESTful conventions

Security Verification:
  [ ] Comment content XSS filtered
  [ ] Non-logged-in users cannot post comments
  [ ] Users can only edit/delete their own comments

5. Execution

AI autonomously decides execution path based on (q₀, Σ, F):

  • Create database migration
  • Implement Comment model
  • Develop API endpoints
  • Write frontend components
  • Write test cases
  • Update documentation

Key: AI's execution path may vary, but must satisfy verification checklist (reach F)

Traditional vs State Machine Comparison

Dimension Traditional Chat-style State Machine Style
Context Management Implicit in chat history, easily lost Explicitly defined in Σ, always visible
Goal Definition Vague natural language, "make a feature" Clear state description, verifiable checklist
Completion Judgment Subjective feeling, "looks about done" Objective verification, all boxes checked
Predictability Different results each time, luck-based Same input, consistent result, reproducible
Composability Hard to chain tasks, context breaks Clear state transfer between tasks, can form workflows
Atomicity Often abandoned halfway, leaves half-products Either complete or fail, no half-products
Parallel Collaboration Multiple AIs conflict Clear states, can parallelize, collaborate via state interfaces

Core Value of This Approach

1. Make AI Tasks Controllable

Three dimensions of control:

  • Predictable: Same input → same output
  • Verifiable: Clear completion criteria
  • Replaceable: Switch AI, same result

2. Make AI Collaboration Reliable

Reliability guarantees:

  • Atomicity: Tasks either succeed or fail, no half-products
  • Consistency: Multiple executions yield consistent results
  • Isolation: Tasks don't interfere with each other
  • Durability: State transitions are traceable

3. Make AI Workflows Composable

Task Network:

Task A's target state → Task B's initial state
Task B's target state → Task C's initial state
...
Forms composable workflow

Next Steps: How to Practice

For Product/Project Managers

When defining tasks, ask yourself 5 questions:

  1. What's the initial state? (What does it look like now)
  2. What's the target state? (What should it become)
  3. What context is needed? (What does AI need to know)
  4. How to verify completion? (What are the completion criteria)
  5. What's the minimum acceptable completion? (What's the MVP)

For Technical Leaders

When implementing, do three things:

  1. Establish task templates: Standardize task definition format
  2. Build context library: Maintain project tech stack, standards, design docs
  3. Design verification process: Automated tests + manual checklists

For AI Tool Developers

When building, consider three layers:

  1. Task engine: Parse state machine definition, drive AI execution
  2. State management: Record state transition traces, support rollback
  3. Verification framework: Automatically check if target state is reached

Summary

Core Idea:

  • Redesign AI tasks with state machine thinking
  • Focus on "production state" (what's produced), not "runtime state" (what's happening)
  • Explicitly define initial state, target state, context, verification criteria

Core Value:

  • Guarantee atomicity: Tasks either complete fully or don't happen
  • Guarantee consistency: Same input produces same result
  • Improve controllability: AI tasks go from "luck-based" to "predictable"

Where to Start:

  • Next time you assign a task to AI, think through these 5 elements first
  • Don't say "help me make a feature"
  • Say "from state A to state B, must satisfy condition C"

The future of AI collaboration isn't smarter AI—it's clearer task definitions.