Workflows for High Quality AI Generated Code

Copilot has freed me to create some powerful GUI tools to rapidly speed up embedded development.

One of the big challenges in embedded development is knowing what is happening in the code execution in real-time. Just a few years ago if you asked me to develop a GUI that connects to my embedded device in real-time and produces graphical traces of each sub-system, I would have said: “give me a team of 4 people and a year, we will get this done.”

Now, I say: “let me carve out 4 hours a day for the next couple of days and I will get this banged out.”

These custom, powerful GUI tools have dramatically improved productivity and code quality in embedded development. However, I’ve recently run into a recurring issue: the GUIs degrade in quality surprisingly quickly. To stay responsive while processing high-throughput real-time data, they must strictly enforce threading discipline. Managing complex data streams from multiple subsystems demands robust object-oriented ownership patterns. And perhaps most critically, a clean Model-View architecture is essential to properly separate business logic, state, and presentation

If you ask Copilot to code up a feature, it will take the fastest path and violate all these principles. Over the past couple of months I have developed a work-flow pattern that seems to work well to avoid all these problems.

The most crucial element: ask copilot to review the code against the design spec and implementation plan constantly.

Copilot Workflow — Iterative Co-Development

This document describes the iterative workflow used between a human developer and GitHub Copilot (agent mode) for designing, building, and maintaining a software project.


Overview

The workflow is a repeating cycle of Design → Implement → Review → Fix → Reconcile. Each cycle produces working, buildable code and keeps documentation in sync with reality.

    ┌─────────┐
    │  Design  │◄──────────────────────┐
    └────┬─────┘                       │
         ▼                             │
    ┌──────────┐                       │
    │Implement │                       │
    └────┬─────┘                       │
         ▼                             │
    ┌─────────┐     ┌──────────────┐   │
    │ Review  │────►│Fix / Refactor│───┤
    └─────────┘     └──────────────┘   │
                                       │
                    ┌──────────────┐   │
                    │Reconcile Docs│───┘
                    └──────────────┘

Phase 1 — Design

Participants: Human + Copilot
Artifacts: design.md, development_plan.md

  1. Human describes the goal, constraints, and reference material
    (existing codebase, Python prototype, protocol specs, etc.).
  2. Copilot drafts a design document covering architecture, layer
    boundaries, file inventory, and data flow.
  3. Human reviews, asks questions, requests changes.
  4. Together they iterate until the design is agreed.
  5. Copilot drafts a development plan: numbered milestones, each with
    concrete tasks, file lists, and a test checkpoint.
  6. Human approves or reorders milestones.

Key rules:

  • Every milestone must produce a buildable, runnable, testable
    application — no “big bang” integration.
  • Architectural constraints (layer boundaries, dependency direction)
    are documented up front and enforced in reviews.

Phase 2 — Implement

Participants: Human directs, Copilot implements
Artifacts: Source code, build output

  1. Human selects the next milestone (or a specific task within it).
  2. Copilot reads relevant context (existing files, design doc,
    reference code) and implements the changes.
  3. Copilot builds the project and fixes compile errors.
  4. Human tests against real hardware or a test harness.
  5. If bugs are found, human reports symptoms (logs, screenshots).
    Copilot diagnoses root cause, implements a fix, and rebuilds.
  6. Milestone is marked complete when the test checkpoint passes.

Key rules:

  • Copilot always builds after making changes — no “it should work”
    hand-offs.
  • One milestone at a time. Don’t skip ahead.
  • When the human reports a bug, Copilot gathers context first
    (reads files, searches code) before proposing a fix.

Phase 3 — Review

Participants: Human requests, Copilot performs
Artifacts: Review findings (in chat)

  1. After a set of milestones, human requests a review comparing the
    implementation against the design document.
  2. Copilot systematically reads every source file and cross-references
    against the spec. It may use sub-agents for parallel investigation.
  3. Findings are categorized:
  • Compliant — matches the spec
  • 🔴 Violation — breaks an architectural rule or misses a
    required feature
  • 🟡 Deviation — works but differs from the spec (layout,
    naming, missing polish)
  • 🟡 Spec gap — implementation added something not in the spec,
    or spec is ambiguous
  1. Human prioritizes which findings to address.

Key rules:

  • Reviews compare code to the written spec, not to assumptions.
  • Each finding must cite the specific file/line and the specific spec
    section it violates.

Phase 4 — Fix / Refactor

Participants: Human directs, Copilot implements
Artifacts: Code changes, build verification

  1. Human says which category of findings to fix (e.g., “fix the
    architectural violations”).
  2. Copilot reads all affected files, forms a plan, then implements
    all fixes.
  3. Copilot builds and verifies zero errors.
  4. Human tests if needed.

Key rules:

  • Fixes are done in batches by category, not one-at-a-time.
  • Copilot must build after all fixes — a fix that breaks the build
    is not a fix.

Phase 5 — Reconcile Documentation

Participants: Human requests, Copilot updates
Artifacts: Updated design.md, development_plan.md

  1. After fixes/refactors change the architecture or add features not
    in the original plan, the docs are stale.
  2. Human asks Copilot to review the docs for accuracy.
  3. Copilot identifies every discrepancy between the docs and the
    actual code.
  4. Copilot updates the docs to reflect reality.
  5. The cycle restarts: the updated docs become the baseline for the
    next implementation or review cycle.

Key rules:

  • Docs describe what is, not what was planned. If the
    implementation diverged for good reason, the doc is updated — not
    the code.
  • New architectural decisions discovered during implementation (e.g.,
    extracting a utility to a different layer) are added to the design
    doc so future reviews enforce them.

Session Continuity

Copilot does not have persistent memory across chat sessions. To
bootstrap a new session effectively:

  1. Workspace files are the source of truth. Design docs,
    development plans, and the code itself carry forward automatically.
  2. Conversation summary is generated at the end of long sessions
    and carried into the next one. It includes: current milestone,
    recent changes, known issues, and build state.
  3. Resumption prompt: When starting a new session, point Copilot
    at the design doc and development plan first. Example:

Read doc/design.md and doc/development_plan.md, then tell me
what milestone we’re on and what’s next.

This ensures Copilot can reconstruct context from the repo rather
than relying on chat history.


Anti-Patterns to Avoid

Anti-PatternWhy It FailsInstead
Implementing without a design docNo baseline for review; drift is invisibleWrite the spec first, even if brief
Skipping the build stepBroken code shipped to the next phaseAlways build after every change
Reviewing against assumptionsFindings are subjective and arguedReview against the written spec
Fixing docs before fixing codeHides real violationsFix code first, then reconcile docs
Giant milestonesImpossible to test incrementallyKeep milestones to 10 to 20 minute chucks
Asking Copilot to “just do it all”Context overload, poor qualityOne milestone or one review at a time

Typical Session Flow

Human: "Read the design doc and dev plan. Where are we?"
Copilot: [reads files] "Milestones 0–N are complete. Next is M(N+1)."

Human: "There's a bug — here's the syslog output."
Copilot: [diagnoses, fixes, rebuilds] "Root cause was X. Fixed."

Human: "Review the code against the design doc."
Copilot: [reads all files] "Found 4 violations, 6 deviations, ..."

Human: "Fix the violations."
Copilot: [implements fixes, builds] "All 4 fixed. Build clean."

Human: "Is the dev plan up to date?"
Copilot: [compares docs to code] "7 discrepancies found."

Human: "Update it."
Copilot: [updates docs] "Done. Cycle complete."

Scroll to Top