AI-Assisted Development, Fully Under Control

A capable AI can write a week of code in an afternoon. It can also write a week of subtle, plausible-looking bugs just as fast. The difference between those two outcomes is almost never the model. It is the setup around it.

We have written before about the raw speed AI unlocks on a greenfield build, shipping a full production app in a day. This post is the harder half of that story: how you keep the speed under control on a live system, where a confident wrong answer costs real customers rather than a wasted afternoon.

Our approach is simple to state and demanding to do: give the AI maximum leverage, and keep the engineer in total control at every step. Those two goals only sound contradictory if you think control means using AI less. It does not. Control means feeding the model the truth, encoding the rules it has to follow, gating everything it produces, and keeping a human deciding what good looks like. Here is the toolchain and the discipline we wired around a live Laravel and Vue rebuild to make both true at once.

#Control is not less AI

Left to its own devices, a language model guesses. It invents a method that does not exist in your version of a framework, ignores the conventions the rest of the codebase follows, and produces code that compiles, reads well, and is quietly wrong. The naive response is to use it less. The better response is to remove the reasons it guesses in the first place.

We think about that in four layers: give the model accurate context, encode the project's rules, keep a human in front of every decision, and gate every output through automated checks. Each layer makes the AI more capable and the human more in control at the same time. The rest of this post is those four layers and the specific tools and commands we use for each on this stack.

#Give the model the truth, not its training data

Most bad AI output traces back to one thing: the model answered from stale training data instead of your actual project. So the first layer is context, delivered through a set of MCP servers wired into Claude Code.

{
  "mcpServers": {
    "laravel-boost": {
      "command": "docker",
      "args": ["exec", "-i", "php", "php", "artisan", "boost:mcp"]
    },
    "stripe": {
      "type": "http",
      "url": "https://mcp.stripe.com"
    },
    "context7": {
      "command": "npx",
      "args": ["-y", "@upstash/context7-mcp"]
    }
  }
}

context7 serves version-accurate documentation for the front-end ecosystem, Vue, Pinia, TanStack Query, Tailwind, reka-ui, so the model writes against the APIs we actually have rather than the ones it half-remembers. Laravel Boost does the same for the back end and goes further: it exposes version-specific framework docs, the live database schema, read-only queries, a tinker bridge, and Artisan introspection. That means the AI can check the real shape of a table before writing a migration against it, instead of guessing the columns:

# Inspect the real schema before touching it, rather than trusting memory:
docker exec php php artisan db:table products

serena adds semantic navigation of the actual codebase, so edits are grounded in symbols that exist instead of a guess at the file layout. Sentry brings real production errors and stack traces into the loop, and Stripe's documentation and resource lookups keep payment code honest. A model that can look up the exact answer stops inventing one, and that single shift removes most of the hallucinations people blame on AI.

#Encode the rules so the work matches the project

Accurate context stops the model guessing about APIs. It does not stop it writing code that ignores how this project is built. That is the job of a written constitution, a CLAUDE.md at the root of the repo that the AI reads on every task.

## Principles
- Fix the root cause, never the symptom. No band-aid patches, no silenced
  warnings, no version pinned just to dodge a bug.
- Do not know an API? Look it up with search-docs or context7 before writing it.
- Match existing patterns: read two or three sibling files before inventing structure.

## Architecture (enforced by tests/Arch)
- Route -> single-action controller -> service -> repository -> model.
- Every endpoint input and output is a typed Data object. No raw arrays on the wire.
- declare(strict_types=1) at the top of every PHP file; explicit return types.

## Common tasks (the canonical answers, so nobody guesses)
- Rebuild the database:  php artisan app:provision --fresh
- Run one test:          php artisan test --filter=Name
- Regenerate TS types:   php artisan typescript:transform

## Definition of done
- A new or updated test for every change.
- ./check.sh is green: Pint, PHPStan level 10, Pest, Prettier, ESLint.

These are not suggestions. The architecture rules map to tests that fail the build, and the definition of done is a single command. But the most valuable lines are the principles, because they teach judgment rather than syntax. 'Fix the root cause, never the symptom' is the whole difference between an engineer and a code generator, and stating it plainly keeps the AI from reaching for the quick patch that silences a warning instead of solving the problem. The same file indexes the common tasks and their canonical answers and points to the deeper docs for anything that needs more than a line, so neither the model nor a new teammate has to guess how we rebuild the database, regenerate types, or find the migration guide. It is documentation the AI reads on every task and a human can read on their first day. On top of the rules sit task-specific skills: a Stripe best-practices skill so payment flows follow current guidance rather than a tutorial from 2019, and ui-ux-pro-max and frontend-design skills so generated interfaces use our shadcn-vue and Tailwind system instead of generic AI styling. The generated TypeScript from our typed data contracts belongs to this layer too: the AI cannot reference a field the back end does not expose, because the type will not compile. When a contract changes, one command keeps both languages in lockstep:

# Regenerate the TypeScript types from the PHP data classes:
docker exec php php artisan typescript:transform

The rules make the AI's output look like the rest of the codebase, because it is held to the same standard everyone else is.

#Keep the human in front

Tools and rules make the AI capable and consistent. They do not make the decisions, and that is deliberate. It is also the part most teams skip.

Before any non-trivial work, we brainstorm the intent and write a plan, and that plan is reviewed and approved before a line of implementation exists. Features and fixes go through test-driven development, so the specification is written before the code. In that loop the inversion is concrete: the engineer writes the spec, and the AI's job is to make it green.

// tests/Feature/Pricing/QuoteTest.php (the spec, written before the code)
it('applies the volume discount above the threshold', function () {
    $quote = (new QuoteService())->total(units: 120, unitPriceCents: 500);

    expect($quote->discountCents)->toBe(6_000)        // 10% off once over 100 units
        ->and($quote->totalCents)->toBe(54_000);
});

docker exec php php artisan test --filter=QuoteTest

Bugs go through systematic debugging rather than guess-and-check, and nothing is called done until it is verified against reality instead of asserted to be working. The AI proposes, drafts, and accelerates. The engineer scopes, plans, reviews, and decides. That is maximum leverage on the mechanical work and full human judgment on everything that actually matters.

#Gate everything, trust nothing unverified

Speed is only safe if the floor is solid. Every change, whoever or whatever wrote it, clears the same gate before it counts.

# One command runs every gate the AI's output must clear:
./check.sh        # Pint, PHPStan (level 10), Pest, Prettier, ESLint

The same gate runs in continuous integration on every push, so unverified code never reaches the main branch:

# .github/workflows/ci.yml
name: CI
on: [push]
jobs:
  verify:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: ./check.sh --no-fix

Architecture tests fail the build when a controller grows a second responsibility or a contract skips its base class, which means the conventions from the rules layer are enforced, not merely documented. For anything with a runtime surface, we verify it running, and payments are the clearest example of the whole loop. We built the integration with the Stripe MCP for live API references and the Stripe best-practices skill for current patterns: idempotency keys, verified webhook signatures, the right primitive for the job instead of a copied snippet. Then we proved it end to end with the Stripe CLI before it touched real money:

# Forward Stripe's test events to local code, then fire a realistic one:
stripe listen --forward-to https://api.localhost/api/stripe/webhook
stripe trigger payment_intent.succeeded

Playwright does the equivalent for the interface, driving the Vue app through real user flows, and sentry-cli ships sourcemaps so the errors that still slip through arrive readable in production:

sentry-cli sourcemaps upload --release "$APP_VERSION" ./public/build

The gates are what let us move fast without flinching. The AI can take a big swing precisely because the net underneath it is real.

#Tuned to this stack, not a generic kit

None of this is a one-size-fits-all AI starter pack. The toolchain is chosen for exactly this project: Laravel Boost because the back end is Laravel, context7 for the specific front-end libraries we run, the Stripe tooling because the product takes payments, Sentry because it is live and we need real errors quickly, Playwright because there is a genuine single-page app to exercise. A static marketing site would not earn half of these. A data pipeline would want a different set entirely.

That is the same principle we apply to the stack itself, fitting the tools to the project rather than to our habits. The right AI setup is not the longest list of integrations. It is the smallest set that gives the model the context this project needs and gates the output this project demands.

#How Submitit got built

This is exactly how we rebuilt the Submitit platform: a live Laravel and Vue application, modernized at a pace that would not have been possible a few years ago, without trading away the discipline that keeps a live system safe. The case for that speed, and the guardrails that made it responsible, is in the Submitit case study.

#Leverage, not autopilot

AI-assisted development is not autopilot, and selling it that way is how teams ship confident nonsense. Done well, it is a force multiplier pointed by expertise and bounded by guardrails. Feed the model the truth. Encode your rules. Keep a human in front of every decision. Gate everything. Do that, and you get the speed without the regret.

If you want AI woven into your build with that kind of control rather than crossed fingers, let's talk.