Technology Software Development

Stop Letting AI Rewrite Your Entire Codebase (And Start Coding Again)

How I went from burning $400/month on AI chaos to actually shipping quality code with LLMs as my assistant, not my replacement

The $400 Wake-Up Call

I used to have Cursor’s highest tier subscription—$200 a month with $400 in monthly credits. Sounds like plenty, right?

Two weeks. That’s how long it took me to burn through the entire quota.

I found myself rationing my coding time based on API credits. Think about that for a moment—I love coding. I want to work all day. But here I was, checking my usage dashboard like someone monitoring a dwindling bank account, wondering if I could afford one more refactoring session before the month reset.

Something was deeply wrong.

But the cost wasn’t even the worst part. The worst part was watching my carefully planned code dissolve into something I couldn’t recognize anymore.

When Good Plans Meet Eager AI

Picture this: You’ve architected something beautiful. You know exactly what you want. You’ve thought through the design patterns, planned the separation of concerns, mapped out the type system. You’re not just playing around—you’re building something real, something that needs to scale, something that actual users will depend on.

So you explain your plan to your LLM assistant. The plan is solid. You’re feeling good.

And then it generates 500 lines of code.

At first glance, it looks… fine? It runs. But as you read through it:

  • The Python typing is sloppy—bare dict and list types everywhere when you need strict typing
  • Data is being passed around as raw JSON objects, making everything impossible to trace
  • There’s zero separation of concerns—just one massive function doing everything
  • It’s like the LLM threw the entire architecture into a blender and hit “frappe”

“No problem,” you think. “I’ll just ask it to fix these issues.”

So it rewrites the entire thing.

Now the typing is overly specific. It’s created seventeen helper functions where you needed three. The original logic that actually worked is gone, replaced with something that breaks in subtle ways you won’t discover until production. And you can’t even remember what the original code looked like because it’s been through four complete rewrites in twenty minutes.

You’re sitting there, staring at your screen, thinking: “I had a plan. What happened to my plan?”

If you’ve experienced this sinking feeling, you’re not alone. And if you’ve sworn off AI coding assistants because of it, I don’t blame you. I nearly did too.

The Real Problem (And It’s Not the AI)

Here’s what I eventually realized: The LLM isn’t bad at coding. It’s actually pretty good—when it understands what you want and operates within clear boundaries.

The problem was me letting it jump straight to implementation.

Think of it like this: Imagine hiring a new developer who, the moment you mention a problem, immediately opens their editor and starts frantically rewriting your entire authentication system before you’ve even finished explaining what’s wrong. You’d stop them, right? You’d say, “Hold on, let’s talk through this first. What are the trade-offs? How does this fit with our existing architecture?”

But with LLMs, we often skip that conversation. We describe a problem and—because the tools make it so easy—let them immediately start generating code. No discussion. No design review. No “have we thought through the implications of this?”

The result? Cascading failures:

  • File A has a subtle architectural inconsistency
  • File B depends on File A, amplifying the problem
  • File C builds on File B, creating something that looks functional but is fundamentally fragile
  • You’ve burned through thousands of tokens generating code you’ll have to throw away

The Three-Prompt Rule That Changed Everything

After months of frustration (and drained API credits), I stumbled into a pattern that actually works. It’s almost embarrassingly simple:

Discuss, decide, then deploy.

Or more specifically: Make the LLM talk through the solution before it touches a single line of code.

This breaks down into three distinct prompts—three phases of conversation that keep you in control while letting the AI handle the heavy lifting.

Phase One: “Here’s How Things Work Around Here”

The first prompt isn’t about solving a problem. It’s about orientation—making sure the LLM understands your project’s architecture, design patterns, and philosophies before it offers any solutions.

Here’s what I actually send:

1
2
3
4
5
6
7
8
9
Need your help with optimizing the code with proper typing and more elegant solutions. 
We'll do things iteratively—meaning we'll discuss and pin down a solution for a problem, 
then I'll say "let's apply this to the code," and only then can you edit anything. 
No code editing during discussion—you need explicit permission.

Now here's the overall project architecture: @project_big_picture.md

Your task right now: understand the project as a whole. We'll discuss specific 
problems later.

The magic ingredient here is project_big_picture.md—a document I maintain that’s part architecture diagram, part philosophy statement, part annotated table of contents.

What goes in this document?

Think of it as the thing you’d give to a new team member on their first day. Not a line-by-line code walkthrough, but the high-level understanding they need:

  • Architecture overview: How the pieces fit together (I literally draw ASCII diagrams)
  • Design patterns we use: “All database access goes through repositories,” “We favor immutability,” etc.
  • File directory map: What each file/module does and how they relate
  • Design philosophies: The why behind our decisions
  • Non-negotiables: “We require strict typing,” “Error handling must be explicit,” that sort of thing

Here’s a real example from one of my projects (a cat photo rating API, because why not):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
## Core Architecture

CatRater follows a layered architecture:

- API Layer (FastAPI) → handles HTTP, validation, serialization
- Service Layer → business logic, rating algorithms  
- Repository Layer → all database access
- Models Layer → Pydantic models with strict typing (no bare dicts!)

## Design Principles

1. Explicit over implicit: We don't pass around JSON blobs. Every data structure 
   has a defined Pydantic model.
2. Separation of concerns: Rating logic stays in services. Database access stays 
   in repositories. No mixing.
3. Type safety: Python 3.11+ with full type hints. mypy must pass strict mode.

## File Map

- `api/routes/cats.py` - Cat photo submission and retrieval endpoints
- `services/rating_service.py` - Core rating algorithm (cuteness factors, whisker symmetry, etc.)
- `repositories/cat_repository.py` - Database operations for cat photos
- `models/cat_models.py` - CatPhoto, RatingScore, etc.

Is it extra work to maintain this document? Yes. Does it save me from re-explaining my architecture every single time I start a new chat? Absolutely.

Why this works:

When the LLM has this context up front, it can explore your codebase intelligently. It reads the relevant files. It understands how pieces connect. Most importantly, it learns what you care about—and it won’t suggest solutions that violate your principles.

Plus, you write this prompt once. Then you reuse it across coding sessions until your architecture meaningfully changes. The time investment pays for itself immediately.

Phase Two: “I’ve Got a Problem. How Would You Solve It?”

Now that the LLM understands your project, you can actually discuss the problem at hand.

The key here: Ask for proposals, not implementations.

Good second prompt:

1
2
3
I've noticed our cat rating logic is tightly coupled to the database repository, 
making it impossible to test without spinning up a full database. How would you 
propose refactoring this to align with our dependency injection patterns?

Bad second prompt:

1
Make the cat rating code testable.

See the difference? The first invites discussion. The second invites the LLM to immediately start rewriting code based on assumptions.

What happens next is what I call the “discussion chain”—a back-and-forth conversation where the LLM proposes solutions and you poke holes in them:

LLM: “I’d suggest extracting an interface for the repository and injecting it into the rating service…”

You: “That makes sense, but what about the rating cache? It’s currently stored in the repository layer.”

LLM: “Good point. We could move caching to the service layer, or introduce a dedicated caching layer…”

This might go on for 3-10 messages. You’re refining the approach together. You’re asking “what if” questions. You’re making sure the solution actually fits your architecture.

And crucially: No code changes happen during this phase.

The LLM is your thinking partner, not your typing monkey. It’s suggesting, explaining trade-offs, showing you where problems might arise. You’re maintaining full control over the design decisions.

I can’t overstate how much stress this eliminates. There’s no frantic “wait, stop, that’s not what I meant!” scrambling. There’s no throwing away thousands of lines of generated code. You’re designing the solution together, in plain English, before a single file gets touched.

Phase Three: “Okay, Let’s Do It”

Only after you’ve fully discussed the approach, considered the trade-offs, and pinned down exactly what you want—only then do you give permission:

1
This approach looks good. Let's apply these changes to the code.

Now the LLM implements.

And because you’ve thoroughly discussed the solution, the changes will:

  • Actually align with your architecture
  • Follow your established patterns
  • Include the right level of typing
  • Solve the problem without creating three new ones

But here’s the beautiful part: You’re not micromanaging the implementation. You’re not specifying every variable name or worrying about which files need imports. The LLM handles those details.

You designed it. The LLM built it.

The Complete Development Loop

Here’s what this looks like in practice:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
Session 1:
├─ Prompt 1: "Understand my architecture" (@project_big_picture.md)
├─ Prompt 2: "Cat rating is coupled to database. Proposals?"
├─ Discussion: (4-5 messages refining the solution)
├─ Prompt 3: "Let's apply it"
└─ Implementation: (LLM makes the changes)

Session 2: (Same architecture, different problem)
├─ Prompt 1: "Understand my architecture" (same @project_big_picture.mdreused!)
├─ Prompt 2: "Image upload validation is inconsistent. Proposals?"  
├─ Discussion: (6 messages)
├─ Prompt 3: "Apply it"
└─ Implementation

Session 3: (Major architectural change)
├─ Implementation revealed we needed a caching layer
├─ Update @project_big_picture.md with new architecture
└─ Next session: Start fresh with updated Prompt 1

When do you update project_big_picture.md?

Update when you’ve made a change that meaningfully alters your architecture:

  • Added a new layer (like that caching layer)
  • Changed a fundamental pattern (REST to GraphQL)
  • Refactored a core abstraction

Don’t update for:

  • Bug fixes
  • New features that follow existing patterns
  • Refactoring that doesn’t change public interfaces

When you do update, I typically ask the LLM to do it:

1
2
3
4
Based on our conversation and changes, update project_big_picture.md. 
Keep the style consistent—generic enough to serve as guidance, specific 
enough to capture our design decisions. Focus on how the architecture 
evolved, not implementation details.

What Actually Changed (Besides My Sanity)

After adopting this approach, here’s what shifted:

Cost: I haven’t hit my $400 limit in months. Seriously. By scoping conversations to one problem at a time and discussing before implementing, I’ve cut my token usage by something like 70%. Sometimes I even copy a proposed solution from one chat and paste it into a new conversation to get a fresh perspective—sounds paranoid, but it saves a ton of tokens.

Code Quality: My code actually follows my architecture now. There’s no drift where the LLM subtly introduces patterns I don’t want. Type safety is consistent. Error handling is explicit. Everything feels intentional.

Debugging: When something breaks in production, I know exactly where to look—because I made the design decisions. The LLM just implemented them. The code structure makes sense to me because I architected it.

Mental Load: This is the surprising one. I thought adding all this structure would make coding feel more bureaucratic. Instead, it’s the opposite. I’m not constantly context-switching between “what do I want” and “what did the LLM just do?” I think at the design level. The LLM handles the implementation details. It feels like pair programming with someone who types really fast and never gets tired.

Speed: Yes, I’m actually faster overall. The time I spend in discussion is dwarfed by the time I save not debugging cascading failures or rewriting entire modules.

For the Skeptics (I Was One of You)

Look, if you’ve tried AI coding assistants and decided they write garbage code, you’re not wrong about what you experienced. I’ve been there. I’ve seen the sloppy typing, the tangled messes, the solutions that technically work but are architectural nightmares.

But here’s the thing: The LLM isn’t the architect. You are.

When you let it jump straight to code, you’re handing over design authority to something that doesn’t understand your project’s constraints, your team’s conventions, or your future maintenance burden. Of course it produces code you’d never ship.

But when you keep it in the discussion phase—when you use it as a thinking partner before giving it implementation authority—it becomes remarkably useful.

Think of it this way: You wouldn’t hire a developer and immediately give them commit access to main before they understand your codebase, right? You’d have them read the architecture docs, discuss approaches, get code reviewed.

Do the same with your LLM. Orientation, discussion, then implementation. In that order. Every time.

The Golden Rule

Discuss, decide, then deploy. Never let the LLM skip straight to implementation.

The moment you see it generating code during what should be a design discussion, you’ve lost control. Pull it back. “Let’s talk through the approach first.”

Your job: Architecture, design decisions, quality standards.

The LLM’s job: Remembering where everything is, handling boilerplate, implementing the approach you designed.

When you maintain that boundary, AI-assisted coding stops being chaos and starts being productive. You get to code faster without sacrificing quality. You get to work all day without burning through API credits. You get to ship code you actually understand and can debug when things go wrong.

And honestly? It makes coding fun again. I’m not fighting with an overager assistant that keeps rewriting my work. I’m designing systems and having them built to spec. That’s what I wanted all along.


(Written by Human, improved using AI where applicable.)