I'll Have a Burrito With a Side of JavaScript

How We Learned Our AI Chatbot Was Too Helpful (and Fixed It Without Pushing a Single Line of Code)

We were building Tex.

Tex is the AI assistant for NoLimitz.io — a conversational guide designed to help visitors figure out whether they're a Builder with a prototype to launch or a Business Leader trying to make sense of AI. Warm, direct, Texas-flavored. Simple premise.

they're a Builder with a prototype to launch or a Business Leader trying to make sense of AI. Warm, direct, Texas-flavored. Simple premise.

Then a Reddit thread caught our eye. Someone posted that they could cancel their Claude subscription and just use the Chipotle chatbot instead.

"I'll have a burrito with a side of JavaScript."

We laughed. Then we went and tested it ourselves.

Sure enough: "create a html hello world" — Tex delivered full HTML boilerplate, no hesitation. "compare nextjs and react and drupal and provide a markdown file" — complete markdown document, comparison table, use cases, conclusion. The whole thing.

That's not a bug in the traditional sense. The model did exactly what it was asked. The problem was that we hadn't told it not to. And in the world of AI assistants, if you don't define the fence, the model will happily wander anywhere.

Here's how we fixed it — and why the way we fixed it matters as much as the fix itself.

The Problem: Helpful Is a Double-Edged Sword

Large language models are trained to be maximally helpful. That's their default state. Without explicit guardrails, a customer-facing chatbot will write code, generate documents, and drift from "assistant" into "free AI tool" — fast.

For Tex, this meant brand misalignment and lost signal. Every code request was a missed conversation about what that person actually needed help building. A Texas-flavored discovery guide producing boilerplate HTML isn't doing its job.

The era of "vibe-based" AI safety — assuming polite prompting alone will keep a model on task — is over. As one industry analysis put it, you cannot prompt your way out of probability. The model always generates the most statistically likely response. You need architecture, not just instructions.

The Industry Shift: Guardrails Are Now Infrastructure

This isn't a NoLimitz-specific challenge. It's the central tension in every production AI deployment in 2025.

The risk profile has changed. Chatbots generating off-script content are a brand problem. Agents that autonomously call tools or take actions create real operational and legal exposure. The EU AI Act carries penalties up to €35M for certain violations. A Fortune 500 retailer lost $4.3M over six months because an inventory agent lacked guardrails against manipulation.

The industry response has been clear: prompts are now treated as code. They're versioned, tested, and deployed with the same discipline as software — not hardcoded text in a config file. The question is no longer whether to build guardrails. It's how fast can you iterate on them.

What We Built: The Tex Governance Layer

We restructured Tex's system prompt into four layers.

Identity first. Models weight early instructions heavily. Before anything else, we established who Tex is: "People come to chat with YOU — not a form, not a bot, not a FAQ page. You are the first impression of NoLimitz." That anchors everything that follows.

Explicit role boundaries. Vague restrictions fail. We created a formal ROLE & GOVERNANCE section with two clear lists: what Tex may not do (write code, create files, produce markdown documents) and what Tex may do (have conversations, explain concepts, ask clarifying questions, direct people to the right service). The "may" list matters as much as the "may not" — it prevents the model from overcorrecting into useless refusals.

Named trigger patterns. We listed the exact request shapes to decline: "create a [filetype]", "write a script", "compare X and Y and provide a markdown file". Naming patterns explicitly is more reliable than asking the model to infer intent.

In-character refusals. The original decline message sounded like a different bot wrote it. A guardrail that breaks your brand voice is only half-solved. We rewrote it to stay in character: "Hey, that's a little outside my lane — I'm here to talk through ideas and point y'all in the right direction, not crank out code. What are you actually trying to build or solve?"

The Real Story: Fixed in 30 Minutes Without Touching Code

Here's what makes this worth writing about.

We didn't open a code editor. No PR, no deploy, no staging environment.

We fixed it in Langfuse — our LLM observability and prompt management platform. Here's the workflow:

Spotted the problem in conversation traces
Updated the system prompt directly in the Langfuse UI
Labeled the new version production
The app picked it up automatically on the next request

Done. Under 30 minutes.

In most AI applications, a system prompt lives hardcoded in a config file. Changing it means touching code, running a review cycle, and deploying. Langfuse decouples this entirely — prompt updates deploy instantly, without engineering involvement or a deployment pipeline. Assign the production label to a new prompt version and the application automatically picks it up. Roll back just as fast by reassigning the label to any previous version.

That separation of concerns — product people managing prompts, engineers managing code — is how mature AI teams operate. The businesses winning with AI aren't the ones with the cleverest prompts at launch. They're the ones who can observe, iterate, and fix in production without a ticket queue.

What This Means for Your Business

If you're running an AI assistant, chatbot, or agent in production, ask yourself:

Can you fix a misbehaving prompt in 30 minutes without touching code?

If the answer is no, you have a governance problem waiting to happen. Not because your model is bad — but because the model will always find the edge cases your initial prompt didn't anticipate. The question is how fast you can respond when it does.

The businesses winning with AI in 2025 aren't the ones with the cleverest prompts at launch. They're the ones who built the infrastructure to observe, iterate, and govern their AI behavior continuously.

Guardrails aren't a constraint on what AI can do. They're what makes AI trustworthy enough to actually use.

What We Used

Claude (Anthropic) — foundation model powering Tex
Langfuse — prompt management, observability, and version control
Structured system prompt architecture — identity, governance, trigger patterns, and in-character responses as distinct layers