Every product team is being asked the same question right now: "where's our AI?" The honest answer is that most AI features fail not because the model is wrong, but because the product around it is wrong. A chatbot in a sidebar is not an AI strategy. A "generate with AI" button on a form field rarely survives the second week after launch.

We've shipped LLM features into a dozen production apps over the last eighteen months — from document-heavy enterprise tools to consumer products at meaningful scale. The stack and the patterns have stabilized. Here's what we actually build with, and the rules we've learned not to break.

The stack

Start with the smallest surface area that can still stream tokens and handle failure cleanly.

  • Next.js App Router for streaming UI directly from React Server Components.
  • Vercel AI SDK for unified provider access (OpenAI, Anthropic, Groq) and ready-made React streaming hooks.
  • Edge runtime for low-latency token streaming close to the user, especially for chat-style surfaces.
  • Postgres + pgvector for embeddings and retrieval — no need for a separate vector DB until you have millions of rows.
  • Upstash Redis for rate limiting, response caching, and cost controls.

This stack gives you a production AI app in a single Next.js codebase without any infrastructure team. That's the real unlock of 2025–2026: the scaffolding for AI features is now as simple as adding an API route.

Three rules we don't break

1. Stream everything.

A four-second wait for a complete response feels broken. The same response streamed token-by-token feels intelligent. This isn't a matter of taste — it's the difference between users trusting your feature and users abandoning it.

The Vercel AI SDK's useChat and useCompletion hooks make this trivial. If you're still doing await response.json() on an LLM call, that's the first thing to fix.

A four-second wait for a complete response feels broken. The same response streamed token-by-token feels intelligent.

2. Always ground the model.

Pure LLM output is a liability. It hallucinates, it repeats training data, and it has no idea what happened last Tuesday in your app. Every serious AI feature retrieves from your own data first, then lets the model reason over the retrieved context.

This is RAG — retrieval-augmented generation — and it's no longer optional. The pattern is:

  1. Embed your own data (docs, emails, product entries) once, store vectors in pgvector.
  2. On user query, embed the query and retrieve the top N most similar chunks.
  3. Pass those chunks as context to the model along with the user prompt.
  4. Stream the answer with citations back to the UI.

Without grounding, you're shipping a product that lies to your users. With grounding, you're shipping a product that knows things.

3. Design for the failure case.

Rate limits, timeouts, content filters, empty responses, hallucinations — they will happen. The question is whether your UI degrades gracefully or crashes.

Every AI surface in our products has four explicit states: idle, streaming, success, error. The error state is not a red toast. It's a considered message that explains what happened and offers a path forward. Users forgive failure. They don't forgive silence.

What we learned the hard way

Cost control is a feature. Without per-user rate limits, one enthusiast can burn a week's API budget in an afternoon. Add a quota layer from day one.

Prompts are code. Version them, test them, diff them. The second you treat prompts as configuration files that anyone can edit, quality collapses.

Observability is non-negotiable. Log every prompt, every completion, every cost. You cannot improve what you cannot see. We use LangFuse for this; several alternatives work.

What's next

The ceiling of what a single AI feature can do is rising faster than most product roadmaps. Structured output, tool calling, and long-context retrieval are opening up workflows that weren't possible six months ago. Teams that build the stack above can move fast when the capability lands — teams still wiring up a chatbot will be rewriting.

AI isn't a feature you add. It's a layer that changes how the whole product behaves. Treat it that way and you'll ship things people actually use.