Blog

Context engineering vs. prompt engineering: what data teams actually need

Prompt tweaks helped early on. Then the CFO got a revenue number that was off by 15% — and everything the AI had written was syntactically perfect.

The Hex Team

Data teams

April 2, 2026

Your AI analytics tool just gave the CFO a revenue number that's off by 15%. The SQL ran without errors and produced a good-looking chart. But the AI calculated revenue as SUM(amount) from the transactions table instead of SUM(order_total) WHERE status = 'completed' AND refunded = FALSE from the orders table, because nobody told it the difference.

This is the kind of failure that keeps data teams up at night. Not the obvious errors where the query breaks, but the quiet ones where everything looks right and still isn't. And it's this kind of failure that separates prompt engineering from context engineering.

For general-purpose AI tasks like drafting emails or summarizing documents, the distinction is mostly academic. But when the task is "query our revenue data accurately across 40 endorsed tables," the difference is between a number you can defend in a board meeting and one that quietly erodes trust in your entire data operation.

What prompt engineering and context engineering actually mean

Prompt engineering is the practice of refining how you communicate with an AI model to get better outputs. You choose the right words, structure your instructions clearly, and iterate on how you ask. It happens inside the context window: a specific instruction for a specific task.

Context engineering is the discipline of designing the full information environment the AI operates in: schema metadata, metric definitions, governance rules, conversation history, and observability signals. Anthropic's engineering team has described context engineering as the "natural progression" of prompt engineering, covering strategies for curating all the information that reaches the model, not just the instructions you write.

Prompt engineering gets you the first good output. Context engineering keeps quality high as usage scales.

In analytics, that distinction is critical. A well-crafted prompt can summarize a dataset or reformat a query. But when the AI needs to choose the right table from dozens of options, apply the correct metric definition, follow the right join path, and produce a number someone can stake a decision on, you've moved beyond what any prompt can handle alone. You need infrastructure.

Why good prompts aren't enough for trusted analytical answers

Real enterprise databases break AI in ways that prompt engineering can't fix. Models that perform well on small, clean schemas often struggle on real enterprise warehouses: more tables, more columns, more ambiguity. No amount of rephrasing your question closes that gap. The failures are structural, and they show up in predictable ways.

Hallucinated joins

Enterprise databases often have multiple valid paths between tables. An orders table might connect to customers through customer_id, billing_address_id, or shipping_address_id. The AI picks whichever path seems logical based on pattern matching, and can be confidently wrong. The query runs, the numbers look plausible, and your regional revenue analysis is off.

Inconsistent metric definitions

When someone asks "what's our revenue?" the AI has to decide how to calculate it. Without a canonical definition, it guesses — maybe SUM(amount) from transactions instead of the business logic that excludes refunds and incomplete orders. If your own teams struggle to agree on what "revenue" means across tools, the AI won't infer the right answer from a prompt.

Confident answers built on the wrong tables

At enterprise scale, column names are often cryptic (cust_addr_typ, ord_ln_itm_stat), and legacy tables sit alongside production ones. The AI can't tell which tables your team endorses for analysis and which are deprecated artifacts. Mismatches between the table the AI selects and the table that would actually produce correct results are common — and they're silent.

Results nobody can audit

Even when the AI produces correct numbers, there's no trail showing which metric definition it used, which tables it accessed, or what filters it applied. For any team subject to compliance requirements, or a skeptical CFO, that's a nonstarter.

These aren't edge cases. They're the default behavior when AI operates without governed context.

What context is made of

Most of these failures have a direct structural fix. Each layer of context you add closes a specific gap, and you don't need all of them before you start seeing improvement.

Endorsed tables

Endorsed tables tell the AI which tables are production-ready and which are experimental, deprecated, or just not meant for ad-hoc analysis. This is the lightest thing you can do, and it directly addresses the wrong-table problem.

Warehouse schema and table descriptions

These go one level deeper: what columns contain and what they're named cryptically for historical reasons. Without them, the AI is pattern-matching on column names and guessing at the rest.

For an AI agent, these descriptions should read like onboarding instructions for a new junior analyst joining the team, helping them to understand what data exists in each column.

Workspace rules

Workspace rules encode the institutional knowledge that doesn't fit neatly into a schema: which tables to prefer for which questions, what filters to always apply, how to handle the edge cases that trip up even experienced analysts. It’s more onboarding for the agent and should focus on the “how to use this data” rather than the “what the data is”.

Semantic models

Semantic models are where you actually enforce metric consistency. "Revenue" stops being an open question and becomes SUM(order_total) WHERE status = 'completed' AND refunded = FALSE. Which dimensions can slice which metrics. Which joins are valid. If you've already defined these in dbt MetricFlow, Cube, or Snowflake, you can sync them in rather than rebuilding from scratch, and that's where the largest accuracy improvements show up, because the AI stops guessing at definitions and uses the real ones.

Conversation history

Conversation history preserves context across turns. When someone asks "now break that down by region," the AI needs to know what "that" refers to. Without it, every follow-up is a fresh start.

Observability signals

Observability signals close the loop: tracking which questions are being asked, where answers are failing, and which parts of your data have thin context. This is what turns context engineering from a one-time setup into something that actually gets better over time.

When to use prompt engineering vs. context engineering

Prompt engineering and context engineering aren't competing approaches. They operate at different scales of the problem, and knowing which one you need is mostly a question of what you're trying to guarantee.

Prompt engineering works well for bounded, single-turn work: summarizing a dataset you've already pulled, reformatting a SQL query, generating exploratory analysis on a familiar schema, translating statistical output into plain language for a stakeholder. The schema is known, the scope is contained, and you're reviewing the result before anyone acts on it.

Context engineering takes over when answers need to be consistent across many people, questions, and time. Choosing the right table from dozens of options. Applying the canonical metric definition, not an improvised one. Constructing joins that follow actual business relationships. Producing a number a CFO can audit six months from now. These aren't things you solve by tweaking the prompt. The infrastructure needs to be there before anyone asks the question.

One useful rule of thumb: one analyst running a one-off analysis and reviewing their own output can probably get by with prompt engineering alone. Dozens of people across the organization asking AI questions about your data every day, needing consistent and defensible answers? That's context engineering territory.

AI shifts the bottleneck from execution to alignment. Writing SQL, building charts, running queries: all of that gets faster. But whether the AI is using the right data, the right definition, and the right logic becomes more important, not less. Prompt engineering sharpens the questions you ask. Context engineering makes the answers worth trusting.

How shadow AI signals the gap

You've probably seen this. The request queue is months deep, so business users start uploading CSVs to outside AI tools. They get answers in seconds, but with no shared metric definitions, no audit trail, and no way to verify which table or calculation the AI used.

Making the governed path faster and easier than the ungoverned one is what fixes it. When anyone can ask a question in natural language and get self-serve answers grounded in canonical metric definitions, with the SQL visible and inspectable, the incentive to take data outside drops considerably.

PandaDoc's team had 80% of analyst time going to repetitive requests, with business users turning to AI chatbots that returned inconsistent numbers. By pairing Hex's Threads with Cube semantic models via semantic model sync, the team cut time on tasks like click-through rate analysis by 75%, with analysts able to inspect the underlying queries and semantic models to verify accuracy. When the governed path is that fast, people use it.

Building context: a practical path

Context engineering works best as a progression, not a project. Here's roughly how teams move through it.

Days 1–5: endorse tables and add descriptions. Start with the 20 tables that account for most of your team's queries. Mark them as production-ready. Add table-level and column-level descriptions that clarify what each one contains and what it's actually for. This takes hours, not months, and you'll see immediate improvement in which tables the AI reaches for.

Weeks 2-3: add workspace rules and semantic models. Workspace rules are usually a few paragraphs: which tables to prefer for which questions, what filters always apply, how to handle the edge cases that trip people up. Semantic models go deeper: your core metrics in code, the joins that are valid, the dimensions that slice each metric correctly. If you've already done this work in dbt, Cube, or Snowflake, sync it in rather than starting over. This is where the accuracy improvements compound, because the AI stops guessing at what your business terms mean.

Weeks 3-5: build observability loops. Start tracking which questions are coming in, where answers are falling short, and which areas of your data have thin context. Context Studio surfaces these patterns so you can see exactly where to invest next rather than guessing. This is the shift from context engineering as setup to context engineering as ongoing practice.

Ongoing: distribute stewardship. This is the part most teams underestimate. Context engineering breaks down when a single team owns all of it, because it becomes a bottleneck, definitions go stale, and the people closest to the data stop trusting what they see.

The data team should own the infrastructure and governance framework: the endorsement structure, the workspace rules architecture, the observability tooling. But the content of that context should belong to the people who actually know it. Finance owns the revenue metric definitions. Product owns the feature adoption metrics. Marketing owns campaign attribution logic. Those domain experts are the ones who catch edge cases, flag outdated rules, and know when a definition needs updating. Build the infrastructure to make that easy for them and the context gets sharper over time instead of drifting.

According to Hex's State of Data Teams 2026 report, 31% of data leaders name trust as their top concern around AI adoption, cited nearly twice as much as any other concern. Distributed stewardship is how that trust stays grounded in the current state of your data, rather than drifting into infrastructure nobody maintains.

Prompt engineering now, context engineering as you scale

The short answer is that you'll use both, and the right one depends on where you are.

Early on, when you're exploring and the output is mostly for you, prompt engineering is enough. As AI analytics spreads across your organization, with more people asking questions and more decisions depending on consistent answers, context engineering is what makes it work. Every table description you add, every metric you define in code, and every workspace rule you encode makes the next AI interaction more reliable than the last. The context compounds.

When anyone in your organization can ask a question and get an answer grounded in the same definitions your data team would use, with full transparency into how that answer was built, that's when AI analytics starts earning real trust.

Hex builds this context layer into the core of its AI analytics platform, so the governed path is the fastest path. Request a demo to see how it works in practice.

Frequently Asked Questions

Do I need a fully built semantic layer before I can start getting value from context engineering?

No. Context engineering is a progression, and the lightest actions make a real difference on day one. Endorsing your most-queried tables and adding column descriptions prevents the most common class of wrong-table errors before you've written a single semantic model. Most teams see meaningful improvement from endorsements and workspace rules first, and they build confidence about where to invest the deeper work. Semantic models are where the largest accuracy gains tend to show up, but they're a destination you work toward, not a prerequisite for getting started.

How does context engineering interact with semantic layer tools we already have, like dbt or Cube?

It builds on them rather than replacing them. If you've already defined metrics in dbt MetricFlow or Cube, those definitions become a core part of the context that powers AI answers. You're not rebuilding what you've already invested in. Hex supports semantic model sync with dbt MetricFlow, Cube, Snowflake, and Databricks, so the work you've done in those tools flows in directly. Workspace rules, endorsed tables, and observability signals then wrap around those semantic definitions, giving the AI what it needs to select the right metric, apply the right joins, and produce results you can actually audit.

How do you know if your context layer is actually working?

Watch the behavioral signals. Do people stop rephrasing the same questions to get useful answers? Do the questions that caused quality issues 30 days ago still cause problems today? Are analysts spending less time correcting AI output before sharing it with stakeholders? Those patterns are reliable indicators. Qualitative signals matter too: if business users start acting on AI answers without looping in the data team to verify them, that's usually a sign the context layer is doing its job. Context Studio surfaces these patterns systematically, but you'll often feel the shift before you measure it.

Get "The Data Leader’s Guide to Agentic Analytics" — a practical roadmap for understanding and implementing AI to accelerate your data team.

Download

Request a demo