Curating your data for Explore makes everyone more productive
More and more people outside of the data team — from product, finance, ops, and marketing — are relying on Hex to get clearer and more specific answers on what’s happening in their world. This is exciting! It means data teams are producing powerful work in Hex that business partners want to dive into further.
We wanted these business partners (and data team members, too!) to have a first-class experience working with data in a way that didn’t require code. So we built a new no-code experience, called Explore. It lets business partners get their hands on the data directly using visual data exploration to answer their follow-up questions in Hex. No waiting on the data team or learning code.
If your business users are already using Hex or you have a hunch they’ll start to soon, we recommend doing just a bit of curation at the data warehouse level. This will ensure that the data they explore and summon with Magic AI can be trusted and is relevant to them.
To start this process, head over to the Data browser and complete the following steps.
Chances are your business users don't need to wade through every nook and cranny of your data warehouse (cough, cough, looking at you… dev_user_42_test_table 👀). To give them access to the data only they need, set up a new data connection to house just the right databases, schemas, and tables that are relevant to them. Not only will your stakeholders feel right at home, but Magic will also serve up data insights tailored just for them. From there you can curate relevant data.
Data connection tips:
Use a clear, consistent, and descriptive name for your connection to make it easily identifiable to team members
Clearly document the purpose of this data connection, its internal owner, and notes about special configurations or limitations.
Business users want data that can give them trustworthy information, so it’s best to prune anything that could be inaccurate, not up-to-date, sensitive or just too raw. Think of this connection as the “Gold layer” from Databricks’ medallion architecture concept — ”organized in consumption-ready, project-specific databases.”
To create a smooth Explore experience, there are three ways to curate what's being seen within your data connection that are not relevant to business Explorers.
Schema filtering (in the Data browser)
Using the Data browser, admins can easily use schema filtering to include or exclude specific databases, schemas, or tables from your data connection. On the refresh, only your selected assets will be synced. We recommend filtering out STAGING
/DEV
/RAW
schemas to start. Any excluded objects can still be queried, they just won’t appear in the data browser, autocomplete, or Magic AI responses. To fully remove access to certain objects, you’ll want to set up role permissions in the actual warehouse.
Magic - Include/Exclude toggles and Endorsements (in the Data browser)
Think of Magic AI like a data exploration sidekick — ready to assist any Explore users who might not speak fluent SQL or Python. Adding an endorsed status to databases, schemas, or tables is the easiest way to quickly tell Magic (and your eager end users) which data is "Approved" or "Trusted" by the data team. And now you can get endorsement suggestions from Magic itself…
*NEW: Magic Curation Suggestions 🪄 Magic will now suggest tables to endorse to Admins! In the Data browser, Magic will automatically surface popular tables and datasets to endorse and you can accept or dismiss suggestions. Magic will then prioritize any endorsed tables when answering questions and generating suggested prompts in Ask Magic. This helps your non-data team users ask the right questions and explore the right data.
If you want to maintain access in the Data browser to certain databases/tables/schemas but never want Magic to use these tables, you can toggle them via “Include/Exclude for Magic” setting.
Warehouse permissioning - (in your data warehouse, not in Hex)
If you don’t want folks in your workspace to be able to access specific tables at all — like ultra sensitive data or raw warehouse data — configure user permissions in your warehouse and your data connection to prevent business partners from querying or viewing the data.
When anyone asks Magic a question, it first uses the metadata from the Data browser to perform a semantic similarity search for tables and columns that might answer that question. You can add descriptions to any database, schema, or table. The more information you add to the Data browser, the more likely that Magic will be match the right tables and columns.
What should you include in metadata? It’s best to include information about what can be calculated from a table and what it should be used for. If there is company jargon or synonyms, explain what they mean or referring to. Dig into more metadata tips that are useful for Magic.
To reduce any potential string hallucinations:
Add enumerations - For low cardinality string columns that are often filtered on or used in case statements,
try explicitly enumerating or describing options for these fields in the Data Browser (this can reduce hallucination rates down to near 0). This could be useful for a question like: “How many orders have shipped but not yet been delivered?”
Try explaining the pattern in natural language - For high cardinality string columns that have too many options to list but follow a consistent pattern
(like a City / State combo), calling out the pattern in natural language like “City State pairs, like 'Memphis TN’” can help Magic understand.
Add custom metadata - Try using natural language to tell Magic when a table should and shouldn’t be referenced.
For example, you could write: only use this table if the prompt explicitly requires raw stripe data, otherwise use
fct_orders.
”
Pro tip: If you want to prototype descriptions and see how Magic does with them, feel free to directly edit/update them in Hex in our Data browser UI! You can then ask Magic your question and see how it does.
If you use dbt Cloud, you can use metadata from your dbt project to enrich the Data browser, making the Explore experience with Magic even more useful. When you use our dbt integration with your data connection, Hex will grab metadata, like: model, source, and column descriptions and tests; when the model was last updated; source freshness tests, and more for Magic to reference. Explore users will also be able to see these descriptions in Hex.
Our last suggestion is to create a semantic model that abstracts away a lot of the complexity and predefines the metrics your business users care more about. We have more coming here soon so stay tuned!
Congrats on making it through our data curation crash course! By carving out just a bit of time to spruce up your Hex workspace, you're giving your stakeholders a reliable and relevant data exploration experience to drive even more value from your data.