AI & Cubes

Why pointing Athena at a cube is safer, more accurate, and more privacy-preserving than letting AI write raw SQL against your database.


Table of Contents

Why Cubes Make AI Better

When you ask an AI assistant "what was our revenue last quarter?" against a raw database, the model has to:

  1. Guess which table holds revenue (orders? invoices? line_items? payments?)
  2. Guess which column is the amount (amount? total? gross_total? net_total?)
  3. Guess whether to include refunds, taxes, currency conversion
  4. Hope all those guesses match how your business actually defines revenue

Even with a perfect schema, this gets wrong answers more often than people admit. And you don't see that it's wrong — you just see a number.

A cube fixes this by giving the AI a vocabulary that's already correct. When revenue is a named measure with one definition, the AI doesn't guess — it picks revenue from the cube and the SQL is generated from your reviewed, named, versioned definition. Same number every time, same number across every consumer.

Privacy: What the AI Sees

DataPallas's data-privacy stance is that the AI never sees your data — only your schema and metadata. Cubes strengthen this:

What the AI seesWhen raw SQLWhen using a cube
Full database schema (every table, column, type)YesNo — only the cube's declared dimensions and measures
Joined-table column listsYesNo — only the cube's joined-table aliases
Free-form column names that may hint at PIIYesNo — the cube's title and description fields are what the AI reads
Row dataNoNo

A cube is effectively a redacted view of your schema specifically for AI. You decide what's in it — exclude sensitive tables, rename columns to user-friendly titles, write descriptions in business language — and the AI works only with that surface.

Asking Questions Through a Cube

In the Chat2DB AI panel, after you've created cubes, you'll see them listed alongside raw database connections. Pick a cube as the context and ask questions in plain English:

"Show monthly revenue trend for the last 6 months, broken down by country."

The AI translates this into a cube selection — dimensions: order_date, country, measures: revenue, segments: recent_6_months — and the cube generates the SQL. You see both the question, the cube selection, and the resulting chart side by side. If the answer surprises you, you can drill into the cube selection itself rather than re-reading 40 lines of SQL.

📷 Screenshot placeholder: 200_40_cube-in-chat.png — Athena chat answering a natural-language question by picking fields on a cube, with the result chart and the cube selection visible.

Writing Cube DSL With AI

Cubes are not only consumed by AI — they're also created with AI. Inside the Cube editor, the Hey AI, Help Me… button:

  1. Reads the live database schema for the selected connection
  2. Lets you describe the cube you want in natural language ("a cube on orders with revenue by region and time-based segments")
  3. Drafts the Groovy DSL for you, ready to review and tweak

This is by far the fastest way to get the first draft of a cube right. Review the generated DSL against the DSL Reference, adjust titles and descriptions to match your business language, and save.

📷 Screenshot placeholder: 200_45_cube-ai-help.png — The "Hey AI, Help Me…" dialog showing a natural-language prompt and the generated cube DSL.

Best Practices

  • Write description fields like you're onboarding a new analyst. The AI reads them. "Total Revenue" is fine; "Sum of completed orders' net amount in USD, after refunds" is much better — and removes a class of wrong answers.
  • Hide what you don't want asked about. A cube is a curated surface. If a table contains PII or pre-launch product data, don't expose it as a dimension. The AI can only ask about what you put in the cube.
  • One cube per business domain, not per table. A good cube spans the fact table plus its commonly-joined dimension tables (orders + customers + products). Don't make one cube per database table — that's just a thin wrapper around the schema.
  • Start broad, narrow over time. Ship a cube with the obvious dimensions and measures, watch what users actually ask, then add segments and hierarchies for the patterns that emerge.
  • Cubes are versioned with your code. They live in config/cubes/ as XML files. Commit them. Code-review them. Treat changes to a measure definition the same way you'd treat a change to a critical SQL view.