Confidently Wrong by Orders of Magnitude
With 70,000 datasets across 15 tools, analysts faced a haystack of similar-sounding tables with subtle, consequential differences. Some tables had encrypted IDs, some didn't. Some adjusted for fraud, some didn't. Some pre-filtered for feedback, some didn't. Missing one nuance (a single column difference, a missing filter) could produce answers off by orders of magnitude. In a real life example, when asked how many ChatGPT users there are, an AI agent incorrectly answered 5,062,338. The real number: 800 million.
Simple Questions, Expensive Answers
A question that should take minutes was costing days of work across teams. A business lead would ask how many ChatGPT Pro users were in Italy. The data scientist would consult another engineer, who would consult another, who would hunt through tables. By the time the answer landed, it had taken three code deep-dives, two quick meetings, and five Slack threads. At 3,500 internal users across 15 tools, the cost of every question compounded across the whole company.
Context Trapped in Tribal Knowledge
A table's name and schema only tell part of the story. Why was it created? What filter did the analytics team apply at ingestion? What does this column mean in this team's vocabulary? That context lived in Notion docs, Slack incident channels, code review comments, and individuals' memories, invisible to any agent looking only at metadata. Without that layer, an AI agent could read the schema correctly and still give a wildly wrong answer.
No Memory Across Conversations
Without a memory layer, every correction had to be relearned. If an analyst told the agent that a particular product type was filtered by a specific value, the next query restarted from scratch. Repeat questions would cost as much time as first-time questions. Team-specific definitions, edge-case filters, and learned corrections disappeared the moment a session ended.
Sensitive Data, Permissions Needed Beyond the Table
OpenAI's data includes information that not every user should be able to see. An agent ranging across all of it had to respect who could access what, without leaking sensitive context through query history or shared chats. Permission checks needed to extend to retrieval, sanitized queries, and chain-of-thought, not just row-level table access.