FutrixData - Data Sensitivity Classification

AI agents are becoming a new entry point for enterprise data. They read databases, invoke tools, generate SQL, and analyze results — and they pass those results onward to large models for summarization, reasoning, and visualization.

This raises a new safety question. Until recently, only humans saw query results; now, query results enter agent context. Once data is in context, it may continue to flow into external models, plugins, log systems, debugging tools, or token relay services. What enterprises actually need is no longer "can we connect to the database," but "which fields can an agent see, and which must be processed before they ever leave the data source."

FutrixData's sensitivity classification is built for this question. It assigns a sensitivity level to each field, and replaces sensitive values automatically at the data egress according to the agent's allowed access level. The agent can still complete its analysis, but it does not need to see raw email addresses, phone numbers, addresses, payment details, or credentials.

Why Sensitivity Classification Belongs in the AI Agent Era#

1. External LLMs Should Not See Private Data by Default#

When an agent forwards query results to an external LLM, the enterprise hands part of its data control over to an external system. Different model providers, account tiers, and gateways apply different retention and training policies. Even where enterprise services pledge "no training use," organizations still need to prevent employees and automation from sending raw personal data into model entry points that lie outside central governance.

The value of sensitivity classification is that it converts "which fields cannot leave the local environment" into a configurable policy — rather than relying on every agent, every prompt, and every developer to make that judgment in the moment.

2. Token Relay Services Expand the Leakage Surface#

To unify model invocation, many teams deploy token relays, proxy gateways, or third-party invocation platforms. These systems can see the full request and response body, and may retain request logs, error logs, or debug samples.

If query results contain phone numbers, email addresses, addresses, card numbers, or API secrets, the relay becomes a new data-handling party. Sensitivity classification combined with egress masking lets FutrixData process data before it leaves the local environment, so raw values never reach unnecessary intermediaries.

Core GDPR principles include data minimization, security, and confidentiality. The European Commission's interpretation of GDPR principles makes clear that organizations should process only the data necessary for the stated purpose, and apply appropriate technical and organizational measures to protect personal data.

In an AI agent context, this means an enterprise should not deliver customer names, emails, addresses, and phone numbers to a model simply because the agent needs to "analyze order trends." A more defensible approach preserves the structure, statistical relationships, and equivalence relationships the agent needs to do its job — while removing unnecessary raw personal information.

It is worth noting that hashed or pseudonymized data is still typically managed as personal data. The European Data Protection Board has clarified that pseudonymized data remains personal data when it can still be linked to an individual through additional information. FutrixData does not aim to "make compliance disappear" — it aims to reduce the leakage surface and make data egress easier to audit and govern.

How FutrixData Meets Sensitivity Classification Requirements#

FutrixData provides two classification paths: one for external agents, one for the product's built-in AI. Both rely on the same classification store and the same egress policy.

External Agents and Classification Policy#

External agents are agents that connect to FutrixData via MCP, Skill, or CLI tool — Claude Code, Cursor, Codex, OpenCode, Windsurf, and custom agents.

Their typical workflow:

The agent reads the data-source list, table list, and field structure through tools exposed by FutrixData.
The agent reads the current sensitivity configuration, including level definitions, custom rules, and the agent's accessible range.
Based on field names and field types, the agent assigns classifications, optionally adjusted by enterprise rules.
The agent writes the full classification result back to FutrixData.
On subsequent queries via MCP or Skill, FutrixData automatically processes any field outside the agent's accessible range before returning results.

The current default classification spans L1 through L5:

Level	Default Meaning	Examples
L1 Public	Non-sensitive business data	`id`, `status`, `created_at`
L2 Internal	Internal identifiers and metadata	`user_id`, `session_id`, `request_id`
L3 Confidential	Indirect personal info, behavior, location data	`ip_address`, `user_agent`, `device_id`
L4 Sensitive	Direct personal info, financial, medical data	`email`, `phone`, `salary`, `date_of_birth`
L5 Critical	Credentials, payment instruments, high-sensitivity personal data	`password`, `credit_card`, `api_secret`, `home_address`

By default, agents access L1 through L3. L4 and L5 are replaced with hashes before reaching the agent. Enterprises may also customize level names, descriptions, example fields, and the agent-accessible range from the settings page.

External agents are not anonymous. FutrixData generates an independent access key for each Skill or MCP installation and records the agent's origin. This means the enterprise can identify which agent issued a given tool call, the channel it used (MCP or Skill), and revoke a particular agent's access when necessary.

Built-in AI Agent and Classification Policy#

FutrixData also provides a built-in AI Chat. It uses the same sensitivity policy as external agents but interacts differently.

The built-in sensitivity scan only sends schema information — entity name, field name, field type — to the model. No real row data is transmitted. The model assigns field levels, and users can review the result and apply manual corrections in the UI. Manual corrections and writes from external agents are preserved; subsequent rescans do not casually overwrite them.

When the built-in AI Chat queries data, FutrixData maintains two views:

The Console panel shows raw query results to humans for verification.
The agent view, used by AI for analysis, summarization, and visualization, is the masked one.

In other words, the Console is for humans and does not perform field-level masking; the agent egress is for models and automation, and applies the classification policy to sensitive fields.

How Classification Works Technically#

The classification system has three parts.

The first is classification configuration. The system stores level definitions, agent-accessible range, custom rules, and the per-data-source field classification report. The report records each field's level, category, reason, and source — where source can be the built-in AI, manual correction, or an external agent.

The second is classification generation. The built-in scan reads the data source schema and submits field names and types to the configured AI model for classification. External agents read the schema via MCP or Skill, classify on their own, and write the report back through tool calls. Both paths write to the same classification store.

The third is data egress processing. When query results are about to be returned to an agent, FutrixData inspects each field's source table, field name, and saved classification. If a field's level is outside the agent's allowed range, the raw value is replaced by a stable hash. The result also records which columns were processed, supporting debugging and audit.

For SQL queries, FutrixData uses column-source information to determine the originating table. For non-relational results from MongoDB, Elasticsearch, and DynamoDB, the system processes leaf fields based on entity hints, field paths, and nesting paths. When source attribution is uncertain, the system applies a more conservative policy and avoids handing known sensitive fields directly to agents.

The Hash Algorithm FutrixData Uses#

FutrixData uses a secret-backed deterministic HMAC-SHA256 to produce a stable hash of sensitive values:

Convert the raw field value into a string.
Derive the HMAC key from a local root secret stored in OS-level secret storage (Keychain on macOS, Credential Manager on Windows, Secret Service on Linux). The root secret is generated at first launch and never leaves the host. If the OS keyring is unavailable, the daemon surfaces an explicit plaintext-fallback warning rather than silently downgrading.
Compute SHA-256 HMAC over the raw value.
Take the first 16 hexadecimal characters.
Prepend masked: and return to the agent.

Example output:

masked:8f3a1c9b72e04d11

The design serves three purposes.

First, the agent cannot recover the original from the output. Hashing is not encryption; the system has no decryption step, and what is returned to the agent is not reversible ciphertext. The local root secret never reaches agent context — only the masked output does.

Second, the same raw value yields the same hash inside an install. Although the agent does not see the real email or phone number, it can still tell whether two rows share the same value, and continue to deduplicate, group, count, and join on equality.

Third, different installs yield different hashes. Because each FutrixData install generates its own root secret in the OS keyring, the same raw value does not share a masked output across installs. An attacker who scrapes masked tokens from one install cannot use them to look up the same value in another install's logs.

This is not an anonymization guarantee. For low-cardinality data such as phone numbers, postal codes, booleans, or short identifiers, no hash scheme alone constitutes full anonymization. The design intent is: raw values do not appear at the agent egress; the hash output is not decryptable; the enterprise retains the analytical capability it needs; and from a compliance standpoint, this should still be treated as part of a broader set of security and minimization measures.

Where Sensitivity Classification Applies#

Today, sensitivity classification primarily applies at the agent data egress.

MCP Connections#

When an agent calls FutrixData tools through an MCP Server, the request enters the unified tool layer. After execution, results are processed according to the sensitivity policy before reaching the MCP client. This applies to Cursor, Claude Desktop, OpenCode, Codex, and any other MCP-compatible client.

Skill Connections#

When agents invoke CLI tools via FutrixData Skill, they hold an independent access key for the tool layer. Results are processed in the same way before being returned to the Skill caller. The Skill template also embeds the classification workflow, guiding the agent to read configuration, then schema, then save the classification report.

Built-in AI Chat#

Once the built-in AI Chat queries data, the Console-side result for human review keeps raw values, while the result used by AI for analysis, summarization, and visualization is masked. Users see the real data; the model never touches the real sensitive values.

The Console Does Not Mask at Field Level#

The FutrixData Console is the human operating surface. When a user queries a data source directly, the Console preserves raw results and does not apply field-level replacement. This is intentional: the Console exists for authorized humans to inspect real data; MCP, Skill, and the AI analysis chain are the automated egress paths that need stronger protection.

Classification Is Part of the Agent Data Gateway#

Sensitivity classification answers a single question: which fields may an agent see, and which must be processed at egress. FutrixData places it alongside access keys, audit records, risk rules, and dangerous-operation approval to form a complete pipeline.

The end-to-end flow:

The agent enters FutrixData via MCP or Skill.
FutrixData identifies the agent and access channel.
Risk rules and approval logic run before execution.
After execution, results are processed by sensitivity policy.
The audit record retains the tool name, data source, agent origin, execution status, and risk information.

These layers together turn FutrixData from a database connector into a security gateway between enterprise data sources and AI agents.

PII Masking for AI Agents — How sensitivity classification turns into runtime field-level redaction on every agent egress.
Trust & Security — Where credentials live, the local hash-chain audit log, and the audit verify CLI.
Limitations & Threat Model — What deterministic masking does not guarantee, and the scope of the local-only audit.

Data Sensitivity Classification