Tuning Omni AI for cost and quality

The behavior of Omni’s Agent on a model is controlled by the ai_settings parameter. Several of its sub-parameters compose into a spectrum between “cheap and fast” and “accurate but expensive” — which means you can deliberately tune a model for the workloads it actually serves, rather than accepting defaults everywhere. This guide maps those knobs to three starting profiles — cost-optimized, balanced, and max-quality — that you can adopt directly or adapt to your organization.

Requirements

To follow this guide, you’ll need:

Connection Admin or Modeler permissions on the model you want to work with
An understanding of the parameters documented on the ai_settings reference

How AI settings affect cost

Every AI turn in Omni costs LLM tokens. Different ai_settings parameters affect token usage in different ways, and some affect cost more than others.

Parameter	Cost
`analyze_configuration.model`	High. `smartest` is multiple times more expensive per LLM token than `standard`.
`analyze_configuration.thinking`	High. Each level adds reasoning LLM tokens to every turn; `high` on `analyze` compounds fastest.
`validate_analysis`	High. Adds additional model turns per analytical turn, each at the analyze tier and thinking level.
`conversation_prune_length`	Medium. Claude/Anthropic models only. Higher thresholds mean more LLM tokens billed per turn on long conversations.
`query_all_views_and_fields`	Low–medium. A larger search surface means more views considered per query, which costs LLM tokens.
`build_configuration.*`	Low in aggregate. Model-building tasks run less often than analysis.
`simple_summarize_configuration.*`	Low per call, but high volume. Keep at `fastest` / `none` unless you have a specific reason to upgrade.

Tuning profiles

Each profile below is a recommended starting point — not a rigid recipe. You can mix parameters across profiles and override settings on specific models when individual use cases warrant it. See Mix and override profiles below.

Profile	Best for	Tradeoffs
Cost-optimized	High-volume AI usage, recurring questions against a well-curated topic set	Smaller query surface, no query validation, earlier context pruning, no extended reasoning
Balanced	Most organizations, most of the time. Mirrors Omni’s default behavior.
Max quality	Highly complex questions and datasets, optimizing for depth over speed, debugging accuracy issues with default settings	Potential for higher costs due to query validation and usage of `smartest` + high `thinking` models

The conversation_prune_length parameter is only supported for Claude/Anthropic models.

Cost-optimized

This profile is best for high-volume AI usage where per-query cost matters more than marginal accuracy, or models where end users are doing bounded, familiar analysis. For example, recurring questions over a well-curated topic set.

Cost-optimized ai_settings profile

ai_settings:
  query_all_views_and_fields: disabled
  validate_analysis: disabled
  conversation_prune_length: short
  analyze_configuration:
    model: standard
    thinking: none
  build_configuration:
    model: standard
    thinking: none
  simple_summarize_configuration:
    model: fastest
    thinking: none

The tradeoffs of this approach are:

A smaller query surface (topic-scoped only), so ad-hoc views aren’t available to the AI (query_all_fields_and_views: disabled)
No self-correction before users see results (validate_analysis: disabled)
Earlier context pruning on long chats (75,000 LLM tokens vs. 175,000 at the default) (conversation_prune_length: short)
No extended reasoning — every turn responds from the raw prompt (*.thinking: none)

Balanced (default)

This profile is best for most organizations, most of the time. This profile mirrors Omni’s out-of-the-box behavior — it’s what you get if you don’t set ai_settings at all. You can copy it into your model to make the intent explicit, or omit ai_settings entirely.

Balanced (default) ai_settings profile

ai_settings:
  query_all_views_and_fields: enabled
  validate_analysis: disabled
  conversation_prune_length: max
  analyze_configuration:
    model: standard
    thinking: none
  build_configuration:
    model: smartest
    thinking: none
  simple_summarize_configuration:
    model: fastest
    thinking: none

This approach is the default because:

The *.model: standard tier is strong enough for most analytical tasks without smartest-tier pricing
*.thinking: fastest is almost always the right call for short summarization work
conversation_prune_length: max preserves context while opting you into future >200k context sizes as they become available

If your instance was created on or before March 5, 2026, query_all_views_and_fields defaults to disabled rather than enabled. Setting it explicitly in your model overrides the instance default either way.Similarly, if your instance was created on or before April 23, 2026, build_configuration.model defaults to standard rather than smartest. Setting it explicitly in your model overrides the instance default.

Max quality

This profile is best for highly complex questions and datasets, and when you want to optimize for exploration depth and answer quality over speed. It can also be useful when you’ve seen accuracy issues with the defaults and want to raise the floor before debugging further.

Max quality ai_settings profile

ai_settings:
  query_all_views_and_fields: enabled
  validate_analysis: enabled
  conversation_prune_length: max
  analyze_configuration:
    model: smartest
    thinking: high
  build_configuration:
    model: smartest
    thinking: medium
  simple_summarize_configuration:
    model: standard
    thinking: low

While this approach can yield higher quality results, there are a few tradeoffs:

validate_analysis: enabled adds additional model turns per analytical turn
smartest + thinking: high on analyze_configuration is the single biggest cost multiplier — only use it if the accuracy lift is worth it for your workload
Summarization is intentionally kept lighter (standard / low) because it’s high-volume and low-leverage; pushing it to smartest / high rarely changes subtitle or description output meaningfully

Mix and override profiles

The profiles above are starting points — you don’t have to adopt one in its entirety. A few common compositions:

Mostly default with quality insurance. Start from Balanced and flip validate_analysis to enabled. This adds self-correction without upgrading model tier or thinking level, which is often the cheapest way to raise answer quality.
```
ai_settings:
  validate_analysis: enabled
```
Max quality analysis, default everything else. Upgrade only analyze_configuration and leave build and summarize at defaults. This concentrates spend where it matters most — user-facing analytical output — without paying for it on lower-value tasks.
```
ai_settings:
  analyze_configuration:
    model: smartest
    thinking: high
```

Cost-optimized with fuller context. Start from Cost-optimized but keep conversation_prune_length: long if your users tend to have long chat sessions where losing earlier context would hurt more than the LLM token savings.

ai_settings:
  query_all_views_and_fields: disabled
  validate_analysis: disabled
  conversation_prune_length: long
  analyze_configuration:
    model: standard
    thinking: none
  build_configuration:
    model: standard
    thinking: none
  simple_summarize_configuration:
    model: fastest
    thinking: none

Apply a profile to your model

Navigate to the model IDE and open the model settings (model) file.

Add the ai_settings block from the profile you want to use to the model file.

Note sure which profile to use? Start with Balanced and only adjust once you see specific cost or accuracy issues you want to address.

Save your changes.

Promote the changes to the shared model.

Monitor and iterate

After applying a profile, check the Analytics > Credit tracking dashboard periodically to see how your users are interacting with Omni AI. Look for:

Question volume and types — Helps you judge whether the profile still fits the actual workload
Turn counts per conversation — Long conversations interact with conversation_prune_length; if pruning is kicking in frequently, consider raising the threshold or investigating why conversations run long

Tuning is iterative. It’s common to start balanced, upgrade a single dimension (such as validate_analysis), observe the change, and continue from there rather than jumping straight to max quality.

Next steps

AI credit tracking - Track your organization’s AI usage over time
ai_settings reference — Full parameter documentation
Optimize models for Omni AI — Curate the context Omni AI sees, which can help improve quality instead of making model-tier upgrades
Learn from conversation — Use real user questions to improve the model over time