For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Atoms PlatformProduct OverviewDeveloper GuideAPI ReferenceMCPIntegrationsDeveloper ToolsChangelog
Atoms PlatformProduct OverviewDeveloper GuideAPI ReferenceMCPIntegrationsDeveloper ToolsChangelog
  • Get Started
    • Quick start
    • Platform overview
  • Single Prompt Agents
    • Overview
  • Conversational Flow Agents
    • Overview
  • Features
    • Concurrency
    • Knowledge Base
    • Webhooks
    • Widget
    • Integrations
    • Post-Call Metrics
    • Versioning
    • Prompt Scoring
    • Variables
    • API Calls
  • Deployment
    • Phone Numbers
    • Audiences
    • Campaigns
  • Analytics & Logs
    • Overview
    • Testing
    • Conversation Logs
    • Locking
  • Integrate
    • Embed a voice agent
    • WebSocket SDK
  • Cookbooks
    • Using Cookbooks
  • Reference
    • Quick Reference
    • Glossary
  • Troubleshooting
    • Error reference
    • FAQ
    • Getting Help
LogoLogo
Voice AgentsModels
Voice AgentsModels
On this page
  • Seeing Your Score
  • The Analysis Panel
  • How Scoring Works
  • Tiers and dimensions
  • Quality Levels and Token Bands
  • Grade thresholds
  • Token density bands
  • Scoring via API
  • Related
Features

Prompt Scoring

||View as Markdown|
Was this page helpful?
Previous

Agent Versioning

Next

Variables

Built with

Prompt Scoring analyses your agent’s system prompt across 11 quality dimensions and returns an overall score (0–100), a grade, and actionable feedback for each dimension. Use it to catch weak spots before you go live.

Location: Agent editor → Prompt tab → score badge at the bottom of the editor


Seeing Your Score

When you open the Prompt tab of any single-prompt agent, the editor shows a score badge at the bottom of the prompt area. A coloured label — Strong prompt, Good prompt, Weak prompt, or Poor prompt — tells you the current grade at a glance.

Weak prompt badge with View issues link

Weak prompt badge in the agent editor — click 'View issues' to open the analysis panel.

Click View issues to open the Prompt Analysis side panel.


The Analysis Panel

The panel breaks down your score into individual dimension cards. Each card shows the dimension name, its quality level, and a quoted excerpt from your prompt as evidence.

Prompt analysis panel

Prompt analysis panel showing the overall score, latency estimate, and per-dimension findings.
Panel fieldWhat it means
Overall gradeNumeric score (0–100) and label — Excellent / Good / Needs Work / Poor
First token latencyEstimated TTFT overhead added by your prompt length
Weak prompt / Strong promptSummary label derived from the overall score
Low / Normal / High latencyToken-density band for the current prompt

How Scoring Works

Each time you request a score, Atoms sends your prompt through two sequential Gemini passes — a Platform Analyst pass and a Rubric Judge pass — and returns scores for 11 dimensions grouped into three priority tiers.

Tiers and dimensions

Tier 1 — Core (highest priority)
Tier 2 — Quality
Tier 3 — Integrity (gating)
DimensionWhat is checked
Role & ObjectiveIs the agent’s purpose and scope clearly defined?
Personality & VoiceAre tone and style specified with concrete guidance, not just adjectives?
Conversation StructureAre there explicit flow phases and exit criteria?
Tool IntegrationAre all referenced tools declared, with failure paths covered?
Constraints & SafetyAre safety rules, escalation paths, and hard limits spelled out?

Each dimension is rated Strong, Adequate, Weak, Missing, or Not Applicable.


Quality Levels and Token Bands

Grade thresholds

ScoreGrade
90–100Excellent
75–89Good
50–74Needs Work
0–49Poor

Token density bands

The First token latency estimate and the density band are derived from your prompt’s token count.

BandToken rangeLatency impact
LeanFewer than 4K tokensVery low
Normal4K–9.9K tokensLow
Heavy10K–14.9K tokensModerate
Overweight15K or more tokensHigh

For voice agents, aim for the Normal band. Heavy and Overweight prompts increase first-token latency, which makes responses feel slower to callers.


Scoring via API

You can trigger scoring programmatically against a published version or a draft. Each call deducts 1 credit. Re-submitting an unchanged prompt returns a 400 — retrieve the cached result via GET /agent/{id} instead.

$# Score a published version
$curl -X POST https://api.smallest.ai/atoms/v1/prompt-scoring/score \
> -H "Authorization: Bearer $SMALLEST_API_KEY" \
> -H "Content-Type: application/json" \
> -d '{"versionId": "6a1589b75e048394eb37bc47"}'
$
$# Score a draft
$curl -X POST https://api.smallest.ai/atoms/v1/prompt-scoring/score \
> -H "Authorization: Bearer $SMALLEST_API_KEY" \
> -H "Content-Type: application/json" \
> -d '{"draftId": "6a1589b75e048394eb37bc48"}'

The response includes overall_score, overall_grade, band, estimated_ttft_overhead_ms, and a dimensions array with one entry per scored dimension.

Prompt scoring is only supported for single-prompt agents. Conversational flow (workflow-graph) agents return a 400.

→ Full API reference


Related

Writing Prompts

Best practices for structuring your system prompt

Agent Versioning

Publish and manage agent versions safely