LLM Settings

View as MarkdownOpen in Claude

The OpenAIClient is the unified interface for calling LLMs. It works with OpenAI by default, but supports any OpenAI-compatible endpoint by changing the base_url.

Basic Configuration

1import os
2from smallestai.atoms.agent.clients.openai import OpenAIClient
3
4self.llm = OpenAIClient(
5 model="gpt-4o-mini",
6 temperature=0.7,
7 api_key=os.getenv("OPENAI_API_KEY")
8)

Parameters

ParameterTypeDefaultDescription
modelstrThe model identifier (e.g., gpt-4o-mini, llama-3.1-70b-versatile)
api_keystrOPENAI_API_KEY envYour provider’s API key
base_urlstrOpenAICustom endpoint URL for other providers
temperaturefloat0.7Controls randomness. Lower = consistent, higher = creative
max_tokensint1024Maximum tokens in the response

Temperature

Controls how “creative” vs “predictable” the model behaves:

  • 0.0–0.3: Consistent, factual. Best for support, FAQ bots.
  • 0.4–0.6: Balanced. Good for general conversation.
  • 0.7–1.0: Creative, varied. Better for sales, engagement.
1# Consistent support agent
2self.llm = OpenAIClient(model="gpt-4o-mini", temperature=0.3)
3
4# Engaging sales agent
5self.llm = OpenAIClient(model="gpt-4o-mini", temperature=0.8)

Using Other Providers

Any provider with an OpenAI-compatible API works by setting base_url. This includes Groq, Together.ai, Fireworks, Anyscale, OpenRouter, and Azure OpenAI.

1import os
2from smallestai.atoms.agent.clients.openai import OpenAIClient
3
4# Example: Using Groq
5self.llm = OpenAIClient(
6 model="llama-3.1-70b-versatile",
7 base_url="https://api.groq.com/openai/v1",
8 api_key=os.getenv("GROQ_API_KEY")
9)

Just swap the base_url and api_key—your agent code stays the same.

Streaming

Voice agents must use streaming. Without it, users wait for the entire response before hearing anything.

1async def generate_response(self):
2 response = await self.llm.chat(
3 messages=self.context.messages,
4 stream=True
5 )
6
7 async for chunk in response:
8 if chunk.content:
9 yield chunk.content

Tool Calling

To enable function calling, pass tool schemas:

1response = await self.llm.chat(
2 messages=self.context.messages,
3 stream=True,
4 tools=self.tool_registry.get_schemas()
5)

Error Handling

1from openai import RateLimitError, APIError
2
3async def generate_response(self):
4 try:
5 response = await self.llm.chat(
6 messages=self.context.messages,
7 stream=True
8 )
9 async for chunk in response:
10 if chunk.content:
11 yield chunk.content
12
13 except RateLimitError:
14 yield "I'm experiencing high demand. Please try again."
15
16 except APIError as e:
17 logger.error(f"LLM error: {e}")
18 yield "I'm having trouble. Let me try again."

Voice Configuration

Agents also need a voice for text-to-speech. Waves is our recommended TTS engine—ultra-low latency, optimized for real-time telephony.

Basic Voice Setup

1from smallestai.atoms.models import CreateAgentRequest
2
3agent = CreateAgentRequest(
4 name="SupportAgent",
5 synthesizer={
6 "provider": "smallest", # Use Waves
7 "model": "waves_lightning_v2",
8 "voice_id": "zorin", # Male, professional
9 "speed": 1.0
10 },
11 # ... other config
12)

Waves Voice IDs

Voice IDDescription
zorinMale, professional (recommended)
emilyFemale, warm
rajMale, Indian English
ariaFemale, neutral

For the complete Waves voice library with audio samples: → Waves Voice Models

Voice Cloning

Create custom voices from audio samples: → Waves Voice Cloning Guide

Third-Party Providers

OpenAI and ElevenLabs voices are also supported. Set provider to openai or elevenlabs and use their respective voice IDs.


Tips

Set max_tokens to 100-200. Shorter responses mean faster audio playback. Guide conciseness in your prompt too.

The first LLM call has connection overhead. Send a tiny request in start() to warm up before the user speaks.

Configure a backup provider (e.g., Groq primary, OpenAI fallback) to handle rate limits or outages gracefully.