*** title: LLM Settings description: 'Configure models, parameters, and providers.' ----------------------------------------------------------- The `OpenAIClient` is the unified interface for calling LLMs. It works with OpenAI by default, but supports any OpenAI-compatible endpoint by changing the `base_url`. ## Basic Configuration ```python import os from smallestai.atoms.agent.clients.openai import OpenAIClient self.llm = OpenAIClient( model="gpt-4o-mini", temperature=0.7, api_key=os.getenv("OPENAI_API_KEY") ) ``` ## Parameters | Parameter | Type | Default | Description | | ------------- | ----- | -------------------- | --------------------------------------------------------------------- | | `model` | str | — | The model identifier (e.g., `gpt-4o-mini`, `llama-3.1-70b-versatile`) | | `api_key` | str | `OPENAI_API_KEY` env | Your provider's API key | | `base_url` | str | OpenAI | Custom endpoint URL for other providers | | `temperature` | float | `0.7` | Controls randomness. Lower = consistent, higher = creative | | `max_tokens` | int | `1024` | Maximum tokens in the response | ### Temperature Controls how "creative" vs "predictable" the model behaves: * **0.0–0.3**: Consistent, factual. Best for support, FAQ bots. * **0.4–0.6**: Balanced. Good for general conversation. * **0.7–1.0**: Creative, varied. Better for sales, engagement. ```python # Consistent support agent self.llm = OpenAIClient(model="gpt-4o-mini", temperature=0.3) # Engaging sales agent self.llm = OpenAIClient(model="gpt-4o-mini", temperature=0.8) ``` ## Using Other Providers Any provider with an OpenAI-compatible API works by setting `base_url`. This includes Groq, Together.ai, Fireworks, Anyscale, OpenRouter, and Azure OpenAI. ```python import os from smallestai.atoms.agent.clients.openai import OpenAIClient # Example: Using Groq self.llm = OpenAIClient( model="llama-3.1-70b-versatile", base_url="https://api.groq.com/openai/v1", api_key=os.getenv("GROQ_API_KEY") ) ``` Just swap the `base_url` and `api_key`—your agent code stays the same. ## Streaming Voice agents **must** use streaming. Without it, users wait for the entire response before hearing anything. ```python async def generate_response(self): response = await self.llm.chat( messages=self.context.messages, stream=True ) async for chunk in response: if chunk.content: yield chunk.content ``` ## Tool Calling To enable function calling, pass tool schemas: ```python response = await self.llm.chat( messages=self.context.messages, stream=True, tools=self.tool_registry.get_schemas() ) ``` ## Error Handling ```python from openai import RateLimitError, APIError async def generate_response(self): try: response = await self.llm.chat( messages=self.context.messages, stream=True ) async for chunk in response: if chunk.content: yield chunk.content except RateLimitError: yield "I'm experiencing high demand. Please try again." except APIError as e: logger.error(f"LLM error: {e}") yield "I'm having trouble. Let me try again." ``` *** ## Voice Configuration Agents also need a voice for text-to-speech. **Waves** is our recommended TTS engine—ultra-low latency, optimized for real-time telephony. ### Basic Voice Setup ```python from smallestai.atoms.models import CreateAgentRequest agent = CreateAgentRequest( name="SupportAgent", synthesizer={ "provider": "smallest", # Use Waves "model": "waves_lightning_v2", "voice_id": "zorin", # Male, professional "speed": 1.0 }, # ... other config ) ``` ### Waves Voice IDs | Voice ID | Description | | -------- | -------------------------------- | | `zorin` | Male, professional (recommended) | | `emily` | Female, warm | | `raj` | Male, Indian English | | `aria` | Female, neutral | For the complete Waves voice library with audio samples: → [Waves Voice Models](https://waves-docs.smallest.ai/content/getting-started/models) ### Voice Cloning Create custom voices from audio samples: → [Waves Voice Cloning Guide](https://waves-docs.smallest.ai/content/voice-cloning/types-of-clone) ### Third-Party Providers OpenAI and ElevenLabs voices are also supported. Set `provider` to `openai` or `elevenlabs` and use their respective voice IDs. *** ## Tips Set `max_tokens` to 100-200. Shorter responses mean faster audio playback. Guide conciseness in your prompt too. The first LLM call has connection overhead. Send a tiny request in `start()` to warm up before the user speaks. Configure a backup provider (e.g., Groq primary, OpenAI fallback) to handle rate limits or outages gracefully.