Bring Your Own Model
For privacy, cost control, or specialized models, you can run LLMs locally or on your own servers. The OpenAIClient works with any endpoint that implements the OpenAI chat completions API.
Complete Example
Here’s a full agent using a local Ollama model:
Requirements
Your model server must implement the OpenAI Chat Completions API:
Custom Endpoints
Connect to any custom model server:
Ollama
Ollama is the easiest way to run models locally. It handles model downloads and serving automatically.
Setup
Usage
vLLM
vLLM is a high-performance inference server for production workloads.
Setup
Usage
LM Studio
LM Studio provides a desktop UI for running models locally.
- Download from lmstudio.ai
- Load a model
- Start the local server (Settings → Local Server)
Troubleshooting
Tips
Use local LLMs for development, Cloud/Managed LLM providers for production
Ollama is great for local development. For production, consider vLLM or a cloud provider for reliability.
Set up a fallback LLM
Local models can fail. Configure a cloud fallback (e.g., OpenAI) to catch errors and keep the conversation going.
Check tool calling support
Not all local models support function calling. Test your tools or use a model known to support them.

