Pronunciation Dictionaries | Smallest AI Docs

Pronunciation dictionaries allow you to customize how specific words are pronounced in your text-to-speech synthesis. This is particularly useful for:

Brand names, product names, or proper nouns
Technical terms or acronyms
Words that should be pronounced differently than their standard pronunciation
Non-English words in English text (or vice versa)

How Pronunciation Dictionaries Work

A pronunciation dictionary is a collection of word-pronunciation pairs that you create and manage through the Waves API. Each dictionary has a unique ID that you can reference in your TTS requests to ensure consistent pronunciation across your applications.

Key Concepts

Word: The text that appears in your input
Pronunciation: The way the word is written out in normal words to show how it sounds (not IPA)
Dictionary ID: A unique identifier for your pronunciation dictionary that you use in TTS requests

Creating a Pronunciation Dictionary

Step 1: Create Your Dictionary

First, create a pronunciation dictionary with your custom word-pronunciation pairs:

$ curl -X POST "https://waves-api.smallest.ai/api/v1/pronunciation-dicts" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "items": [
>       {
>         "word": "API",
>         "pronunciation": "ay-pee-eye"
>       },
>       {
>         "word": "GitHub",
>         "pronunciation": "git-hub"
>       },
>       {
>         "word": "SQL",
>         "pronunciation": "sequel"
>       }
>     ]
>   }'

Response:

1 {
2   "id": "64f1234567890abcdef12345",
3   "items": [
4     {
5       "word": "API",
6       "pronunciation": "ay-pee-eye"
7     },
8     {
9       "word": "GitHub",
10       "pronunciation": "git-hub"
11     },
12     {
13       "word": "SQL",
14       "pronunciation": "sequel"
15     }
16   ],
17   "createdAt": "2023-09-01T12:00:00.000Z"
18 }

Step 2: Save the Dictionary ID

Important: Save the returned id from the response. You’ll need this ID to reference your pronunciation dictionary in TTS requests and for future updates or deletions.

1 const dictionaryId = "64f1234567890abcdef12345"; // Save this!

Managing Your Pronunciation Dictionaries

List All Dictionaries

Retrieve all your pronunciation dictionaries:

$ curl -X GET "https://waves-api.smallest.ai/api/v1/pronunciation-dicts" \
>   -H "Authorization: Bearer YOUR_API_KEY"

Update a Dictionary

Modify an existing pronunciation dictionary:

$ curl -X PUT "https://waves-api.smallest.ai/api/v1/pronunciation-dicts" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "id": "64f1234567890abcdef12345",
>     "items": [
>       {
>         "word": "OpenAI",
>         "pronunciation": "open ay eye"
>       },
>     ]
>   }'

Delete a Dictionary

Remove a pronunciation dictionary:

$ curl -X DELETE "https://waves-api.smallest.ai/api/v1/pronunciation-dicts" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "id": "64f1234567890abcdef12345"
>   }'

Using Pronunciation Dictionaries in TTS Requests

Once you have created a pronunciation dictionary and obtained its ID, you can use it in your TTS requests by including the pronunciation_dicts parameter. This parameter accepts an array of dictionary IDs, allowing you to use multiple pronunciation dictionaries in a single request:

Example

$ curl -X POST "https://waves-api.smallest.ai/api/v1/lightning-v3.1/get_speech" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "text": "Welcome to Waves API! Our TTS service integrates with GitHub.",
>     "voice_id": "your_voice_id",
>     "pronunciation_dicts": ["64f1234567890abcdef12345"],
>     "sample_rate": 24000,
>     "speed": 1.0,
>     "language": "en"
>   }'

Using Multiple Dictionaries

You can also use multiple pronunciation dictionaries in a single request by providing an array of dictionary IDs:

$ curl -X POST "https://waves-api.smallest.ai/api/v1/lightning-v3.1/get_speech" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "text": "Our API uses PostgreSQL and integrates with GitHub for CI/CD.",
>     "voice_id": "your_voice_id",
>     "pronunciation_dicts": [
>       "64f1234567890abcdef12345",
>       "64f9876543210fedcba09876"
>     ],
>     "sample_rate": 24000,
>     "speed": 1.0,
>     "language": "en",
>     "output_format": "wav"
>   }'

Complete Workflow Example

Here’s a complete example showing the full workflow from creating a dictionary to using it in synthesis:

1 import requests
2 import json
3 
4 # Your API configuration
5 API_KEY = "your_api_key_here"
6 BASE_URL = "https://waves-api.smallest.ai/api/v1"
7 headers = {
8     "Authorization": f"Bearer {API_KEY}",
9     "Content-Type": "application/json"
10 }
11 
12 # Step 1: Create pronunciation dictionary
13 pronunciation_data = {
14     "items": [
15         {"word": "PostgreSQL", "pronunciation": "post-gres"},
16         {"word": "Redis", "pronunciation": "red-iss"},
17         {"word": "Kubernetes", "pronunciation": "koo-ber-net-ees"},
18         {"word": "nginx", "pronunciation": "engine-x"}
19     ]
20 }
21 
22 # Create the dictionary
23 response = requests.post(
24     f"{BASE_URL}/pronunciation-dicts",
25     headers=headers,
26     json=pronunciation_data
27 )
28 
29 dict_data = response.json()
30 dictionary_id = dict_data["id"]
31 print(f"Created pronunciation dictionary with ID: {dictionary_id}")
32 
33 # Step 2: Use the dictionary in TTS synthesis
34 tts_request = {
35     "text": "Our infrastructure uses PostgreSQL, Redis, Kubernetes, and nginx.",
36     "voice_id": "your_voice_id",
37     "pronunciation_dicts": [dictionary_id],  # Use the dictionary ID here
38     "sample_rate": 24000,
39     "speed": 1.0,
40     "language": "en",
41     "output_format": "wav"
42 }
43 
44 # Generate speech with custom pronunciations
45 audio_response = requests.post(
46     f"{BASE_URL}/lightning-v3.1/get_speech",
47     headers=headers,
48     json=tts_request
49 )
50 
51 # Save the audio file
52 with open("speech_with_custom_pronunciations.wav", "wb") as f:
53     f.write(audio_response.content)
54 
55 print("Speech generated with custom pronunciations!")

Tips for Creating Pronunciations

Break down complex words: For multi-syllable words, separate syllables with hyphens
- “Kubernetes” → “koo-ber-net-ees”
Spell it how it sounds: Write words the way you want them spoken, even if it’s not standard spelling
- “SQL” → “sequel”
- “API” → “ay-pee-eye”
Stay consistent: Use the same style across your dictionary (e.g., always use hyphens for syllables).
Test and refine: Generate a small dictionary first, test the pronunciations, and adjust until they sound natural.

Best Practices

Dictionary Management

Keep dictionaries focused: Create separate dictionaries for different domains (e.g., one for technical terms, another for product names).
Combine multiple dictionaries: Use the array format to apply multiple pronunciation dictionaries in a single TTS request.
Update regularly: Add or refine pronunciations as your vocabulary grows.

Pronunciation Quality

Verify pronunciations: Listen to the output to confirm it matches expectations.
Consider context: Some words may have multiple valid pronunciations—pick the one that makes sense for your use case.
Language consistency: Ensure pronunciations match the language setting of your TTS requests.

Performance Considerations

Cache dictionary IDs: Store dictionary IDs in your application to avoid repeated API calls.
Batch updates: When possible, update multiple pronunciations in a single API call.
Monitor usage: Track which dictionaries are actively used in production.

Troubleshooting

Common Issues

Dictionary not found

Make sure you’re using the correct dictionary ID and that the dictionary hasn’t been deleted.

Pronunciations not applied

Verify that the dictionary ID is included in your TTS request.
Ensure the words in your text match exactly (case-sensitive) with your dictionary entries.
Confirm the pronunciation is written in plain text (not IPA).

Unexpected pronunciations

Simplify your spelling.
Test with shorter words first and adjust gradually.

Error Responses

The API will return specific error messages for common issues:

1 {
2   "error": "Invalid request body",
3   "details": [
4     {
5       "code": "invalid_type",
6       "expected": "string",
7       "received": "undefined",
8       "path": ["items", 0, "pronunciation"],
9       "message": "Required"
10     }
11   ]
12 }

Next Steps

Explore the API Reference for detailed parameter information
Check out TTS Best Practices for optimization tips
Learn about Voice Cloning to create custom voices