Pronunciation Dictionaries

View as MarkdownOpen in Claude

Pronunciation dictionaries allow you to customize how specific words are pronounced in your text-to-speech synthesis. This is particularly useful for:

  • Brand names, product names, or proper nouns
  • Technical terms or acronyms
  • Words that should be pronounced differently than their standard pronunciation
  • Non-English words in English text (or vice versa)

How Pronunciation Dictionaries Work

A pronunciation dictionary is a collection of word-pronunciation pairs that you create and manage through the Waves API. Each dictionary has a unique ID that you can reference in your TTS requests to ensure consistent pronunciation across your applications.

Key Concepts

  • Word: The text that appears in your input
  • Pronunciation: The way the word is written out in normal words to show how it sounds (not IPA)
  • Dictionary ID: A unique identifier for your pronunciation dictionary that you use in TTS requests

Creating a Pronunciation Dictionary

Step 1: Create Your Dictionary

First, create a pronunciation dictionary with your custom word-pronunciation pairs:

$curl -X POST "https://waves-api.smallest.ai/api/v1/pronunciation-dicts" \
> -H "Authorization: Bearer YOUR_API_KEY" \
> -H "Content-Type: application/json" \
> -d '{
> "items": [
> {
> "word": "API",
> "pronunciation": "ay-pee-eye"
> },
> {
> "word": "GitHub",
> "pronunciation": "git-hub"
> },
> {
> "word": "SQL",
> "pronunciation": "sequel"
> }
> ]
> }'

Response:

1{
2 "id": "64f1234567890abcdef12345",
3 "items": [
4 {
5 "word": "API",
6 "pronunciation": "ay-pee-eye"
7 },
8 {
9 "word": "GitHub",
10 "pronunciation": "git-hub"
11 },
12 {
13 "word": "SQL",
14 "pronunciation": "sequel"
15 }
16 ],
17 "createdAt": "2023-09-01T12:00:00.000Z"
18}

Step 2: Save the Dictionary ID

Important: Save the returned id from the response. You’ll need this ID to reference your pronunciation dictionary in TTS requests and for future updates or deletions.

1const dictionaryId = "64f1234567890abcdef12345"; // Save this!

Managing Your Pronunciation Dictionaries

List All Dictionaries

Retrieve all your pronunciation dictionaries:

$curl -X GET "https://waves-api.smallest.ai/api/v1/pronunciation-dicts" \
> -H "Authorization: Bearer YOUR_API_KEY"

Update a Dictionary

Modify an existing pronunciation dictionary:

$curl -X PUT "https://waves-api.smallest.ai/api/v1/pronunciation-dicts" \
> -H "Authorization: Bearer YOUR_API_KEY" \
> -H "Content-Type: application/json" \
> -d '{
> "id": "64f1234567890abcdef12345",
> "items": [
> {
> "word": "OpenAI",
> "pronunciation": "open ay eye"
> },
> ]
> }'

Delete a Dictionary

Remove a pronunciation dictionary:

$curl -X DELETE "https://waves-api.smallest.ai/api/v1/pronunciation-dicts" \
> -H "Authorization: Bearer YOUR_API_KEY" \
> -H "Content-Type: application/json" \
> -d '{
> "id": "64f1234567890abcdef12345"
> }'

Using Pronunciation Dictionaries in TTS Requests

Once you have created a pronunciation dictionary and obtained its ID, you can use it in your TTS requests by including the pronunciation_dicts parameter. This parameter accepts an array of dictionary IDs, allowing you to use multiple pronunciation dictionaries in a single request:

Example

$curl -X POST "https://waves-api.smallest.ai/api/v1/lightning-v3.1/get_speech" \
> -H "Authorization: Bearer YOUR_API_KEY" \
> -H "Content-Type: application/json" \
> -d '{
> "text": "Welcome to Waves API! Our TTS service integrates with GitHub.",
> "voice_id": "your_voice_id",
> "pronunciation_dicts": ["64f1234567890abcdef12345"],
> "sample_rate": 24000,
> "speed": 1.0,
> "language": "en"
> }'

Using Multiple Dictionaries

You can also use multiple pronunciation dictionaries in a single request by providing an array of dictionary IDs:

$curl -X POST "https://waves-api.smallest.ai/api/v1/lightning-v3.1/get_speech" \
> -H "Authorization: Bearer YOUR_API_KEY" \
> -H "Content-Type: application/json" \
> -d '{
> "text": "Our API uses PostgreSQL and integrates with GitHub for CI/CD.",
> "voice_id": "your_voice_id",
> "pronunciation_dicts": [
> "64f1234567890abcdef12345",
> "64f9876543210fedcba09876"
> ],
> "sample_rate": 24000,
> "speed": 1.0,
> "language": "en",
> "output_format": "wav"
> }'

Complete Workflow Example

Here’s a complete example showing the full workflow from creating a dictionary to using it in synthesis:

1import requests
2import json
3
4# Your API configuration
5API_KEY = "your_api_key_here"
6BASE_URL = "https://waves-api.smallest.ai/api/v1"
7headers = {
8 "Authorization": f"Bearer {API_KEY}",
9 "Content-Type": "application/json"
10}
11
12# Step 1: Create pronunciation dictionary
13pronunciation_data = {
14 "items": [
15 {"word": "PostgreSQL", "pronunciation": "post-gres"},
16 {"word": "Redis", "pronunciation": "red-iss"},
17 {"word": "Kubernetes", "pronunciation": "koo-ber-net-ees"},
18 {"word": "nginx", "pronunciation": "engine-x"}
19 ]
20}
21
22# Create the dictionary
23response = requests.post(
24 f"{BASE_URL}/pronunciation-dicts",
25 headers=headers,
26 json=pronunciation_data
27)
28
29dict_data = response.json()
30dictionary_id = dict_data["id"]
31print(f"Created pronunciation dictionary with ID: {dictionary_id}")
32
33# Step 2: Use the dictionary in TTS synthesis
34tts_request = {
35 "text": "Our infrastructure uses PostgreSQL, Redis, Kubernetes, and nginx.",
36 "voice_id": "your_voice_id",
37 "pronunciation_dicts": [dictionary_id], # Use the dictionary ID here
38 "sample_rate": 24000,
39 "speed": 1.0,
40 "language": "en",
41 "output_format": "wav"
42}
43
44# Generate speech with custom pronunciations
45audio_response = requests.post(
46 f"{BASE_URL}/lightning-v3.1/get_speech",
47 headers=headers,
48 json=tts_request
49)
50
51# Save the audio file
52with open("speech_with_custom_pronunciations.wav", "wb") as f:
53 f.write(audio_response.content)
54
55print("Speech generated with custom pronunciations!")

Tips for Creating Pronunciations

  1. Break down complex words: For multi-syllable words, separate syllables with hyphens

    • “Kubernetes” → “koo-ber-net-ees”
  2. Spell it how it sounds: Write words the way you want them spoken, even if it’s not standard spelling

    • “SQL” → “sequel”
    • “API” → “ay-pee-eye”
  3. Stay consistent: Use the same style across your dictionary (e.g., always use hyphens for syllables).

  4. Test and refine: Generate a small dictionary first, test the pronunciations, and adjust until they sound natural.


Best Practices

Dictionary Management

  • Keep dictionaries focused: Create separate dictionaries for different domains (e.g., one for technical terms, another for product names).
  • Combine multiple dictionaries: Use the array format to apply multiple pronunciation dictionaries in a single TTS request.
  • Update regularly: Add or refine pronunciations as your vocabulary grows.

Pronunciation Quality

  • Verify pronunciations: Listen to the output to confirm it matches expectations.
  • Consider context: Some words may have multiple valid pronunciations—pick the one that makes sense for your use case.
  • Language consistency: Ensure pronunciations match the language setting of your TTS requests.

Performance Considerations

  • Cache dictionary IDs: Store dictionary IDs in your application to avoid repeated API calls.
  • Batch updates: When possible, update multiple pronunciations in a single API call.
  • Monitor usage: Track which dictionaries are actively used in production.

Troubleshooting

Common Issues

Dictionary not found

  • Make sure you’re using the correct dictionary ID and that the dictionary hasn’t been deleted.

Pronunciations not applied

  • Verify that the dictionary ID is included in your TTS request.
  • Ensure the words in your text match exactly (case-sensitive) with your dictionary entries.
  • Confirm the pronunciation is written in plain text (not IPA).

Unexpected pronunciations

  • Simplify your spelling.
  • Test with shorter words first and adjust gradually.

Error Responses

The API will return specific error messages for common issues:

1{
2 "error": "Invalid request body",
3 "details": [
4 {
5 "code": "invalid_type",
6 "expected": "string",
7 "received": "undefined",
8 "path": ["items", 0, "pronunciation"],
9 "message": "Required"
10 }
11 ]
12}

Next Steps