Skip to main content

Overview

AI Enhancement uses large language models (LLMs) to intelligently post-process your transcriptions, cleaning up grammar, fixing punctuation, and transforming raw voice input into polished, professional text.
AI Enhancement is optional and requires an internet connection. Local transcription always works offline.

How It Works

1

Local Transcription

VoiceTypr transcribes your voice locally using Whisper or Parakeet models.
2

AI Processing

If enhancement is enabled, the raw transcription is sent to your selected AI provider.
3

Smart Enhancement

The LLM applies intelligent corrections:
  • Grammar and spelling fixes
  • Punctuation and capitalization
  • Semantic improvements
  • Format transformation (based on preset)
4

Automatic Insertion

The enhanced text is automatically inserted at your cursor position.

Supported Providers

VoiceTypr supports multiple AI providers to give you flexibility and choice:

OpenAI

GPT-4o, GPT-4o Mini, and GPT-4 Turbo models for high-quality enhancement.

Google Gemini

Gemini 2.0 Flash and Gemini 1.5 Flash for fast, accurate processing.

Anthropic

Claude Sonnet, Haiku, and Opus models (configured via custom provider).

Groq

Ultra-fast inference with Llama and Mixtral models (configured via custom provider).

Provider Configuration

Each provider is defined with:
interface AIProviderConfig {
  id: string;          // Provider identifier
  name: string;        // Display name
  color: string;       // UI theme color
  apiKeyUrl: string;   // Where to get API key
  isCustom?: boolean;  // OpenAI-compatible custom provider
}

Enhancement Presets

VoiceTypr includes four enhancement modes for different use cases:

Default Mode

Clean, natural text with grammar and punctuation fixes.What it does:
  • Removes fillers and false starts
  • Fixes grammar, spelling, and punctuation
  • Normalizes capitalization and spacing
  • Resolves self-corrections (“last-intent wins”)
  • Handles dictation commands when explicitly said
Example:Input:
um so I was thinking we should maybe uh schedule the meeting for 
tuesday no wait actually wednesday would be better
Output:
I was thinking we should schedule the meeting for Wednesday.
Best for: General dictation, notes, messages

Enhancement Implementation

Enhancement presets are defined in the backend:
// src-tauri/src/ai/prompts.rs:103
pub enum EnhancementPreset {
    Default,
    Prompts,
    Email,
    Commit,
}

pub struct EnhancementOptions {
    pub preset: EnhancementPreset,
}
Each preset combines a base prompt (with grammar/semantic fixes) with a mode-specific transformation layer.

Setting Up AI Enhancement

1

Navigate to Enhancements

Open VoiceTypr and go to the Enhancements tab.
2

Choose a Provider

Browse available providers:
  • OpenAI
  • Google Gemini
  • Custom (OpenAI-compatible)
3

Add API Key

Click Connect on your chosen provider and enter your API key.
4

Select a Model

Choose which model to use from your provider’s available models.
5

Enable Enhancement

Toggle AI Enhancement ON to activate it for all transcriptions.
6

Choose Preset

Select your preferred enhancement preset (Default, Prompts, Email, or Commit).

API Key Management

Secure Storage

API keys are stored securely using the system keyring:
  • macOS: Keychain
  • Windows: Credential Manager
// API key utilities (src/utils/keyring.ts)
saveApiKey(keyId: string, apiKey: string)  // Save to system keyring
getApiKey(keyId: string)                    // Retrieve from keyring  
removeApiKey(keyId: string)                 // Delete from keyring
hasApiKey(keyId: string)                    // Check if key exists
Never commit API keys to version control or share them publicly. VoiceTypr stores them securely in your system’s credential store.

Managing Keys

You can update or remove API keys at any time:
  • Update: Click “Update Key” on a connected provider
  • Remove: Click “Disconnect” to remove the API key

Custom Provider (OpenAI-Compatible)

The Custom provider option allows you to use any OpenAI-compatible API:
Groq provides ultra-fast inference with open source models.Configuration:
  • Base URL: https://api.groq.com/openai/v1
  • API Key: Get from console.groq.com
  • Models: llama-3.3-70b-versatile, mixtral-8x7b-32768, etc.
Benefits:
  • Extremely fast inference (~500 tokens/sec)
  • Cost-effective
  • Open source models

Custom Configuration

// OpenAI-compatible config
interface OpenAIConfig {
  baseUrl: string;  // API endpoint
}

// Example: Groq configuration
await invoke('save_openai_config', {
  args: {
    baseUrl: 'https://api.groq.com/openai/v1'
  }
});

Language Support

AI Enhancement respects your selected transcription language:
language: 'en'  // ISO 639-1 code
The enhancement prompt automatically adapts to output in the correct language:
// src-tauri/src/ai/prompts.rs:97
fn build_base_prompt(language: Option<&str>) -> String {
    let lang_name = language.map(get_language_name).unwrap_or("English");
    BASE_PROMPT_TEMPLATE.replace("{language}", lang_name)
}
Supported languages include English, Spanish, French, German, Japanese, Chinese, and 80+ more.

Troubleshooting

Authentication Errors

If you see “AI authentication error”:
  1. Verify your API key is correct
  2. Check the API key has sufficient credits/quota
  3. Update or re-enter the API key in Enhancements tab
  4. Ensure your API key has the necessary permissions

Enhancement Errors

If enhancement fails:
  1. Check your internet connection
  2. Verify the selected model is available
  3. Try a different model or provider
  4. Check provider status page for outages

Slow Enhancement

If enhancement takes too long:
  • Try a faster model (e.g., GPT-4o Mini, Gemini Flash)
  • Use Groq for ultra-fast inference
  • Switch to a smaller enhancement preset
  • Check your network latency

Unexpected Output

If enhanced text doesn’t match expectations:
  • Try a different enhancement preset
  • Speak more clearly and with better grammar
  • Use a different AI model
  • Disable enhancement and use raw transcription

Privacy Considerations

When AI Enhancement is enabled, your transcribed text is sent to the selected AI provider over the internet. Your voice audio never leaves your device, only the text transcription.
What is sent:
  • Transcribed text only
  • Language setting
  • Enhancement preset selection
What is NOT sent:
  • Voice audio recordings
  • Transcription history
  • Personal settings
Data retention varies by provider:

Best Practices

Tip: Start with the Default preset and only switch to specialized presets (Email, Commit, Prompts) when you specifically need that format.
  1. Choose the right preset for your workflow
  2. Speak naturally - the AI handles grammar fixes
  3. Use faster models (Flash, Mini) for everyday enhancement
  4. Use advanced models (GPT-4o, Gemini Pro) for complex content
  5. Test different providers to find the best fit for your needs
  6. Monitor API costs if using paid providers
  7. Keep API keys secure - never share them

Cost Considerations

Most AI providers charge per token:
  • OpenAI: ~$0.15-2.50 per 1M tokens (varies by model)
  • Google Gemini: Free tier available, then pay-as-you-go
  • Groq: Generous free tier, very cost-effective
  • Local LLMs: Free after setup (requires capable hardware)
A typical voice transcription enhancement uses 100-500 tokens, costing less than $0.01 with most providers.