Overview
AI Enhancement uses large language models (LLMs) to intelligently post-process your transcriptions, cleaning up grammar, fixing punctuation, and transforming raw voice input into polished, professional text.AI Enhancement is optional and requires an internet connection. Local transcription always works offline.
How It Works
AI Processing
If enhancement is enabled, the raw transcription is sent to your selected AI provider.
Smart Enhancement
The LLM applies intelligent corrections:
- Grammar and spelling fixes
- Punctuation and capitalization
- Semantic improvements
- Format transformation (based on preset)
Supported Providers
VoiceTypr supports multiple AI providers to give you flexibility and choice:OpenAI
GPT-4o, GPT-4o Mini, and GPT-4 Turbo models for high-quality enhancement.
Google Gemini
Gemini 2.0 Flash and Gemini 1.5 Flash for fast, accurate processing.
Anthropic
Claude Sonnet, Haiku, and Opus models (configured via custom provider).
Groq
Ultra-fast inference with Llama and Mixtral models (configured via custom provider).
Provider Configuration
Each provider is defined with:Enhancement Presets
VoiceTypr includes four enhancement modes for different use cases:- Default
- Prompts
- Email
- Commit
Default Mode
Clean, natural text with grammar and punctuation fixes.What it does:- Removes fillers and false starts
- Fixes grammar, spelling, and punctuation
- Normalizes capitalization and spacing
- Resolves self-corrections (“last-intent wins”)
- Handles dictation commands when explicitly said
Enhancement Implementation
Enhancement presets are defined in the backend:Setting Up AI Enhancement
Add API Key
Click Connect on your chosen provider and enter your API key.
Where to get API keys
Where to get API keys
- OpenAI: platform.openai.com/api-keys
- Google Gemini: aistudio.google.com/apikey
- Custom: Depends on your provider (Groq, Anthropic, etc.)
API Key Management
Secure Storage
API keys are stored securely using the system keyring:- macOS: Keychain
- Windows: Credential Manager
Managing Keys
You can update or remove API keys at any time:- Update: Click “Update Key” on a connected provider
- Remove: Click “Disconnect” to remove the API key
Custom Provider (OpenAI-Compatible)
The Custom provider option allows you to use any OpenAI-compatible API:- Groq
- Anthropic
- Local LLMs
Groq provides ultra-fast inference with open source models.Configuration:
- Base URL:
https://api.groq.com/openai/v1 - API Key: Get from console.groq.com
- Models:
llama-3.3-70b-versatile,mixtral-8x7b-32768, etc.
- Extremely fast inference (~500 tokens/sec)
- Cost-effective
- Open source models
Custom Configuration
Language Support
AI Enhancement respects your selected transcription language:Troubleshooting
Authentication Errors
If you see “AI authentication error”:- Verify your API key is correct
- Check the API key has sufficient credits/quota
- Update or re-enter the API key in Enhancements tab
- Ensure your API key has the necessary permissions
Enhancement Errors
If enhancement fails:- Check your internet connection
- Verify the selected model is available
- Try a different model or provider
- Check provider status page for outages
Slow Enhancement
If enhancement takes too long:- Try a faster model (e.g., GPT-4o Mini, Gemini Flash)
- Use Groq for ultra-fast inference
- Switch to a smaller enhancement preset
- Check your network latency
Unexpected Output
If enhanced text doesn’t match expectations:- Try a different enhancement preset
- Speak more clearly and with better grammar
- Use a different AI model
- Disable enhancement and use raw transcription
Privacy Considerations
What is sent:- Transcribed text only
- Language setting
- Enhancement preset selection
- Voice audio recordings
- Transcription history
- Personal settings
- OpenAI: 30 days via API
- Google: Check Gemini API terms
- Custom: Depends on your provider
Best Practices
- Choose the right preset for your workflow
- Speak naturally - the AI handles grammar fixes
- Use faster models (Flash, Mini) for everyday enhancement
- Use advanced models (GPT-4o, Gemini Pro) for complex content
- Test different providers to find the best fit for your needs
- Monitor API costs if using paid providers
- Keep API keys secure - never share them
Cost Considerations
Most AI providers charge per token:- OpenAI: ~$0.15-2.50 per 1M tokens (varies by model)
- Google Gemini: Free tier available, then pay-as-you-go
- Groq: Generous free tier, very cost-effective
- Local LLMs: Free after setup (requires capable hardware)
A typical voice transcription enhancement uses 100-500 tokens, costing less than $0.01 with most providers.