AI Enhancement

Overview

AI Enhancement uses large language models (LLMs) to intelligently post-process your transcriptions, cleaning up grammar, fixing punctuation, and transforming raw voice input into polished, professional text.

AI Enhancement is optional and requires an internet connection. Local transcription always works offline.

How It Works

Local Transcription

VoiceTypr transcribes your voice locally using Whisper or Parakeet models.

AI Processing

If enhancement is enabled, the raw transcription is sent to your selected AI provider.

Smart Enhancement

The LLM applies intelligent corrections:

Grammar and spelling fixes
Punctuation and capitalization
Semantic improvements
Format transformation (based on preset)

Automatic Insertion

The enhanced text is automatically inserted at your cursor position.

Supported Providers

VoiceTypr supports multiple AI providers to give you flexibility and choice:

OpenAI

GPT-4o, GPT-4o Mini, and GPT-4 Turbo models for high-quality enhancement.

Google Gemini

Gemini 2.0 Flash and Gemini 1.5 Flash for fast, accurate processing.

Anthropic

Claude Sonnet, Haiku, and Opus models (configured via custom provider).

Groq

Ultra-fast inference with Llama and Mixtral models (configured via custom provider).

Provider Configuration

Each provider is defined with:

interface AIProviderConfig {
  id: string;          // Provider identifier
  name: string;        // Display name
  color: string;       // UI theme color
  apiKeyUrl: string;   // Where to get API key
  isCustom?: boolean;  // OpenAI-compatible custom provider
}

Enhancement Presets

VoiceTypr includes four enhancement modes for different use cases:

Default
Prompts
Email
Commit

Default Mode

Clean, natural text with grammar and punctuation fixes.What it does:

Removes fillers and false starts
Fixes grammar, spelling, and punctuation
Normalizes capitalization and spacing
Resolves self-corrections (“last-intent wins”)
Handles dictation commands when explicitly said

Example:Input:

um so I was thinking we should maybe uh schedule the meeting for 
tuesday no wait actually wednesday would be better

Output:

I was thinking we should schedule the meeting for Wednesday.

Best for: General dictation, notes, messages

Prompts Mode

Transform speech into clear, actionable AI prompts.What it does:

Classifies as Request, Question, or Task
Adds essential missing context (what/how/why)
Includes constraints and success criteria
Specifies output format when helpful
Preserves all technical details

Example:Input:

write a function that sorts an array

Output:

Write a JavaScript function that sorts an array of numbers in ascending 
order. Return a new sorted array without modifying the original.

Best for: Creating prompts for ChatGPT, Claude, or other AI assistants

Email Mode

Format dictation as professional email messages.What it does:

Generates specific, action-oriented subject line
Adds appropriate greeting (Hi/Dear/Hello)
Structures body into short paragraphs
Leads with key information or request
Includes action items and deadlines when present
Matches tone (formal/casual) to input
Adds appropriate closing and signature placeholder

Example:Input:

hey john just wanted to check if you got those files I sent 
yesterday we need them for the meeting on friday

Output:

Subject: Following up on files for Friday meeting

Hi John,

I wanted to check if you received the files I sent yesterday. 
We'll need them for our meeting on Friday.

Thanks,
[Your Name]

Best for: Drafting emails, messages, correspondence

Commit Mode

Generate conventional commit messages for Git.What it does:

Converts speech to conventional commit format
Selects appropriate type (feat, fix, docs, etc.)
Uses present tense, no period
Keeps under 72 characters
Adds ! for breaking changes

Example:Input:

I added a new dark mode feature to the settings page

Output:

feat(settings): add dark mode toggle

Available types: feat, fix, docs, style, refactor, perf, test, chore, build, ciBest for: Creating git commit messages from voice

Enhancement Implementation

Enhancement presets are defined in the backend:

// src-tauri/src/ai/prompts.rs:103
pub enum EnhancementPreset {
    Default,
    Prompts,
    Email,
    Commit,
}

pub struct EnhancementOptions {
    pub preset: EnhancementPreset,
}

Each preset combines a base prompt (with grammar/semantic fixes) with a mode-specific transformation layer.

Setting Up AI Enhancement

Navigate to Enhancements

Open VoiceTypr and go to the Enhancements tab.

Choose a Provider

Browse available providers:

OpenAI
Google Gemini
Custom (OpenAI-compatible)

Add API Key

Click Connect on your chosen provider and enter your API key.

Where to get API keys

OpenAI: platform.openai.com/api-keys
Google Gemini: aistudio.google.com/apikey
Custom: Depends on your provider (Groq, Anthropic, etc.)

Select a Model

Choose which model to use from your provider’s available models.

Enable Enhancement

Toggle AI Enhancement ON to activate it for all transcriptions.

Choose Preset

Select your preferred enhancement preset (Default, Prompts, Email, or Commit).

API Key Management

Secure Storage

API keys are stored securely using the system keyring:

macOS: Keychain
Windows: Credential Manager

// API key utilities (src/utils/keyring.ts)
saveApiKey(keyId: string, apiKey: string)  // Save to system keyring
getApiKey(keyId: string)                    // Retrieve from keyring  
removeApiKey(keyId: string)                 // Delete from keyring
hasApiKey(keyId: string)                    // Check if key exists

Never commit API keys to version control or share them publicly. VoiceTypr stores them securely in your system’s credential store.

Managing Keys

You can update or remove API keys at any time:

Update: Click “Update Key” on a connected provider
Remove: Click “Disconnect” to remove the API key

Custom Provider (OpenAI-Compatible)

The Custom provider option allows you to use any OpenAI-compatible API:

Groq
Anthropic
Local LLMs

Groq provides ultra-fast inference with open source models.Configuration:

Base URL: https://api.groq.com/openai/v1
API Key: Get from console.groq.com
Models: llama-3.3-70b-versatile, mixtral-8x7b-32768, etc.

Benefits:

Extremely fast inference (~500 tokens/sec)
Cost-effective
Open source models

Anthropic Claude via OpenAI-compatible endpoint.Configuration:

Base URL: Varies by integration (use proxy or wrapper)
API Key: Your Anthropic API key
Models: claude-3-5-sonnet, claude-3-haiku, etc.

Anthropic doesn’t natively support OpenAI format. Use a proxy service or wait for native Anthropic support in a future update.

Local LLM servers like LM Studio, Ollama, or text-generation-webui.Configuration:

Base URL: http://localhost:1234/v1 (or your server’s URL)
API Key: Usually not required (use dummy value)
Models: Depends on what you have loaded

Benefits:

100% private and offline (after transcription)
No API costs
Full control over models

Custom Configuration

// OpenAI-compatible config
interface OpenAIConfig {
  baseUrl: string;  // API endpoint
}

// Example: Groq configuration
await invoke('save_openai_config', {
  args: {
    baseUrl: 'https://api.groq.com/openai/v1'
  }
});

Language Support

AI Enhancement respects your selected transcription language:

language: 'en'  // ISO 639-1 code

The enhancement prompt automatically adapts to output in the correct language:

// src-tauri/src/ai/prompts.rs:97
fn build_base_prompt(language: Option<&str>) -> String {
    let lang_name = language.map(get_language_name).unwrap_or("English");
    BASE_PROMPT_TEMPLATE.replace("{language}", lang_name)
}

Supported languages include English, Spanish, French, German, Japanese, Chinese, and 80+ more.

Troubleshooting

Authentication Errors

If you see “AI authentication error”:

Verify your API key is correct
Check the API key has sufficient credits/quota
Update or re-enter the API key in Enhancements tab
Ensure your API key has the necessary permissions

Enhancement Errors

If enhancement fails:

Check your internet connection
Verify the selected model is available
Try a different model or provider
Check provider status page for outages

Slow Enhancement

If enhancement takes too long:

Try a faster model (e.g., GPT-4o Mini, Gemini Flash)
Use Groq for ultra-fast inference
Switch to a smaller enhancement preset
Check your network latency

Unexpected Output

If enhanced text doesn’t match expectations:

Try a different enhancement preset
Speak more clearly and with better grammar
Use a different AI model
Disable enhancement and use raw transcription

Privacy Considerations

When AI Enhancement is enabled, your transcribed text is sent to the selected AI provider over the internet. Your voice audio never leaves your device, only the text transcription.

What is sent:

Transcribed text only
Language setting
Enhancement preset selection

What is NOT sent:

Voice audio recordings
Transcription history
Personal settings

Data retention varies by provider:

OpenAI: 30 days via API
Google: Check Gemini API terms
Custom: Depends on your provider

Best Practices

Tip: Start with the Default preset and only switch to specialized presets (Email, Commit, Prompts) when you specifically need that format.

Choose the right preset for your workflow
Speak naturally - the AI handles grammar fixes
Use faster models (Flash, Mini) for everyday enhancement
Use advanced models (GPT-4o, Gemini Pro) for complex content
Test different providers to find the best fit for your needs
Monitor API costs if using paid providers
Keep API keys secure - never share them

Cost Considerations

Most AI providers charge per token:

OpenAI: ~$0.15-2.50 per 1M tokens (varies by model)
Google Gemini: Free tier available, then pay-as-you-go
Groq: Generous free tier, very cost-effective
Local LLMs: Free after setup (requires capable hardware)

A typical voice transcription enhancement uses 100-500 tokens, costing less than $0.01 with most providers.

Get Started

Core Features

Guides

Platform Specific

Help

Overview

How It Works

Supported Providers

OpenAI

Google Gemini

Anthropic

Groq

Provider Configuration

Enhancement Presets

Default Mode

Prompts Mode

Email Mode

Commit Mode

Enhancement Implementation

Setting Up AI Enhancement

API Key Management

Secure Storage

Managing Keys

Custom Provider (OpenAI-Compatible)

Custom Configuration

Language Support

Troubleshooting

Authentication Errors

Enhancement Errors

Slow Enhancement

Unexpected Output

Privacy Considerations

Best Practices

Cost Considerations

Get Started

Core Features

Guides

Platform Specific

Help

Documentation Index

​Overview

​How It Works

​Supported Providers

OpenAI

Google Gemini

Anthropic

Groq

​Provider Configuration

​Enhancement Presets

​Default Mode

​Prompts Mode

​Email Mode

​Commit Mode

​Enhancement Implementation

​Setting Up AI Enhancement

​API Key Management

​Secure Storage

​Managing Keys

​Custom Provider (OpenAI-Compatible)

​Custom Configuration

​Language Support

​Troubleshooting

​Authentication Errors

​Enhancement Errors

​Slow Enhancement

​Unexpected Output

​Privacy Considerations

​Best Practices

​Cost Considerations

Overview

How It Works

Supported Providers

Provider Configuration

Enhancement Presets

Default Mode

Prompts Mode

Email Mode

Commit Mode

Enhancement Implementation

Setting Up AI Enhancement

API Key Management

Secure Storage

Managing Keys

Custom Provider (OpenAI-Compatible)

Custom Configuration

Language Support

Troubleshooting

Authentication Errors

Enhancement Errors

Slow Enhancement

Unexpected Output

Privacy Considerations

Best Practices

Cost Considerations