owlburtoe/openwhisp

Fork 0

Voice to text, entirely on your machine. Hold Fn, speak, release — transcribed, polished, and pasted. No cloud, no account, no latency.

dictation electron local-ai macos ollama speech-to-text whisper

TypeScript 73.5%
CSS 19.7%
Swift 6.3%
HTML 0.4%

Find a file

GMoonblocks b6be778334 Overlay toggle, permission panel, fresh permission checks, tray icon Always-visible overlay (off by default) with compact/active states. Show overlay toggle in Preferences. Permission status panel with Manage in System Settings link. Fresh bootstrap on every Fn press to avoid stale permission cache. Menu bar tray icon with context menu. Sidebar credits with @GiusMarci and raelume.ai links. Microphone grant polling to handle macOS TCC timing delay.		2026-04-11 21:45:43 +02:00
assets	Add cover image, multilingual Whisper, and scaled prompt levels	2026-04-11 16:56:17 +02:00
build/icons	Add menu bar tray icon with context menu	2026-04-11 21:07:21 +02:00
scripts	Initial commit: OpenWhisp local dictation app	2026-04-11 10:43:51 +02:00
src	Overlay toggle, permission panel, fresh permission checks, tray icon	2026-04-11 21:45:43 +02:00
swift	Initial commit: OpenWhisp local dictation app	2026-04-11 10:43:51 +02:00
.gitignore	Update gitignore and README for public release	2026-04-11 14:28:44 +02:00
electron.vite.config.ts	Initial commit: OpenWhisp local dictation app	2026-04-11 10:43:51 +02:00
index.html	Landscape layout, Instrument Serif typography, audio-reactive overlay	2026-04-11 11:18:29 +02:00
package-lock.json	Flow-style color scheme with hugeicons and floating content card	2026-04-11 14:00:35 +02:00
package.json	Flow-style color scheme with hugeicons and floating content card	2026-04-11 14:00:35 +02:00
README.md	Update getting started with Ollama setup before app launch	2026-04-11 21:03:57 +02:00
tsconfig.json	Initial commit: OpenWhisp local dictation app	2026-04-11 10:43:51 +02:00

README.md

Openwhisp

Voice to text, entirely on your machine. Hold Fn, speak, release — your words are transcribed, polished, and pasted right where you need them. No cloud, no account, no latency.

Built in a weekend because I kept getting ads for Wispr Flow and thought — why not build it myself?

How it works

Hold Fn — OpenWhisp starts listening
Speak — your voice is captured locally
Release Fn — Whisper transcribes your speech, a local LLM polishes the text, and the result is pasted into whatever app you were using

The entire pipeline runs locally via Whisper (speech-to-text) and Ollama (text enhancement).

Features

Fully local — no data leaves your Mac
Styles — switch between Conversation and Vibe Coding modes depending on context
Enhancement levels — from raw transcription (No Filter) to professional polish (High)
Intent resolution — if you change your mind mid-sentence ("make it white... actually, black"), OpenWhisp resolves to your final intent
Auto-paste — refined text is pasted directly into the active app
Auto-launch Ollama — if Ollama is installed, OpenWhisp starts it automatically
Setup wizard — guided first-launch experience for permissions, models, and configuration
Minimal overlay — a small audio-reactive grid appears at the bottom of your screen during dictation

Styles

Style	Use case
Conversation	Messages, emails, notes, everyday writing
Vibe Coding	Developer communication — translates casual speech into proper engineering language

Each style has four enhancement levels: No Filter, Soft, Medium, and High.

Requirements

macOS (Apple Silicon recommended)
Ollama — OpenWhisp auto-launches it if installed
~10 GB disk space for models (downloaded on first launch)

Getting started

1. Install Ollama and download the text model first:

# Install Ollama from https://ollama.com/download/mac, then:
ollama serve

# In a new terminal, pull the text enhancement model (~9.6 GB)
ollama pull gemma4:e4b

2. Clone and run Openwhisp:

git clone https://github.com/giusmarci/openwhisp.git
cd openwhisp
npm install
npm run build:native
npm run dev

On first launch, the setup wizard will walk you through:

Ollama — verifies the connection. If Ollama is running, it connects automatically.
Speech model — downloads Whisper Base Multilingual (~150 MB) automatically.
Text model — detects the Gemma 4 model you already pulled.
Permissions — microphone access for recording, plus Accessibility and Input Monitoring for Fn key listening and auto-paste.

After setup, click into the text field where you want the text to go (an email, chat, code editor, etc.), then hold Fn and speak. When you release, the transcribed and enhanced text is automatically pasted into that field. If you move away or no text field is selected, the text is still copied to your clipboard — just use Cmd+V to paste it wherever you need.

Default models

Purpose	Model	Size
Speech-to-text	`onnx-community/whisper-base`	~150 MB
Text enhancement	`gemma4:e4b`	~9.6 GB

You can switch to any Ollama-compatible model from the Models page.

Tech stack

Electron + React + TypeScript — desktop shell and UI
@huggingface/transformers — local Whisper inference
Ollama — local LLM inference via API
Swift — native macOS helper for Fn key listening, focus detection, and paste simulation
electron-vite — build tooling
Hugeicons — UI icons

Building for distribution

npm run package

Builds the Electron app, compiles the Swift helper, and packages everything into a .dmg and .zip in the release/ directory.

Project structure

src/
  main/           # Electron main process
    dictation.ts    # Transcription + rewrite pipeline
    ollama.ts       # Ollama API client + auto-launch
    prompts.ts      # Global rules + style + level prompt matrix
    settings.ts     # Settings persistence
    windows.ts      # Window creation and positioning
  renderer/       # React UI
    App.tsx         # Sidebar layout, pages, setup wizard, overlay
    styles.css      # Complete styling
    audio-recorder.ts # Web Audio recorder with level metering
  preload/        # Electron preload bridge
  shared/         # Shared types and constants
swift/
  OpenWhispHelper.swift  # Native macOS helper

License

MIT

Made by Raelume