Give your agent the ability to speak to you real-time. Local text-to-speech, voice cloning, and audio generation on Apple Silicon. Give your agent the ability to speak to you real-time. Local TTS with voice cloning on Apple Silicon. Requirement Check
uname -m → arm64sw_verswhich soxbrew install soxwhich ffmpegbrew install ffmpegwhich pdftotextbrew install popplerspeak article.txtspeak doc.mdspeak "Hello"pbpaste | speakcat file.txt | speaklynx -dump -nolist "https://example.com/article" | speak --output article.wavpdftotext doc.pdf doc.txttextutil -convert txt doc.docxpandoc -f html -t plain doc.html > doc.txtspeak text.txt --output file.wavspeak text.txt --streamspeak text.txt --playspeak text.txt --stream --output file.wavspeak article.txt # → ~/Audio/speak/article.wav (no playback) speak "Hello" # → ~/Audio/speak/speak_<timestamp>.wav
~/Audio/speak/~/.chatter/voices/mkdir -p ~/.chatter/voices/ mkdir -p ~/Audio/custom/
# -d = use default microphone # Recording starts immediately and stops after 25 seconds sox -d -r 24000 -c 1 ~/.chatter/voices/my_voice.wav trim 0 25
# From MP3 ffmpeg -i voice.mp3 -ar 24000 -ac 1 voice.wav # From M4A (QuickTime) ffmpeg -i voice.m4a -ar 24000 -ac 1 voice.wav # Trim to 25 seconds ffmpeg -i long.wav -t 25 -ar 24000 -ac 1 trimmed.wav # Check sample properties ffprobe -i voice.wav 2>&1 | grep -E "Duration|Stream" # Should show: Duration ~15-25s, 24000 Hz, mono `### Using Your Voice` # Create directory mkdir -p ~/.chatter/voices/ # Move sample mv voice.wav ~/.chatter/voices/my_voice.wav # Test speak "Testing my voice" --voice ~/.chatter/voices/my_voice.wav --stream # Use for content speak notes.txt --voice ~/.chatter/voices/my_voice.wav --output presentation.wav
~/.chatter/voices/my_voice.wav (tilde expanded by shell)/Users/name/.chatter/voices/my_voice.wavmy_voice.wav (relative path)./voices/my_voice.wav (relative path)--voice is omitted, a built-in default voice is used:speak "Hello world" --stream # Uses default voicespeak "[sigh] Monday again." --stream # Output: (sigh sound) "Monday again."
[laugh][chuckle][sigh][gasp][groan][clear throat][cough][crying][singing][pause], [whisper] (ignored)"Wait... let me think."mkdir -p ~/Audio/book/ speak ch01.txt ch02.txt ch03.txt --output-dir ~/Audio/book/ # Creates: ch01.wav, ch02.wav, ch03.wav # With auto-chunking (for long files) speak chapters/*.txt --output-dir ~/Audio/book/ --auto-chunk # Skip completed files speak chapters/*.txt --output-dir ~/Audio/book/ --skip-existing
--auto-chunk with batch processing:.wav per input file (e.g., ch01.wav)--keep-chunks)# Explicit order (recommended) speak concat ch01.wav ch02.wav ch03.wav --output book.wav # Glob pattern (REQUIRES zero-padded filenames) speak concat audiobook/*.wav --output book.wav
01, 02, ..., 091, 2, ..., 901, 02, ..., 991, 10, 2, ...001, 002, ..., 9991, 100, 2, ...1, 10, 2 vs 01, 02, 10.# Preview table of contents pdftotext -f 1 -l 5 textbook.pdf toc.txt cat toc.txt # Note chapter page numbers # Or search for "Chapter" markers pdftotext textbook.pdf - | grep -n "Chapter" `### Step 2: Extract Chapters (Zero-Padded!)` # For 100-page book with ~10 chapters pdftotext -f 1 -l 12 -layout textbook.pdf ch01.txt pdftotext -f 13 -l 25 -layout textbook.pdf ch02.txt pdftotext -f 26 -l 38 -layout textbook.pdf ch03.txt # ... continue for all chapters `### Step 3: Estimate Time` speak --estimate ch*.txt # Shows: total audio duration, generation time, storage needed # Quick estimates: # 1 page ≈ 2 min audio ≈ 1 min generation # 100 pages ≈ 200 min audio ≈ 100 min generation ≈ 500 MB `### Step 4: Generate Audio` mkdir -p audiobook/ speak ch01.txt ch02.txt ch03.txt --output-dir audiobook/ --auto-chunk # Creates: audiobook/ch01.wav, audiobook/ch02.wav, audiobook/ch03.wav `### Step 5: Concatenate` speak concat audiobook/ch01.wav audiobook/ch02.wav audiobook/ch03.wav --output complete_audiobook.wav # Or with glob (only if zero-padded): speak concat audiobook/ch*.wav --output complete_audiobook.wav
brew install tesseractpdftotext -enc UTF-8 doc.pdfpdftotext doc.pdf - | wc -w (should be >100)mkdir -p podcast/scripts podcast/wav echo "Welcome to the show." > podcast/scripts/01_host.txt echo "Thanks for having me." > podcast/scripts/02_guest.txt speak podcast/scripts/01_host.txt --voice ~/.chatter/voices/host.wav --output podcast/wav/01.wav speak podcast/scripts/02_guest.txt --voice ~/.chatter/voices/guest.wav --output podcast/wav/02.wav speak concat podcast/wav/01.wav podcast/wav/02.wav --output podcast.wav
--stream--play--output <path>--output-dir <dir>--voice <path>--timeout <sec>--auto-chunk--chunk-size <n>--resume <file>--keep-chunks--skip-existing--estimate--dry-run--quietspeak setupspeak healthspeak modelsspeak concatspeak daemon killspeak config# Single file with auto-chunk — use --resume speak long.txt --auto-chunk --output book.wav # If interrupted, manifest saved at ~/Audio/speak/manifest.json speak --resume ~/Audio/speak/manifest.json # Batch processing — use --skip-existing speak ch*.txt --output-dir audiobook/ --auto-chunk # If interrupted, re-run same command: speak ch*.txt --output-dir audiobook/ --auto-chunk --skip-existing
~/.chatter/voices/x.wavffmpeg -i in.wav -ar 24000 -ac 1 out.wavmkdir -p dirname/brew install sox01, 02, not 1, 2--auto-chunk or --timeout 600speak daemon kill && speak healthspeak "test" # Auto-setup on first run (downloads model ~500MB) speak setup # Or manual setup speak health # Verify everything works
speak health # Check status speak daemon kill # Stop manually