Speech synthesis
Links
- Deepvoice3 PyTorch - PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models.
- WaveNet vocoder - Can generate high quality raw speech samples conditioned on linguistic or acoustic features.
- Papercup - Translate your content into other languages with a voice that sounds like yours.
- WaveNet implementation in Keras
- nv-wavenet - CUDA reference implementation of autoregressive WaveNet inference.
- PyTorch implementation of Tacotron speech synthesis model
- Yet another WaveNet implementation in PyTorch
- Flowtron - Auto-regressive flow-based generative network for text to speech synthesis.
- A highly efficient, real-time text-to-speech system deployed on CPUs (2020) (HN)
- Sonatic - Emotionally Expressive Text to Speech.
- GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis
- Ask HN: My wife might lose the ability to speak in 3 weeks – how to prepare? (2020)
- DiffWave - Fast, high-quality neural vocoder and waveform synthesizer.
- Voice Conversion with Non-Parallel Data
- Speech Synthesis Papers
- VoiceFilter - Unofficial PyTorch implementation of Google AI's VoiceFilter system. (Web)
- ForwardTacotron - Generating speech in a single forward pass without any attention. (Web)
- HiFi-GAN - Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis.
- Parakeet - Text-to-speech toolKIT (supporting WaveFlow, ClariNet, WaveNet, Deep Voice 3, Transformer TTS and FastSpeech).
- pyttsx3 - Offline Text To Speech synthesis for python.
- SOVA TTS - Speech syntthesis solution based on Tacotron 2 architecture.
- eSpeak NG - Open source speech synthesizer that supports more than hundred languages and accents.
- PRiSM SampleRNN - Neural sound synthesis with TensorFlow 2.
- Flite - Small fast portable speech synthesis system.
- FastSpeech 2: Fast and High-Quality End-to-End Text to Speech (2020) (Code) (Code)
- Neural Granular Sound Synthesis (Code)
- CLEESE - Combinatorial Expressive Speech Engine.
- LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation
- LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search (2021) (Code)
- A Survey on Neural Speech Synthesis (2021) (Code)
- Binaural Speech Synthesis - Code to train a mono-to-binaural neural sound renderer.
- NN-SVS - Neural network-based singing voice synthesis library for research.
- Larynx - End to end text to speech system using gruut and onnx, 50 voices, 9 languages.
- WellSaid Labs - Voice Narration. Simplified.
- Neural Wave shaping Synthesis - Efficient neural audio synthesis in the waveform domain. (Article)
- Catch-A-Waveform: Learning to Generate Audio from a Single Short Example (Code)
- TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis (2020) (Code)
- EdiTTS: Score-based Editing for Controllable Text-to-Speech
- PortaSpeech: Portable and High-Quality Generative Text-to-Speech (2021) (Code)
- Speech Resynthesis from Discrete Disentangled Self-Supervised Representations (2021) (Code)
- Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge (2021) (Code)
- Grail-rs - Rust speech synth.
- RAVE: A variational autoencoder for fast and high-quality neural audio synthesis (2021) (Code)
- WaveFlow: A Compact Flow-based Model for Raw Audio (2020) (Code)
- VoiceFixer - Framework for general speech restoration.
- TTS-RS - High-level Text-To-Speech (TTS) interface supporting various backends.
- Speech synthesis using AVSpeechSynthesizer (2021)
- Towards Lightweight Controllable Audio Synthesis with Conditional Implicit Neural Representations (2021) (Code)
- TTS - Library for advanced Text-to-Speech generation. (Web) (HN)
- YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
- SubSync - Subtitle Speech Synchronizer. (Overview) (HN)
- Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation (2021) (Code)
- NATSpeech - Non-Autoregressive Text-to-Speech Framework.
- VocBench: A Neural Vocoder Benchmark for Speech Synthesis (2021) (Code)
- TransformerTTS - Text-to-Speech Transformer in TensorFlow 2.
- Awesome Speech Recognition Speech Synthesis Papers
- Neural Instrument Cloning from very few samples (2022) (Code)
- MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis (2021) (Code)
- IMS Toucan - Toolkit to train state-of-the-art Speech Synthesis models.
- BDDM: Bilateral Denoising Diffusion Models for Fast and High-quality Speech Synthesis (2022)
- Deep Learning for Emotional Text-to-speech - Summary on our attempts at using Deep Learning approaches for Emotional Text to Speech.
- Nix-TTS - Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation.
- xVA Synth - Machine learning based speech synthesis Electron app, with voices from specific characters from video games.
- Bandwidth Extension is All You Need (2021) (Code)
- TorToiSe - Multi-voice TTS system trained with an emphasis on quality. (Demos)
- Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech (2020) (Code)
- UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation (2021) (Code)
- TikTok TTS - Generate the funny TiKTok lady voice (& more) in your browser. (Code)
- TikTok Text-to-speech API - Simple Python script to interact with the TikTok TTS API.
- Unreal Speech - Text-to-Speech API. Better & 8x Cheaper than AWS.
- 15.ai - Natural TTS with minimal viable data. (HN)
- JDC-PitchExtractor - Deep Neural Pitch Extractor for Voice Conversion and TTS Training.
- Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech (2021) (Code)
- Publicly Available Emotional Speech Dataset (ESD) for Speech Synthesis and Voice Conversion
- Mimic 3 - Fast local neural text to speech engine for Mycroft. (Intro) (HN)
- DiffWave: A Versatile Diffusion Model for Audio Synthesis (2021) (Code)
- FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis
- HiFi-GAN - Training and inference scripts for the vocoder models in A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion.
- Acoustic-Model - Training and inference scripts for the acoustic models in A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion.
- HuBERT - Training and inference scripts for the HuBERT content encoders in A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion.
- Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech (2021) (Code)
- Diffsound: Discrete Diffusion Model for Text-to-sound Generation (Code)
- DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation (2022) (Code)
- ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech
- AudioLM: Language Modeling Approach to Audio Generation (Code)
- Awesome Singing Voice Synthesis and Singing Voice Conversion
- LPCNet - Efficient neural speech synthesis.
- AudioGen: Textually Guided Audio Generation (HN)
- Ask HN: Best free text-to-speech plugins for browsers? (2022)
- Neural Speech Synthesis Tutorial (2022)
- PhaseAug: Differentiable Augmentation for Speech Synthesis to Simulate One-to-Many Mapping (2022)
- VIC-20 text-to-speech synthesizer using the iconic voice of SAM (2021) (Article)
- PyTorch implementation of the Perceptual Evaluation of Speech Quality
- Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform (2022)
- GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech (2022) (Code)
- AERO - Audio Super Resolution in the Spectral Domain.
- Enhance Speech from Adobe - Free AI filter for cleaning up spoken audio. (HN)
- Incorporating AutoVocoder to MB-iSTFT-VITS
- Automatic Prosody Annotation with Pre-Trained Text-Speech Model
- Ask HN: Are there any good open source text-to-speech tools? (2023)
- Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers (2023) (Web) (HN) (HN) (Code) (Code)
- This Voice Doesn't Exist – Generative Voice AI (2023) (HN)
- Autotone - Vocal pitch correction web application, like Autotune. (HN)
- Voice Cloning Model with Zero-Shot Attention-Based TTS
- ElevenLabs | Speech Synthesis
- Praat - Speech analysis tool used for doing phonetics by computer. (Web)
- Audio AI Timeline - Timeline of the latest AI models for audio generation.
- AudioLDM - Text-to-Audio Generation with Latent Diffusion Models.
- Speaking Style Conversion With Discrete Self-Supervised Units (2022) (Code)
- StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation (2022) (Code)
- TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement (2023) (Code)
- StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models (2022) (Code)
- PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS
- Improving Few-shot Learning for Talking Face System with TTS Data Augmentation (2023)
- Conditioning and Sampling in Variational Diffusion Models for Speech Super-Resolution (2022) (Code)
- Play.ht - Generate and clone voices from 20 seconds of audio. (HN)
- NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates
- Bark - Text-prompted Generative Audio Model. (HN)
- piper - Fast, local neural text to speech system.
- SoftVC VITS Singing Voice Conversion Fork
- Kesha - Voice Assistant made as an experiment using Silero TTS + Vosk STT + Picovoice Porcupine + ChatGPT.
- Bark...but with the ability to use voice cloning on custom audio/text pairs
- SNAC: Speaker-normalized Affine Coupling Layer in Flow-based Architecture for Zero-Shot Multi-Speaker Text-to-Speech
- SoundStorm: Efficient Parallel Audio Generation
- DeepFilterNet - Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) using on Deep Filtering.
- Voicebox: Generative AI model for speech that generalizes across tasks (2023) (HN)
- Build a conversational engine so we can talk to our computers
- SoftVC VITS Singing Voice Conversion
- Google SoundStorm: Efficient Parallel Audio Generation (HN)
- Voder Speech Synthesizer (HN)
- ElevenLabs Python - Official Python API for ElevenLabs text-to-speech.
- UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data (2023)
- VALL-E X: Multilingual Text-to-Speech Synthesis and Voice Cloning