Main Navigation

Appearance

Sidebar Navigation

Analytics

Grafana

Tinybird

Animals

Birds

API

tRPC

Art

Anime

Architecture

Clothes

Comics

Dancing

Drawing

Furniture

Generative art

Makeup

Midjourney

Pen plotting

Photography

Sketching

Tattoos

Augmented Reality

ARKit

Automation

Home automation

Biology

Bionics

Computational biology

Evolution

Genomics

DNA

Immunology

Immunotherapy

Regenerative medicine

Viruses

Books

Artificial Intelligence - A Modern Approach

Brave new world

Code - hidden language of software

Cracking the coding interview

Crafting interpreters

Elements of programming interviews

Eloquent ruby

Go in action

Mind for numbers

Mindstorms

Programming in Haskell

Rich dad poor dad

Surely you are joking Mr Feynman

Thinking, fast and slow

Business

DAOs

Landing pages

Pricing

Products

Restaurants

Startups

Funding

Marketplaces

Onboarding

Payroll

Values

Venture Capital

Command Line Tools

Ngrok

Sed

Tmux

Cloud computing

AWS

AWS Amplify

Fly.io

Serverless computing

AWS Lambda

Cloudflare workers

Code

CMD Explain

CMD Run

CMDs

Config

Definitions

Compilers

Build systems

Bazel

turbo

Linters

LLVM

Computer graphics

Bézier curves

Computer vision

Optical character recognition

CUDA

Image processing

Metal

OpenGL

Procedural generation

Ray tracing

Rendering

Shaders

SVG

Vulkan API

WebGL

WebGPU

Computer Science

Algorithms

Compression

Automata theory

Computer architecture

Formal verification

TLA+

Parsing

Consciousness

Ego

Cryptocurrencies

Avalanche

Bitcoin

Libra

Monero

Nano

Stellar

Terra

TON

Data Science

Data Processing

Data Visualization

Datasette

Apache Kafka

Databases

Blockchain

Arweave

Cardano

Cosmos

Ethereum

NEAR Protocol

Polkadot

Solana

Tezos

Uniswap

BonsaiDb

Cassandra

ClickHouse

CockroachDB

Dgraph

DuckDB

DynamoDB

EdgeDB

Fauna

FoundationDB

GreptimeDB

Kdb+

MariaDB

Memcached

MongoDB

MySQL

Neo4j

Planetscale

PostgreSQL

Prisma

Prometheus

Redis

RocksDB

SpiceDB

SQLite

SurrealDB

TimescaleDB

Design

3D modeling

Animation

Blender

Color

Design inspiration

Design systems

Figma

Figma plugins

Fonts

Framer

Icons

Industrial Design

Inkscape

Interior Design

Logos

Rive

Spline

User Experience

DevOps

Inngest

Observability

Site Reliability Engineering

Temporal

Terraform

Distributed systems

Conflict-free replicated data type

Load balancing

Message queue

MQTT

ZeroMQ

Remote Procedure Calls

gRPC

Drugs

Cannabis

Dissociatives

MDMA

Nootropics

Opiates

Psychedelics

Ketamine

Lysergamides

LSD

Microdosing

Salvia

Trippy things

Tryptamines

DMT

Research chemicals

Economy

Basic income

E-commerce

Finance

High frequency trading

Investing

Education

Learning

University

Environment

Renewable energy

Batteries

Nuclear energy

Solar

Veganism

Zero waste

Fitness

Exercises

Running

Strength training

Yoga

Focusing

Goals

Habits

Processes

Rules

Free pages

Front End

CSS

CSS Flexbox

CSS Grid

CSS in JS

Tailwind CSS

HTML

Games

Board games

Chess

Game development

Game engines

Bevy Engine

Ebiten

Godot

Unity

Unreal Engine

Golf

Minecraft

PlayStation

Poker

Sudoku

Wild Rift

Geography

Spatial analysis

Governance

Law

Politics

Hardware

Arduino

Circuit design

CPU

AMD

RISC-V

Displays

Firmware

Field-programmable gate array

Neuromorphic Computing

Raspberry Pi

Verilog

Health

Aging

Depression

Ergonomics

Hair

Nutrition

Cooking

Drinks

Coffee

Tea

Wine

Fasting

Foods

Hydroponics

Recipes

Supplements

Skin care

Teeth

History

Anthropology

Humans

Alan Watts

Ideas

KusKus

Learn Anything

Keyboards

qmk.md

Knowledge

Knowledge extraction

Knowledge graphs

Mental models

Languages

Chinese language

Inlang

Internationalization

Russian language

Life

Compassion

Death

Happiness

Journaling

Memories

Parenting

Success

Time

Looking back

2018

2018 April

2018 August

2018 December

2018 February

2018 January

2018 July

2018 June

2018 March

2018 May

2018 November

2018 October

2018 September

2019

2019 April

2019 August

2019 December

2019 February

2019 January

2019 July

2019 June

2019 March

2019 May

2019 November

2019 October

2019 September

2020

2020 April

2020 August

2020 December

2020 February

2020 January

2020 July

2020 June

2020 March

2020 May

2020 November

2020 October

2020 September

2021

2021 April

2021 August

2021 December

2021 February

2021 January

2021 July

2021 June

2021 March

2021 May

2021 November

2021 October

2021 September

2022

2022 April

2022 August

2022 December

2022 February

2022 January

2022 July

2022 June

2022 March

2022 May

2022 November

2022 October

2022 September

2023

2023 April

2023 August

2023 February

2023 January

2023 July

2023 June

2023 March

2023 May

macOS

AppleScript

apps

1Password

2Do

Affinity Designer

Alfred

AwGo

Making workflows

BetterTouchTool

Contacts

Fantastical

Hammerspoon

Hazel

iTerm

Keyboard Maestro

Keyboard Maestro macros I use

Keychain

Little Snitch

macOS apps

MindNode

Pixave

ScriptKit

Sketch

Textual

Timing

Trello

Tweetbot

Typinator

Xcode

Xcode Extensions

JavaScript for Automation

Machine learning

Artificial intelligence

Autonomous driving

ChatGPT

Datasets

Generative Machine Learning

libraries

JAX

Keras

ML Libraries

PyTorch

TensorFlow

ML Models

Neural networks

Generative adversarial networks

Graph neural networks

Recommendation systems

Reinforcement learning

Transfer learning

Unsupervised learning

Management

Leadership

Product Management

Math

Algebraic topology

Automatic differentiation

Calculus

Differential equations

Fourier transform

Fractals

Game theory

Geometric algebra

Geometry

Graph theory

Group theory

Homotopy theory

Lambda calculus

Linear algebra

Vectors

Linear programming

Logic

Automated theorem proving

Lean

Combinatory logic

Satisfiability modulo theories

Mathematical optimization

Combinatorial optimization

Gradient descent

Nearest neighbor search

Number theory

Queueing theory

Real analysis

Statistics

Markov chains

Topology

Type Theory

Computational type theory

Cubical type theory

Dependent types

Wolfram Alpha

Medicine

Diseases

Cancer

Mindfulness

Buddhism

Meditation

Tao

Movies

Acting

Film directors

Music

Ambient sounds

Music albums

Music artists

Music playlists

Music production

Ableton

Guitar

Logic Pro

Piano

Synthesizers

Singing

Song covers

Networking

ActivityPub

Authentication

Caddy

Decentralization

DNS

Domains

File sharing

Gemini

GraphQL

Apollo GraphQL

Grafbase

Hasura

PostGraphile

WunderGraph

HTTP

Internet of things

LoRaWAN

Matrix

Mesh networking

Microservices

Nginx

Peer to peer

BitTorrent

IPFS

QUIC

RabbitMQ

REST

SSH

Tailscale

TCP

TLS

VPN

WireGuard

WebSocket

Wi-Fi

Neuroscience

Brain Computer Interfaces

Cognition

Natural language processing

Bots

Sentiment analysis

Speech recognition

Speech synthesis

Virtual assistant

Notes

Inlang Git SDK RFC

Open Source

GitHub

GitHub actions

GitHub bots

GSOC

Operating systems

Android

Containers

Docker

Kubernetes

Kubernetes plugins

Emulators

File systems

Fuchsia OS

iOS

CoreML

HomeKit

iOS Shortcuts

iPad

tvOS

WatchOS

Linux

NixOS

MirageOS

Windows

Other things I find interesting

Football

Fragrances

Funny

Gardening

Jewelry

Massages

Mentions

Mushrooms

My workflow notation

Newsletters

Puzzles

Queries

Real estate

Sauna

Scuba diving

Skiing

Standup

Surfing

TikTok

Used macOS hotkeys

Web presence

My workflow in writing and maintaining this wiki

Woodworking

Package managers

Brew

Nix

Nix on macOS

Philosophy

Effective altruism

Ethics

Simulated reality

Physics

Antimatter

Classical mechanics

Dark matter

Electrical engineering

Signal processing

Quantum physics

Quantum computing

Quantum gravity

String theory

Podcasts

Podcast recording

Privacy

Ad blocking

Freedom

Self hosting

Tor Project

Programming

Agile development

Array programming

Coding practice

Competitive programming

Concurrency

Constraint programming

Continuous Integration

Design patterns

Documentation

Dynamic programming

Embedded systems

Encoding

Functional programming

Algebraic effects

Generalized algebraic data type

Hashing

Interactive computing

Google Colab

Jupyter Notebooks

Wolfram Mathematica

JSON

Logging

Logic programming

Object-oriented programming

Probabilistic programming

Program analysis

Program synthesis

Protocol buffers

Reactive programming

Recursion

Relational programming

Reverse engineering

Semantic versioning

Serialization

Software testing

Cypress

Fuzzing

State machines

Stream processing

Structured programming

System Design

Version control

Git

Visual programming

Programming languages

Ada

Agda

Assembly

Austral

Bash

C

C libraries

Clojure

Babashka

Clojure libraries

ClojureScript

C++

C++ libraries

Qt

Crystal

C#

Dart

Flutter

Dhall

Elixir

Elixir libraries

Phoenix framework

Elm

Elm libraries

Factor

Forth

Fortran

F#

Futhark

Gleam

Go

Go libraries

Ent

Wails

Ink

Java

Java libraries

JavaScript

Babel

Bun

ESLint

JS libraries

Angular

Astro

Data-Driven Documents

Ember.js

Enhance

Jest

MobX

Qwik

React

Blitz.js

Expo

Gatsby JS

MDX

Next.js

React components

React Hooks

React Native

React Server Side Rendering

Relay

Remix

Redux

RxJS

SolidJS

Svelte

Three.js

Julia

Julia libraries

Kotlin

Kotlin libraries

Language Server Protocol

Lisp

Carp

Common Lisp

Janet

Racket

Scheme

Lua

Modal

Nim

Nim libraries

Objective C

ObjC libraries

OCaml

OCaml libraries

Odin

Pascal

Perl

Pony

Processing

p5.js

Prolog

Datalog

Python

Python libraries

Dask

Django

FastAPI

NumPy

R language

R packages

ReasonML

ReasonML libraries

Roc

Ruby

Rails

Ruby libraries

Rust

Rust libraries

Axum

Tauri

Scala

Scala libraries

Self

Smalltalk

Standard ML

Swift

Swift libraries

Combine Framework

SwiftUI

Tcl

TypeScript

TypeScript libraries

Effect

Effector

TinyBase

Unison

V

Val

Vale

Zig

Zig libraries

Psychology

Addiction

Biases

Decision making

Marketing

Negotiating

Relationships

Gifts

Seduction

Sex

Research

Asking Questions

Blogs

Solving problems

Staying on top of things

Research papers

A view of mathematics

Robots

Drones

Security

Cryptography

Encryption

Zero knowledge proofs

Sharing

Everything I Know

My articles

My GitHub

My Notion

My Workflow

Things I own

Tracking

Sleep

Dreaming

Social networks

Bluesky

Farcaster

Instagram

Lemmy

Mastodon

Nostr

Scuttlebutt

Space

Black holes

Rockets

Universe

Talks

Presentations

Text editors

CodeMirror

Emacs

Emacs packages

Helix

IntelliJ IDEA

Monaco Editor

ProseMirror

Sublime Text

Sublime Text plugins

TipTap

Vim

Vim plugins

VS Code

VS Code extensions

Zed

Tools

Ansible

CodeSandbox

Dat

Dendron

Discord

Docusaurus

Dropbox

DuckDuckGo

Elasticsearch

Email

Firebase

GitBook

IFTTT

IRC

ZNC

Logseq

Meilisearch

Notion

Obsidian

PDF

Personal setups

Product Hunt

Raycast

Remnote

Roam Research

Sanity

Slack

Turso

Twitter

VitePress

Wordpress

XState

Zulip

Travel

Backpacks

Cities

Events

Finding homes

Hiking

Nomad

Transportation

Boats

Cycling

Planes

Visited

Afghanistan

Argentina

Austria

Belarus

Belgium

Bulgaria

Canada

Canary Islands

China

Cyprus

Denmark

Estonia

Europe

Finland

France

Georgia

Germany

Greece

India

Indonesia

Iran

Ireland

Israel

Italy

Japan

Kazakhstan

Korea

Netherlands

Norway

Portugal

Romania

Russia

Spain

Sri Lanka

Sweden

Switzerland

Taiwan

Thailand

Turkey

Ukraine

United Arab Emirates

United Kingdom

United States

Unix

Configuration management

Dotfiles

My file system

Unix Shell

Fish Shell

Mosh

Nushell

Warp

Zsh

Zsh plugins

Updates

Video

Cinematography

Virtual reality

VisionOS

Web

Browsers

Bookmarklets

Firefox

Google Chrome

Chrome DevTools

Playwright

Safari

Stylish

Capacitor

Content management systems

Deno

Electron

esbuild

JAMstack

Redwood

Node.js

Fastify

NestJS

Progressive web apps (PWA)

Rollup

RSS

Search engines

Search Engine Optimization

Service workers

Spin

Static sites

Eleventy

Hugo

Jekyll

swc

Vite

Web accessibility

Web Components

Web engines

WebKit

Web performance

Web scraping

Web workers

WebAssembly

Webpack

WebRTC

Work

Communication

Consultancies

Finding work

CV

Freelancing

Hiring

Interviews

Remote work

Writing

Markdown

Writing prompts

On this page

Table of Contents for current page

Speech synthesis

TorToiSe & 15.ai are nice.

Links

Deepvoice3 PyTorch - PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models.
WaveNet vocoder - Can generate high quality raw speech samples conditioned on linguistic or acoustic features.
Papercup - Translate your content into other languages with a voice that sounds like yours.
WaveNet implementation in Keras
nv-wavenet - CUDA reference implementation of autoregressive WaveNet inference.
PyTorch implementation of Tacotron speech synthesis model
Yet another WaveNet implementation in PyTorch
Flowtron - Auto-regressive flow-based generative network for text to speech synthesis.
A highly efficient, real-time text-to-speech system deployed on CPUs (2020) (HN)
Sonatic - Emotionally Expressive Text to Speech.
GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis
Ask HN: My wife might lose the ability to speak in 3 weeks – how to prepare? (2020)
DiffWave - Fast, high-quality neural vocoder and waveform synthesizer.
Voice Conversion with Non-Parallel Data
Speech Synthesis Papers
VoiceFilter - Unofficial PyTorch implementation of Google AI's VoiceFilter system. (Web)
ForwardTacotron - Generating speech in a single forward pass without any attention. (Web)
HiFi-GAN - Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis.
Parakeet - Text-to-speech toolKIT (supporting WaveFlow, ClariNet, WaveNet, Deep Voice 3, Transformer TTS and FastSpeech).
pyttsx3 - Offline Text To Speech synthesis for python.
SOVA TTS - Speech syntthesis solution based on Tacotron 2 architecture.
eSpeak NG - Open source speech synthesizer that supports more than hundred languages and accents.
PRiSM SampleRNN - Neural sound synthesis with TensorFlow 2.
Flite - Small fast portable speech synthesis system.
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech (2020) (Code) (Code)
Neural Granular Sound Synthesis (Code)
CLEESE - Combinatorial Expressive Speech Engine.
LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search (2021) (Code)
A Survey on Neural Speech Synthesis (2021) (Code)
Binaural Speech Synthesis - Code to train a mono-to-binaural neural sound renderer.
NN-SVS - Neural network-based singing voice synthesis library for research.
Larynx - End to end text to speech system using gruut and onnx, 50 voices, 9 languages.
WellSaid Labs - Voice Narration. Simplified.
Neural Wave shaping Synthesis - Efficient neural audio synthesis in the waveform domain. (Article)
Catch-A-Waveform: Learning to Generate Audio from a Single Short Example (Code)
TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis (2020) (Code)
EdiTTS: Score-based Editing for Controllable Text-to-Speech
PortaSpeech: Portable and High-Quality Generative Text-to-Speech (2021) (Code)
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations (2021) (Code)
Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge (2021) (Code)
Grail-rs - Rust speech synth.
RAVE: A variational autoencoder for fast and high-quality neural audio synthesis (2021) (Code)
WaveFlow: A Compact Flow-based Model for Raw Audio (2020) (Code)
VoiceFixer - Framework for general speech restoration.
TTS-RS - High-level Text-To-Speech (TTS) interface supporting various backends.
Speech synthesis using AVSpeechSynthesizer (2021)
Towards Lightweight Controllable Audio Synthesis with Conditional Implicit Neural Representations (2021) (Code)
TTS - Library for advanced Text-to-Speech generation. (Web) (HN)
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
SubSync - Subtitle Speech Synchronizer. (Overview) (HN)
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation (2021) (Code)
NATSpeech - Non-Autoregressive Text-to-Speech Framework.
VocBench: A Neural Vocoder Benchmark for Speech Synthesis (2021) (Code)
TransformerTTS - Text-to-Speech Transformer in TensorFlow 2.
Awesome Speech Recognition Speech Synthesis Papers
Neural Instrument Cloning from very few samples (2022) (Code)
MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis (2021) (Code)
IMS Toucan - Toolkit to train state-of-the-art Speech Synthesis models.
BDDM: Bilateral Denoising Diffusion Models for Fast and High-quality Speech Synthesis (2022)
Deep Learning for Emotional Text-to-speech - Summary on our attempts at using Deep Learning approaches for Emotional Text to Speech.
Nix-TTS - Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation.
xVA Synth - Machine learning based speech synthesis Electron app, with voices from specific characters from video games.
Bandwidth Extension is All You Need (2021) (Code)
TorToiSe - Multi-voice TTS system trained with an emphasis on quality. (Demos)
Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech (2020) (Code)
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation (2021) (Code)
TikTok TTS - Generate the funny TiKTok lady voice (& more) in your browser. (Code)
TikTok Text-to-speech API - Simple Python script to interact with the TikTok TTS API.
Unreal Speech - Text-to-Speech API. Better & 8x Cheaper than AWS.
15.ai - Natural TTS with minimal viable data. (HN)
JDC-PitchExtractor - Deep Neural Pitch Extractor for Voice Conversion and TTS Training.
Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech (2021) (Code)
Publicly Available Emotional Speech Dataset (ESD) for Speech Synthesis and Voice Conversion
Mimic 3 - Fast local neural text to speech engine for Mycroft. (Intro) (HN)
DiffWave: A Versatile Diffusion Model for Audio Synthesis (2021) (Code)
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis
HiFi-GAN - Training and inference scripts for the vocoder models in A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion.
Acoustic-Model - Training and inference scripts for the acoustic models in A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion.
HuBERT - Training and inference scripts for the HuBERT content encoders in A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion.
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech (2021) (Code)
Diffsound: Discrete Diffusion Model for Text-to-sound Generation (Code)
DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation (2022) (Code)
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech
AudioLM: Language Modeling Approach to Audio Generation (Code)
Awesome Singing Voice Synthesis and Singing Voice Conversion
LPCNet - Efficient neural speech synthesis.
AudioGen: Textually Guided Audio Generation (HN)
Ask HN: Best free text-to-speech plugins for browsers? (2022)
Neural Speech Synthesis Tutorial (2022)
PhaseAug: Differentiable Augmentation for Speech Synthesis to Simulate One-to-Many Mapping (2022)
VIC-20 text-to-speech synthesizer using the iconic voice of SAM (2021) (Article)
PyTorch implementation of the Perceptual Evaluation of Speech Quality
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform (2022)
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech (2022) (Code)
AERO - Audio Super Resolution in the Spectral Domain.
Enhance Speech from Adobe - Free AI filter for cleaning up spoken audio. (HN)
Incorporating AutoVocoder to MB-iSTFT-VITS
Automatic Prosody Annotation with Pre-Trained Text-Speech Model
Ask HN: Are there any good open source text-to-speech tools? (2023)
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers (2023) (Web) (HN) (HN) (Code) (Code)
This Voice Doesn't Exist – Generative Voice AI (2023) (HN)
Autotone - Vocal pitch correction web application, like Autotune. (HN)
Voice Cloning Model with Zero-Shot Attention-Based TTS
ElevenLabs | Speech Synthesis
Praat - Speech analysis tool used for doing phonetics by computer. (Web)
Audio AI Timeline - Timeline of the latest AI models for audio generation.
AudioLDM - Text-to-Audio Generation with Latent Diffusion Models.
Speaking Style Conversion With Discrete Self-Supervised Units (2022) (Code)
StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation (2022) (Code)
TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement (2023) (Code)
StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models (2022) (Code)
PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS
Improving Few-shot Learning for Talking Face System with TTS Data Augmentation (2023)
Conditioning and Sampling in Variational Diffusion Models for Speech Super-Resolution (2022) (Code)
Play.ht - Generate and clone voices from 20 seconds of audio. (HN)
NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates
Bark - Text-prompted Generative Audio Model. (HN)
piper - Fast, local neural text to speech system.
SoftVC VITS Singing Voice Conversion Fork
Kesha - Voice Assistant made as an experiment using Silero TTS + Vosk STT + Picovoice Porcupine + ChatGPT.
Bark...but with the ability to use voice cloning on custom audio/text pairs
SNAC: Speaker-normalized Affine Coupling Layer in Flow-based Architecture for Zero-Shot Multi-Speaker Text-to-Speech
SoundStorm: Efficient Parallel Audio Generation
DeepFilterNet - Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) using on Deep Filtering.
Voicebox: Generative AI model for speech that generalizes across tasks (2023) (HN)
Build a conversational engine so we can talk to our computers
SoftVC VITS Singing Voice Conversion
Google SoundStorm: Efficient Parallel Audio Generation (HN)
Voder Speech Synthesizer (HN)
ElevenLabs Python - Official Python API for ElevenLabs text-to-speech.
UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data (2023)
VALL-E X: Multilingual Text-to-Speech Synthesis and Voice Cloning

Previous pageSpeech recognition

Next pageVirtual assistant