A notion-style talking avatar and a pure-CSS lip-sync mouth. Build a face, feed it any audio — ElevenLabs, a recording, the mic — and the mouth follows every phoneme. Then drop it into your app.
That’s a phonetic pangram — every English sound. “Sample” needs no key; “Make it talk” uses ElevenLabs via /api/tts.
import { Avatar } from "@lipzink/avatar"
import "@lipzink/avatar/styles.css"
// "spec" is the JSON you build & copy below
export function Hi({ spec }) {
return <Avatar spec={spec} size={96} />
}Click through the parts — or roll the dice. The avatar above updates live. When you like it, copy the spec: a tiny JSON blob you pass to <Avatar spec />.
The avatar is voice-agnostic — you bring the sound. With ElevenLabs timestamps the mouth lands on every phoneme on time; with any other audio it follows along live. Pick your lane:
import { useRef } from "react"
import { Avatar, type AvatarVoiceHandle } from "@lipzink/avatar"
import { fetchElevenLabsSpeech } from "@lipzink/mouth"
import "@lipzink/avatar/styles.css"
import "@lipzink/mouth/styles.css"
function Talking({ spec }) {
const voice = useRef<AvatarVoiceHandle>(null)
async function say(text: string) {
// /api/tts proxies ElevenLabs' /with-timestamps (key stays server-side).
const { audio, cues } = await fetchElevenLabsSpeech("/api/tts", {
text,
voiceId: "21m00Tcm4TlvDq8ikWAM",
})
// Scheduled against the audio clock → lands on every phoneme, on time.
voice.current?.playCues(audio, cues)
}
return <Avatar spec={spec} ref={voice} onClick={() => say("Hello there!")} />
}Whatever drives the mouth — timestamps, live audio, the mic — resolves to one of these. Tap to preview.
Don’t need the whole character? @lipzink/mouth is a standalone, pure-CSS mouth — no avatar, no assets. Position it over any illustration and tint it to match.
import { TalkingMouth } from "@lipzink/mouth"
import "@lipzink/mouth/styles.css"
// Just the mouth — no avatar, no bundled art. Overlay it on your illustration:
function MyCharacter() {
return (
<div style={{ position: "relative" }}>
<img src="/character.png" alt="" />
<div style={{ position: "absolute", left: "50%", top: "62%",
transform: "translate(-50%,-50%)" }}>
<TalkingMouth audio="/hello.mp3" scale={1.4} />
</div>
</div>
)
}The mouth is just a positioned CSS element — so it works on a photo, an illustration, even a Renaissance masterpiece. Here she is, finally able to answer the question everyone asks.
import { useRef } from "react"
import { TalkingMouth, type TalkingMouthHandle } from "@lipzink/mouth"
import "@lipzink/mouth/styles.css"
function Portrait() {
const mouth = useRef<TalkingMouthHandle>(null)
return (
<div style={{ position: "relative" }}>
<img src="/mona-lisa.png" alt="Mona Lisa" />
{/* Position + tint the mouth to match YOUR art */}
<div style={{ position: "absolute", left: "50%", top: "56%",
transform: "translate(-50%,-50%)" }}>
<TalkingMouth ref={mouth} scale={0.7} cavity="#7a1f1f" tongue="#c75c5c" />
</div>
<button onClick={() => mouth.current?.play("/hello.mp3")}>Speak</button>
</div>
)
}Small, typed, and unopinionated about your stack.
Bring any audio — ElevenLabs, OpenAI TTS, a recording, or the live microphone. No TTS is baked in.
ElevenLabs timestamps schedule visemes against the audio clock, so the mouth lands on each phoneme instead of trailing it.
Scheduled cues, live audio analysis, or swap in your own — one headless useLipsync hook behind them all.
Every phoneme maps to a CSS mouth shape. Mappings ship for ElevenLabs, Azure visemes, plain text, and live audio.
No canvas, no WebGL. The mouth is a positioned, tintable CSS element you can drop over any illustration.
Hundreds of bundled SVG parts, a randomizer, and a copy-paste spec — the whole character travels as JSON.
React 19, full TypeScript, no heavy dependencies. Two packages you can adopt together or apart.
useLipsync() hands you { shape, amplitude, status } so you can build entirely custom visuals on top.