/v1/stt/stream
POST /v1/tts · cache: — (repeat the same text → HIT, no Google call)
/v1/tts
records a short word → POST raw 16 kHz PCM to /v1/pronunciation/analyze → pitch-accent + timing
/v1/pronunciation/analyze