Notes from building call-gpt - a generative AI phone calling system.
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Caller │────▶│ Twilio │────▶│ Your App │
│ (Phone) │◀────│ Media │◀────│ (Node.js) │
└─────────────┘ │ Streams │ └──────┬──────┘
└─────────────┘ │
▼
┌──────────────────────────┴──────────────────────────┐
│ │
┌─────▼─────┐ ┌─────────────┐ ┌─────────────────┐
│ STT │ │ LLM │ │ TTS │
│ Deepgram │────▶│ OpenAI/ │────▶│ Deepgram/ │
│ │ │ Claude │ │ ElevenLabs │
└───────────┘ └─────────────┘ └─────────────────┘
| Feature | Implementation |
|---|---|
| Low latency (~1s) | Streaming at every stage |
| Interruption handling | Detect speech, cancel current response |
| Conversation history | Maintain context with LLM |
| Function calling | LLM can trigger external tools |