Technology AI

AI Customer Support

A support chatbot that actually works — with a human handoff that doesn't feel like failure.

AI Customer Support — Featured screenshot

Client

Confidential

Year

2024

Duration

3 months

The challenge

A SaaS company was drowning in tier-1 support tickets — password resets, billing questions, "how do I" inquiries. Their small support team spent most of its time on issues their documentation already answered. Meanwhile, the genuinely complex tickets waited.

They wanted an AI layer that would resolve the easy stuff automatically and hand off the hard stuff to humans cleanly — without the frustration that usually comes with chatbot experiences.

What we built

  • RAG pipeline: their entire knowledge base, help docs, and support history indexed into pgvector, with hybrid retrieval (dense + BM25) and cross-encoder reranking.
  • Conversation layer: GPT-4 with a tight system prompt, tool-use for common actions (password reset, plan change), and a clear "I don't know — let me get you to someone who does" escape hatch.
  • Handoff UX: when the bot escalates, the full conversation transcript goes to the human agent, so the user never has to re-explain.
  • Analytics: every conversation tagged by outcome (resolved / escalated / abandoned) and topic, so the team can see where the bot succeeds and where it fails.
  • Feedback loop: agents can flag bad bot responses, which feed back into prompt tuning and knowledge-base improvements.

Results

  • 42% of tier-1 tickets resolved without human involvement.
  • Median response time dropped from 4 hours to 8 seconds for resolved cases.
  • CSAT held steady (no drop in customer satisfaction despite more bot-handled conversations).
  • Support team reallocated to tier-2/3 work where they add the most value.

What we learned

The single biggest quality lever is retrieval, not the LLM. Swapping GPT-4 for a cheaper model changes cost a lot; improving retrieval changes correctness a lot. Most AI-support products we've seen under-invest in retrieval and over-invest in prompt engineering.

The second biggest lever is the handoff. A chatbot that confidently fails is worse than no chatbot. A chatbot that says "I'm not sure, let me get you a human" within a few seconds actually builds trust.

Technology

Python OpenAI GPT-4 FastAPI React pgvector Redis

Working on something similar?

Send the shape of the problem. We'll tell you what it would take to solve.

Start a project