CASE FILE #02LLMAgenticProduct Ops

CRAFT(Customer Response Analysis & Feature Translator)

Paytm

Users tell us the moment something breaks. The slow part is everything after — making sense of that response, and turning it into a fix. What if the gap between a user's reaction and a shipped change closed to almost nothing?

Context

It started with the CST Dashboard — where we began. The team was working off thousands of weekly support signals with no way to see what was actually hurting users, so we built a tool that ingested live consumer conversations, clustered them by issue, and ranked severity. For the first time we could see what was breaking in real time. But awareness was only the first step — the harder, slower part was everything after: making sense of why users were reacting, diagnosing the leaking funnel behind it, deciding what to build, and turning that into work engineers could pick up.

The Problem

The CST Dashboard solved 'what's wrong'. It didn't touch the rest of the PM cycle — and that cycle was where the time went. Reading a user reaction back to its root cause meant manually digging through funnels, the data warehouse, and the codebase. Turning insight into a Jira ticket meant 30–45 minutes of writing from memory, and the tickets that came out were often too vague to act on. The bet was that a single workspace could carry a PM from a user's response all the way to an engineer-ready change — not just automate the typing at the end.

What We Did

01Built the CST Dashboard as the starting point: ingests live consumer conversations, clusters them by issue type, and ranks severity with LLMs — giving the team real-time visibility into what's actually hurting users.
02Designed CRAFT on top — a PM workspace, not a ticket tool: scan KPIs and funnels, surface growth ideas straight from the signals coming through, and initiate Jira in one place. The point was to cover the whole loop, from spotting a reaction to shipping a fix.
03Built the agentic chat at the centre of CRAFT — a PM chats in plain English and can either create a Jira ticket on the spot, or kick off a PRD: a concise use case in a standardised format. Because it reads the repo, it understands what a given change actually touches, and reverts with the right questions and edge cases before anything gets written.
04Structured the Jira output into a real framework — Context, detailed stories, and broken-down engineering tasks with acceptance criteria — so tickets land ready to execute and engineers spend their time building, not decoding.
05As an internal tool for bringing this onto the file system matured, we extended the same workflow into an IDE-driven setup: multiple commands that orchestrate several agents, each directed through markdown files — so a PM can run the entire cycle from an IDE using Claude Code.
06Built the knowledge base — the piece the older setup was missing. The earlier version reached for funnels, called the repo directly to find gaps, and queried the data warehouse live every time. We grounded it instead: schemas attached, funnel logic explained step by step, and scripts to pull repo details — so the agents diagnose against the real product, not a guess.

Outcome

A PM now has two ways in: CRAFT to read KPIs, run funnel analysis, chat to draft a ticket or a repo-aware PRD — or the file-system setup in an IDE, driving the same cycle through Claude Code. Funnel analysis that used to be a manual dig is grounded in real schemas and data, so a drop gets explained, not just spotted. Ticket-writing collapsed from 30–45 minutes of vague notes to minutes of structured, engineer-ready work. A concrete example: CRAFT surfaced a spike in PAN–Aadhaar mismatch failures, traced it to the root cause, and drove a pre-check nudge that resolved it — exactly the kind of leak that used to compound silently for days before anyone connected the dots. The downstream effect is the one that matters: user reactions that used to compound for days get diagnosed and shipped faster, which users feel as fewer broken flows and quicker fixes.

What We Took Away

The leverage wasn't the chat or the ticket formatting — it was the knowledge base. An agent that doesn't understand the funnel, the schema, or the codebase gives you confident, generic answers. Once it was grounded in how the product actually works, the same model went from 'note-taker' to something that could genuinely diagnose. Grounding beat cleverness every time.

The Honest Take

What we'd push on next is a feedback loop: let engineers rate each generated ticket so the knowledge base learns what 'good' means for this codebase and the diagnosis gets sharper over time.

← PreviousGold RushPaytm Next →Digital SilverPaytm

↩ Back to portfolio