Connect Claude to Snowflake and ask, "What are the top three reasons customers are churning?"
You'll get a confident answer that’s, on its face, incredibly compelling, with beautiful graphs and formatting.
But read the fine print. For example, Claude may say, the average transcript is ~34K characters. If I pulled the whole April/May window into context, that's ~340M chars — totally infeasible to read directly. So I need a sampling strategy.
- I randomly sampled 50 conversations.
- I used key word analysis to select the sample.
- I read the first comment of each support case.
- I will select one call per account
Why 50? Why key word search? Why the first comment?
Why not 100% of all 10,000 conversations?
It's not the model. It's the tools.
Same brain, different tools
The model is the brain, the tools are the arms and legs.
Claude and Rippit share the same brain.
So does Claude connected to any data source - Snowflake, a helpdesk, a call recorder, a chat platform, a CRM.
What's different is how Rippit can orchestrate the brain.
What happens when Claude talks to Snowflake
Claude writes SQL and Snowflake runs the query. Rows come back - including raw transcripts.
Claude has to load and read the transcripts into its context window and reason over them at query time.
That's the bottleneck.
A transcript could be 2,000 - 10,000 tokens, maybe more. Reading 50,000 of them to answer one question is both technically possible and economically insane.
Claude or any LLM cannot actually load 100% of the transcripts into its context window - so the LLM has to sample the transcripts someway.
The model summarizes the sample and reports back. It takes shortcuts because it has to.
Again - it will give you what seems to be an amazing answer, that may look totally defensible - but when you dig into what Claude actually did - it reveals all the shortcuts it took.
The same thing happens with any conversation source
Different tool, same shape.
Connect Claude to Zendesk. Connect it to Gong. Connect it to Intercom, Salesforce Service Cloud, Front, Dialpad, Slack, Teams, your in-product chats, your agent-to-agent messages. Pick any conversation source.
Zendesk and Gong aren’t data platforms like Snowflake where they can handle 100% analysis on the fly for you so Claude has to sample from those sources and fit what it can in the context window which is a tiny percent of the total conversation volume.
Snowflake has the ability to enrich 100% of conversations but it is not designed for this so it’s too slow to be feasible - and requires more complex internal building.
What Rippit does differently — part one, pre-enrichment
Every conversation that lands in Rippit gets processed once, at ingestion. Topics. Intents. Sentiment. Escalations. Outcomes. Themes. Entities.
We’ve determined a number of data points that the most people will need and pre-enrich the conversations based on what we’ve learned over the past 10 years.
However, we also give customers the ability to build customized pre-enrichment prompts. Given every business is unique - the conversations each one is having is unique - therefore you should have the ability to customize what insights you want out of your data.
You define the questions you actually care about - "did the customer mention a competitor by name," "did the agent quote pricing," "was a regulatory term invoked," "did this conversation contain a renewal objection," “what was our customers’ thoughts on X new product” — and we run those against every single conversation at ingestion.
A custom question becomes a custom dimension. Asked once, answered across 100% of the corpus.
Those become structured dimensions and measures that Rippit can leverage when someone asks a question - this means each question is cheaper and faster.
This is why Rippit is better than the "Claude + warehouse" setup. The generic setup gives you raw text and asks the model to figure out your business at query time, every time, over a sample. Rippit lets you encode your business once and run it over everything.
The per-conversation model cost was paid once, asynchronously, at ingestion. Not on every analytical pass.
If you do take the time to build LLM pipelines, in Snowflake for example, you’ll find the cost wildly different from what it would cost in Rippit. We’re just passing through the cost that the model providers charge us - 1:1, whereas Snowflake marks up these tokens and recognizes that markup as revenue. Therefore, your LLM token costs are always 10-25% higher than using Rippit - when you get hooked and want to get more and more insight out of your data - that 10-25% turns into a much higher annual cost.
What Rippit does differently — part two, on-the-fly enrichment
Pre-enrichment only covers some set of use cases customer ask about. You also need to do -on-the-fly enrichment
When you ask Rippit some questions, the model will need to read the raw transcripts to figure out the answer.
Rippit will run query time question-specific LLM calls over every conversation that has been selected for this analysis.
Pre-enrichment gives you a better baseline than what exists in your CRM or Phone System and on-the-fly enrichment is what gives you depth and 100% question coverage.
Sampling gives you an answer. It doesn't give you the right answer.
This is the part that should bother you most.
When the model samples 50 conversations and tells you the top three reasons customers are churning, it isn't lying. It found patterns. The patterns are real in those 50 conversations.
But it’s not verifiably accurate.
It's anecdotal evidence confidently articulated as analytics.
An LLM with a confident voice and three bullet points feels like a research report. It isn't. It's the equivalent of asking a consultant to talk to a couple customers at random and come back with a strategy memo. The summary will sound smart. The conclusions might even be directionally correct. They also might be completely wrong, and you have no way to know which.
You're going to take this answer into a board meeting. Or a roadmap review. Or a renewal conversation. Decisions get made on it. And the underlying evidence base is fifty conversations out of fifty thousand, picked by a heuristic nobody audited.
Here’s just three examples of where the anecdote-vs-statistics gap actually bites:
Rare events. If 2% of conversations contain a churn signal, a 50-conversation sample catches one of them, maybe. You'll never see the pattern. Compliance violations, executive escalations, regulatory disclosures - exactly the things you can't afford to miss, exactly the things sampling guarantees you'll miss.
Trend detection. Comparing two periods requires comparable coverage. Sample 50 last quarter and 50 this quarter, and the noise swamps any signal smaller than 20 points. With 100%, a 2-point shift in resolution rate is real.
Segmented analysis. "What are enterprise customers saying about pricing?" needs volume inside the segment. Sample globally and filter, and you've got three conversations from your top tier. Sample inside the segment, and you had to know the segment mattered before you sampled — which means you've already lost the ability to ask unanticipated questions.
The tradeoff is real
Nothing comes for free. 100% coverage costs more. We’re not talking about how much we charge but literally costs more in AI tokens.
But you get the answer you can trust and make better decisions with.
The summary
Claude + Snowflake is a brain reaching for a tool that doesn't understand conversations.
Claude + Rippit is a brain reaching for a tool that already does.
Same model. Different tools. Different answers.
Snowflake Example from Claude

Prompt:
Can you use snowflake to tell me how customers are talking about Rippit and how that's changed over the last month or two?
Claude states that it has to sample and is just trying to determine the best approach
So the windows look like:
- Last ~2 months (focus): April + May 2026 = 1,099 calls across ~340 distinct accounts
- Prior comparable window: Feb + March 2026 = 1,318 calls across ~370 accounts
Average transcript is ~34K characters. If I pulled the whole April/May window into context, that's ~37M chars — totally infeasible to read directly. So I need a sampling strategy.
Where I'd actually start. Cheap signals first to find which calls are even worth opening, then read only those. Roughly:
- Stratify by account, not by call. A single big account can produce 30 calls in a month and dominate any naive random sample. I'd partition into buckets — e.g. (a) top-ARR accounts, (b) churn-cohort accounts (which I do happen to know from Salesforce), (c) new logos, (d) everyone else — and sample within each.
- Use cheap lexical filters on the raw transcript to find calls that are likely to be about Rippit-as-a-product (vs. random discovery calls, demos, internal syncs that got recorded). Keywords like "Rippit", "MaestroQA", "the product", "feature", "broken", "doesn't work", "love", "cancel", "renewal", "competitor", etc. That gets you from ~1,099 → maybe 300-500 candidate calls.
- From candidates, take a stratified random sample small enough to actually summarize — maybe 30-50 calls per window. Pull the first ~5K chars of each, dump into a single context, and synthesize themes by hand (or with one LLM call).
The methodology it chose
Step 1: Scope the windows and drop unattributable calls. April-May 2026 = 1,099 calls, but ~250 of those have no ACCOUNT_ID (internal syncs, recordings that never got matched to a Salesforce account). Drop them — you can't talk about "customers" without knowing which customer. That leaves 852 attributed calls.
Step 2: Pick a role lens. CSM calls are where existing customers talk about the product. SDR/AE calls are pitch motion — different signal, separate analysis. For "what are customers saying about Rippit," restrict to CSM + Head of CS. That's 508 calls in Apr-May.
Step 3: Lexical filters to find candidate calls. Cheap and noisy, but better than random. The big lesson: most sentiment keywords are useless. "Love"/"great"/"amazing" matched 92% of all calls — they're filler. The keywords that do discriminate are concrete: product/brand mentions, named competitors, integration names (Gladly, Kustomer, Zendesk), specific surfaces (auto-QA, coaching, calibration, screen capture).
Step 4: Stratified one-call-per-account sampling. A naive random sample over-weights chatty accounts (Mindbody had 8 product-mention calls in 2 months). Take the most recent call per account so each customer voice gets weighted once.
Step 5: Snippet extraction, not full transcripts. Pull ±400 chars around the first brand mention. ~900 chars per call × 50 calls = 45K chars total — fits in one context window for thematic coding.
Step 6: Track keyword rates month-over-month. Raw counts are misleading because call volumes swing 30%+ month-over-month. Normalize as percent of CSM calls.
What the methodology misses: (in Claude’s own words)
- Sentiment direction. Frustrated/confused word counts are flat, but that doesn't mean sentiment is flat — those words are too generic. A keyword can’t capture the target of the frustration.
- Defect categorization. I can count "Integration Partner X Mentions" mentions but not split "X integration broken" vs "evaluating X as a competitor" vs "X is fine, talking about something else."
- Quote extraction. Surfacing the actual damning customer line requires reading the transcript manually for every flagged call.
- Aggregation across accounts. "What are top 5 themes driving churn risk" is a one-line SQL query, but how do you determine what counts as a churn risk in a conversation? Without running an LLM over the entire transcript, you’re just using some key word search heuristic.
-
Realistic conclusion: the lexical approach gets you the integration health questions (which CRM/help desk is the friction) and the vocabulary tracking questions (rebrand, named competitors) cheaply. It can't replace enrichment for sentiment, defect taxonomy, or churn-driver attribution - those genuinely need the LLM pass.





