Why Rippit Reads 100% of Conversations and Claude + Snowflake Doesn't

Mike Nucci

May 28, 2026

?? min read

Connect Claude to Snowflake and ask, "What are the top three reasons customers are churning?"

‍

You'll get a confident answer that’s, on its face, incredibly compelling, with beautiful graphs and formatting.

‍

But read the fine print. For example, Claude may say, the average transcript is ~34K characters. If I pulled the whole April/May window into context, that's ~340M chars — totally infeasible to read directly. So I need a sampling strategy.

‍

I randomly sampled 50 conversations.
I used key word analysis to select the sample.
I read the first comment of each support case.
I will select one call per account

‍

Why 50? Why key word search? Why the first comment?

‍

Why not 100% of all 10,000 conversations?

‍

It's not the model. It's the tools.

‍

Same brain, different tools

‍

The model is the brain, the tools are the arms and legs.

‍

Claude and Rippit share the same brain.

‍

So does Claude connected to any data source - Snowflake, a helpdesk, a call recorder, a chat platform, a CRM.

‍

What's different is how Rippit can orchestrate the brain.

‍

What happens when Claude talks to Snowflake

‍

Claude writes SQL and Snowflake runs the query. Rows come back - including raw transcripts.

‍

Claude has to load and read the transcripts into its context window and reason over them at query time.

‍

That's the bottleneck.

‍

A transcript could be 2,000 - 10,000 tokens, maybe more. Reading 50,000 of them to answer one question is both technically possible and economically insane.

‍

Claude or any LLM cannot actually load 100% of the transcripts into its context window - so the LLM has to sample the transcripts someway.

‍

The model summarizes the sample and reports back. It takes shortcuts because it has to.

‍

Again - it will give you what seems to be an amazing answer, that may look totally defensible - but when you dig into what Claude actually did - it reveals all the shortcuts it took.

‍

The same thing happens with any conversation source

‍

Different tool, same shape.

‍

Connect Claude to Zendesk. Connect it to Gong. Connect it to Intercom, Salesforce Service Cloud, Front, Dialpad, Slack, Teams, your in-product chats, your agent-to-agent messages. Pick any conversation source.

‍

Zendesk and Gong aren’t data platforms like Snowflake where they can handle 100% analysis on the fly for you so Claude has to sample from those sources and fit what it can in the context window which is a tiny percent of the total conversation volume.

‍

Snowflake has the ability to enrich 100% of conversations but it is not designed for this so it’s too slow to be feasible - and requires more complex internal building.

‍

What Rippit does differently — part one, pre-enrichment

‍

Every conversation that lands in Rippit gets processed once, at ingestion. Topics. Intents. Sentiment. Escalations. Outcomes. Themes. Entities.

‍

We’ve determined a number of data points that the most people will need and pre-enrich the conversations based on what we’ve learned over the past 10 years.

‍

However, we also give customers the ability to build customized pre-enrichment prompts. Given every business is unique - the conversations each one is having is unique - therefore you should have the ability to customize what insights you want out of your data.

‍

You define the questions you actually care about - "did the customer mention a competitor by name," "did the agent quote pricing," "was a regulatory term invoked," "did this conversation contain a renewal objection," “what was our customers’ thoughts on X new product” — and we run those against every single conversation at ingestion.

‍

A custom question becomes a custom dimension. Asked once, answered across 100% of the corpus.

‍

Those become structured dimensions and measures that Rippit can leverage when someone asks a question - this means each question is cheaper and faster.

‍

This is why Rippit is better than the "Claude + warehouse" setup. The generic setup gives you raw text and asks the model to figure out your business at query time, every time, over a sample. Rippit lets you encode your business once and run it over everything.

‍

The per-conversation model cost was paid once, asynchronously, at ingestion. Not on every analytical pass.

‍

If you do take the time to build LLM pipelines, in Snowflake for example, you’ll find the cost wildly different from what it would cost in Rippit. We’re just passing through the cost that the model providers charge us - 1:1, whereas Snowflake marks up these tokens and recognizes that markup as revenue. Therefore, your LLM token costs are always 10-25% higher than using Rippit - when you get hooked and want to get more and more insight out of your data - that 10-25% turns into a much higher annual cost.

‍

What Rippit does differently — part two, on-the-fly enrichment

‍

Pre-enrichment only covers some set of use cases customer ask about. You also need to do -on-the-fly enrichment

‍

When you ask Rippit some questions, the model will need to read the raw transcripts to figure out the answer.

‍

Rippit will run query time question-specific LLM calls over every conversation that has been selected for this analysis.

‍

Pre-enrichment gives you a better baseline than what exists in your CRM or Phone System and on-the-fly enrichment is what gives you depth and 100% question coverage.

‍

Sampling gives you an answer. It doesn't give you the right answer.

‍

This is the part that should bother you most.

‍

When the model samples 50 conversations and tells you the top three reasons customers are churning, it isn't lying. It found patterns. The patterns are real in those 50 conversations.

‍

But it’s not verifiably accurate.

‍

It's anecdotal evidence confidently articulated as analytics.

‍

An LLM with a confident voice and three bullet points feels like a research report. It isn't. It's the equivalent of asking a consultant to talk to a couple customers at random and come back with a strategy memo. The summary will sound smart. The conclusions might even be directionally correct. They also might be completely wrong, and you have no way to know which.

‍

You're going to take this answer into a board meeting. Or a roadmap review. Or a renewal conversation. Decisions get made on it. And the underlying evidence base is fifty conversations out of fifty thousand, picked by a heuristic nobody audited.

‍

Here’s just three examples of where the anecdote-vs-statistics gap actually bites:

‍

Rare events. If 2% of conversations contain a churn signal, a 50-conversation sample catches one of them, maybe. You'll never see the pattern. Compliance violations, executive escalations, regulatory disclosures - exactly the things you can't afford to miss, exactly the things sampling guarantees you'll miss.

‍

Trend detection. Comparing two periods requires comparable coverage. Sample 50 last quarter and 50 this quarter, and the noise swamps any signal smaller than 20 points. With 100%, a 2-point shift in resolution rate is real.

‍

Segmented analysis. "What are enterprise customers saying about pricing?" needs volume inside the segment. Sample globally and filter, and you've got three conversations from your top tier. Sample inside the segment, and you had to know the segment mattered before you sampled — which means you've already lost the ability to ask unanticipated questions.

‍

The tradeoff is real

‍

Nothing comes for free. 100% coverage costs more. We’re not talking about how much we charge but literally costs more in AI tokens.

‍

But you get the answer you can trust and make better decisions with.

‍

The summary

‍

Claude + Snowflake is a brain reaching for a tool that doesn't understand conversations.

‍

Claude + Rippit is a brain reaching for a tool that already does.

‍

Same model. Different tools. Different answers.

‍

Snowflake Example from Claude

‍

‍

Prompt:

Can you use snowflake to tell me how customers are talking about Rippit and how that's changed over the last month or two?

‍

Claude states that it has to sample and is just trying to determine the best approach

‍

So the windows look like:

‍

Last ~2 months (focus): April + May 2026 = 1,099 calls across ~340 distinct accounts
Prior comparable window: Feb + March 2026 = 1,318 calls across ~370 accounts

‍

Average transcript is ~34K characters. If I pulled the whole April/May window into context, that's ~37M chars — totally infeasible to read directly. So I need a sampling strategy.

‍

Where I'd actually start. Cheap signals first to find which calls are even worth opening, then read only those. Roughly:

‍

Stratify by account, not by call. A single big account can produce 30 calls in a month and dominate any naive random sample. I'd partition into buckets — e.g. (a) top-ARR accounts, (b) churn-cohort accounts (which I do happen to know from Salesforce), (c) new logos, (d) everyone else — and sample within each.
Use cheap lexical filters on the raw transcript to find calls that are likely to be about Rippit-as-a-product (vs. random discovery calls, demos, internal syncs that got recorded). Keywords like "Rippit", "MaestroQA", "the product", "feature", "broken", "doesn't work", "love", "cancel", "renewal", "competitor", etc. That gets you from ~1,099 → maybe 300-500 candidate calls.
From candidates, take a stratified random sample small enough to actually summarize — maybe 30-50 calls per window. Pull the first ~5K chars of each, dump into a single context, and synthesize themes by hand (or with one LLM call).

‍

The methodology it chose

‍

Step 1: Scope the windows and drop unattributable calls. April-May 2026 = 1,099 calls, but ~250 of those have no ACCOUNT_ID (internal syncs, recordings that never got matched to a Salesforce account). Drop them — you can't talk about "customers" without knowing which customer. That leaves 852 attributed calls.

‍

Step 2: Pick a role lens. CSM calls are where existing customers talk about the product. SDR/AE calls are pitch motion — different signal, separate analysis. For "what are customers saying about Rippit," restrict to CSM + Head of CS. That's 508 calls in Apr-May.

‍

Step 3: Lexical filters to find candidate calls. Cheap and noisy, but better than random. The big lesson: most sentiment keywords are useless. "Love"/"great"/"amazing" matched 92% of all calls — they're filler. The keywords that do discriminate are concrete: product/brand mentions, named competitors, integration names (Gladly, Kustomer, Zendesk), specific surfaces (auto-QA, coaching, calibration, screen capture).

‍

Step 4: Stratified one-call-per-account sampling. A naive random sample over-weights chatty accounts (Mindbody had 8 product-mention calls in 2 months). Take the most recent call per account so each customer voice gets weighted once.

‍

Step 5: Snippet extraction, not full transcripts. Pull ±400 chars around the first brand mention. ~900 chars per call × 50 calls = 45K chars total — fits in one context window for thematic coding.

‍

Step 6: Track keyword rates month-over-month. Raw counts are misleading because call volumes swing 30%+ month-over-month. Normalize as percent of CSM calls.

‍

What the methodology misses: (in Claude’s own words)

‍

Sentiment direction. Frustrated/confused word counts are flat, but that doesn't mean sentiment is flat — those words are too generic. A keyword can’t capture the target of the frustration.
Defect categorization. I can count "Integration Partner X Mentions" mentions but not split "X integration broken" vs "evaluating X as a competitor" vs "X is fine, talking about something else."
Quote extraction. Surfacing the actual damning customer line requires reading the transcript manually for every flagged call.
Aggregation across accounts. "What are top 5 themes driving churn risk" is a one-line SQL query, but how do you determine what counts as a churn risk in a conversation? Without running an LLM over the entire transcript, you’re just using some key word search heuristic.
‍

Realistic conclusion: the lexical approach gets you the integration health questions (which CRM/help desk is the friction) and the vocabulary tracking questions (rebrand, named competitors) cheaply. It can't replace enrichment for sentiment, defect taxonomy, or churn-driver attribution - those genuinely need the LLM pass.

‍

Check out the video of the second attempt of the prompt using the Snowflake MCP – the outputs still highlight the limitations outlined above:

Great AI Products

Great AI products shouldn't require a services team to set up—if you can’t learn it yourself in 10 minutes, the product is failing you.

Every software company is building AI into their product. As we embark on the Rippit journey, something we have spent a lot of time thinking about over the last 2-3 years is: what are the attributes of great products?

‍

In the technology industry, there has been a lot of talk about Forward-Deployed Engineers (FDE). These are people who work with the customers to customize the product to a customer's needs.

‍

What they are primarily doing is building prompts for LLMs on behalf of customers.

‍

This sounds like a great deal for the customer, and it often leads to customers giving their business to the company that offers more services and help.

‍

Our pre-Rippit experience suggests that this is a death trap for the company and customer within 24-36 months.

‍

When a software company is small, early customers get the best employees as Forward-Deployed Engineers, and because the company is desperate for early customers, they also under-charge for the human resources they provide. Both of these things have to change for the software company to succeed over time.

‍

The average customer loses the best FDE involved in their success. The average customer has to pay more for people resources.

‍

Because the company has committed to a strategy where humans fill in product gaps, the product gets harder and harder to use over time.

‍

What ends up happening is only the most important customers have good customer experiences.

‍

I’ve lived this. I don’t think you can offer the FDE model to deliver a high quality customer experience unless the customer is paying $500,000/Year (±$250,000). It’s very hard to hire enough high-quality Forward-Deployed Engineers to match your growing customer count.

‍

This is why I think Great AI Products have to be easy enough to use that a customer could learn them on their own. It is the most important criterion when judging a Great AI Product.

‍

That’s why we’ve committed to this strategy at Rippit — we’re not where we want to be yet, but we are making progress.

‍

This also seems to be the defining attribute of the best software products we use internally, like Figma, ChatGPT, Claude, Ramp, Cursor, Snowflake, AWS, and more.

‍

We felt even higher conviction when we evaluated the alternatives for Voice of Customer software, Quality Assurance software, Experience Management software, and Conversation Intelligence software. All of them require FDEs — the signal was that they all had minimum price points of $25,000 to $50,000. That is way too much friction for many potential customers.

‍

I’m not sure a single one lets you sign up for the product without talking to a human, which is often another signal of a complex product. The ones I researched were Qualtrics, Medallia, Enterpret, Chattermill, Unwrap, Loris, Level AI, Observe.AI, Cresta, MaestroQA, Balto, and unitQ.

‍

I think Great AI Products have to be so easy to use that you can sign up and learn them yourself within 10 minutes.

‍

Not everyone in the industry agrees with me — in fact, I think most won’t.

‍

Some will argue it’s different for products selling to B2B or certain industries.

‍

Some will argue that companies with FDEs will use the learnings from customers to build an easier product. The reality is it’s very hard to keep the plane flying just as fast while building an easier-to-use product — and it requires a different product culture. It requires a founder to potentially slow growth down and rearchitect everything, and only the exceptions have the courage to do that.

‍

Everyone will admit that deploying AI in all situations will get easier over time — similar to how making a website got easier from the 1990s to 2020.

‍

There might be an exception but I’m not letting the exception be the rule.

‍

Lastly, similar to how it’s harder to write fewer words than many words to get a point across, it’s harder to make easy-to-use software than hard-to-use software. I think people who argue for complexity are often scared to step up to the engineering challenge.

Traditional SaaS are workflow apps, AI demands data and infra apps

Traditional SaaS companies are workflow tools, but AI demands data apps. This shift is an extinction event for 99% of engineering cultures—and a forced evolution for founders who think they can stay non-technical.

Explore All

Where conversations become

insights

actionable data

business intelligence

enterprise visibility