Blog
6 behavioral segmentation examples for data teams
Most behavioral segmentation content is written for marketers picking traits from a dropdown. This is for the people writing the SQL.

Most articles about behavioral segmentation start with a marketing team picking audience traits from a dropdown menu. But if you're the person writing the SQL that defines those segments, maintaining the dbt models that power them, or fielding Slack messages asking why two dashboards show different numbers for "active users," the work looks very different.
Behavioral segmentation groups people by what they do: purchase patterns, login frequency, feature adoption, lifecycle stage. These signals live in your warehouse as event tables, transaction logs, and session records. Turning that raw data into reliable, queryable segments is where data teams spend their time.
Here are six patterns you can use, from RFM scoring in SQL to churn detection models deployed as data apps, plus the governance work that keeps those segments trustworthy as your organization scales.
What behavioral segmentation looks like from the data team's side
In practice, behavioral segmentation means defining cohorts in SQL and Python. You're writing common table expressions (CTEs) with window functions, building dbt models that score users on activity patterns, and keeping metric definitions consistent when five different teams query them.
Most implementations follow a multi-stage CTE pattern: you extract the base data, join according to time, calculate behavioral metrics with window functions, assign scores, and classify users. Whether you're building RFM models in BigQuery or cohort retention in Snowflake, this five-to-seven-stage CTE structure shows up consistently across well-maintained segmentation pipelines.
All six patterns below assume batch processing: dbt models running on a schedule, calculating segments from warehouse data at rest. Real-time segmentation on streaming events is a different set of trade-offs entirely.
The work that tends to bite teams later is keeping "engaged user" consistent across every notebook, dashboard, and data app in your organization three months from now.
1. Purchase behavior segmentation with RFM analysis
RFM analysis (recency, frequency, monetary value) is probably the most widely implemented behavioral segmentation pattern in data warehouses. You're scoring customers based on how recently they bought, how often they buy, and how much they spend.
You'll rely on window functions for metric calculation and either percentile-based or fixed-threshold scoring. Here's the percentile approach, which adapts automatically as your customer base grows:
rfm_percentiles AS (SELECT user_id, date_month, recency, frequency, monetary,PERCENT_RANK() OVER (ORDER BY recency DESC) AS recency_percentile,PERCENT_RANK() OVER (ORDER BY frequency ASC) AS frequency_percentile,PERCENT_RANK() OVER (ORDER BY monetary ASC) AS monetary_percentileFROM rfm_values),rfm_scores AS (SELECT *,CASE WHEN recency_percentile >= 0.8 THEN 5WHEN recency_percentile >= 0.6 THEN 4WHEN recency_percentile >= 0.4 THEN 3WHEN recency_percentile >= 0.2 THEN 2ELSE 1END AS recency_scoreFROM rfm_percentiles)From there, you map score combinations to business segments like Champions, Loyal Customers, At Risk, and Hibernating. You can either embed this mapping in nested CASE statements (which gets messy fast) or use a lookup table that separates segment definitions from scoring logic. The lookup table makes adjustments far easier when the business wants to tweak thresholds without rewriting the core pipeline.
A common mistake is calculating RFM scores only for the current snapshot. Historical tracking — calculating scores for every month — is what reveals the real value. As dbt Labs describes in their historical segmentation docs, tracking how users move between segments over time surfaces early warning signs of churn or opportunities for upsell. That temporal dimension turns a static classification into a lifecycle analysis tool.
2. Cohort retention analysis for usage frequency
Cohort retention tracks groups of users who share a starting condition, usually signup month, and measures how their activity changes over subsequent periods. This pattern answers a question most aggregate metrics obscure: are we getting better at keeping users engaged, or worse?
The SQL pattern groups users into cohorts, then calculates the percentage still active in each subsequent period:
WITH cohorts AS (SELECT DISTINCT user_id,DATE_TRUNC(event_timestamp, MONTH) AS activity_monthFROM `your_project.your_dataset.your_events_table`WHERE event_name = 'purchase'),retention_data AS (SELECT c.cohort_month,c.month_number,COUNT(DISTINCT c.user_id) AS retained_usersFROM cohorts cGROUP BY 1, 2)On the Python side, pandas excels at the pivot and percentage calculations that turn raw counts into retention matrices. The transform method is useful here: it applies an aggregation function but returns results with the same shape as the original DataFrame, which avoids expensive self-joins when adding cohort assignments:
data['cohort'] = data.groupby('CustomerID')['InvoiceDate'].transform('min').dt.to_period('M')The output is a retention matrix that reveals patterns invisible in aggregate metrics. You might find that a specific signup cohort drops off at twice the rate of others, exactly the kind of signal that disappears when you average everyone together.
3. Engagement scoring with statistical weighting
Engagement scoring takes multiple behavioral signals (page views, feature interactions, session duration, visit frequency) and collapses them into a single composite score. The temptation is to assign weights based on gut feeling. A statistically grounded alternative is to use correlation analysis with actual business outcomes to derive justified weights.
Start by calculating correlation coefficients between each user metric and your target outcome (conversion, retention, expansion revenue). Keep only the metrics that correlate at or above the average, then use those coefficients as weights:
Engagement Score = Σ(metric_value × correlation_coefficient)This approach means the engagement score reflects actual business impact rather than arbitrary assumptions about which behaviors matter. Compare "we think feature clicks are worth 10 points" with "feature clicks correlate with 90-day retention, so they're weighted accordingly."
This pattern typically lives in a Python notebook where you can iterate on feature selection and validate the model before deploying the scoring logic as a dbt model in your warehouse. The iteration cycle matters: you'll want to recalculate correlations periodically as your product evolves, since the behaviors that predict retention shift as onboarding flows and features change.
4. Lifecycle stage segmentation from event data
With lifecycle segmentation, you assign users to stages (activated, engaged, at risk, churned) based on temporal patterns in their event data. Each stage needs distinct SQL patterns.
Activation uses time-windowed event completion: did the user complete key onboarding steps within a defined window?
SELECTuser_id,CASE WHEN profile_completed_date IS NOT NULLAND first_action_date IS NOT NULLAND first_action_date <= signup_date + INTERVAL '7 days'THEN 'Activated'ELSE 'Not Activated'END AS activation_statusFROM activation_eventsEngagement tiers look at sustained patterns over time: weeks active in the last 90 days, average weekly actions, and consistency of usage.
Churn risk combines recency (time since last activity) with frequency trends. A user whose activity dropped to half their historical average shows up as medium risk even if they logged in yesterday.
The layered approach that most mature teams use works well here: clean and standardize raw events in staging models, apply business logic in intermediate models to create behavioral metrics, and produce the final lifecycle tables in your marts. This separation keeps lifecycle definitions testable and your staging layer reusable across different segmentation efforts.
5. Product-qualified lead scoring
Product-Qualified Leads (PQLs) use behavioral signals to flag which free or trial users are ready for a sales conversation.
Binary action-based thresholds define specific qualifying actions: completed onboarding + invited team member + created 3+ projects = PQL. This is straightforward to implement in SQL and easy for sales to understand.
Point-based scoring distributes values across multiple events (last login date, time spent in application, feature usage breadth) and sums them to a composite score. Users crossing a threshold (say, ≥80 points) qualify as PQLs.
The data team owns the scoring logic and thresholds. When sales says "we're getting too many unqualified leads," that's a signal to revisit the correlation between your scoring weights and actual conversion outcomes. This feedback loop — sales flags quality issues, data team adjusts the model — is where PQL scoring either earns trust or loses it. And because PQL models sit at the intersection of product changes and go-to-market strategy, they tend to drift faster than other segments. A new onboarding flow can invalidate your activation signals overnight.
6. Churn risk detection through behavioral decline
The goal here is catching users whose engagement is declining before they fully churn. The SQL combines recency thresholds with historical baselines:
SELECTl.user_id,CASE WHEN CURRENT_DATE - l.last_active_date >= 14 THEN 'High Churn Risk'WHEN l.total_events_last_30d < h.avg_daily_events_historical * 0.5 THEN 'Medium Churn Risk'WHEN CURRENT_DATE - l.last_active_date >= 7 THEN 'Low Churn Risk'ELSE 'Healthy'END AS churn_risk_tierFROM latest_activity lLEFT JOIN historical_activity h ON l.user_id = h.user_idThis played out at ClickUp, where the data science team built a churn prediction model and turned it into a data app that business teams could explore directly. That access saved more than $1M in churn as marketing, lifecycle, and customer success teams each used the segments differently. The key was making the model's output explorable, not locking it behind a static dashboard.
The broader pattern holds regardless of tooling: churn risk segments are most useful when they combine recency thresholds with historical baselines and reach the teams who can act on them.
When clustering makes more sense than hand-coded thresholds
All six patterns above use hand-coded rules: you define the thresholds, score the users, and classify them into segments you've already named. That works well when you know what you're looking for. But behavioral data sometimes contains structure you wouldn't think to code for, and that's where unsupervised learning earns its keep.
K-Means, DBSCAN, and hierarchical clustering are the most common choices in production, each with different strengths around cluster shape, noise handling, and granularity. Running multiple algorithms in parallel and comparing results via silhouette scores tends to produce more reliable segments than committing to one approach upfront.
The practical challenge with clustering is validation. Unlike rule-based segments where the business logic is explicit, cluster-based segments need to prove they map to meaningful differences in outcomes. Run your clusters against retention, conversion, or revenue metrics and verify that membership in cluster A actually predicts different behavior than membership in cluster B. If the clusters don't separate on the outcomes you care about, they're statistically tidy but operationally useless.
Where behavioral segmentation breaks down
The SQL patterns are the easy part. What actually breaks behavioral segmentation is the governance work that accumulates around them.
Inconsistent metric definitions are the most common failure mode. Marketing defines "active user" as anyone who logged in this month. Product defines it as anyone who completed a core action. Both teams build segments using their own definition, and leadership is looking at two conflicting dashboards wondering which one is right. The resulting distrust undermines everything built on top.
Stale segments are the second killer. Someone writes a great segmentation query for a one-time analysis. Other teams reference it, reuse it, copy it. Six months later, the underlying event schema has changed, but nobody updated the query. The segment now silently returns wrong results. This is especially common with models that depend on product behavior. A new onboarding flow or pricing change can invalidate your activation signals, and if the segment definitions aren't version-controlled alongside the code that changed the product, nobody catches it until sales asks why lead quality tanked.
Ungoverned ad hoc queries create a third problem. In Hex's State of Data Teams, 31% of data leaders cite trust as their top concern with AI — nearly twice any other concern. When every analyst creates their own version of "engaged customer," local definitions fragment segmentation logic across the organization.
Building segments that stay trustworthy at scale
The fix is treating segment definitions like code: version-controlled, tested, and centrally governed.
A semantic layer gives you this consistency: you define a metric once, and it generates the same SQL regardless of where someone queries it. When you define "active user" in a semantic model, that definition resolves to the same SQL whether it's queried through a notebook, a data app, or a conversational analytics interface. As dbt Labs explains, the approach treats data definitions like software code. That opens up collaboration and change tracking.
In practice, this means your behavioral segment definitions live as YAML alongside your dbt models, with Git tracking every change. When someone asks why "at-risk users" grew 15% last week, you can trace the definition to its source, check the commit history, and verify nothing changed in the underlying logic.
Matching governance to where your team is
The level of governance you need depends on where your team is. Some teams start by endorsing specific tables and adding warehouse descriptions so AI tools know which data to use. Others add workspace rules that shape how people write queries. As definitions mature, semantic models (authored directly or synced from dbt, Cube, or Snowflake) formalize those definitions. If you're working in Hex, your definitions from dbt MetricFlow, Cube, or Snowflake sync in automatically, so the same governed logic resolves whether you're querying in a notebook or a stakeholder is asking a question through Threads.
Before you formalize anything, you're usually testing dozens of behavioral hypotheses. Instead of writing each one by hand, Hex's Notebook Agent generates SQL and Python as part of a multi-step analysis: proposing segment boundaries, iterating on definitions, and producing work you can inspect, edit, and validate at every step.
What tends to work well is two layers: a global layer where you centralize and version-control core segment definitions, and a local layer where analysts can do exploratory work without those explorations becoming shadow sources of truth.
Making segments actionable beyond the data team
The best behavioral segmentation model is useless if nobody outside the data team can work with it.
Static dashboards showing "Top 5 Customer Segments" tend to generate more tickets than they resolve. Marketers want to test hypotheses: "What if we defined engaged users as 5 logins instead of 3?" Product managers want to filter by acquisition channel. Without exploratory interfaces, each variation becomes a new request.
With data apps, you turn your segments into interactive tools where stakeholders can adjust thresholds, filter dimensions, and export cohorts, all backed by governed data. You maintain control over the underlying definitions and logic; they get self-serve access without filing a ticket.
You can reduce the back-and-forth further with conversational analytics. Instead of building a custom interface for every segmentation question, a stakeholder asks in plain language ("Show me users in the at-risk segment who signed up in Q1") and gets results grounded in the same definitions you already maintain.
Start with definitions so the dashboards stay consistent
Behavioral segmentation is a governance problem dressed up as an analytics one. The SQL patterns are well-documented, and the clustering algorithms are mature. What separates teams that get value from segmentation and teams that create metric chaos is whether they invest in governed definitions first.
Start by defining your most important behavioral segments as code and tracking changes in version control. From there, build interfaces so stakeholders can explore segments without filing tickets, and connect your semantic layer so the same definitions hold whether someone's querying in a notebook or asking a question in plain language.
Sign up for free or request a demo to see how it works.
Frequently Asked Questions
How do you choose between percentile-based and fixed-threshold scoring for behavioral segments?
Percentile-based scoring adapts to your data distribution automatically. As your customer base grows or shifts, segment sizes remain proportional. This works well when you want statistical consistency and don't need business users to intuitively understand the cutoffs. Fixed thresholds ("users who log in 3+ times per week") are more interpretable and can be adjusted in a business intelligence (BI) layer without touching SQL. Most teams start with fixed thresholds because they're easier to explain to stakeholders, then graduate to percentile-based scoring as their segmentation matures and the data team needs segments that hold up over time without manual recalibration.
How often should behavioral segments be refreshed?
Refresh frequency depends on how fast your users' behavior changes and how quickly your business needs to act on those changes. Most teams run daily batch refreshes for core segments like lifecycle stage and churn risk, with weekly or monthly recalculations for slower-moving segments like RFM scores. Segments that go stale pose a much bigger risk than low refresh frequency, especially when someone bookmarks a point-in-time analysis and still treats it as current months later. Building segments as incremental dbt models with scheduled runs can reduce computation costs, but doesn't by itself prevent drift; historical or corrected data may still need full refreshes or custom logic to keep segments reflecting current behavior rather than an outdated snapshot.
What's the minimum data infrastructure needed to start with behavioral segmentation?
You need a data warehouse (Snowflake, BigQuery, Redshift, or similar) with event-level data flowing in reliably, and a transformation layer (ideally dbt) to manage the SQL models that define your segments. The most common failure stems from organizational misalignment, such as two teams defining "active user" differently because nobody wrote down the canonical version, rather than missing infrastructure. Start with one well-defined segment, build it as a tested dbt model, document the business logic inline, and expand from there. The tooling scales more easily than the organizational alignment.