Cohort Analysis a detailed guide

20250102.1

May 22, 2025

Fair warning: this one might be a longer article. I wanted to focus some time on talking about cohort analysis, which seems to be an issue that a lot of my startups have encountered recently.

Firstly, we need to determine what cohort analysis is for. Typically, we use it for retention, and so the terms are being used interchangeably. However, cohort analysis and retention analysis are two separate things. If anything, cohort analysis is a tool that we can use to analyze retention.

The reason why retention is important is, generally speaking, the vast majority of businesses out there spend a significant amount of money on acquiring new customers. If these customers stay with the business, then for every period that they stay with the business, you are saving on the marketing spend to acquire them in the first place. For example, if I have a customer who spends $100 a month and it costs me $100 to acquire this customer in the first month, then for the first month that this customer is with me, I make no money. It's only in the second month, when I no longer have to spend $100 to acquire the customer, that I make revenue on that particular customer. This is the fundamental unit economics underpinning retention, and this, in turn, underpins a crucial fundamental logic that all startups need to overcome and address.

Now that we've established this foundation, what we need to think about is how to accurately calculate the retention rate. What I've encountered recently is a lot of startups trying to do it the quick, dare I say, lazy way, which is they start with a period and then in the following period they do a calculation of how many people were retained, or the change between the two numbers. However, this is a blunt instrument and not particularly accurate, and does not take into account detail. This measure of retained customers actually measures the change in active customers over a period. It provides us with extremely limited information to determine how to improve the number: it ignores customers that reactivate; it ignores customers that attrite out and are offset by increasing customers coming in. In this sense, you do not get an accurate idea of what the retention actually is like.

I then see the follow-on version of this retention analysis, where the company might only calculate it on new customers, so new customers versus existing customers. Again, it is a simple metric, easy to calculate, but overly simplified for what we are trying to do.

The more accurate way to do it is to do what's called a cohort analysis. The cohort analysis typically segments individual groups of users by time period and analyzes each individual group separately from the other groups. What you'll typically see is some kind of waterfall chart, such as the one shown.

This is a really nice cohort analysis table by monthly intervals. From one of my portfolio companies. you can see some of the pattern characteristics mentioned below if you look closely

The benefit of this kind of chart is that you can see how a cohort that has come in through a period changes over time. Typically, they all tend to decline over time and then reach a bottom plateau. Typically, the way it's done is from the month that the cohort is entered into the system, and then adding additional months subsequently. So what this kind of chart actually is telling us is for how many months does this cohort retain its users.

here’s a simple cohort analysis done by a friend of mine on linkedin, as you can see, it doesn’t need to be totally detailed, just sufficient enough that we can understand and take action. the lesson here, if this level of detail is sufficient for you to make decisions, the additional detail below may be excessive.

I have done a different version of this cohort analysis chart in the past, where the horizontal row represents the cohort group and the vertical columns are calendar months, so this version allows us to account for any seasonality that might occur and be observed across multiple cohorts simultaneously, such as a Christmas spike or a New Year's spike.

Often, the cohort analysis measures the level of engagement over a period. However, we can apply the same logic to transactions, interactions, logins, and any other metric. What we can also think about in this sense is not just the retention rate of the user in terms of their engagement but also the retention rate in terms of activity, such as purchases. So that means that we can also have multiple cohort analyses covering the retention rate versus the depth of engagement. In other words, this will allow us to actually understand the conversion rate of different cohorts for different actions over time.

So the first thing that I think is often ignored is the time frame. Typically, these cohort analyses are done on a monthly basis. However, I think there's more nuance to this. What if you were doing cohort analysis on mortgages or cars, something with a very, very long purchase cycle? It would not make sense to measure this on a monthly basis because the purchase behaviour doesn't occur on a monthly basis. If you did this, your churn rate would look like 100%.

here’s an example of the timeframe impact, in this we see that weekly and bi-weekly purchasers make up only 20%, monthly and bi-monthly account for 35-ish percent, if we set to monthly or quarterly, our evaluations would be inaccurate. we need to establish what the ideal should be first, then use that timeframe.

What makes more practical sense is to try and understand what the natural retention rate might look like. As a practical example, we have one company where the churn analysis is done on a quarterly basis instead of the typical monthly basis, simply by virtue of the fact that their business pattern occurs in that way.

Next, we also need to think about layering in the different types of users. While it might be convenient to look at the entire group of customers as a single group and then do the cohort analysis on that basis, we find that many startups often have different profiles of customers, and the behaviors of those different profiles also change over time. So it makes more sense for us to have separate cohort analyses for different groups of customer profiles as well. You might find that the small to medium enterprise cohort analysis will result in different patterns to the enterprise-level customer, and that will also differ from the solo entrepreneur customer.

Less commonly, but still relevant, you might also find that the cohort analysis, depending on the product type, may also differ.

Once you've segmented the different profiles of cohorts and the time periods that you want to analyze the data for, the next thing is to actually look at the chart itself. What you'll typically see will be this triangle overlaid with a heat map, and what you want to look for are four different types of stripes: the vertical stripe, horizontal stripe, diagonal, and then outlier blocks.

Each of these gives us a different signal as to what might be happening and where to further investigate. A vertical stripe on an elapsed time cohort analysis would tell us that there's a specific stage at a specific elapsed time in every single cohort where something happens. So an example might be if you have a free trial of 2 weeks, and then you see a drop-off for every single cohort after those 2 weeks. At the same time, you can have a vertical stripe pattern occur on a calendar cohort analysis. In this instance, you might see that regardless of cohort, every December results in a spike, indicating a seasonal pattern.

When analyzing a vertical stripe pattern for elapsed time, given that each row represents a cohort, improvement in that particular stage would result in the column showing an improving conversion rate over time. So for the earliest cohort, you might see a conversion rate of 20%, and subsequent cohorts will show an improving conversion rate for the same column. So the same column at the end might show a 50% conversion rate.

When analyzing the vertical stripe pattern for the calendar, comparing each of the calendar columns over time, you would be able to see a similar change in conversion rate.

Turning now to the horizontal stripe: this is where we are analyzing the change over elapsed time or over a calendar month for a singular cohort. This allows us to see whether this particular cohort is behaving similarly or differently from other cohorts. We might be able to determine that an outlier cohort with a different pattern indicates something different in the way that we acquired these customers or engaged with them. It's less common for us to do a calendar cohort analysis for a horizontal stripe; however, doing the calendar analysis on a horizontal stripe may allow us to understand how seasonality affects that cohort relative to other cohorts.

The diagonal stripe pattern is one that we see with the elapsed time cohort analysis; we do not see it in the calendar analysis. The reason for this is because the diagonal stripe actually indicates the calendar analysis, taking into account every subsequent new cohort is one period delayed. So typically, you can avoid doing a calendar analysis and lean against using the elapsed time analysis using a diagonal stripe, unless the information is too noisy and the diagonal stripe pattern has been obscured; then you might apply the calendar analysis in order to try and understand the information differently. It's important to note that the diagonal stripe pattern can also occur with both seasonal events, but also company events, such as the release of a new feature, the change in the marketing campaign, or the implementation of a paywall.

Outlier blocks, the last analysis pattern for us to observe, occur when there are unusual anomalies that do not persist across multiple cohorts. These often reflect special campaigns or unique tactical activities. In this way, it's important that you are accurately cataloging changes and how they relate to cohorts so that when you observe these blocks, you're better able to understand what's causing them. It's also worth bearing in mind that these blocks may be seasonality in disguise if you're using an elapsed time analysis, so switching to a calendar analysis is an effective way to check.

Lastly, I want to highlight that a lot of this article was heavily influenced by this particular article that I came across: https://www.reforge.com/guides/basics-of-cohort-analysis-user-engagement-and-churn.

In conclusion, cohort analysis is a valuable tool for businesses to measure and understand user retention and engagement. By segmenting users into cohorts and tracking their behavior over time, businesses can identify trends, patterns, and areas for improvement. However, to get the most out of cohort analysis, it's important to tailor the analysis to the specific business and user base, and to carefully interpret the results. By following the best practices outlined in this article and leveraging the insights provided by cohort analysis, businesses can make data-driven decisions to improve user retention, optimize marketing strategies, and ultimately drive growth.

Founders Curve

Discussion about this post