As an analyst, I’ve seen the devastating impact of customer churn first-hand.
It’s like standing on a sinking ship, desperately trying to patch the holes as the water rushes in, only to see more leaks springing up.
And trust me, it’s far from pleasant.
You’re constantly grappling with:
- Understanding why customers are leaving
- Identifying trends and patterns in churn
- Generating actionable insights from complex data
It’s a relentless, nerve-wracking task.
So relentless, in fact, that I almost gave up on data analysis altogether.
My boss at the time was pretty demanding. And this was for a large company with thousands of employees. It was high pressure and Excel just wasn’t cutting it because there was simply too much data.
But then, I realized the power of SQL for customer churn analysis.
And so, I decided to harness it.
Once I dove into SQL, I began to see my job in a whole new light.
No more late nights spent futzing with Excel formulas. No more futile attempts at piecing together disparate data. No more feeling helpless in the face of growing data problems. No more asking more technical co-workers to help me out (yet again).
Instead, I started using SQL to:
- Identify trends and patterns in customer behavior
- Understand the factors leading to churn
- Create actionable strategies to retain customers
- Impress my boss and my co-workers
- Increase my earning potential
You’re no longer left scrambling, trying to understand why your customers are leaving. You’re not guessing anymore. With SQL, you can query your databases, slice and dice your customer data, and zero in on the root cause of churn.
This is what I mean when I say “SQL is a superpower”.
To help you take control of your customer churn, I’ve created this step-by-step guide to Customer Churn Analysis using SQL.
It’s a skill anyone can learn and I’m going to show you the ropes.
Let’s dive in:
Step 1: Setting up your database
Start by creating your tables in PostgreSQL.
The data you need for customer churn analysis typically includes user data and transaction data. Remember, getting this step right is critical, as it lays the foundation for everything that follows. The right data sets the stage for accurate churn analysis.
Pro tip: Ensure that your data is clean and consistent. Bad data = bad insights.
Fortunately, you can create a well-organized database that will deliver the precise insights you need.
Here’s the code to CREATE our tables for this analysis (interactive online SQL code version here)
CREATE TABLE customers ( id SERIAL PRIMARY KEY, signup_date DATE, churn_date DATE ); CREATE TABLE transactions ( id SERIAL PRIMARY KEY, customer_id INT REFERENCES customers(id), transaction_date DATE, transaction_amount NUMERIC );
And here’s the code to populate the tables:
-- Inserting into the customers table INSERT INTO customers (signup_date) VALUES ('2022-10-01'), ('2022-10-15'), ('2022-10-30'), ('2022-11-01'), ('2022-11-15'), ('2022-11-30'), ('2022-12-01'), ('2022-12-15'), ('2022-12-30'), ('2022-12-31'); -- Inserting into the transactions table INSERT INTO transactions (customer_id, transaction_date, transaction_amount) VALUES (1, '2022-10-05', 150.50), (1, '2022-11-07', 120.75), (2, '2022-10-16', 200.00), (3, '2022-10-30', 95.50), (3, '2022-11-20', 110.25), (3, '2022-12-20', 90.00), (4, '2022-11-01', 300.00), (4, '2022-12-01', 250.00), (5, '2022-11-20', 150.00), (5, '2022-12-15', 175.50), (6, '2022-11-30', 85.00), (6, '2022-12-30', 75.00), (7, '2022-12-05', 225.50), (7, '2022-12-10', 210.00), (7, '2022-12-15', 200.50), (8, '2022-12-15', 90.50), (8, '2022-12-20', 80.00), (9, '2022-12-30', 105.50), (9, '2022-12-31', 95.00), (10, '2022-12-31', 300.00), (2, '2022-11-16', 150.00), (2, '2022-12-16', 200.00), (1, '2022-12-05', 180.75), (6, '2022-12-10', 75.00), (4, '2022-11-30', 275.00), (5, '2022-12-25', 200.50), (10, '2022-12-01', 350.00), (8, '2022-12-25', 100.00), (9, '2022-12-10', 120.50);
Step 2: Create a definition of “churn”
You’ll need to clearly define what “churn” means in the context of your business. Is it 30 days without a transaction? 60 days? More?
A well-defined churn metric makes your analysis relevant and actionable.
This is something you’ll need to check in with your boss or your client to make sure you’re on the same page.
Don’t pick a random churn definition. It should align with your business model and customer behavior.
With a clear churn definition, you’re one step closer to the truth.
Other ways to think about customer churn (including some “leading indicators”)
- Subscription Cancellations: canceled subscriptions (which happen over time) are slightly different than transactions (which are one-time events).
- Login Inactivity: if customer logins and engagement are important to your business, then look into how often they are logging in. Low logins = customer churn here.
- Not Using Key Features: This could be a leading indicator (something that happens BEFORE the customer churns). For example, if a customer isn’t using a digital billing feature, that could mean they aren’t really interested in the service and are going to cancel soon.
- Decreased Interaction with Emails/Notifications: If a customer goes silent and stops responding to emails or other communications, that’s another leading indicator of customer churn.
- Negative Feedback or Ratings: Customers who leave negative reviews or give low ratings could be categorized as churned if they do not engage further with your product or service. While not always the case, such customers are likely to churn and might already be looking at alternatives.
But, for our example, we’re going to define churn as 30 days without a transaction.
Step 3: Write SQL queries to identify customers that have churned
Write SQL queries to identify customers who have churned based on your definition.
This step will directly identify the size of the problem.
Pro Tip: Be careful with date calculations. Make sure you’re considering all necessary factors.
Here’s where your SQL skills really start to shine!
SELECT c.id, c.signup_date, MAX(t.transaction_date) AS last_transaction_date FROM customers c LEFT JOIN transactions t ON t.customer_id = c.id GROUP BY c.id HAVING -- Defining churn as 30 days without transaction ('2023-01-15' - MAX(t.transaction_date)) > 30;
Here’s a breakdown of the code:
The SELECT clause defines the columns to be included in the result set.
c.id– This fetches the customer ID from the customers table.
c.signup_date– This fetches the signup date of the customer.
MAX(t.transaction_date) AS last_transaction_date– This gets the most recent transaction date for each customer.
The FROM clause specifies the table (
customers) from which to retrieve data, and assigns it an alias (
c) for easier referencing later in the query.
LEFT JOIN clause:
The LEFT JOIN clause joins the
transactions table (aliased as
t) with the
customers table. The condition for the join is that the
customer_id in the
transactions table should match the
id in the
GROUP BY clause:
The GROUP BY clause groups the results by
customer id, allowing aggregate functions like
MAX to be applied to each group of data.
The HAVING clause is used to filter the results of the query based on a condition applied to the result of an aggregate function. Here it is used to only include customers whose most recent transaction is older than 30 days from the current date.
This essentially identifies the customers who could be classified as “churned” based on the definition of not having any transactions in the last 30 days.
Step 4: Analyze customer churn using SQL
Now you can use SQL to figure out problems with customer churn rates.
You can look for trends and hotspots before they get too out of hand. Don’t overlook small fluctuations! This is the step where you dig into the details to answer questions that your boss never even thought to ask.
Here are a few example analyses you can run, plus the code for each one:
Insight 1: Identify High-Value Churning Customers
You can generate an insight into which customers have the highest transaction amounts but haven’t made a purchase recently. This would help you identify high-value customers at risk of churning.
-- High Value Churning Customers SELECT customer_id, SUM(transaction_amount) as total_spent, MAX(transaction_date) as last_transaction_date FROM transactions GROUP BY customer_id HAVING ('2023-01-15' - MAX(transaction_date)) > 30 ORDER BY total_spent DESC;
Insight 2: Identify Customers With Decreasing Transaction Frequency
Analyze the frequency of transactions for each customer and see if there’s a decreasing trend. Customers whose purchase frequency is declining may be at risk of churning.
SELECT customer_id, COUNT(*) as transaction_count, MAX(transaction_date) as last_transaction_date FROM transactions GROUP BY customer_id HAVING COUNT(*) < ( SELECT AVG(transaction_count) FROM ( SELECT COUNT(*) as transaction_count FROM transactions GROUP BY customer_id ) as txn );
Step 5: Dive deeper with cohort analysis
Use cohort analysis to dive deeper into the data. Break down churn by sign-up cohorts, customer segments, etc.
Cohort analysis offers a granular understanding of churn.
- E-commerce customers can be analyzed based on the month of first purchase to look for trends.
- Course enrollment cohort analysis in higher education. Find when students enrolled and analyze long-term learners.
- Account creation cohort analysis for banks or investment firms. Analyze when customers first signed up
- Patient cohort analysis for doctors and hospitals. This will help your client (or your boss) understand churn trends based on the date of the first appointment.
Keep your cohorts logical and meaningful. Random or too-small cohorts can be meaningless.
Fortunately, you now have the framework to solve all sorts of customer churn problems.
The 5-Steps To Building A Customer Churn Analysis From Scratch in SQL
- Set up your database
- Create a definition of “churn”
- Write SQL queries to identify customers that have churned
- Analyze customer churn using SQL
- Dive deeper with cohort analysis
Remember, you’re not just running some SQL queries. You’re embarking on a journey to improve your business’s customer retention and overall health. So get started, and let SQL guide you to success.
Got an interesting use case that I haven’t mentioned? Let me know!
There are two types of data professionals: action takers and side-liners. Don’t sit on the side lines waiting for an opportunity to fall in your lap. Take action TODAY by following these next steps:
- Run through the code and analysis
- Make some tweaks to see how things change
- Think of other ways you could apply this to your work
- Build a project and add it to your portfolio