Hello ,
Here’s a question that sounds simple but isn’t:
“Which customers have booked the same type of expedition more than once?”
You can’t answer this with a regular JOIN between two different tables — the information lives in one table. You need to compare rows within that table to find patterns.
That’s what a self-JOIN does. It joins a table to itself.
It sounds unusual, but once you see the pattern, you’ll recognize situations where it’s exactly what you need.
The Business Problem
Marcus, your VP of Operations, wants to understand customer loyalty at a deeper level than just “repeat customers.” He wants to know: which customers are so loyal to a specific expedition type that they’ve booked it three or more times?
This matters because a customer who booked three photography tours isn’t just a repeat customer — they’re a brand advocate for that category. They might respond to premium photography packages, refer friends, or provide testimonials.
The Query: Finding Category Loyalists
Here’s what I ran against the Summit Adventures database (the fake adventure tourism company I created to help people learn business analytics):
-- Find customers who booked the same expedition type 3+ times
SELECT
c.first_name || ' ' || c.last_name AS customer_name,
e.expedition_type,
COUNT(DISTINCT b.booking_id) AS times_booked,
SUM(p.amount) AS total_spent_on_type
FROM customers c
INNER JOIN bookings b ON c.customer_id = b.customer_id
INNER JOIN expedition_instances ei ON b.instance_id = ei.instance_id
INNER JOIN expeditions e ON ei.expedition_id = e.expedition_id
INNER JOIN payments p ON b.booking_id = p.booking_id
WHERE b.status IN ('completed', 'confirmed')
AND p.payment_status = 'completed'
GROUP BY c.customer_id, c.first_name, c.last_name, e.expedition_type
HAVING COUNT(DISTINCT b.booking_id) >= 3
ORDER BY times_booked DESC, total_spent_on_type DESC;
Results:
| Customer | Expedition Type | Times Booked | Total Spent |
|———-|—————-|————-|————-|
| Alexandrie Ullrich | Hiking | 3 | $18,643 |
| Sébastien Marie | Photography | 3 | $8,707 |
| Sylvie Seguin | Cultural | 3 | $7,535 |
| Agathe Molina | Safari | 3 | $7,398 |
| David Gutknecht | Cultural | 3 | $6,584 |
| Georges Jones | Photography | 3 | $4,753 |
| Anthony Warmer | Photography | 3 | $4,344 |
| Ilse Lévy | Hiking | 3 | $1,778 |
Eight customers have booked the same expedition type three or more times. That’s real loyalty — and real business intelligence.
Alexandrie Ullrich has spent over $18,600 on hiking alone. That’s a customer who should be getting early access to every new hiking expedition, a direct email from a guide, and perhaps an invitation to a premium experience.
Now: The Self-JOIN Pattern
The query above uses GROUP BY and HAVING — powerful, but it doesn’t compare individual records to each other. Let’s look at where self-JOINs shine: comparing rows within the same table.
Business question: Which guides have worked together on the same expedition instance? This tells operations which guide pairings work well and which guides have experience collaborating.
-- Find guide pairs who've been assigned to the same trip
SELECT
g1.first_name || ' ' || g1.last_name AS guide_1,
g2.first_name || ' ' || g2.last_name AS guide_2,
COUNT(DISTINCT ga1.instance_id) AS instances_together
FROM guide_assignments ga1
INNER JOIN guide_assignments ga2
ON ga1.instance_id = ga2.instance_id
AND ga1.guide_id < ga2.guide_id
INNER JOIN guides g1 ON ga1.guide_id = g1.guide_id
INNER JOIN guides g2 ON ga2.guide_id = g2.guide_id
GROUP BY g1.guide_id, g1.first_name, g1.last_name,
g2.guide_id, g2.first_name, g2.last_name
ORDER BY instances_together DESC
LIMIT 10;
Results:
| Guide 1 | Guide 2 | Trips Together |
|———|———|—————|
| Lewis Morris | Aenne Hahn | 2 |
| Laurent Neveu | Margot Ramirez | 2 |
| Lewis Morris | Rosalie Church | 2 |
| Lewis Morris | Katy Losekann | 2 |
| Kristina Cook | Joyce Holmes | 2 |
| Laurent Neveu | Duncan Conradi | 2 |
Lewis Morris has worked with several different guides across multiple trips. Laurent Neveu is another frequent collaborator. These are the guides you’d pair with newcomers — they have experience working in teams.
How the Self-JOIN Works
The key is this part:
FROM guide_assignments ga1
INNER JOIN guide_assignments ga2
ON ga1.instance_id = ga2.instance_id -- Same trip
AND ga1.guide_id < ga2.guide_id -- Different guides
We’re joining `guide_assignments` to itself. Same table, two aliases (`ga1` and `ga2`).
`ga1.instance_id = ga2.instance_id` — matches guides assigned to the same expedition instance
`ga1.guide_id < ga2.guide_id` — ensures we don't count a guide paired with themselves, and avoids duplicates (Lewis-Aenne only appears once, not also as Aenne-Lewis)
That `<` instead of `!=` is a subtle but important detail. Using `!=` would give you both (Lewis, Aenne) and (Aenne, Lewis). Using `<` gives you each pair exactly once.
The Self-JOIN Pattern
Here's the general structure:
-- Compare rows within the same table
SELECT
a.column_value AS record_1,
b.column_value AS record_2,
-- comparison metrics
FROM table_name a
INNER JOIN table_name b
ON a.shared_attribute = b.shared_attribute
AND a.id < b.id -- Avoid duplicates and self-matches
Three components:
1. Same table, two aliases — give the table two names (a, b) so SQL can tell them apart
2. Shared attribute — what connects the rows you want to compare (same city, same trip, same department)
3. De-duplication — `a.id < b.id` prevents self-matches and duplicate pairs
When to Use Self-JOINs in Business
Finding customers in the same city:
“Which of our high-value customers live in the same city?” → Useful for local events and meetups.
Comparing employees:
“Which team members started in the same month?” → Useful for cohort mentoring programs.
Product pairings:
“Which products are frequently bought together?” → Useful for bundle pricing.
Sequential events:
“Which bookings happened within 30 days of each other for the same customer?” → Useful for understanding purchase velocity.
Common Mistakes to Avoid
Mistake 1: Forgetting the de-duplication condition
Without `a.id < b.id`, you'll get every pair twice and every row paired with itself. Your result set explodes in size and your numbers are wrong. Mistake 2: Using a self-JOIN when GROUP BY would work
If you’re counting “how many times did X happen?” — that’s GROUP BY + HAVING. Self-JOINs are for comparing one row to another row. Use the right tool.
Mistake 3: Not limiting results
Self-JOINs can produce enormous result sets (every possible pair of rows). Always add filters and LIMIT while developing your query.
Try This at Your Job
Think about comparisons within a single dataset:
HR: Which employees in the same department have overlapping job titles? (Role duplication audit)
Sales: Which deals in the same quarter had the same sales rep? (Workload analysis)
Support: Which tickets from the same customer came in within 24 hours? (Escalation patterns)
The self-JOIN pattern works whenever you need to find relationships between rows in the same table.
Until next time,
Brian ([say hi on twitter!](https://twitter.com/briangraves))
—
P.S. Multi-table analysis is the core of Module 4 (The FBI Evidence Board Blueprint) in SQL for Business Impact. The course builds from simple JOINs to complex self-JOINs, giving you the full toolkit for connecting data across any source. Check it out at [sqlforbusinessimpact.com](https://sqlforbusinessimpact.com).
P.P.S. What kind of “same-table comparison” would be most useful at your company? Hit reply and let me know — these are fun puzzles to solve. I read every response.
Want More SQL Insights Like This?
Join thousands of analysts getting weekly tips on turning data into business impact.
