Fraud is a multi-billion-dollar business problem.
And fraud analytics is something I’ve dealt with in my 15+ years of working in data my fair share of times.
But one project really sticks out in my mind.
Business leaders were constantly dealing with hackers exploiting weaknesses in our security system. It felt like we were at war with invisible enemies, and we were losing. Money walking right out the back door. And I was on the hook to get to the bottom of it.
It was not JUST the monetary loss from these fraud incidents that was a concern.
The larger issue was the strain it was putting on our team:
- The ever-growing list of fraudulent transactions we had to keep track of
- The stress from having to constantly look over our shoulders for the next attack
- The sinking morale of the team, knowing that they were always one step behind the fraudsters
The workload was just too much to bear. We were fighting an uphill battle.
The situation was so dire, it felt like we were on the brink of throwing in the towel and letting the fraudsters have their way. It was an unbearable thought but one we found ourselves considering more and more each day.
But then, I stumbled upon the potential of SQL in detecting and preventing fraud.
And the results? Game-changing.
With SQL, we could:
- Analyze massive datasets with ease
- Easily spot patterns and trends
- Proactively identify potential fraud before it occurred
Suddenly, the battle against fraud didn’t seem so unwinnable.
No more drowning in an endless sea of data. No more sleepless nights worrying about the next attack. No more defeated conversations in the break room.
Instead, we had a powerful tool that transformed our approach:
- Proactive, not reactive
- Data-driven, not guesswork
- Empowered, not overwhelmed
So I’ve put together a comprehensive guide on how to leverage SQL for fraud detection
I’m sharing the exact methods we used to turn the tide in our battle against fraud.
(by the way, this newsletter was inspired by a survey response from a subscriber like you. Can you take 5 minutes to add your voice? Thanks so much!)
For this case study, pretend you are a data analyst at the ultra-popular automobile company, Drezzla Autos.
Despite having cutting-edge electric vehicles, Drezzla Autos is plagued with fraudulent transactions on its e-commerce platform. Your manager wants you to take charge, find out if there are recurring patterns in these fraudulent transactions, and nip this problem in the bud.
SQL to the rescue.
Step 1: Database Setup
Before you begin querying, you need a database. Let’s create four tables – users, products, transactions, and fraud_report.
Follow along using the sample code for this case study
-- Creating Tables -- Creating users table CREATE TABLE users ( user_id SERIAL PRIMARY KEY, user_name VARCHAR(100), account_created TIMESTAMP, location VARCHAR(100) ); -- Creating products table CREATE TABLE products ( product_id SERIAL PRIMARY KEY, product_name VARCHAR(100), price DECIMAL(8,2), category VARCHAR(50) ); -- Creating transactions table CREATE TABLE transactions ( transaction_id SERIAL PRIMARY KEY, user_id INT REFERENCES users(user_id), product_id INT REFERENCES products(product_id), transaction_date TIMESTAMP, transaction_amount DECIMAL(10,2), is_fraud BOOLEAN ); -- Creating fraud_report table CREATE TABLE fraud_report ( report_id SERIAL PRIMARY KEY, user_id INT REFERENCES users(user_id), transaction_id INT REFERENCES transactions(transaction_id), report_date TIMESTAMP, reason VARCHAR(200) );
Here’s what the tables look like:
And here’s the code to populate the tables:
-- Inserting Sample Data -- Inserting records into users INSERT INTO users(user_name, account_created, location) VALUES ('Lionel Mersey', '2022-06-20 08:35:44', 'Buenos Aires, Argentina'), ('Jenson Mutton', '2022-03-14 09:17:22', 'London, United Kingdom'), ('Machael Jaxon', '2021-12-11 10:12:01', 'Los Angeles, USA'), ('Bretony Spares', '2022-09-30 07:28:19', 'Las Vegas, USA'), ('Nelson Dandela', '2023-01-22 06:50:13', 'Johannesburg, South Africa'), ('Malcom Ex', '2021-10-27 14:20:45', 'New York, USA'), ('Cristiano Renaldo', '2023-02-17 11:07:33', 'Madrid, Spain'), ('Serena Walliams', '2022-08-11 12:58:42', 'Paris, France'), ('Usane Colt', '2023-03-28 15:30:56', 'Kingston, Jamaica'), ('Simone Bails', '2023-05-13 08:00:00', 'Texas, USA'); -- Inserting records into products INSERT INTO products(product_name, price, category) VALUES ('Galactic Glider', 5000.00, 'Space Tourism'), ('Nebula Navigator', 10000.00, 'Space Tourism'), ('Star Sprinter', 8000.00, 'Space Tourism'), ('Ocean Explorer', 2000.00, 'Adventure Sports'), ('Summit Conqueror', 1500.00, 'Adventure Sports'), ('Tidal Thrasher', 1200.00, 'Adventure Sports'), ('Velocity Viper', 75000.00, 'Car Racing'), ('Thunderbolt Torpedo', 85000.00, 'Car Racing'), ('Blitz Bruiser', 60000.00, 'Car Racing'), ('Mach Mayhem', 95000.00, 'Car Racing'); -- Inserting records into transactions INSERT INTO transactions(user_id, product_id, transaction_date, transaction_amount, is_fraud) VALUES (1, 7, '2023-01-02 15:30:00', 75000.00, FALSE), (2, 1, '2023-01-10 16:45:00', 5000.00, FALSE), (3, 9, '2023-01-25 17:00:00', 60000.00, TRUE), (4, 2, '2023-02-01 08:00:00', 10000.00, TRUE), (5, 3, '2023-02-15 09:15:00', 8000.00, FALSE), (6, 4, '2023-03-03 10:30:00', 2000.00, FALSE), (7, 5, '2023-03-18 11:45:00', 1500.00, FALSE), (8, 6, '2023-04-02 13:00:00', 1200.00, FALSE), (9, 8, '2023-04-20 14:15:00', 85000.00, TRUE), (10, 10, '2023-05-01 17:45:00', 95000.00, FALSE); -- Inserting records into fraud_report INSERT INTO fraud_report(user_id, transaction_id, report_date, reason) VALUES (3, 3, '2023-02-02 10:00:00', 'Transaction from unknown device'), (4, 4, '2023-04-21 11:15:00', 'Multiple transactions in short duration'), (9, 9, '2023-05-17 09:15:00', 'Multiple payment decline');
The rest of the steps are analyzing the transactional data to discover insights for the company.
Step 2: When Do Fraudulent Transactions Occur The Most?
A good place to start with business fraud analysis is WHEN fraudulent transactions occur.
Are they more likely to occur at a particular time of the day or week, or perhaps during a specific month or season?
By answering the WHEN question, you can solve a bunch of gnarly problems:
- Determine if fraud is more likely during certain hours of the day.
- Focus more attention on fraud monitoring during those peak times.
- Uncover ideas for further investigation on why fraud is more prevalent during these hours.
Follow along using the sample code for this case study
SELECT EXTRACT(HOUR FROM transaction_date) AS transaction_hour, COUNT(*) as fraud_count FROM transactions WHERE is_fraud = TRUE GROUP BY transaction_hour ORDER BY fraud_count DESC;
This query groups all fraudulent transactions by the hour of the day, and then counts the number of fraudulent transactions for each hour.
The result will tell us during which hours of the day fraudulent transactions are most prevalent.
Results
transaction_hour | fraud_count |
---|---|
14 | 1 |
8 | 1 |
17 | 1 |
Nice work!! Let’s keep going.
Step 3: What Products Are Most Often Associated With Fraudulent Transactions?
Next, we can turn our attention to our products.
Let’s find out if there are specific items that fraudsters often target.
Follow along using the sample code for this case study
SELECT p.product_name, COUNT(*) as fraud_count FROM transactions t JOIN products p ON t.product_id = p.product_id WHERE t.is_fraud = TRUE GROUP BY p.product_name ORDER BY fraud_count DESC;
This query joins the transactions table with the products table to get the product names and then groups the fraudulent transactions by product.
It counts the number of fraudulent transactions associated with each product.
The result will tell us which products are most often associated with fraudulent transactions.
Results
product_name | fraud_count |
---|---|
Nebula Navigator | 1 |
Blitz Bruiser | 1 |
Thunderbolt Torpedo | 1 |
From here, we can take it one step further to:
- Identify which products are most targeted by fraudsters.
- Find out how much we’re losing to fraudulent transactions.
- Help our team to tighten security measures for those products.
- Provide insights for further investigation on why these specific products are more attractive to fraudsters.
Step 4: Are There Specific Users Who Are Frequently Associated with Fraudulent Transactions?
Now comes the challenging part.
You can now dig deeper and explore the possibility that certain users might be repeatedly involved in fraudulent transactions.
Let’s build some SQL to find any recurring patterns in user behavior that might indicate fraudulent activity.
Follow along using the sample code for this case study
SELECT u.user_name, COUNT(*) as fraud_count FROM transactions t JOIN users u ON t.user_id = u.user_id WHERE t.is_fraud = TRUE GROUP BY u.user_name HAVING COUNT(*) > 1 ORDER BY fraud_count DESC;
This query joins the transactions table with the users table to get the usernames and then groups the fraudulent transactions by user.
It counts the number of fraudulent transactions associated with each user.
The HAVING clause ensures we only consider users who have more than one fraudulent transaction.
The results will tell us which users are repeatedly involved in fraudulent transactions.
user_name | fraud_count |
---|---|
Machael Jaxon | 2 |
Bingo.
Now we can dig even deeper.
- Find users who are repeatedly involved in fraudulent transactions.
- Provide evidence to take appropriate action against those users.
- Start figuring out whether these users are part of larger fraud networks.
The queries you’ve built here will help the company get to the bottom of the fraud, for sure.
Quick recap of what you learned in this guide:
- Set up your database
- Analyze when fraud happens
- Find which items are most at risk for fraud
- Take a look at potential users that might be taking advantage of vulnerabilities
By taking these steps, you help the team at Drezzla Autos establish patterns, take preventive measures, and secure their e-commerce platform, ultimately reducing fraudulent transactions and boosting their bottom line.
Well done!
Take action
There are two types of data professionals: action-takers and side-liners. Don’t sit on the sidelines waiting for an opportunity to fall in your lap. Take action TODAY by following these next steps:
- Run through the code and analysis
- Make some tweaks to see how things change
- Think of other ways you could apply this to your work
- Build a project and add it to your portfolio
Do you have more questions? Let me know by filling out the 5-minute survey!