Every time I speak to sales managers, they share the same complaint.
“I have so much sales data, but I don’t know what to do with it.”
Sound familiar?
Drowning in numbers:
- Monthly targets
- Lead conversion rates
- Quarterly revenue figures
A relentless barrage of data. Impossible to keep up with.
And even if you somehow manage, interpreting that data to make meaningful decisions? A colossal task.
You’ve considered learning programming, and diving into Python, perhaps. But even the thought of setting up complicated software, dealing with installations, and debugging code is daunting.
It’s a complete nightmare. And it doesn’t feel worth all the hassle.
Never feel overwhelmed with data again
In this guide, I break down the entire process into 3 manageable steps.
The best part?
No installations.
No fuss.
Just pure, actionable steps.
Here’s what’s waiting for you:
- Quickly generate insights from your sales data
- Guide your sales team based on real numbers
- Make proactive decisions to drive business goals
Sounds like a dream, right?
The D.A.T.A Framework
Over the past 15+ years of my data career, I’ve followed the same 4 steps for every new project. And I call it the D.A.T.A. Framework. Clever, I know. 😉
- Define: Start by defining sales objectives and the key metrics that matter.
- Acquire: Gather the data you need. This might involve exporting data from different systems or tools. You also want to gather requirements at this step.
- Transform: Convert raw data into a format suitable for analysis. Clean up inconsistencies, handle missing values, and structure your data to make your life easy.
- Analyze: Use statistical tools, charts, and graphs to study trends, compare sales figures, and deduce patterns that can inform business strategies.
Don’t overcomplicate it! Just keep that framework in mind (like I do) to keep things simple.
👉 btw if you want me to share more about this framework (and others that I use all the time) just let me know!
Okay, let’s apply the D.A.T.A. Framework to solve the sales manager’s problem with Python.
Remember: there are only two types of data professionals: action-takers and sideliners.
This is where you decide if you’re an action-taker or a sideliner.
Use my Global Superstore Sales Analysis with Python notebook so you can follow along. A free Google Colab account is required, but you can also download the ipynb file.
Step 1: Setup and Configuration
The first step is setting everything up.
Protip: Always ensure you have the latest version of libraries to avoid compatibility issues.
# Install the latest version of the plotly library !pip install plotly --upgrade # Import necessary libraries import pandas as pd # Library for data manipulation and analysis import plotly.express as px # Plotly Express for simple syntax plotting import plotly.io as pio # To access Plotly's input-output module # Set the default plotly template to 'seaborn' for consistent and visually appealing plots px.defaults.template = "seaborn"
Step 2: Load the Data and Get a Preview
Next, let’s load the Superstore Sales dataset and get a preview. Understanding your data structure is the first step in any analysis. This is the Acquire step of the D.A.T.A Framework.
Rookie mistake: Not checking the first few rows of your dataset. Always inspect the initial rows to understand your data’s structure.
# Load the Superstore Sales dataset # URL pointing to the raw data file on GitHub url = 'https://raw.githubusercontent.com/yannie28/Global-Superstore/master/Global_Superstore(CSV).csv' # Read the CSV data from the URL into a pandas DataFrame data = pd.read_csv(url) # Display the first few rows of the data to understand its structure and get a quick overview data.head()
Here’s what it should look like:
Using data.head() to view the first 5 rows of our sample data set. |
Step 3: Analyze the Data
The final step for using Python to build a sales analysis is to actually create the analysis.
Now, if you’ve been following the D.A.T.A. Framework, this is the final two parts. I’ll show you how to transform the raw CSV data into something useable and then we’ll build out three different analyses, all in Python.
Analysis 1: Sales Analysis by Category
Visualizing total sales by category provides a high-level overview of where the majority of sales are coming from.
Proper labeling in your plots is essential. It makes your charts easily understandable to anyone viewing them.
# Create a bar chart using Plotly Express fig = px.bar( # Group data by 'Category', sum the values, then reset the index for plotting data.groupby('Category').sum().reset_index(), # Set 'Category' as the x-axis variable x='Category', # Set 'Sales' as the y-axis variable y='Sales', # Provide a title for the chart title='Total Sales by Category', # Rename the 'Sales' label for clarity labels={'Sales': 'Total Sales ($)'}, # Use the 'seaborn' theme for the chart template='seaborn' ) # Update the trace to display y-values on the bars and format them as currency fig.update_traces(texttemplate='%{y:$,.0f}', textposition='outside') # Display the generated chart fig.show()
Here’s what you should see
Analysis 2: Monthly Sales Analysis
Breaking down sales on a monthly basis helps in understanding trends, seasonal variations, and anomalies.
Protip: Be sure to convert any date columns into a datetime datatype. Always ensure date-related operations are performed on columns of the correct datatype.
# Convert the 'Order Date' column from string format to datetime format data['Order Date'] = pd.to_datetime(data['Order Date']) # Extract the month and year from 'Order Date' and store it in a new column 'Order Month' data['Order Month'] = data['Order Date'].dt.to_period('M').astype(str) # Group the data by 'Order Month' and calculate the sum for each month, then reset the index for plotting monthly_sales = data.groupby('Order Month').sum().reset_index() # Create a line chart to visualize monthly sales over time fig = px.line(monthly_sales, x='Order Month', # x-axis: Month and Year y='Sales', # y-axis: Sales amount labels={'Sales': 'Monthly Sales ($)', 'Order Month': 'Month'}, title='Monthly Sales Over Time') # Update the y-axis to display values as currency (with $ prefix) fig.update_layout(yaxis_tickprefix='$') # Display the generated chart fig.show()
Here’s the sales data by month so you can easily spot trends:
Analysis 3: Monthly Sales Over Time by Category
Okay, now we’re getting somewhere! Let’s combine the insights we’ve gotten from the first two analyses into one where we can look at monthly trends by sales category.
# Group the data by both 'Order Month' and 'Category', # calculate the sum for each grouping, then reset the index for plotting monthly_sales_by_category = data.groupby(['Order Month', 'Category']).sum().reset_index() # Create a line chart to visualize monthly sales over time, segmented by product category fig = px.line(monthly_sales_by_category, x='Order Month', # x-axis: Month and Year y='Sales', # y-axis: Sales amount color='Category', # Differentiate lines by product category labels={'Sales': 'Monthly Sales ($)', 'Order Month': 'Month', 'Category': 'Product Category'}, title='Monthly Sales Over Time by Category') # Update the y-axis to display values as currency (with $ prefix) fig.update_layout(yaxis_tickprefix='$') # Display the generated chart fig.show()
And here’s what it should look like. Nice!!
Analysis 4: Profit vs Sales Analysis by Subcategory
Visualizing the relationship between profit and sales for each subcategory can reveal which products are the most lucrative.
More profit = more cash money!
A scatter plot is particularly useful for this type of analysis as it visually separates high-profit, high-sales products from the rest.
# Group the data by 'Sub-Category', # calculate the sum for each subcategory, then reset the index for plotting subcategory_data = data.groupby('Sub-Category').sum().reset_index() # Create a scatter plot to visualize the relationship between profit and sales for each subcategory fig = px.scatter(subcategory_data, x='Sales', # x-axis: Sales amount y='Profit', # y-axis: Profit amount color='Sub-Category', # Differentiate points by subcategory using color size='Sales', # Vary point size based on sales amount labels={'Sales': 'Total Sales ($)', 'Profit': 'Total Profit ($)'}, title='Profit vs Sales by Subcategory') # Update the x and y axes to display values as currency (with $ prefix) fig.update_layout(xaxis_tickprefix='$', yaxis_tickprefix='$') # Display the generated scatter plot fig.show()
And here’s the profit vs. sales scatter plot (created in Python and Plotly from CSV data!)
Great job!
If you worked through those steps, you are well on your way to analyzing all sorts of data with Python.
Here’s a quick recap of what you learned
- Getting everything set up and installed properly using Google Colab
- Loading sales data and previewing it quickly
- Creating a category bar chart and monthly sales line chart
- Combining categories and monthly sales to understand the data at a deeper level
- Understanding profit vs sales by subcategory using a scatterplot
By following the D.A.T.A. Framework, we streamlined the process of deriving actionable insights from raw sales data.
Ideas for next steps
- Expand: What other data could you include to make your analysis even more useful? Product details? Marketing data? Sales rep data?
- Predict: Python has a ton of features to help you with predictive analytics use cases. This could be a game-changer in proactive decision-making.
- Enhance: Enhance data visualization with interactive dashboards. Tools like Dash by Plotly can be integrated with Python for this purpose.
- Iterate: Implement a system to act on the insights and continuously monitor the outcomes. This iterative approach ensures the sales strategy remains dynamic and responsive.