Skip to content

How to Document Your SQL Work So Anyone Can Understand It

    Hello ,

    Quick question: Could a colleague open your most recent SQL query and understand what it does without asking you a single question?

    If the answer is “probably not,” this newsletter is for you.

    Writing SQL that works is step one. Writing SQL that someone else can read, trust, and maintain — that’s the skill that changes how people see your work.

    Why This Matters More Than You Think

    Here’s what I’ve noticed over 15+ years in analytics: the analysts who get promoted aren’t always the ones who write the most complex queries. They’re the ones whose work is so clear that other people can build on it.

    When your analysis is easy to follow:
    Colleagues trust the results without interrogating you
    Leadership can reference your work in meetings
    New team members can pick up where you left off
    You can revisit your own work 6 months later without confusion

    When it’s not: every analysis becomes a bottleneck that requires you to explain it. That limits your impact.

    Three Documentation Habits That Make a Difference

    Habit 1: Comment the “Why,” Not the “What”

    -- BAD: Comments that describe what SQL does (obvious from reading it)
    -- Select first name and last name from customers
    SELECT first_name, last_name FROM customers;
    
    -- GOOD: Comments that explain WHY and business context
    -- Marketing requested VIP list for Q2 retention campaign
    -- VIP defined as: $5K+ lifetime spend, active in last 6 months
    SELECT 
        c.first_name || ' ' || c.last_name AS customer_name,
        c.email,
        SUM(p.amount) AS lifetime_spend,
        MAX(b.booking_date)::date AS last_booking
    FROM customers c
        INNER JOIN bookings b ON c.customer_id = b.customer_id
        INNER JOIN payments p ON b.booking_id = p.booking_id
    WHERE p.payment_status = 'completed'
        AND b.status IN ('completed', 'confirmed')
    GROUP BY c.customer_id, c.first_name, c.last_name, c.email
    HAVING SUM(p.amount) >= 5000
        AND MAX(b.booking_date) >= CURRENT_DATE - INTERVAL '6 months'
    ORDER BY lifetime_spend DESC;

    The “what” is visible in the code itself. The “why” disappears unless you write it down. Six months later, you’ll need the why more than the what.

    Habit 2: Use CTEs to Tell a Story

    Compare these two versions of the same analysis:

    Version A: One big query

    SELECT e.expedition_type, COUNT(DISTINCT b.booking_id),
    SUM(p.amount), ROUND(AVG(p.amount), 2),
    COUNT(DISTINCT b.booking_id) FILTER (WHERE b.status = 'cancelled') * 100.0 / COUNT(DISTINCT b.booking_id)
    FROM expeditions e INNER JOIN expedition_instances ei
    ON e.expedition_id = ei.expedition_id INNER JOIN bookings b
    ON ei.instance_id = b.instance_id INNER JOIN payments p
    ON b.booking_id = p.booking_id WHERE p.payment_status = 'completed'
    GROUP BY e.expedition_type ORDER BY SUM(p.amount) DESC;

    Version B: CTEs with clear steps

    -- Expedition category performance analysis
    -- Requested by: Marcus (Operations)
    -- Purpose: Q2 planning - identify which categories to expand
    
    -- Step 1: Calculate revenue and volume per category
    WITH category_performance AS (
        SELECT 
            e.expedition_type,
            COUNT(DISTINCT b.booking_id) AS total_bookings,
            SUM(p.amount) AS total_revenue,
            ROUND(AVG(p.amount), 2) AS avg_booking_value
        FROM expeditions e
            INNER JOIN expedition_instances ei ON e.expedition_id = ei.expedition_id
            INNER JOIN bookings b ON ei.instance_id = b.instance_id
            INNER JOIN payments p ON b.booking_id = p.booking_id
        WHERE p.payment_status = 'completed'
        GROUP BY e.expedition_type
    ),
    
    -- Step 2: Calculate cancellation rates per category
    category_cancellations AS (
        SELECT 
            e.expedition_type,
            ROUND(
                COUNT(*) FILTER (WHERE b.status = 'cancelled') * 100.0 / COUNT(*),
                1
            ) AS cancellation_rate
        FROM expeditions e
            INNER JOIN expedition_instances ei ON e.expedition_id = ei.expedition_id
            INNER JOIN bookings b ON ei.instance_id = b.instance_id
        GROUP BY e.expedition_type
    )
    
    -- Step 3: Combine for complete picture
    SELECT 
        cp.expedition_type,
        cp.total_bookings,
        cp.total_revenue,
        cp.avg_booking_value,
        cc.cancellation_rate
    FROM category_performance cp
        INNER JOIN category_cancellations cc ON cp.expedition_type = cc.expedition_type
    ORDER BY cp.total_revenue DESC;

    Same results. Version B is readable by anyone on your team.

    Each CTE has a name that describes what it calculates. The comments label each step. A new analyst joining your team can follow the logic without asking you what it does.

    Habit 3: Name Things for Business Users

    Column names your manager won’t understand:

    SELECT 
        cnt_bid,
        sum_amt,
        avg_val,
        exp_typ
    FROM ...

    Column names anyone can read:

    SELECT 
        total_bookings,
        total_revenue,
        avg_booking_value,
        expedition_type
    FROM ...

    This takes 10 extra seconds when writing the query and saves 10 minutes every time someone reads it. Including you.

    A Template for Documented Analysis

    Here’s a pattern I use for any analysis that might be shared or revisited:

    -- ============================================
    -- Analysis: [What question this answers]
    -- Requested by: [Who asked for it]
    -- Date: [When you wrote it]
    -- Data source: [Which database/tables]
    -- Assumptions: [Any filters or business rules]
    -- ============================================
    
    -- Step 1: [Description of first step]
    WITH step_one AS (
        SELECT ...
    ),
    
    -- Step 2: [Description of second step]
    step_two AS (
        SELECT ...
    )
    
    -- Final output: [What the results show]
    SELECT ...
    FROM step_one
        JOIN step_two ON ...
    ORDER BY ...;

    It takes 2 minutes to add this header. That 2-minute investment saves hours of “wait, what does this query do?” conversations.

    The Real Impact

    A documented query isn’t just easier to read. It’s easier to trust.

    When leadership reviews your analysis, clean documentation signals: “This person is thorough. I can rely on their numbers.” Messy, uncommented queries signal the opposite — even if the results are correct.

    This isn’t about perfection. It’s about building a reputation for clear, reliable work. Over time, that reputation compounds into bigger projects, more visibility, and greater career impact.

    Start Today

    Pick your most recent SQL query and apply these three habits:

    1. Add a comment explaining why (not what) the query exists
    2. Break it into named CTEs if it has more than 2 JOINs
    3. Rename columns to plain business language

    It won’t take long. And the next time someone opens your work, they’ll notice the difference.

    Until next time,
    Brian ([say hi on twitter!](https://twitter.com/briangraves))

    P.S. Clear, well-structured queries are a running theme throughout SQL for Business Impact. Every module teaches not just the SQL pattern, but how to structure and present it clearly. It’s the difference between writing queries and doing analysis. Check it out at [sqlforbusinessimpact.com](https://sqlforbusinessimpact.com).

    P.P.S. What does your team’s SQL documentation look like? Are queries well-documented or is everything tribal knowledge? Hit reply — I’m curious how different teams handle this. I read every response.


    Want More SQL Insights Like This?

    Join thousands of analysts getting weekly tips on turning data into business impact.

    Subscribe to Analytics in Action →

    Leave a Reply

    Your email address will not be published. Required fields are marked *