The Ultimate Guide to SQL Joins and Subqueries Explained

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • MyrinNew
    Senior Member
    • Feb 2024
    • 5175

    #1

    The Ultimate Guide to SQL Joins and Subqueries Explained

    If you’re following along with my SQL Series, welcome to Part 2 🎉 🎉


    In the first part, we focused on the basics of retrieving and filtering data from a single table. Now, we’re taking the next big step: learning how to work with data spread across multiple tables.


    We’ll break down the different types of joins, look at practical use cases, and then explore how subqueries can make your queries more flexible and easier to maintain.





    What are SQL JOINS?

    In relational databases, data is often stored across multiple normalized tables. For example, employees and departments might live in two separate tables. But in real-world business queries, you almost always need combined information.


    A JOIN in SQL allows you to query data from two or more tables based on a related column between them (often a primary key and foreign key relationship).


    Syntax:






    SELECT column_list
    FROM table1
    JOIN table2
    ON table1.common_column = table2.common_column;








    When to Use JOINS

    JOINS should be used when:
    • Your data is normalized into multiple tables (e.g., Customers, Orders, Products).
    • You want to generate comprehensive reports combining multiple entities (e.g., customer details + their purchases).
    • You need data validation, ensuring only matching records across tables are retrieved.
    • You’re building reports or dashboards where relationships between entities matter.


    Different Types of SQL JOINs

    Here are the main types of joins in SQL:

    1. INNER JOIN → Returns records that have matching values in both tables
    2. LEFT (OUTER) JOIN → Returns all records from the left table, and the matched records from the right table
    3. RIGHT (OUTER) JOIN → Returns all records from the right table, and the matched records from the left table
    4. FULL (OUTER) JOIN → Returns all records when there is a match in either left or right table
    5. CROSS JOIN → represents the Cartesian product (all combinations).





    Example Dataset:


    For this guide, we’ll be working with a classic use case that most organizations deal with throughout.


    We’ll use two main tables:
    • Employees
    • Departments


    1 Alice 101 70000 2019-01-15
    2 Bob 102 55000 2020-03-12
    3 Charlie NULL 40000 2018-07-23
    4 Diana 103 62000 2017-09-10
    5 Ethan 101 72000 2021-06-19
    6 Farah 104 58000 2022-02-14
    7 George NULL 45000 2016-11-05


    101 HR New York
    102 IT San Francisco
    103 Marketing Chicago
    104 Finance Boston
    105 Operations Dallas


    Have a look at the tables and the data, were you able to catch any gaps? If not, don't worry I got you.


    INNER JOIN

    When we perform a simple JOIN our result only includes rows that match our ON condition.


    👉 Example: Find all employees who are assigned to a valid department.






    SELECT e.name, e.salary, d.department_name, d.location
    FROM employees e
    INNER JOIN departments d
    ON e.department_id = d.department_id;








    Output:


    Alice 70000 HR New York
    Bob 55000 IT San Francisco
    Diana 62000 Marketing Chicago
    Ethan 72000 HR New York
    Farah 58000 Finance Boston


    Note: INNER is the default, so JOIN is equivalent to INNER JOIN.


    LEFT JOIN

    Returns all rows from the left table (employees), even if they don’t have a matching row from the right table (department).


    If no match exists, the right-side columns return NULL.


    👉 Example: List all employees, including those without departments.






    SELECT e.name, e.salary, d.department_name
    FROM employees e
    LEFT JOIN departments d
    ON e.department_id = d.department_id;








    Output:


    Alice 70000 HR
    Bob 55000 IT
    Charlie 40000 NULL
    Diana 62000 Marketing
    Ethan 72000 HR
    Farah 58000 Finance
    George 45000 NULL


    RIGHT JOIN

    A RIGHT JOIN returns all rows from the right table(department), plus matching rows from the left(employees).


    If no match exists, the left-side columns return NULL.


    👉 Example: List all departments, including those without employees.






    SELECT e.name, d.department_name
    FROM employees e
    RIGHT JOIN departments d
    ON e.department_id = d.department_id;








    Output:


    Alice HR
    Ethan HR
    Bob IT
    Diana Marketing
    Farah Finance
    NULL Operations


    FULL OUTER JOIN

    Returns all records when there is a match in left (employees) or right (departments) table records.


    Use FULL OUTER JOIN when you want a complete picture of employees and departments, including mismatches on both sides.


    👉 Example: Show a complete view of employees and departments, including unmatched rows on both sides.






    SELECT e.name, d.department_name
    FROM employees e
    FULL OUTER JOIN departments d
    ON e.department_id = d.department_id;








    Output:


    Alice HR
    Ethan HR
    Bob IT
    Diana Marketing
    Farah Finance
    Charlie NULL
    George NULL
    NULL Operations


    Note: FULL OUTER JOIN and FULL JOIN are equivalent.


    CROSS JOIN

    Returns the Cartesian product of both tables (every employee with every department).


    It is rarely used in business queries, but useful for generating test data or combinations.


    👉 List all employee-department combinations.






    SELECT e.name, d.department_name
    FROM employees e
    CROSS JOIN departments d;








    Notice that cross joins don’t require an ON statement. You’re not really joining on any columns!


    UNION

    Sometimes we just want to stack one dataset on top of the other. Well, the UNION operator allows us to do that.


    The UNION operator automatically removes duplicate rows from the result set.


    Syntax:






    SELECT *
    FROM table1
    UNION
    SELECT *
    FROM table2;








    SQL has strict rules for appending data:
    • Tables must have the same number of columns.
    • The columns must have the same data types in the same order as the first table.


    Primary Key vs Foreign Key

    Relational databases are built on the idea of relationships between tables. These relationships are created using primary keys and foreign keys.


    Primary Key:

    1. A Primary Key is a column (or set of columns) that uniquely identifies each row in a table.
    2. No two rows can have the same primary key value.
    3. It cannot contain NULL


    In the Departments table above, department_id is the primary key because each department must have a unique ID.


    Foreign Key:

    1. A Foreign Key is a column in one table that refers to the primary key in another table.
    2. It creates the relationship between two tables.


    In the Employees table, department_id is a foreign key referencing the Departments table.


    Without primary and foreign keys, JOINS wouldn’t know how to link tables together.


    Using WITH in SQL

    Sometimes we want to combine two tables, but one of those tables isn’t available directly, instead, it’s the result of another calculation.


    For example, HR might want to know:


    👉 What is the average salary per department, and which employees belong to those departments?


    This requires two steps:

    1. First, calculate the average salary for each department.
    2. Then, join this result with the Employees table so we can see which employees belong to those departments.


    Step 1: Calculate average salary per department






    SELECT department_id,
    AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department_id;








    Output:


    101 71000
    102 55000
    103 62000
    104 58000
    NULL 42500


    Step 2: Use WITH to build a reusable result


    Instead of nesting the above query directly, we wrap it with a CTE:






    WITH dept_avg AS (
    SELECT department_id,
    AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department_id
    )
    SELECT e.name, e.salary, d.department_name, dept_avg.avg_salary
    FROM employees e
    JOIN departments d
    ON e.department_id = d.department_id
    JOIN dept_avg
    ON e.department_id = dept_avg.department_id;








    Output:


    Alice 70000 HR 71000
    Ethan 72000 HR 71000
    Bob 55000 IT 55000
    Diana 62000 Marketing 62000
    Farah 58000 Finance 58000


    At this point in our SQL journey we know we can query a

    database to retrieve desired results. However, what happens when we query a database but we really only need a subset of the results returned?


    Let's explore the use of something called a subquery that gives us the same functionality as a join, but with much more readability.

    Subqueries

    As the name suggests, a subquery is an internal query nested inside of an external query. They can be nested inside of SELECT, INSERT, UPDATE, or DELETE statements.


    Anytime a subquery is present, it gets executed before the external statement is run.


    Syntax:






    SELECT column1, column2, ...
    FROM table1
    WHERE column operator (SELECT column FROM table2 WHERE condition);
    • column1, column2, ...: A list of column names you want to retrieve.
    • table1: The main table from which you want to select data.
    • column: The column you want to compare or filter against in the main query.
    • operator: An operator such as =, >,
    • table2: The table from which you want to retrieve data for comparison.
    • condition: The condition that specifies which records to select from the subquery.


    When to use: When you need to filter, calculate, or aggregate before applying conditions in your main query.


    👉 Example: Find employees earning more than the average salary.






    SELECT name, salary
    FROM employees
    WHERE salary > (SELECT AVG(salary) FROM employees);








    In this statement, the subquery SELECT statement would be executed first, resulting in a list of average salaries of the employees.


    Comparison Operators

    Subqueries are often combined with operators for filtering.
    • = → Equal to
    • > → Greater than
    • → Less than
    • >= → Greater than or equal to
    • → Less than or equal to
    • or != → Not equal to


    In and Not In Clauses

    The more common ways to use subqueries is with the use of an IN or NOT IN clause


    When an IN clause is used, results retrieved from the external query must appear within the subquery results.


    Similarly, when a NOT IN clause is used, results retrieved from the external query must not appear within the subquery results.


    Example (IN): Employees in HR or IT






    SELECT name
    FROM employees
    WHERE department_id IN (101, 102);








    Result: Alice, Bob, Ethan


    Example (NOT IN): Employees not in HR or IT






    SELECT name
    FROM employees
    WHERE department_id NOT IN (101, 102);








    Result: Charlie, Diana, Farah, George


    Exists and Not Exists

    The EXISTS and NOT EXISTS clauses are used in SQL to test whether a subquery returns any rows.


    EXISTS → Returns TRUE if the subquery produces at least one row.


    NOT EXISTS → Returns TRUE if the subquery produces no rows.


    Unlike IN / NOT IN, which collect a list of values from the subquery and then compare them, EXISTS simply checks for the presence of rows.


    Example: Show all employees who are assigned to a department.






    SELECT *
    FROM employees e
    WHERE EXISTS (
    SELECT *
    FROM departments d
    WHERE d.department_id = e.department_id
    );
    • The subquery checks if a department exists with the same department_id as the employee.
    • If it finds one, EXISTS is true and the employee is included.
    • If not, that employee is skipped.


    Example: Show employees who are not assigned to any department.






    SELECT *
    FROM employees e
    WHERE NOT EXISTS (
    SELECT *
    FROM departments d
    WHERE d.department_id = e.department_id
    );
    • For each employee, the subquery checks the departments table.
    • If no department matches the employee’s department_id, then NOT EXISTS returns true, and that employee appears in the result.


    Wrapping Up

    Congratulations on taking another step in the SQL journey. 🙌


    In the next part of the series, we’ll build on this foundation to tackle even more advanced SQL concepts.


    Until then, here’s a friendly reminder that you’ve leveled up:







    More...
Working...