Cracking MySQL: Self-Joins Made Simple with Leetcode Example

Xinqian Zhai
3 min readOct 11, 2023

--

Photo by Rubaitul Azad on Unsplash

When I revisited SQL, recently, I noticed the recurring appearance of self-joins. It became evident that it’s time to consolidate the ‘why’ and ‘when’ of using self-joins in SQL queries. After some research, I’ve organized my insights into this post to demystify self-joins for my future reference and to share with anyone who may have the same curiosity about their use.

Defining Self-Joins:

A self-join is a special SQL technique where you join a table with itself (usually using the inner join or left join). It is typically done to create relationships between rows (or in the same row) within the same table. The keywords indicating the relationship often involve organization structure, multi-level marketing, or comparison of successive rows within the same table.

Identifying the Self-Join Scenario:

Self-joins are your go-to spell when you encounter scenarios like:

  1. Hierarchical Data: When your data is organized hierarchically, such as employees and managers. Self-joins unveil the lineage within your organization, allowing you to trace the chain of command.
  2. Comparing Records Within the Same Table: Self-joins are helpful when you need to compare records within the same dataset. For instance, if you want to match customers’ shipping addresses to their billing addresses, self-joins is likely to be your first consideration.

Example:

Below is a SQL question from Leetcode (Database Question 181. Employees Earning More Than Their Managers).

We have an “Employee” table, id is the primary key for this table.
Each row of this table indicates the ID of an employee, their name,
salary, and the ID of their manager, identified by a “ManagerId.”

The question is to find the employees who earn more than their managers. Here self-join becomes the magical key.

/* 
Employee table:
+----+-------+--------+-----------+
| id | name | salary | managerId |
+----+-------+--------+-----------+
| 1 | Joe | 70000 | 3 |
| 2 | Henry | 80000 | 4 |
| 3 | Sam | 60000 | Null |
| 4 | Max | 90000 | Null |
+----+-------+--------+-----------+

Output:
+----------+
| Employee |
+----------+
| Joe |
+----------+
*/

Here is my approach and explanation.

/* My solution with explanation */

-- We use a self-join to find employees
-- have higher salaries than their respective managers

-- Alias 'e' represents employees,
-- and alias 'm' represents managers.

SELECT e.name AS Employee
FROM Employee e
INNER JOIN Employee m

-- The INNER JOIN retrieves employees
-- who have managers (managers themselves are excluded).

ON e.managerId = m.id

-- Join the two tables ON
-- setting the managerId from the e (employee) table
-- equals to the id from the m (manager) table,
-- so that the information (id, name, salary, etc.)
-- of both employees and their managers shows in the same row.

WHERE e.salary > m.salary

-- WHERE filters the records/rows
-- by comparing employee salaries to their managers' salaries.

In this code snippet:

  • Employees e and Employees m provides us power over the employees and their managers within the same table.
  • The conditionm.id = e.ManagerId binds employees to their managers, creating a magical connection.
  • It combines the two tables (Employee e and Employee m), allowing successive comparisons in the WHERE statement.
  • Finally, theWHERE e.salary > m.salary clause filters records/rows where an employee’s salary surpasses their manager’s.

Summary

Mastering self-joins is all about spotting relationships within the same table, unveiling hidden insights. With SQL magic and a keen eye for relationships, you can unlock the secrets within your dataset, whether it’s hierarchical data or the need for comparison of consecutive rows. So, embrace self-joins and craft your data stories!

That’s all for now. Thanks for your reading!

Follow me and stay tuned! 😺 🍂 🐿

WRITER at MLearning.ai /AI Agents LLM / AI 3D ART / AI ART 2024

--

--

Xinqian Zhai

Graduate student at the University of Michigan and a new learner on the road to data science.