One of the common problem that we may face during joining multiple tables in a SQL is we want only one row across the entire tables even if we have duplicate records.
Consider the following tables
Table Customers
ID | NAME
1 | BHANU
Table INVOICES
ID | CUSTOMER ID | TOTAL
1 | 1 | 200
Table items
ID | INVOICE ID | ITEM NAME | PURCHASE DATE |
1 | 1 | EGGS | 20th March 2023
2 | 1 | BREADS | 20th March 2023
In a given scenario, we have customers, invoices and items table. Now we want to list down a summary view invoices with the purchase date.
-- EXPECTED OUTPUT
CUSTOMER ID | CUSTOMER NAME | INVOICE ID | TOTAL | PURCHASE DATE
To achieve this, we have to write following query
Select c.id as "Customer Id", c.name as "Customer Name", inv.id,
inv.total, items.purchase_date
from customer c inner join
invoices inv on c.id = inv.customer_id
inner join items on inv.id = items.invoice_id
-- Output
Customer Id | Customer Name | Invoice Id | Total| Purchase Date
1 | Bhanu | 1 | 200 | 20th March 2023
1 | Bhanu | 1 | 200 | 20th March 2023
Ideally we have gotten only one row, because there is only one invoice but because we have two items the rows got duplicated.
There are different ways to solve this problem.
1. Using Group By
In our use case the entire row is an exact duplicate here we can use group by to avoid duplication .
Select c.id as "Customer Id", c.name as "Customer Name", inv.id,
inv.total, items.purchase_date
from customer c inner join
invoices inv on c.id = inv.customer_id
inner join items on inv.id = items.invoice_id
group by c.id, c.name, inv.id, inv.total, items.purchase_date
2. Applying join smartly
The above solution is not scalable and not a great solution to go with, considering the requirement we only need to select only one row from the items table we apply the join accordingly
Select c.id as "Customer Id", c.name as "Customer Name", inv.id,
inv.total, items.purchase_date
from customer c inner join
invoices inv on c.id = inv.customer_id
inner join items on items.id = (Select id from items where invoice_id = inv.id limit)
Here the idea was to apply join on items id instead of invoice_id of the items table.