How to pick the first row in a join to avoid duplicate records in SQL

One of the common problem that we may face during joining multiple tables in a SQL is we want only one row across the entire tables even if we have duplicate records.

Consider the following tables

Table Customers
ID | NAME 
1  | BHANU

Table INVOICES
ID | CUSTOMER ID | TOTAL 
1  | 1           | 200

Table items 
ID | INVOICE ID | ITEM NAME | PURCHASE DATE |
1  |  1         | EGGS      | 20th March 2023
2  |  1         | BREADS    | 20th March 2023

In a given scenario, we have customers, invoices and items table. Now we want to list down a summary view invoices with the purchase date.

-- EXPECTED OUTPUT
CUSTOMER ID | CUSTOMER NAME | INVOICE ID | TOTAL | PURCHASE DATE

To achieve this, we have to write following query

Select c.id as "Customer Id", c.name as "Customer Name", inv.id,
inv.total, items.purchase_date 
from customer c inner join 
invoices inv on c.id = inv.customer_id
inner join items on inv.id = items.invoice_id



-- Output
Customer Id | Customer Name | Invoice Id | Total| Purchase Date
1           |  Bhanu        | 1          | 200  | 20th March 2023
1           |  Bhanu        | 1          | 200  | 20th March 2023

Ideally we have gotten only one row, because there is only one invoice but because we have two items the rows got duplicated.

There are different ways to solve this problem.

1. Using Group By

In our use case the entire row is an exact duplicate here we can use group by to avoid duplication .

Select c.id as "Customer Id", c.name as "Customer Name", inv.id,
inv.total, items.purchase_date 
from customer c inner join 
invoices inv on c.id = inv.customer_id
inner join items on inv.id = items.invoice_id
group by c.id, c.name, inv.id, inv.total, items.purchase_date

2. Applying join smartly

The above solution is not scalable and not a great solution to go with, considering the requirement we only need to select only one row from the items table we apply the join accordingly

Select c.id as "Customer Id", c.name as "Customer Name", inv.id,
inv.total, items.purchase_date 
from customer c inner join 
invoices inv on c.id = inv.customer_id
inner join items on items.id = (Select id from items where invoice_id = inv.id limit)

Here the idea was to apply join on items id instead of invoice_id of the items table.

Related Post

Leave a Reply

Your email address will not be published. Required fields are marked *