Using SQL DISTINCT
The SQL Distinct command can be used in the SELECT statement to ensure that the query returns only distinct (unique) rows. When the query is selecting the rows it discards any row which is a duplicate of any other row already selected by the query.
The syntax for DISTINCT is as follows:
SELECT DISTINCT [ColumnName,..]
FROM TableName
Let us take the following rows from a BookReview table.
AuthorName | Book | ReviewScore | ReviewDate |
---|---|---|---|
Fred Bloggs | Learn SQL | 8 | 01/04/2007 |
Joe Smith | Database design | 8 | 09/09/2007 |
Jackie Jones | SQL in 1 minute | 8 | 08/01/2007 |
Joe Smith | Database design | 7 | 08/04/2007 |
We want to write a query that will return a list of all book titles that have been reviewed along with the author's name. Our first attempt at this query might look like:
SELECT AuthorName, Book As BookTitle
FROM BookReview
This would return all rows from the above table and although it would partly achieve our aim it is not what we want as the row with the values 'Joe Smith', 'Database design' appears twice in our resultset.
AuthorName | BookTitle |
---|---|
Fred Bloggs | Learn SQL |
Joe Smith | Database design |
Jackie Jones | SQL in 1 minute |
Joe Smith | Database design |
We only want each authorname / booktitle combination to appear once in our resultset so we can use the DISTINCT command to tell the database to only return distinct rows.
SELECT DISTINCT AuthorName, Book As BookTitle
FROM BookReview
AuthorName | BookTitle |
---|---|
Fred Bloggs | Learn SQL |
Joe Smith | Database design |
Jackie Jones | SQL in 1 minute |
This time the query only returns one row for each authorname + booktitle combination which is what we intended.
Conclusion
In this article we have seen how we can use the DISTINCT keyword to ensure that our SELECT queries return unique rows. The Distinct command operates on all columns that are selected in the SELECT query and it can be easily used to remove duplicate rows from resultsets.