Thursday, January 3, 2013

Finding Duplicates in MySQL

NEED TO KNOW:
MySQL

NOTES:
Can be slow but heck, it's a powerful command.

Finding duplicates in MySQL isn't that hard.  For the purposes of clarity, all SQL commands are in caps.  Nuff said. 

So anyway, you have a billion record table and you want to see the duplicate values.  Use the following command from MySQL

HAVING count( Field in question) > 1

count will automatically determine any records that are duplicated.  We use > 1 to distinguish those that are unique.  Conversely you can change it to =1 to find unique records. 

In our example. we'll be using TABLE365 for the example.  The field we'll be using is InputDate for finding duplicates. 

SELECT * 
FROM TABLE365
GROUP BY InputDate
HAVING count(InputDate) >1

MySQL is annoyingly picky about count.  Don't add a space (e.g. count (InputDate)) otherwise it will WHINE back at you over your harseness and cruelty. 

NOW the problem with this query is that it won't show the duplicates, only the first appearance - not good but a start.  We can embed a query to essentially have MySQL run a query ON a query.

SELECT * 
FROM TABLE365
WHERE InputDate IN (  -> The query we just used <- ) 
ORDER BY InputDate

As you can see, the second query is buried in parantheses.  Of course you can use this to do a query on a query.  It's a way to avoid generating a table just to perform a special query on it. 


FINALLY, the entire piece of code looks like this:

SELECT *
FROM TABLE365
WHERE InputDate
IN (


SELECT InputDate
FROM TABLE365
GROUP BY InputDate
HAVING count( InputDate ) >1
)
ORDER BY InputDate

Of course you can modify this for PHP such as:
$QueryString = 'Select * from TABLE365 where InputDate in ( Select InputDate FROM...
You got the idea.