MySQL: Find records in one table that are not in another - revisedMySQL: Find records in one table that are not in another - revised

Posted May 8th, 2009 in MySql

A couple of weeks back I posted how to find records in one table that are not in another with MySQL and received an email from Greg Jorgensen with a more efficient way of approaching the same problem and therefore revise my original post with his suggestion.

Example tables

The examples below use three tables as follows:

content: contains the content pages for a website. The primary key is content_id.

tags: the "tags" that a page is tagged with. The primary key is tag_id.

content_to_tags: a table that creates a many-to-many relationship between the above two tables; a page can belong to multiple tags.

Example 1

The first example finds records in content_to_tags that have no associated record in content. This could have happened if the application deleted a record from content but didn't delete the associated records from content_to_tags:

SELECT * 
FROM content_to_tags c2t
WHERE NOT EXISTS (
	SELECT * 
	FROM content c 
	WHERE c.content_id = c2t.content_id
)

Example 2

This is the same as for the first example but comparing tags with content_to_tags. Again, this might have happened if records were deleted from tags but not their associated records from content_to_tags:

SELECT * 
FROM content_to_tags c2t
WHERE NOT EXISTS (
	SELECT * 
	FROM tags t
	WHERE t.tag_id = c2t.tag_id
)

Benchmarking Examples 1 and 2

In my orginal post I used a LEFT JOIN from content_to_tags to content. I benchmarked my original query compared to this query on a small content table with 1000 records, a content_to_tags table with 2000 records and a tags table with 100 records.

Using NOT EXISTS was just over two times faster, and I would assume that as the tables get populated with more records the difference exponential.

A note about ACID compliance and foreign keys

Note that in a properly ACID compliant database with foreign key constraints the first two examples shouldn't actually return any data, because it wouldn't be possible to delete records from tags/content if there are associated records present in the content_to_tags table.

Examples 3 and 4 are looking at something slightly different: finding content that's not tagged at all, and finding tags that are not tagged to any content.

Example 3

The next example looks for records in content where there are no associated records in content_to_tags. This is useful if you want to find any untagged posts:

SELECT * 
FROM content c
WHERE NOT EXISTS (
	SELECT * 
	FROM content_to_tags c2t
	WHERE c.content_id = c2t.content_id
)

Example 4

The final example is the same as the above but to find tags that have no associated records in content_to_tags. This is useful for finding tags that have no content, or in another context (with different table names etc) categories that have no products in them:

SELECT * 
FROM tags t
WHERE NOT EXISTS (
	SELECT * 
	FROM content_to_tags c2t
	WHERE t.tag_id = c2t.tag_id
)

Benchmarking Examples 3 and 4

The difference between my original LEFT JOIN and the NOT EXISTS queries here was minimal for examples 3 and 4 on the example tables although I would imagine as they become bigger the difference would grow. NOT EXISTS was once again faster that the LEFT JOIN syntax.

Related posts:

Comments

blog comments powered by Disqus