MySQL queries for article summaries

Posted in MySql -

Some time ago I posted a summary of my posts for 2008 which counted the number of posts for 2008, number of days on which I posted, summary of posts by section. Today I've decided to show the MySQL queries behind the statistics. While your content management system / blogging system (or "off the shelf" one) will have different database structures you should be able to use similar queries to work similar numbers out for your own website.

Total articles posted

The simplest query of the lot, simply requires a sum of entries from the appropriate table selected by date range. The table my content is stored in is called "content" and the "date_created" field is the date and time the post lists as the post date.

To get a count of the posts for 2008, simply select a count from the table and set the date range to >= 2008-01-01 and < 2009-01-01. You could also do <= 2008-12-31 but because it's a datetime field you'd also need add 23:59:59 at the end to ensure you get everything from that day as well. Easier to simply to a < the next day. Nothing from that day will be included.

SELECT COUNT(*) 
FROM content
WHERE date_created >= '2008-01-01'
AND date_created < '2009-01-01'

Total articles excluding some

I also wanted to find out how many posts I'd made excluding the weekly and monthly roundups. This is easy in my system because there's a url field in the table and these ones all start with /weekly-roundup/ and /monthly-roundup/, so it's simply a matter of excluding those ones with "NOT LIKE" conditions. If you need to exclude them based on e.g. a tag then you'd need to join to another table and the query gets a little trickier.

SELECT COUNT(*) 
FROM content
WHERE date_created >= '2008-01-01'
AND date_created < '2009-01-01'
AND fullurl NOT LIKE '/monthly-roundup/%'
AND fullurl NOT LIKE '/weekly-roundup/%'

Total number of days articles were posted

The next statistic I wanted was the total number of days on which I made posts. Some days had more than one so the total number of days posted is less than the number of articles posted. Because the date posted is a datetime field I needed to use LEFT(date_created, 10) to get just the date portion of the field. Then do a COUNT(DISTINCT()) to get a count of the unique number of those to get the number of days posted on.

SELECT COUNT(DISTINCT(LEFT(date_created, 10)))
FROM content
WHERE date_created >= '2008-01-01'
AND date_created < '2009-01-01'

Summary by section

The database tables for content behind this blog website have a content table and then a "tree" table which relates child posts with parent posts. The tree is actually only one level deep (an example "parent" is the PHP section and all the posts under that are "children" of the PHP post). I created the database back in 2003 and I guess I must have been intending having the ability to have multiple parents. Anyway...

To work out a count of posts per category, I needed to join the content table to the tree and then back to the content table again. The cc alias in the SQL below is for "category child" and cp for "category parent".

The SQL below gets the parent title and a count of posts belonging to that parent, specifying the date range of posts between 2008-01-01 and 2009-01-01 (see part 1 about the format of the date_created field).

SELECT cp.title, COUNT(*)
FROM content cc
INNER JOIN tree t ON cc.content_id = t.child_id
INNER JOIN content cp ON cp.content_id = t.parent_id
WHERE cc.date_created >= '2008-01-01'
AND cc.date_created < '2009-01-01'
GROUP BY cp.title

Complete list of posts, formatted with concatenate

Finally in the summary post, I listed all the posts for 2008 group by category. Again I had to do the same sort of join as the above table with the same date range selection. The fields selected were the parent's title (so I would know which section they belonged to) and then a concatenated field of HTML which I could simply copy and paste into the HTML source view of the content field in my content editor.

SELECT cp.title, 
  CAST(
    CONCAT('<li><a href="', cc.fullurl, '">', cc.title, '</a> (', 
      DATE_FORMAT(cc.date_created, '%d %b %Y'),
    ')</li>')
  AS CHAR)
FROM content cc
INNER JOIN tree t ON cc.content_id = t.child_id
INNER JOIN content cp ON cp.content_id = t.parent_id
WHERE cc.date_created >= '2008-01-01' 
AND cc.date_created < '2009-01-01'
ORDER BY cp.title, cc.date_created

I needed to do a CAST(... AS CHAR) because for some reason phpMyAdmin decided it was a BLOB field type when it displayed and wouldn't render it. I guess on export it might have rendered it as text but I needed to preview it in the browser.

The DATE_FORMAT converted the YYYY-MM-D HH:MM:SS datetime format into a format like e.g. 12 Dec 2008.

I then exported the data from phpMyAdmin using a tab separator for fields, opened it in Excel so I could see the category names in one column and the HTML in another, and simply copied and pasted the HTML columns into sections in the summary post.

My monthly roundup will now contain this data so I'll need to use the above SQL query in combination with a PHP script to auto-generate the data for the post. That will save time when I do each



Comments