Home / Selecting substrings with MySQL using LOCATE and SUBSTRING

Selecting substrings with MySQL using LOCATE and SUBSTRING

The MySQL function SUBSTRING allows you to extract a substring from a string. This post looks at how to use substring to extract some information from a field in a database using an example column which contains XML data. Refer also to my earlier post titled "Finding the location of a string in a string with MySQL" for more information about using the LOCATION function.

Example Report

In one of the websites I manage, all of the enquiries and orders are saved into an XML format and then sent through to the remote system where they are then saved into that database. For auditing purposes all enquiries and their associated XML are logged into the web server database should we need to run some report or other on it at a later time (or resend an order).

In the example below the table is called enquiry_log and contains, among other fields, a field called "xml" which contains the xml string. There’s an XML node called <Referrer_To_Website> which contains a code logging which partner website referred the customer to this website.

Once a month we run a report to get the partner enquiries from the database. It would be easier to have the partner code as a separate column in the table but when the system was created we didn’t need to run a report like this so for the moment it’s easier to extract it from the XML.

SUBSTRING Arguments

SUBSTRING takes two or three arguments. The first two are the string or column name and the starting position to take the substring from. The third argument is the length of the string to extract. If the third argument is left out then the rest of the string from length is returned.

Example SQL

So, in order to get the text between <Referrer_To_Website> and </Referrer_To_Website> we need to do this:

SELECT
    SUBSTRING(log.xml, 
      LOCATE('<Referrer_To_Website>', log.xml)+21, 
      LOCATE('</Referrer_To_Website>', log.xml) - LOCATE('<Referrer_To_Website>', log.xml) - 21) 
    AS Referrer_To_Website
FROM 
    enquiry_log AS log

In the above SQL query, the second parameter passed to SUBSTRING() is the column from the enquiry_log table that contains the xml.

The second parameter is
LOCATE(‘<Referrer_To_Website>’, log.xml)+21.

This returns the position of the first occurence of <Referrer_To_Website> in the "xml" column. 21 is then added to it; this is the length of the text <Referrer_To_Website> and the length needs to be added so SUBSTRING starts getting the substring from the first character after the end of the XML node name.

The third parameter is for the length of the string to extract. In plain speak, the formula to do this is (location of opening node) minus (location of the closing node) minus (the length of the opening node).

The resulting calculation for this example looks like:
LOCATE(‘</Referrer_To_Website>’, log.xml)
– LOCATE(‘<Referrer_To_Website>’, log.xml)
– 21

I hope this all makes sense. It’s a fairly specific type of example but does illustrate how you can extract strings from a MySQL column using LOCATE and SUBSTRING.