Home / PHP: get keywords from search engine referer url

PHP: get keywords from search engine referer url

This post shows how to use PHP to extract the keywords searched on by a user when they found your website using a seach engine. Bing, Google and Yahoo are covered here and you can easily add your own to the PHP code supplied.

Version 2 of this script

This script and post has been fully revised and updated and posted here: "PHP: get keywords from search engine referer url – version 2". It takes into account some of the comments made on this post and also will work for #fragment query strings passed from Google and others.

Original post follows…

 

PHP functions used

The code example here uses the parse_url function to extract the parts from the referer URL and then the parse_str function to extract the parts of the query string into array variables. I’ve covered those functions before in an article titled "Extract query string into an associative array with PHP".

The referer URL is stored in the $_SERVER PHP superglobal as $_SERVER[‘HTTP_REFERER’], but only if it was set by the web browser. I have covered this value in some detail in the tutorial titled "Using the HTTP_REFERER variable with PHP".

Referer URL examples

Here’s some example referer URLs from Bing, Google and Yahoo from people reaching this blog.

http://www.bing.com/search?q=javascript+date+to+timestamp&src=IE-SearchBox&FORM=IE8SRC
http://www.google.de/search?q=apache+restart&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:de:official&client=firefox-a
http://us.yhs.search.yahoo.com/avg/search?fr=yhs-avg-chrome&type=yahoo_avg_hs2-tb-web_chrome_us&p=concatenation+in+mysql

You can see from looking at the URLs that Bing and Google store the keyword word as the "q" variable and Yahoo does it with "p".

The code

Here’s the PHP code to extract the keywords entered from the above examples:

function search_engine_query_string($url = false) {

    if(!$url) {
        $url = isset($_SERVER['HTTP_REFERER']) ? $_SERVER['HTTP_REFERER'] : false;
    }
    if($url == false) {
        return '';
    }

    $parts = parse_url($url);
    parse_str($parts['query'], $query);

    $search_engines = array(
        'bing' => 'q',
        'google' => 'q',
        'yahoo' => 'p'
    );

    preg_match('/(' . implode('|', array_keys($search_engines)) . ')./', $parts['host'], $matches);

    return isset($matches[1]) && isset($query[$search_engines[$matches[1]]]) ? $query[$search_engines[$matches[1]]] : '';

}

The way that it works is to either use a URL passed in or $_SERVER[‘HTTP_REFERER’] if one is not passed. It then extracts the parts from the URL (line 10) and then the breaks the pieces of the query string into values in an associative array (line 11).

A list of search engines is defined from lines 13 to 17 as an associative array containing the main part of the domain (i.e. in www.google.com the ‘google’ bit) mapped to the variable name in the query string. You can add additional search engines to this array.

Note that the array index (i.e. the ‘google’ bit) is used to match against the search engine’s domain using this index value plus a period/dot. Therefore ‘google’ would match www.google.com, www.google.co.nz and even notgoogle.com.

The regular expression could be modified to ensure there’s a period/dot at the start of the host OR the host starts with the domain, but I’m personally happy to leave it as-is for the moment; you are free of course to modify the code if you prefer to ensure a more exact match.

The regular expression on line 19 matches the search engine name into the $matches array, and line 21 returns the keywords if the search engine domain matched and a keyword variable was found.

Note that parse_str will remove any URL encoding so e.g. "javascript+date+to+timestamp" will be returned as "javascript date to timestamp".

Examples

So here’s some examples running the above function using the referer URLs from the beginning of the post:

echo search_engine_query_string('http://www.bing.com/search?q=javascript+date+to+timestamp&src=IE-SearchBox&FORM=IE8SRC');
// echoes "javascript date to timestamp"
echo search_engine_query_string('http://www.google.de/search?q=apache+restart&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:de:official&client=firefox-a');
// echoes "apache restart"
echo search_engine_query_string('http://us.yhs.search.yahoo.com/avg/search?fr=yhs-avg-chrome&type=yahoo_avg_hs2-tb-web_chrome_us&p=concatenation+in+mysql');
// echoes "concatenation in mysql"