Parsing Google Analytics data with PHP - XML filesParsing Google Analytics data with PHP - XML files

Posted March 22nd, 2009 in PHP

This post shows how to extract data from a XML email attachment from data sent by Google Analytics with PHP. Previously I have covered how to get the data from CSV files and TSV files and now it's time for dealing with XML.

Previously, I've posted how to connect to an IMAP mail server and get the attachments for a specific Google Analytics report. It called a function called process_attachment() which was a dummy placeholder function in the example code. The code in this post would be inside that function.

View an example XML attachment here. This is an HTML formatted version of the XML file so you can easily view it in your web browser. Note that I've reformatted the content with indentation so it's possible to actually understand what's in the file.

The first thing to do is to load the attachment into a SimpleXMLElement:

$xml = new SimpleXMLElement($attachment);

Now initialise some variables. $columns stores the column heading names; $data is the array that all the data from the XML is going to be stored in; $data_index is the index of the array that we store data into for each row in the XML data.

$columns = array();
$data = array();
$data_index = 0;

This chunk of code loops through the "Table" section of the XML report and puts the column names into the $columns array. Note that strings are lower-cased to make comparisons easier later on in the code.

foreach($xml->Report->Table->Column as $row) {
    $columns[] = strtolower($row->Name);
}

Doing print_r on the $columns array for my sample data results in this:

Array
(
    [0] => pageviews
    [1] => unique pageviews
    [2] => avg. time on page
    [3] => bounce rate
    [4] => % exit
    [5] => $ index
)

Now loop through the rows in the table, and then an inner loop through the cells to get each value, saving them into the array using the column names that were saved into $columns above. The page filename is not stored as a "cell" in the data but instead in the PrimaryKey field, so it is retrieved after the inner loop.

foreach($xml->Report->Table->Row as $row) {
    $index = 0;
    foreach($row->Cell as $cell) {
        $data[$data_index][$columns[$index]] = (string)$cell->Content->Value;
        $index++;
    }
    $data[$data_index]['key'] = (string)$row->PrimaryKey;
    $data_index++;
}

The resulting data array looks like this:

Array
(
    [0] => Array
        (
            [pageviews] => 3,135
            [unique pageviews] => 2,986
            [avg. time on page] => 00:06:30
            [bounce rate] => 95.08%
            [% exit] => 94.51%
            [$ index] => $0.00
            [key] => /article/apache/restart-apache/
        )

    [1] => Array
        (
            [pageviews] => 2,997
            [unique pageviews] => 2,836
            [avg. time on page] => 00:04:11
            [bounce rate] => 93.70%
            [% exit] => 91.79%
            [$ index] => $0.00
            [key] => /article/mysql/delete-all-data-mysql/
        )
...
    [99] => Array
        (
            [pageviews] => 260
            [unique pageviews] => 239
            [avg. time on page] => 00:03:22
            [bounce rate] => 87.02%
            [% exit] => 76.15%
            [$ index] => $0.00
            [key] => /default-virtualhost-mod-ssl/
        )

)

Now that the data is loaded into an easy to deal with array you can load it into a database, send a report by email etc. Alternatively you could save it to the database etc at the end of the outer foreach() loop instead of saving the data into a large memory consuming array.

Note that this code will work for the top content report but would need to be modified slightly for other reports.

Related posts:

Comments

blog comments powered by Disqus