Parsing Google Analytics data with PHP - TSV filesParsing Google Analytics data with PHP - TSV files

Posted March 19th, 2009 in PHP

This post shows how to extract data from a TSV (tab separated values) email attachment from data sent by Google Analytics with PHP. This post is virtually identical to the previous post in this series which looked at how to do the same for CSV files, except in this case we use a \t delimiter for the str_getcsv function.

Previously, I've posted how to connect to an IMAP mail server and get the attachments for a specific Google Analytics report. It called a function called process_attachment() which was a dummy placeholder function in the example code. The code in this post would be inside that function.

The TSV (tab-separated values) data from Google Analytics varies depending on the report but generally has a summary at the top, graph data and then table data. The first row of each section contains the column names and then the data itself. View an example TSV attachment here. This is an HTML formatted version of the TSV file so you can easily view it in your web browser. Note that I've substituted the tabs for | so you can actually see where the delimiters are.

Without further ado, here's the code followed by a brief explanation of what it's doing, although there's a fair amount of comments in there already explaining it. Note that the code is identical to the CSV post with the exception for the \t delimiter when calling the str_getcsv function. I've highlighted this in red.

// extract the lines from the attachment into an array so we can loop through them
$lines = explode("\n", $attachment);
// cache the count so function doesn't keep getting called in the loop
$count = count($lines);
// this var will store the section we are currently processing
// if it's false then don't do anything with the line
$processing = false;
// initilise the data array
$data = array(
    'graph' => array(),
    'table' => array()
);
// loop through the lines
for($i = 0; $i < $count; $i++) {
    // if the line starts with graph/table then extract the fieldnames two lines down
    if($lines[$i] == '# Graph' || $lines[$i] == '# Table') {
        $processing = strtolower(str_replace('# ', '', $lines[$i]));
        $i+=2;
        $fields = str_getcsv(strtolower($lines[$i]), "\t");
        $i++;
    }
    // if it's a blank line or starts with # --- then the section has ended
    if($lines[$i] == '' || preg_match('/# ---+/', $lines[$i])) {
        $processing = false;
    }
    // if processing then load the CSV data into the array using the fieldnames as index keys
    if($processing) {
        $parts = str_getcsv($lines[$i], "\t");
        $row = array();
        for($j = 0; $j < count($parts); $j++) {
            $row[$fields[$j]] = $parts[$j];
        }
        $data[$processing][] = $row;
    }
}

The code above is fairly straight forward. We look for lines that start with # Graph and # Table and extract the fieldnames from two lines further down. If the line should be processed, the str_getcsv function is called to extact the data and then it's mapped into the $data array into either 'graph' or 'table' using the fieldnames.

Note that the str_getcsv function is only available from PHP 5.3.0 so if using an earlier version of PHP you'll need to create a custom function instead. I have provided one here earlier today.

Doing print_r on the example data would show something along these lines:

Array
(
    [graph] => Array
        (
            [0] => Array
                (
                    [day] => Friday, February 13, 2009
                    [pageviews] => 3,771
                )

            [1] => Array
                (
                    [day] => Saturday, February 14, 2009
                    [pageviews] => 2,856
                )

            ...

            [30] => Array
                (
                    [day] => Sunday, March 15, 2009
                    [pageviews] => 1,826
                )

        )

    [table] => Array
        (
            [0] => Array
                (
                    [page] => /article/apache/restart-apache/
                    [pageviews] => 2975
                    [unique pageviews] => 2836
                    [avg. time on page] => 388.93975903614455
                    [bounce rate] => 0.9520255863539445
                    [% exit] => 0.944201680672269
                    [$ index] => 0.0
                )

            [1] => Array
                (
                    [page] => /article/mysql/delete-all-data-mysql/
                    [pageviews] => 2891
                    [unique pageviews] => 2732
                    [avg. time on page] => 267.13524590163934
                    [bounce rate] => 0.9355805243445693
                    [% exit] => 0.915600138360429
                    [$ index] => 0.0
                )

            ...
               
            [99] => Array
                (
                    [page] => /view-email-message-headers-outlook-2007/
                    [pageviews] => 248
                    [unique pageviews] => 242
                    [avg. time on page] => 730.875
                    [bounce rate] => 0.9710743801652892
                    [% exit] => 0.967741935483871
                    [$ index] => 0.0
                )

        )

)

The next post in this series on Sunday will show how to do this using an XML attachment. Read about the other posts in this series here, including a full list of posts in the series.

Related posts:

Comments

blog comments powered by Disqus