Extracting attachments from an email message using PHP IMAP functions

Posted in PHP -

This post is part of an ongoing series which aims to show how to extract data from Google Analytics using its scheduled email reports system. I have already looked at how to send Google Analytics data by email, and how to use the PHP IMAP functions to download email. I will also look at using other PHP libraries to download email and attachments, but for now this post looks at how to extract email attachments using the PHP IMAP functions.

PHP email class

Please note I have written a class for extracting parts from email attachments. This can be downloaded from my more recent post: PHP email message class for extracting attachments

Getting the message structure

There are other was to do this - the method presented here is just one of several was of getting the attachments. I will look at other ways to get attachments from email messages in later posts.

After logging in to the IMAP or POP mail server (detailed in my last post in this series about how to use the PHP IMAP functions to download email) use the imap_fetchstructure() function to get the structure of the message.

The following code snippet example uses the connection stored in $connection to download the message structure for the $message_number message in the mailbox (this is a 1 based index of the messages in the mailbox):

$structure = imap_fetchstructure($connection, $message_number);

Assuming all went well we now have an object containing a lot of information about the message. The following is output from print_r for a message sent from Google Analytics containing tab-separated data in an attachment:

stdClass Object
(
    [type] => 1
    [encoding] => 0
    [ifsubtype] => 1
    [subtype] => MIXED
    [ifdescription] => 0
    [ifid] => 0
    [ifdisposition] => 0
    [ifdparameters] => 0
    [ifparameters] => 1
    [parameters] => Array
        (
            [0] => stdClass Object
                (
                    [attribute] => boundary
                    [value] => 00221532c8aca27cf00462632bb7
                )
        )
    [parts] => Array
        (
            [0] => stdClass Object
                (
                    [type] => 0
                    [encoding] => 0
                    [ifsubtype] => 1
                    [subtype] => PLAIN
                    [ifdescription] => 0
                    [ifid] => 0
                    [lines] => 11
                    [bytes] => 737
                    [ifdisposition] => 0
                    [ifdparameters] => 0
                    [ifparameters] => 1
                    [parameters] => Array
                        (
                            [0] => stdClass Object
                                (
                                    [attribute] => charset
                                    [value] => ISO-8859-1
                                )

                            [1] => stdClass Object
                                (
                                    [attribute] => format
                                    [value] => flowed
                                )

                            [2] => stdClass Object
                                (
                                    [attribute] => delsp
                                    [value] => yes
                                )
                        )
                )
            [1] => stdClass Object
                (
                    [type] => 0
                    [encoding] => 3
                    [ifsubtype] => 1
                    [subtype] => TAB-SEPARATED-VALUES
                    [ifdescription] => 0
                    [ifid] => 0
                    [lines] => 111
                    [bytes] => 8674
                    [ifdisposition] => 1
                    [disposition] => attachment
                    [ifdparameters] => 1
                    [dparameters] => Array
                        (
                            [0] => stdClass Object
                                (
                                    [attribute] => filename
                                    [value] => Analytics_www.electrictoolbox.com_20090108-20090207.tsv
                                )
                        )
                    [ifparameters] => 1
                    [parameters] => Array
                        (
                            [0] => stdClass Object
                                (
                                    [attribute] => charset
                                    [value] => US-ASCII
                                )

                            [1] => stdClass Object
                                (
                                    [attribute] => name
                                    [value] => Analytics_www.electrictoolbox.com_20090108-20090207.tsv
                                )
                        )
                )
        )
)

Working out and getting the attachments

The above isn't the easiest to extract the information we need. You can see we need to loop through [parts] and then each part's [parameters] and [dparameters] to get the filename and name for each, downloading the message part using imap_fetchbody() if it is. If the part doesn't have a name then it's not an attachment.

This is achieved with the following code, assigning information to a array called $attachments. The reason 1 is added to $i in the call to imap_fetchbody() is that the parts are zero-based but in the IMAP functions they are one-based.

$attachments = array();
if(isset($structure->parts) && count($structure->parts)) {

	for($i = 0; $i < count($structure->parts); $i++) {

		$attachments[$i] = array(
			'is_attachment' => false,
			'filename' => '',
			'name' => '',
			'attachment' => ''
		);
		
		if($structure->parts[$i]->ifdparameters) {
			foreach($structure->parts[$i]->dparameters as $object) {
				if(strtolower($object->attribute) == 'filename') {
					$attachments[$i]['is_attachment'] = true;
					$attachments[$i]['filename'] = $object->value;
				}
			}
		}
		
		if($structure->parts[$i]->ifparameters) {
			foreach($structure->parts[$i]->parameters as $object) {
				if(strtolower($object->attribute) == 'name') {
					$attachments[$i]['is_attachment'] = true;
					$attachments[$i]['name'] = $object->value;
				}
			}
		}
		
		if($attachments[$i]['is_attachment']) {
			$attachments[$i]['attachment'] = imap_fetchbody($connection, $message_number, $i+1);
			if($structure->parts[$i]->encoding == 3) { // 3 = BASE64
				$attachments[$i]['attachment'] = base64_decode($attachments[$i]['attachment']);
			}
			elseif($structure->parts[$i]->encoding == 4) { // 4 = QUOTED-PRINTABLE
				$attachments[$i]['attachment'] = quoted_printable_decode($attachments[$i]['attachment']);
			}
		}
	}
}

The end result of the above code on our example email is the following, with the data truncated for the actual attachment:

Array
(
    [0] => Array
        (
            [is_attachment] => 
            [filename] => 
            [name] => 
            [attachment] => 
        )

    [1] => Array
        (
            [is_attachment] => 1
            [filename] => Analytics_www.electrictoolbox.com_20090108-20090207.tsv
            [name] => Analytics_www.electrictoolbox.com_20090108-20090207.tsv
            [attachment] => ...
        )

)

You can now loop through the $attachments array looking for the appropriate filename to do whatever processing you need to it.

Conclusion & future posts

The PHP IMAP functions allow you to download email and reasonably easily extract the attachments from it using the code examples presented in this post. Future posts in this series will now look at how to parse TSV, CSV and XML data from the Google Analytics reports using PHP, now that I've covered how to download and extract the attachments.

Read about the series here, including a list of all posts in it.



Related posts:


Comments