Fetch message parts into a flat array with PHP IMAP

Posted in PHP -

The PHP IMAP functions imap_fetchstructure and imap_fetchbody are used to work out the structure of an email and get the message body and attachments, but they can be fiddly to use because the message parts can be nested. This post has a function which effectively flattens the message parts into a new array, indexed by the part number which can be directly passed to imap_fetchbody.

Earlier post for extracting email attachments with PHP IMAP

I've written about extracting attachments from an email with PHP before, but that post didn't do any recursion into the sub parts of the email so it missed attachments, especially from emails sent using Apple's mail program (and therefore probably from iOS devices like iPhones and iPads).

The new functions in this post solve this issue and make it a lot easier to find the content in the email by providing a flatter array structure than that provided by the native imap_fetchstructure function.

Message structure

The structure of an email message is generally something like this, with part numbers:

1 - Multipart/alternative headers
1.1 - Plain text message
1.2 - HTML version of message
2 - Inline attachment, etc

In Apple Mail it's like this instead:

1 - Plain text message
2 - Multipart/alternative headers
2.1 - HTML version of the message
2.2 - Inline attachment, etc

If the message has been forwarded, then it will look like this:

1 - Multipart/alternative headers
1.1 - Plain text message
1.2 - HTML version of message
2 - Message/RFC822
2.0 - Attached message header
2.1 - Plain text message
2.2 - HTML version of message
2.3 - Inline attachment, etc

Apple Mail does it differently again, and it's more complicated. I won't bother showing it here.

Example from imap_fetchstructure from Gmail

The following is the result of doing print_r on a message structure that was sent from one Gmail account to another, with an inline image in the email:

stdClass Object
(
    [type] => 1
    [encoding] => 0
    [ifsubtype] => 1
    [subtype] => RELATED
    [ifdescription] => 0
    [ifid] => 0
    [bytes] => 6672
    [ifdisposition] => 0
    [ifdparameters] => 0
    [ifparameters] => 1
    [parameters] => Array
        (
            [0] => stdClass Object
                (
                    [attribute] => BOUNDARY
                    [value] => 14dae9340fe98f008e04b5e7b187
                )

        )

    [parts] => Array
        (
            [0] => stdClass Object
                (
                    [type] => 1
                    [encoding] => 0
                    [ifsubtype] => 1
                    [subtype] => ALTERNATIVE
                    [ifdescription] => 0
                    [ifid] => 0
                    [bytes] => 915
                    [ifdisposition] => 0
                    [ifdparameters] => 0
                    [ifparameters] => 1
                    [parameters] => Array
                        (
                            [0] => stdClass Object
                                (
                                    [attribute] => BOUNDARY
                                    [value] => 14dae9340fe98f008b04b5e7b186
                                )

                        )

                    [parts] => Array
                        (
                            [0] => stdClass Object
                                (
                                    [type] => 0
                                    [encoding] => 0
                                    [ifsubtype] => 1
                                    [subtype] => PLAIN
                                    [ifdescription] => 0
                                    [ifid] => 0
                                    [lines] => 14
                                    [bytes] => 190
                                    [ifdisposition] => 0
                                    [ifdparameters] => 0
                                    [ifparameters] => 1
                                    [parameters] => Array
                                        (
                                            [0] => stdClass Object
                                                (
                                                    [attribute] => CHARSET
                                                    [value] => ISO-8859-1
                                                )

                                        )

                                )

                            [1] => stdClass Object
                                (
                                    [type] => 0
                                    [encoding] => 0
                                    [ifsubtype] => 1
                                    [subtype] => HTML
                                    [ifdescription] => 0
                                    [ifid] => 0
                                    [lines] => 5
                                    [bytes] => 528
                                    [ifdisposition] => 0
                                    [ifdparameters] => 0
                                    [ifparameters] => 1
                                    [parameters] => Array
                                        (
                                            [0] => stdClass Object
                                                (
                                                    [attribute] => CHARSET
                                                    [value] => ISO-8859-1
                                                )

                                        )

                                )

                        )

                )

            [1] => stdClass Object
                (
                    [type] => 5
                    [encoding] => 3
                    [ifsubtype] => 1
                    [subtype] => PNG
                    [ifdescription] => 0
                    [ifid] => 1
                    [id] => 

Flattened version of the above email

Using my function provided below, this is a "flattened" version of the above email structure, which as you can see is a lot easier to loop through.

Array
(
    [1] => stdClass Object
        (
            [type] => 1
            [encoding] => 0
            [ifsubtype] => 1
            [subtype] => ALTERNATIVE
            [ifdescription] => 0
            [ifid] => 0
            [bytes] => 915
            [ifdisposition] => 0
            [ifdparameters] => 0
            [ifparameters] => 1
            [parameters] => Array
                (
                    [0] => stdClass Object
                        (
                            [attribute] => BOUNDARY
                            [value] => 14dae9340fe98f008b04b5e7b186
                        )

                )

        )

    [1.1] => stdClass Object
        (
            [type] => 0
            [encoding] => 0
            [ifsubtype] => 1
            [subtype] => PLAIN
            [ifdescription] => 0
            [ifid] => 0
            [lines] => 14
            [bytes] => 190
            [ifdisposition] => 0
            [ifdparameters] => 0
            [ifparameters] => 1
            [parameters] => Array
                (
                    [0] => stdClass Object
                        (
                            [attribute] => CHARSET
                            [value] => ISO-8859-1
                        )

                )

        )

    [1.2] => stdClass Object
        (
            [type] => 0
            [encoding] => 0
            [ifsubtype] => 1
            [subtype] => HTML
            [ifdescription] => 0
            [ifid] => 0
            [lines] => 5
            [bytes] => 528
            [ifdisposition] => 0
            [ifdparameters] => 0
            [ifparameters] => 1
            [parameters] => Array
                (
                    [0] => stdClass Object
                        (
                            [attribute] => CHARSET
                            [value] => ISO-8859-1
                        )

                )

        )

    [2] => stdClass Object
        (
            [type] => 5
            [encoding] => 3
            [ifsubtype] => 1
            [subtype] => PNG
            [ifdescription] => 0
            [ifid] => 1
            [id] => 

PHP code to flatten the IMAP structure

This is a recursive function which creates an array indexed by the part number (1, 1.1, 1.2 etc) and follows the IMAP RFC rules. I've tested it on both a regular email sent from Gmail and from Apple Mail, and also tested it on a forwarded email in Apple Mail (where the original message was forwarded as an attachment).

function flattenParts($messageParts, $flattenedParts = array(), $prefix = '', $index = 1, $fullPrefix = true) {

	foreach($messageParts as $part) {
		$flattenedParts[$prefix.$index] = $part;
		if(isset($part->parts)) {
			if($part->type == 2) {
				$flattenedParts = flattenParts($part->parts, $flattenedParts, $prefix.$index.'.', 0, false);
			}
			elseif($fullPrefix) {
				$flattenedParts = flattenParts($part->parts, $flattenedParts, $prefix.$index.'.');
			}
			else {
				$flattenedParts = flattenParts($part->parts, $flattenedParts, $prefix);
			}
			unset($flattenedParts[$prefix.$index]->parts);
		}
		$index++;
	}

	return $flattenedParts;
			
}

To call the function, first connect to the IMAP server and download the message structure for the message (the 1 in the imap_fetchstructure call below) and then run the function, e.g.:

$connection = imap_open($server, $login, $password);
$structure = imap_fetchstructure($connection, 1);
$flattenedParts = flattenParts($structure->parts);

Looping through the parts to extract the messages and attachments

The next post shows how to loop through the parts from the flattened array above and work out which ones are the plain text and html message parts and which ones are attachments. The following post will look at how to work out which are inline images rendered in the HTML content.



Related posts:


Comments