Home / Index
 www.icosaedro.it 

 Mailer Tutorial - Composing, sending and parsing e-mails

Last update: 2018-07-27

This article illustrates through examples how an e-mail can be parsed, built and sent using the PHPLint Mailer and other classes available under the it\icosaedro\email name space. The PHLint Mailer library does not require special extensions nor special external libraries as it is entirely implemented in bare PHP.

Contents

Design concepts

Classes overview

An overview of all the classes available under the it\icosaedro\email name space follows.

Composing and sending e-mails

Sending an e-mail from a PHP program is probably one of the most common tasks, so the Mailer class below will be typically the only one you will need most of the times.

Mailer - Allows to compose and send an e-mail, including a text part, an HTML part with in-line content, and attachments.

Parsing an e-mail

The internal structure of a MIME e-mail could be quite complex as it may involve several parts of different types and nested structures of parts. The simplest message could be a text only, non-MIME message consisting in a single MIMEPartMemory object containing the header and the text. More complex messages may contain several parts, including alternative text, in-line images and attachment files. The EmlParser recognizes all that and returns an abstract MIMEAbstractPart object from which your program may start examining the content of the message.

EmlParser - Allows to parse an e-mail from different sources, including a string in memory, an EML file, a stream of bytes, or a mailbox.
EmlFormatException - Exception thrown by EmlParser.
ReaderInterface - Interface to read an e-mail line by line: the readLine() method reads the next line that becomes the current line, while the getLine() method returns that line. Several specific implementations of this interface allows the EmlParser to read an e-mail from different sources, including file, string in memory, or mailbox.
ReaderFromStream - Implementation of the e-mail reader interface to read from an InputStream of bytes, allowing the EmlParser to access any type of source. Under the it\icosaedro\io there are several implementations of the InputStream interface to read from string, file, resource or other media.

Mailbox scanners

There are several types of mailboxes, each one with its format an access methods. The following classes allow to scan the content of such a mailbox, message by message, possibly to read the bare raw content of the message line by line, or to parse each message with the EmlParser class.

ReaderFromMboxInterface - Defines the interface that each mailbox scanner must provide. Basically, a mailbox scanner offers a next() method to skip to the next message, a readLine() method to read the next line of the current message, and getLine() to get that line. And that is it. A mailbox scanner also implements the single message reader interface ReaderInterface to read the current message, so such an object can be feed to the EmlParser class to parse the current message. The documentation of this class shows several practical samples of code.
ReaderFromMbox - Scanner for the "mboxrd" file format used by programs like Thunderbird, Nescape, Mozilla.
ReaderFromPOP3 - Scanner for POP3 mailbox.
ReaderFromTtBox - Scanner for the tt mailbox.
ReaderFromUnixMbox - Scanner for the Unix-like mailbox.

E-mail structure

An e-mail is built of, or parsed into an abstract part, which in turn is composed by several objects to represent the header and the body of the message:

MIMEAbstractPart - Abstract representation of an e-mail, including its header and body parts. Several specific implementations of this class follows, depending on the content type.
MIMEPartMemory - Represents a memory based content part.
MIMEPartFile - Represents a file based content part. This class avoids to fully read the file in memory as far as possible and supports streamed operations instead.
MIMEMultiPart - Represents a multi-part e-mail or sub-part. Each sub-part could be any type of MIME part, including multi-part.
Field - Represents a single header's field, with encoding and decoding routines for specific field parts.
Header - Represents the header, with specialized encoding and decoding routines for specific fields.

Networking support

ConnectionParameters - Parses and represents a connection string to a SMTP or POP3 server. The Mailer, SMTP and POP3 classes use this class.
SMTP - Allows to connect to an SMTP server to send an e-mail. The Mailer uses this class to send an e-mail through the SMTP protocol.
POP3 - Allows to connect to a POP3 server to download a mailbox. The POP3 mailbox scanner class uses this class to access a POP3 server.

Support classes

The following classes are for internal use only and are documented here only for the sake of completeness; client applications does not need to use them directly:

EOLFilter - Stream filter used by Mailer to convert line ending termination codes from LF to CRLF.
SmtpDataOutputStream - Used by Mailer to send an e-mail to the SMTP server in streamed mode.
ProcOutputStream - Used by Mailer to send an e-mail to the "sendmail" process in streamed mode.

Composing, sending and saving e-mails

The simplest way to compose and send an e-mail if by using the Mailer class alone, which provides all the methods to set sender, recipients, subject, readable part and attachments, and also allows to send the result through the SMTP protocol, through the built-in mail() function, through a "sendmail" process, or save the message on a string or on a file. For very specific needs, the structure of the message may also built out or parts in a totally custom way, but this requires bit more of knowledge of the MIME specifications to get the result right.

Sending a simple text message

require_one __DIR__ . "/stdlib/all.php";
use it\icosaedro\email\Mailer;

$m = new Mailer();
$m->setSubject("This is the subject");
$m->setFrom("my@mydomain.com", "My Name");
$m->addAddress("you@yourdomain.com", "Your Name");
$m->setTextMessage("This is the text body of the message.");
$m->sendByMail();
echo "E-mail sent with message ID: ", $m->getMessageID();

Note that:

Saving the message on file

The Mailer class provides the sendByStream($out, TRUE) method to write the current message on a generic output stream of bytes $out. This could be used to retrieve the message in a string:

use it\icosaedro\email\Mailer;
use it\icosaedro\io\StringOutputStream;
...
$out_string = new StringOutputStream();
$m->sendByStream($out_string, TRUE);
$message_as_string = $out_string->__toString();

and in a EML formatted file:

use it\icosaedro\email\Mailer;
use it\icosaedro\io\FileOutputStream;
use it\icosaedro\io\File;
...
$out_file = new FileOutputStream(File::fromLocaleEncoded("mail.eml"));
$m->sendByStream($out_file, TRUE);
$out_file->close();

In both the cases, the current Message-ID and current Date fields are preserved, so what you are retrieving is an exact copy of the sent message.

Sending the message through the SMTP server

The sendBySMTP($hosts, $keep_alive) method allows to send the message through an SMTP server. The trickiest part of this process is to build the correct connection string $hosts listing one or more hosts to try in turn. In its simplest form, this string contains just the name of the host and possibly the port number, typically 25 which is the default anyway:

// here: compose the message as shown above
$hosts = "mail.myisp.com:25"; // connection string
$rejected = $m->sendBySMTP($hosts, FALSE);
if( count($rejected) > 0 )
	echo "Rejected recipients: ", var_export($rejected, TRUE);

Several hosts could be listed separated by semicolon, and they will be tried out in turn looking for the first responding to our connection request, for example:

$hosts = "mail.mycompany.com; mail.myisp.com";

The connection string also supports parameters to enable the SSL protocol, user authentication and others. These parameters can be added to the host name just like URL parameters. For example, to enable the SSL/TLS protocol and user authentication the connection string could look something like this:

$hosts = "mail.myisp.com:587"
	."?timeout=10"
	."&security=ssl"
	."&user_authentication_method=PLAIN"
	."&user_name=MyUserName"
	."&user_password=MyPassword";

Other parameters allow to enable the SSL+STARTTLS protocol, allow to set the client certificate, and allow to set the server CA certificate to override the default OpenSSL configuration, for example to use a self-signed certificate. The following table summarizes all the parameters available to establish a connection with the Mailer class, the SMTP class, and the POP3 class:

Connection string parameters
Name Description
timeout=30 Timeout period in seconds used at several stages of the connection: establishing the TCP channel, establishing the SSL protocol, waiting for a reply from the server. An IOException is thrown if the timeout expires. Defaults to 30 seconds.
security=ssl Allows to enable the SSL/TLS protocol (ssl) or the SSL/TLS + SMTP STARTTLS protocol (tls). Note that the tls option does not mean the TLS protocol will be used, as the specific version of the SSL/TLS protocol is always negotiated between the client and the server based on their respective configurations; instead, the tls option means that the connection is first established in clear text, and then the STARTTLS command is sent to negotiate the SSL/TLS authentication and channel encryption; the tls option is supported only by SMTP.
client_name=mypc.mydomain.com Client host name for "hello" SMTP announcement; defaults to "localhost.localdomain". This parameter is supported only by the SMTP protocol. Normally ignored by the server, so the default fits most needs.
client_certificate_path=C:\mycert.crt File path of the client certificate (that is, the client public key); it may also contain the client secret key; empty for no client authentication (default).
client_key_path=C:\mykey.crt File path of the client secret key; empty if the key is already available in the certificate or for no client authentication at all.
client_key_passphrase=MySecretPassPhrase Client key pass-phrase; empty for plain text client secret key (default).
ca_certificate_path=C:\ca.crt File path of the specific CA certificate that signed the server certificate. If empty, or the certificate does not match the CA found in the server certificate, the default OpenSSL CA store is used -- see the openssl.cafile and openssl.capath directives of the php.ini for more. If the server uses a self-signed certificate, for example for testing, then you may set its CA certificate here rather than pollute the OpenSSL configuration.
user_authentication_method=PLAIN User's authentication method, one of "PLAIN", "LOGIN" (SMTP only), "CRAM-MD5"; empty for no user authentication (default). If the connection is encrypted with SSL, the LOGIN methods fits perfectly, otherwise consider using CRAM-MD5 if supported by the server.
user_name=MyName User's login name.
user_password=MyPass User's login password.

Adding attachments to the message

The addAttachementFromFile() method allows to add a file as an attachment to the e-mail:

$m->addAttachmentFromFile("C:\\images\\photo.jpeg", "image/jpeg");

The character set for text files, the custom name for the attachment, and the preferred encoding method can also be specified using other optional arguments. Some very common MIME types are listed below:

ContentFile name extensionMIME type
JavaScript.jsapplication/javascript
MS Word.doc, .docxapplication/msword
Adobe PDF.pdfapplication/pdf
MS RTF.rtfapplication/rtf
MS Excel.xlsapplication/vnd.ms-excel
ODT.odtapplication/vnd.oasis.opendocument.text
RAR.rarapplication/x-rar-compressed
TAR.tarapplication/x-tar
XML.xmlapplication/xml
ZIP.zipapplication/zip
MP3.mp3audio/mpeg
BMP.bmpimage/bmp
GIF.gifimage/gif
JPEG.jpg, .jpegimage/jpeg
PNG.pngimage/png
SVG.svg, .svgzimage/svg+xml
E-mail.emlmessage/rfc822(1)
E-mail UTF-8.u8msgmessage/global(2)
HTML.htm, .htmltext/html
Text.txt, .texttext/plain
PHP.phptext/x-php
Binary or unknown application/octet-stream
Notes:
1. Attached e-mail of type message/rfc822 must be encoded with the 8bit method. See the examples below about how correctly attach an e-mail to your message.
2. For e-mails containing UTF-8 encoded addresses and fields, RFC 6532 proposes the new MIME type message/global and the new .u8msg; I never seen nothing like that, but it worth to mention for the future. Thunderbird does not recognize this file name extension, but recognizes the message/global type and allows both 8bit and Base64 encodings.

Hint: the it\icosaedro\web\FileDownload::getTypeFromFilename() method could be used to guess the type from the bare file name extension of the file.

Attachment can be added even from data in memory by using the addAttachmentFromString() method or from a generic custom or parsed MIME part by using the addAttachmentFromPart() method. Note that you must always indicate the specific MIME type of the content.

Adding embedded images

An embedded image (or any other type of embedded content in a message) can be added using the addInlineFromFile() or addInlineFromString() methods. Each embedded content must have an univocal content identifier CID assigned, for example "image@1", "image@2", etc. so that the HTML text may refer to these images using URLs like "cid:image@1", "cid:image@2", etc. For example:

// The file path to the content to embed:
$path = "C:\\images\\latest photo.jpg";
// Type of the embedded content:
$type = "image/jpeg";
// ID we assign to the embedded content; must be univocal for this msg:
$cid = "image@1";
// The HTML formatted message; note the URL to the embedded content:
$html = "<html><body>Look at this beautiful photo: <img src='cid:$cid'></body></html>";
$m = new Mailer();
$m->setSubject("Panorama");
$m->setFrom("my@mydomain.com", "My Name");
$m->addAddress("you@yourdomain.com", "Your Name");
$m->setHtmlMessage($html);
$m->addInlineFromFile($path, $type, NULL, $cid);
$m->sendByMail();

RFC 2392 tells the CID has the same syntax of an e-mail address. Mailer enforces this requirement applying to the CID code the same validation routine it applies to any other e-mail address. Note that a CID may then contain characters that need to be URL-encoded in the HTML code, just like any other URL. With simple CIDs like in our examples, containing only letters, digits and the "@" sign, URL-encoding is not necessary.

Attaching another e-mail

An e-mail has MIME type message/rfc822; the character set and name must be NULL, while the encoding method MUST be 8bit:

$m->addAttachmentFromFile("C:\\mail.eml", "message/rfc822",
	NULL, // charset -- not applicable
	NULL, // name -- optional file name
	Header::ENCODING_8BIT);

According to RFC 2046 par. 5.2.1, the message/rfc822 is in some way a special content for which only 7bit or 8bit encodings are allowed; Base64, which is the Mailer default encoding for attachments, is not allowed. E-mail clients that strictly meet with this restriction (Thunderbird up to 52.8.0, closed as expired bug #293475) may fail to decode attached e-mails if the Base64 encoding is used, and show an empty window or a misleading error message instead. So, the only remaining general safe choice is the 8bit encoding as suggested here.

Or, you may use the newer message/global MIME type to explicitly advertise the receiver program that the message could contain internationalized UTF-8 encoded e-mail addresses. Thunderbird recognizes this type and allows both the 8bit and Base64 encodings:

$m->addAttachmentFromFile("C:\\mail.eml", "message/global",
	NULL, // charset -- not applicable
	NULL, // name -- optional file name
	Header::ENCODING_BASE64);

Composing and sending custom messages

The Mailer::addAttachmentFromPart($part) method allows to add any type of content or multi-part content structure to the message; if that part is the sole part of the message, that part will be set as the whole content of the message.

Parsing e-mail files and mailboxes

Parsing an EML file

Before going into the details of how e-mails are parsed, it is important to describe how e-mails are represented in the program. Basically an e-mail is a "part" represented by the MIMEAbstractPart class. A part contains an Header and a body.

The Header is a list of Field(s).

The content of the body depends on the actual class that implements this abstract class. The body can be binary data in memory (MIMEPartMemory) or a file (MIMEPartFile); the body can also be a list of parts (MIMEMultiPart), also named "multi-part part".

There are several types of multi-part parts:

The EmlParser class provides methods to parse an e-mail from a file, from a string in memory or from a generic stream of bytes InputStream. The result of the parsing is an object of type MIMEAbstractPart. For example:

use it\icosaedro\email\EmlParser;
use it\icosaedro\email\Field;
use it\icosaedro\email\Header;
use it\icosaedro\email\MIMEAbstractPart;
use it\icosaedro\email\MIMEPartMemory;
use it\icosaedro\email\MIMEMultiPart;

$email = EmlParser::parseFile("c:\\mail.eml");
echo "Subject: ", $email->header->getSubject(), "\n";
$from = $email->header->getFieldValue("From");
if( $from !== NULL ){
	$addresses = Field::parseAddresses($from);
	foreach($addresses as $a)
		echo "From: address=", $a[0], ", name=", $a[1], "\n";
}
$to = $email->header->getFieldValue("To");
if( $to !== NULL ){
	$addresses = Field::parseAddresses($to);
	foreach($addresses as $a)
		echo "To: address=", $a[0], ", name=", $a[1], "\n";
}
...

Any field of the e-mail can be retrieved applying the getFieldValue() to the header of the e-mail; this method returns the bare content of the field, which may need to be parsed by other specific methods.

So, the EmlParser class returns an abstract MIMEAbstractPart containing the structure and the data of the e-mail. The client program can retrieve from the header of that part the important fields of the message it is interested on, like From, To, Cc, Subject. To further analyze the body, the client must detect the specific type of this abstract part and continue recursively to any sub-part. If we find a simple, non multi-part part, we have some type of content; otherwise, if we find a multi-part part, then we must parse recursively each nested sub-part. The function below, for example, recursively searches for attachments:

/**
 * Displays all file names on a part and any sub-part, recursively.
 * @param MIMEAbstractPart $part
 * @return void
 */
function echoFileNames($part)
{
	if( $part instanceof MIMEMultiPart ){
		$multi = cast(MIMEMultiPart::class, $part);
		foreach($multi->parts as $sub_part)
			echoFileNames($sub_part); // recurse on each sub-part
	} else if( $part->header->isAttachment() ){
		$simple = cast(MIMEPartMemory::class, $part);
		$fn = $simple->header->getFilename();
		echo "File name: $fn\n";
		echo "Type: ", $simple->header->getType(), "\n";
		echo "Charset: ", $simple->header->getCharset(), "\n";
		echo "Content (hex): ", bin2hex($simple->content), "\n";
	}
}

...
echoFileNames($email);

Character set encoding issue. The parser assumes the header be formatted as specified in the RFC 5322, possibly with non-ASCII characters encoded as per the RFC 2047. If non-ASCII verbatim characters are found in the header, these are assumed UTF-8 encoded as per the RFC 6532, and this includes the e-mail addresses too; addresses that are not properly UTF-8 encoded or are syntactically invalid are rejected. All the strings returned from the parsed e-mail header are UTF-8 encoded.

Parsing the content of the e-mail

As we saw in the example above, the e-mail parser class returns an abstract MIMEAbstractPart object $email from which the header can be retrieved as Header object $header = $email->header.

From the header, the raw value of each field can be retrieved with the method $header->getFieldValue("FieldName"); this string could need to be decoded or parsed depending on the specific field. There is no a general rule, as fields are defined in several RFCs documents and each document defines the specific structure of the field. Lets take for example this chunk of e-mail header and examine each field one by one:

From: Umberto Salsi <salsi@icosaedro.it>
Subject: =?UTF-8?Q?EmlParse_cl=c3=a0ss_t=c3=a8sting_sample_1?=
To: =?UTF-8?Q?R=c3=a8cipient1?= <recipient1@icosaedro.it>,
    =?UTF-8?Q?R=c3=a8cipient2?= <recipient2@icosaedro.it>
Cc: recipient3@icosaedro.it
Message-ID: <c2759229-76f6-e9ff-e3ed-550d50920e79@icosaedro.it>
Date: Tue, 3 Jul 2018 13:41:24 +0200
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Thunderbird/52.8.0
MIME-Version: 1.0
Content-Type: multipart/mixed;
 boundary="------------D6BA0485CB6618A2C6DF65E0"

Some fields do not require any decoding or parsing at all as them are (or should be) bare ASCII strings. This is the case of the Message-ID and User-Agent.

Some fields require to be parsed from their ASCII string to retrieve a meaningful value. This is the case of the Date and Content-Type fields. The Header class provides methods to parse these specific fields:
getCharset() to retrieve the character set from the Content-Type field, where available;
getDate() to retrieve the date as a DateTimeTZ object;
isAttachment() to detect if this part is an attachment;
getFileName() to retrieve the file name of an attachment;
getType() to retrieve the MIME type of this part;
getCharset() to retrieve the character set of the content;
...and several others.

Some fields require specific parsing and decoding, like the Subject field and the To field. The $header->getSubject() method returns the decoded subject as a UTF-8 string. Fields like From, To and Cc all may contain a list of mailboxes, each one with its name and address; the Field::parseAddresses() could be used to parse and decode their content.

Evaluating a multi-part part

Once the header has been evaluated, the next step is to consider the body. There are two cases: the main part of the e-mail is a multi-part MIMEMultiPart part, or it is a simple content MIMEMemoryPart. If multi-part, we then must cast the e-mail object to its actual class, and then evaluate recursively each sub-part as we saw in the example above:

if( $email instanceof MIMEMultiPart ){
	$multi = cast(MIMEMultiPart::class, $email);
	foreach($multi->parts as $sub_part)
		echoFileNames($sub_part); // recurse on each sub-part
...

There are basically three types of multi-part:

So, to sum up, the readable part of the messages is any plain/text, or HTML text, or multipart/related part of the message; the remaining parts are either attachments or are other resources we could normally ignore in a typical application.

Evaluating a simple, non-structured part

If the part is a simple content, we must cast this part to is actual class and then evaluate its content according to its MIME type, character set and possibly accounting for its file name:

} else {
	$simple = cast(MIMEPartMemory::class, $part);
	if( $simple->header->isAttachment() )
		echo "Found attachment.\n";
	echo "Type: ", $simple->header->getType(), "\n";
	$fn = $simple->header->getFilename();
	if( $fn !== NULL )
		echo "File name: $fn\n";
	$charset = $simple->header->getCharset();
	if( $charset !== NULL )
		echo "Charset: $charset\n";
	echo "Content (hex): ", bin2hex($simple->content), "\n";
}

Note that the content of the body is automatically decoded from the transport encoding resulting in general into a binary string, possibly text in its original character set, but possibly also an image, a PDF file, or whatever. Evaluating the content of a generic content is a complex task that goes far beyong the object of this tutorial.

Scanning a mailbox message by message

Need to scan the content of a Thunderbird mailbox? or any other compatible mailbox in the "mboxrd" format? Then here we explain how this can be made.

The EmlParser class provides the parse($in, NULL) method to parse an e-mail from a generic source of type ReaderInterface. The specialized methods to parse the mail from a file or from a string just feed that method with a specific implementation of the ReaderInterface that parses a file or a string. To scan all the e-mails of a mbox file we could first split the mbox into single messages and then parse each message in turn. But there is a simpler alternative.

The ReaderFromMbox class takes an mbox (actually, any source of bytes representing the content of a mbox) and provides the same ReaderInterface required by the EmlParser class, but with the important addition of the next() method. The next() method allows to move or skip to the next message; the parser can read from there up to end of the message, then it get a virtual "end of the file" event where the message ends. By invoking the next() method again we can move or skip to the next message and continue scanning the mailbox. Here is the skeleton of our mailbox scanner:

use it\icosaedro\email\EmlParser;
use it\icosaedro\email\ReaderFromMbox;
use it\icosaedro\io\FileInputStream;
use it\icosaedro\io\File;

// The path to our mailbox:
$mbox_path = "path/to/my/thunderbird/mailbox";

// Create a InputStream out of this file:
$mbox_is = new FileInputStream( File::fromLocaleEncoded($mbox_path) );

// Create the mailbox scanner that reads from that InputStream:
$mbox = new ReaderFromMbox( $mbox_is );

// Loop over each e-mail feeding the e-mail parser:
while( $mbox->next() ){
	// Parse the next e-mail as usual with the EmlParser class:
	$email = EmlParser::parse($mbox, NULL);
	echo "Subject: ", $email->header->getSubject(), "\n";
	echo "Size: ", $email->getContentSize(), "\n";
	// ...and so on.
}
$mbox->close();
$mbox_is->close();

Faster scanning can be performed by invoking the EmlParser::parseHeader() rather than EmlParser::parse(); in many cases the header alone provides all the informations we need; we may also invoke the EmlParser::parseBody() method next, once the header has been evaluated.

References

E-mail format and transport specifications:

Umberto Salsi
Comments
Contact
Site map
Home / Index
Still no comments to this page. Use the Comments link above to add your contribute.