< draft-ietf-822ext-mime-imt   rfc2046.txt 
Network Working Group Nathaniel Borenstein Network Working Group N. Freed
Internet Draft Ned Freed Request for Comments: 2046 Innosoft
<draft-ietf-822ext-mime-imt-04.txt> Obsoletes: 1521, 1522, 1590 N. Borenstein
Category: Standards Track First Virtual
November 1996
Multipurpose Internet Mail Extensions Multipurpose Internet Mail Extensions
(MIME) Part Two: (MIME) Part Two:
Media Types
Media Types Status of this Memo
March 1996 This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.
Status of this Memo Abstract
This document is an Internet-Draft. Internet-Drafts are STD 11, RFC 822 defines a message representation protocol specifying
working documents of the Internet Engineering Task Force considerable detail about US-ASCII message headers, but which leaves
(IETF), its areas, and its working groups. Note that other the message content, or message body, as flat US-ASCII text. This
groups may also distribute working documents as Internet- set of documents, collectively called the Multipurpose Internet Mail
Drafts. Extensions, or MIME, redefines the format of messages to allow for
Internet-Drafts are draft documents valid for a maximum of six (1) textual message bodies in character sets other than
months. Internet-Drafts may be updated, replaced, or obsoleted US-ASCII,
by other documents at any time. It is not appropriate to use
Internet-Drafts as reference material or to cite them other
than as a "working draft" or "work in progress".
To learn the current status of any Internet-Draft, please (2) an extensible set of different formats for non-textual
check the 1id-abstracts.txt listing contained in the message bodies,
Internet-Drafts Shadow Directories on ds.internic.net (US East
Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast),
or munnari.oz.au (Pacific Rim).
1. Abstract (3) multi-part message bodies, and
STD 11, RFC 822 defines a message representation protocol (4) textual header information in character sets other than
specifying considerable detail about US-ASCII message headers, US-ASCII.
but which leaves the message content, or message body, as flat
US-ASCII text. This set of documents, collectively called the
Multipurpose Internet Mail Extensions, or MIME, redefines the
format of messages to allow for
(1) textual message bodies in character sets other than
US-ASCII,
(2) an extensible set of different formats for non-textual These documents are based on earlier work documented in RFC 934, STD
message bodies, 11, and RFC 1049, but extends and revises them. Because RFC 822 said
so little about message bodies, these documents are largely
orthogonal to (rather than a revision of) RFC 822.
(3) multi-part message bodies, and The initial document in this set, RFC 2045, specifies the various
headers used to describe the structure of MIME messages. This second
document defines the general structure of the MIME media typing
system and defines an initial set of media types. The third document,
RFC 2047, describes extensions to RFC 822 to allow non-US-ASCII text
data in Internet mail header fields. The fourth document, RFC 2048,
specifies various IANA registration procedures for MIME-related
facilities. The fifth and final document, RFC 2049, describes MIME
conformance criteria as well as providing some illustrative examples
of MIME message formats, acknowledgements, and the bibliography.
(4) textual header information in character sets other than These documents are revisions of RFCs 1521 and 1522, which themselves
US-ASCII. were revisions of RFCs 1341 and 1342. An appendix in RFC 2049
describes differences and changes from previous versions.
These documents are based on earlier work documented in RFC Table of Contents
934, STD 11, and RFC 1049, but extends and revises them.
Because RFC 822 said so little about message bodies, these
documents are largely orthogonal to (rather than a revision
of) RFC 822.
The initial document in this set, RFC MIME-IMB, specifies the 1. Introduction ......................................... 3
various headers used to describe the structure of MIME 2. Definition of a Top-Level Media Type ................. 4
messages. This second document defines the general structure 3. Overview Of The Initial Top-Level Media Types ........ 4
of the MIME media typing system and defines an initial set of 4. Discrete Media Type Values ........................... 6
media types. The third document, RFC MIME-HEADERS, describes 4.1 Text Media Type ..................................... 6
extensions to RFC 822 to allow non-US-ASCII text data in 4.1.1 Representation of Line Breaks ..................... 7
Internet mail header fields. The fourth document, RFC MIME- 4.1.2 Charset Parameter ................................. 7
REG, specifies various IANA registration procedures for MIME- 4.1.3 Plain Subtype ..................................... 11
related facilities. The fifth and final document, RFC MIME- 4.1.4 Unrecognized Subtypes ............................. 11
CONF, describes MIME conformance criteria as well as providing 4.2 Image Media Type .................................... 11
some illustrative examples of MIME message formats, 4.3 Audio Media Type .................................... 11
acknowledgements, and the bibliography. 4.4 Video Media Type .................................... 12
4.5 Application Media Type .............................. 12
4.5.1 Octet-Stream Subtype .............................. 13
4.5.2 PostScript Subtype ................................ 14
4.5.3 Other Application Subtypes ........................ 17
5. Composite Media Type Values .......................... 17
5.1 Multipart Media Type ................................ 17
5.1.1 Common Syntax ..................................... 19
5.1.2 Handling Nested Messages and Multiparts ........... 24
5.1.3 Mixed Subtype ..................................... 24
5.1.4 Alternative Subtype ............................... 24
5.1.5 Digest Subtype .................................... 26
5.1.6 Parallel Subtype .................................. 27
5.1.7 Other Multipart Subtypes .......................... 28
5.2 Message Media Type .................................. 28
5.2.1 RFC822 Subtype .................................... 28
5.2.2 Partial Subtype ................................... 29
5.2.2.1 Message Fragmentation and Reassembly ............ 30
5.2.2.2 Fragmentation and Reassembly Example ............ 31
5.2.3 External-Body Subtype ............................. 33
5.2.4 Other Message Subtypes ............................ 40
6. Experimental Media Type Values ....................... 40
7. Summary .............................................. 41
8. Security Considerations .............................. 41
9. Authors' Addresses ................................... 42
A. Collected Grammar .................................... 43
These documents are revisions of RFCs 1521 and 1522, which 1. Introduction
themselves were revisions of RFCs 1341 and 1342. An appendix
in RFC MIME-CONF describes differences and changes from
previous versions.
2. Table of Contents The first document in this set, RFC 2045, defines a number of header
fields, including Content-Type. The Content-Type field is used to
specify the nature of the data in the body of a MIME entity, by
giving media type and subtype identifiers, and by providing auxiliary
information that may be required for certain media types. After the
type and subtype names, the remainder of the header field is simply a
set of parameters, specified in an attribute/value notation. The
ordering of parameters is not significant.
1 Abstract .............................................. 1 In general, the top-level media type is used to declare the general
2 Table of Contents ..................................... 3 type of data, while the subtype specifies a specific format for that
3 Introduction .......................................... 4 type of data. Thus, a media type of "image/xyz" is enough to tell a
4 Definition of a Top-Level Media Type .................. 5 user agent that the data is an image, even if the user agent has no
5 Overview Of The Initial Top-Level Media Types ......... 5 knowledge of the specific image format "xyz". Such information can
6 Discrete Media Type Values ............................ 7 be used, for example, to decide whether or not to show a user the raw
6.1 Text Media Type ..................................... 7 data from an unrecognized subtype -- such an action might be
6.1.1 Representation of Line Breaks ..................... 8 reasonable for unrecognized subtypes of "text", but not for
6.1.2 Charset Parameter ................................. 8 unrecognized subtypes of "image" or "audio". For this reason,
6.1.3 Plain Subtype ..................................... 12 registered subtypes of "text", "image", "audio", and "video" should
6.1.4 Unrecognized Subtypes ............................. 12 not contain embedded information that is really of a different type.
6.2 Image Media Type .................................... 12 Such compound formats should be represented using the "multipart" or
6.3 Audio Media Type .................................... 13 "application" types.
6.4 Video Media Type .................................... 13
6.5 Application Media Type .............................. 14
6.5.1 Octet-Stream Subtype .............................. 15
6.5.2 PostScript Subtype ................................ 16
6.5.3 Other Application Subtypes ........................ 19
7 Composite Media Type Values ........................... 19
7.1 Multipart Media Type ................................ 20
7.1.1 Common Syntax ..................................... 21
7.1.2 Handling Nested Messages and Multiparts ........... 28
7.1.3 Mixed Subtype ..................................... 28
7.1.4 Alternative Subtype ............................... 28
7.1.5 Digest Subtype .................................... 31
7.1.6 Parallel Subtype .................................. 32
7.1.7 Other Multipart Subtypes .......................... 33
7.2 Message Media Type .................................. 33
7.2.1 RFC822 Subtype .................................... 34
7.2.2 Partial Subtype ................................... 34
7.2.2.1 Message Fragmentation and Reassembly ............ 36
7.2.2.2 Fragmentation and Reassembly Example ............ 37
7.2.3 External-Body Subtype ............................. 39
7.2.4 Other Message Subtypes ............................ 47
8 Experimental Media Type Values ........................ 47
9 Summary ............................................... 48
10 Security Considerations .............................. 48
11 Authors' Addresses ................................... 49
A Collected Grammar ..................................... 50
3. Introduction
The first document in this set, RFC MIME-IMB, defines a number Parameters are modifiers of the media subtype, and as such do not
of header fields, including Content-Type. The Content-Type fundamentally affect the nature of the content. The set of
field is used to specify the nature of the data in the body of meaningful parameters depends on the media type and subtype. Most
a MIME entity, by giving media type and subtype identifiers, parameters are associated with a single specific subtype. However, a
and by providing auxiliary information that may be required given top-level media type may define parameters which are applicable
for certain media types. After the type and subtype names, to any subtype of that type. Parameters may be required by their
the remainder of the header field is simply a set of defining media type or subtype or they may be optional. MIME
parameters, specified in an attribute/value notation. The implementations must also ignore any parameters whose names they do
ordering of parameters is not significant. not recognize.
In general, the top-level media type is used to declare the MIME's Content-Type header field and media type mechanism has been
general type of data, while the subtype specifies a specific carefully designed to be extensible, and it is expected that the set
format for that type of data. Thus, a media type of of media type/subtype pairs and their associated parameters will grow
"image/xyz" is enough to tell a user agent that the data is an significantly over time. Several other MIME facilities, such as
image, even if the user agent has no knowledge of the specific transfer encodings and "message/external-body" access types, are
image format "xyz". Such information can be used, for likely to have new values defined over time. In order to ensure that
example, to decide whether or not to show a user the raw data the set of such values is developed in an orderly, well-specified,
from an unrecognized subtype -- such an action might be and public manner, MIME sets up a registration process which uses the
reasonable for unrecognized subtypes of text, but not for Internet Assigned Numbers Authority (IANA) as a central registry for
unrecognized subtypes of image or audio. For this reason, MIME's various areas of extensibility. The registration process for
registered subtypes of text, image, audio, and video should these areas is described in a companion document, RFC 2048.
not contain embedded information that is really of a different
type. Such compound formats should be represented using the
"multipart" or "application" types.
Parameters are modifiers of the media subtype, and as such do The initial seven standard top-level media type are defined and
not fundamentally affect the nature of the content. The set described in the remainder of this document.
of meaningful parameters depends on the media type and
subtype. Most parameters are associated with a single
specific subtype. However, a given top-level media type may
define parameters which are applicable to any subtype of that
type. Parameters may be required by their defining media type
or subtype or they may be optional. MIME implementations must
also ignore any parameters whose names they do not recognize.
MIME's Content-Type header field and media type mechanism has 2. Definition of a Top-Level Media Type
been carefully designed to be extensible, and it is expected
that the set of media type/subtype pairs and their associated
parameters will grow significantly over time. Several other
MIME facilities, such as transfer encodings and
message/external-body access types, are likely to have new
values defined over time. In order to ensure that the set of
such values is developed in an orderly, well-specified, and
public manner, MIME sets up a registration process which uses
the Internet Assigned Numbers Authority (IANA) as a central
registry for MIME's various areas of extensibility. The
registration process for these areas is described in a
companion document, RFC MIME-REG.
The initial seven standard top-level media type are defined The definition of a top-level media type consists of:
and described in the remainder of this document.
4. Definition of a Top-Level Media Type (1) a name and a description of the type, including
criteria for whether a particular type would qualify
under that type,
The definition of a top-level media type consists of: (2) the names and definitions of parameters, if any, which
are defined for all subtypes of that type (including
whether such parameters are required or optional),
(1) a name and a description of the type, including (3) how a user agent and/or gateway should handle unknown
criteria for whether a particular type would qualify subtypes of this type,
under that type,
(2) the names and definitions of parameters, if any, which (4) general considerations on gatewaying entities of this
are defined for all subtypes of that type (including top-level type, if any, and
whether such parameters are required or optional),
(3) how a user agent and/or gateway should handle unknown (5) any restrictions on content-transfer-encodings for
subtypes of this type, entities of this top-level type.
(4) general considerations on gatewaying entities of this 3. Overview Of The Initial Top-Level Media Types
top-level type, if any, and
(5) any restrictions on content-transfer-encodings for The five discrete top-level media types are:
entities of this top-level type.
5. Overview Of The Initial Top-Level Media Types (1) text -- textual information. The subtype "plain" in
particular indicates plain text containing no
formatting commands or directives of any sort. Plain
text is intended to be displayed "as-is". No special
software is required to get the full meaning of the
text, aside from support for the indicated character
set. Other subtypes are to be used for enriched text in
forms where application software may enhance the
appearance of the text, but such software must not be
required in order to get the general idea of the
content. Possible subtypes of "text" thus include any
word processor format that can be read without
resorting to software that understands the format. In
particular, formats that employ embeddded binary
formatting information are not considered directly
readable. A very simple and portable subtype,
"richtext", was defined in RFC 1341, with a further
revision in RFC 1896 under the name "enriched".
The five discrete top-level media types are: (2) image -- image data. "Image" requires a display device
(such as a graphical display, a graphics printer, or a
FAX machine) to view the information. An initial
subtype is defined for the widely-used image format
JPEG. . subtypes are defined for two widely-used image
formats, jpeg and gif.
(1) text -- textual information. The subtype "plain" in (3) audio -- audio data. "Audio" requires an audio output
particular indicates plain text containing no device (such as a speaker or a telephone) to "display"
formatting commands or directives of any sort. Plain the contents. An initial subtype "basic" is defined in
text is intended to be displayed "as-is". No special this document.
software is required to get the full meaning of the
text, aside from support for the indicated character
set. Other subtypes are to be used for enriched text in
forms where application software may enhance the
appearance of the text, but such software must not be
required in order to get the general idea of the
content. Possible subtypes of text thus include any
word processor format that can be read without
resorting to software that understands the format. In
particular, formats that employ embeddded binary
formatting information are not considered directly
readable. A very simple and portable subtype,
"richtext", was defined in RFC 1341, with a further
revision in RFC 1563 under the name "enriched".
(2) image -- image data. Image requires a display device (4) video -- video data. "Video" requires the capability
(such as a graphical display, a graphics printer, or a to display moving images, typically including
FAX machine) to view the information. An initial specialized hardware and software. An initial subtype
subtype is defined for the widely-used image format "mpeg" is defined in this document.
JPEG.
(3) audio -- audio data. Audio requires an audio output (5) application -- some other kind of data, typically
device (such as a speaker or a telephone) to "display" either uninterpreted binary data or information to be
the contents. An initial subtype "basic" is defined in processed by an application. The subtype "octet-
this document. stream" is to be used in the case of uninterpreted
binary data, in which case the simplest recommended
action is to offer to write the information into a file
for the user. The "PostScript" subtype is also defined
for the transport of PostScript material. Other
expected uses for "application" include spreadsheets,
data for mail-based scheduling systems, and languages
for "active" (computational) messaging, and word
processing formats that are not directly readable.
Note that security considerations may exist for some
types of application data, most notably
"application/PostScript" and any form of active
messaging. These issues are discussed later in this
document.
(4) video -- video data. Video requires the capability to The two composite top-level media types are:
display moving images, typically including specialized
hardware and software. An initial subtype "mpeg" is
defined in this document.
(5) application -- some other kind of data, typically (1) multipart -- data consisting of multiple entities of
either uninterpreted binary data or information to be independent data types. Four subtypes are initially
processed by an application. The subtype "octet- defined, including the basic "mixed" subtype specifying
stream" is to be used in the case of uninterpreted a generic mixed set of parts, "alternative" for
binary data, in which case the simplest recommended representing the same data in multiple formats,
action is to offer to write the information into a file "parallel" for parts intended to be viewed
for the user. The "PostScript" subtype is also defined simultaneously, and "digest" for multipart entities in
for the transport of PostScript material. Other which each part has a default type of "message/rfc822".
expected uses for "application" include spreadsheets,
data for mail-based scheduling systems, and languages
for "active" (computational) messaging, and word
processing formats that are not directly readable.
Note that security considerations may exist for some
types of application data, most notably
application/PostScript and any form of active
messaging. These issues are discussed later in this
document.
The two composite top-level media types are: (2) message -- an encapsulated message. A body of media
type "message" is itself all or a portion of some kind
of message object. Such objects may or may not in turn
contain other entities. The "rfc822" subtype is used
when the encapsulated content is itself an RFC 822
message. The "partial" subtype is defined for partial
RFC 822 messages, to permit the fragmented transmission
of bodies that are thought to be too large to be passed
through transport facilities in one piece. Another
subtype, "external-body", is defined for specifying
large bodies by reference to an external data source.
(1) multipart -- data consisting of multiple entities of It should be noted that the list of media type values given here may
independent data types. Four subtypes are initially be augmented in time, via the mechanisms described above, and that
defined, including the basic "mixed" subtype specifying the set of subtypes is expected to grow substantially.
a generic mixed set of parts, "alternative" for
representing the same data in multiple formats,
"parallel" for parts intended to be viewed
simultaneously, and "digest" for multipart entities in
which each part has a default type of "message/rfc822".
(2) message -- an encapsulated message. A body of media 4. Discrete Media Type Values
type "message" is itself all or a portion of some kind
of message object. Such objects may or may not in turn
contain other entities. The "rfc822" subtype is used
when the encapsulated content is itself an RFC 822
message. The "partial" subtype is defined for partial
RFC 822 messages, to permit the fragmented transmission
of bodies that are thought to be too large to be passed
through transport facilities in one piece. Another
subtype, "external-body", is defined for specifying
large bodies by reference to an external data source.
It should be noted that the list of media type values given Five of the seven initial media type values refer to discrete bodies.
here may be augmented in time, via the mechanisms described The content of these types must be handled by non-MIME mechanisms;
above, and that the set of subtypes is expected to grow they are opaque to MIME processors.
substantially.
6. Discrete Media Type Values 4.1. Text Media Type
Five of the seven initial media type values refer to discrete The "text" media type is intended for sending material which is
bodies. The content of these types must be handled by non-MIME principally textual in form. A "charset" parameter may be used to
mechanisms; they are opaque to MIME processors. indicate the character set of the body text for "text" subtypes,
notably including the subtype "text/plain", which is a generic
subtype for plain text. Plain text does not provide for or allow
formatting commands, font attribute specifications, processing
instructions, interpretation directives, or content markup. Plain
text is seen simply as a linear sequence of characters, possibly
interrupted by line breaks or page breaks. Plain text may allow the
stacking of several characters in the same position in the text.
Plain text in scripts like Arabic and Hebrew may also include
facilitites that allow the arbitrary mixing of text segments with
opposite writing directions.
6.1. Text Media Type Beyond plain text, there are many formats for representing what might
be known as "rich text". An interesting characteristic of many such
representations is that they are to some extent readable even without
the software that interprets them. It is useful, then, to
distinguish them, at the highest level, from such unreadable data as
images, audio, or text represented in an unreadable form. In the
absence of appropriate interpretation software, it is reasonable to
show subtypes of "text" to the user, while it is not reasonable to do
so with most nontextual data. Such formatted textual data should be
represented using subtypes of "text".
The text media type is intended for sending material which is 4.1.1. Representation of Line Breaks
principally textual in form. A "charset" parameter may be
used to indicate the character set of the body text for text
subtypes, notably including the subtype "text/plain", which
indicates plain text that doesn't contain any formatting
commands or directives.
Beyond plain text, there are many formats for representing The canonical form of any MIME "text" subtype MUST always represent a
what might be known as "extended text" -- text with embedded line break as a CRLF sequence. Similarly, any occurrence of CRLF in
formatting and presentation information. An interesting MIME "text" MUST represent a line break. Use of CR and LF outside of
characteristic of many such representations is that they are line break sequences is also forbidden.
to some extent readable even without the software that
interprets them. It is useful, then, to distinguish them, at
the highest level, from such unreadable data as images, audio,
or text represented in an unreadable form. In the absence of
appropriate interpretation software, it is reasonable to show
subtypes of text to the user, while it is not reasonable to do
so with most nontextual data.
Such formatted textual data should be represented using This rule applies regardless of format or character set or sets
subtypes of text. Plausible subtypes of text are typically involved.
given by the common name of the representation format, e.g.,
"text/enriched" [RFC-1563].
6.1.1. Representation of Line Breaks NOTE: The proper interpretation of line breaks when a body is
displayed depends on the media type. In particular, while it is
appropriate to treat a line break as a transition to a new line when
displaying a "text/plain" body, this treatment is actually incorrect
for other subtypes of "text" like "text/enriched" [RFC-1896].
Similarly, whether or not line breaks should be added during display
operations is also a function of the media type. It should not be
necessary to add any line breaks to display "text/plain" correctly,
whereas proper display of "text/enriched" requires the appropriate
addition of line breaks.
The canonical form of any MIME text type MUST represent a line NOTE: Some protocols defines a maximum line length. E.g. SMTP [RFC-
break as a CRLF sequence. Similarly, any occurrence of CRLF 821] allows a maximum of 998 octets before the next CRLF sequence.
in text MUST represent a line break. Use of CR and LF outside To be transported by such protocols, data which includes too long
of line break sequences is also forbidden. segments without CRLF sequences must be encoded with a suitable
content-transfer-encoding.
This rule applies regardless of format or character set or 4.1.2. Charset Parameter
sets involved.
NOTE: The proper interpretation of line breaks when a body is A critical parameter that may be specified in the Content-Type field
displayed depends on the media type. In particular, while it for "text/plain" data is the character set. This is specified with a
is appropriate to treat a line break as a transition to a new "charset" parameter, as in:
line when displaying a text/plain body, this treatment is
actually incorrect for other subtypes of text like
text/enriched [RFC-1563]. Similarly, whether or not line
breaks should be added during display operations is also a
function of the media type. It should not be necessary to add
any line breaks to display text/plain correctly, whereas
proper display of text/enriched requires the appropriate
addition of line breaks.
6.1.2. Charset Parameter Content-type: text/plain; charset=iso-8859-1
A critical parameter that may be specified in the Content-Type Unlike some other parameter values, the values of the charset
field for text/plain data is the character set. This is parameter are NOT case sensitive. The default character set, which
specified with a "charset" parameter, as in: must be assumed in the absence of a charset parameter, is US-ASCII.
Content-type: text/plain; charset=iso-8859-1 The specification for any future subtypes of "text" must specify
whether or not they will also utilize a "charset" parameter, and may
possibly restrict its values as well. For other subtypes of "text"
than "text/plain", the semantics of the "charset" parameter should be
defined to be identical to those specified here for "text/plain",
i.e., the body consists entirely of characters in the given charset.
In particular, definers of future "text" subtypes should pay close
attention to the implications of multioctet character sets for their
subtype definitions.
Unlike some other parameter values, the values of the charset The charset parameter for subtypes of "text" gives a name of a
parameter are NOT case sensitive. The default character set, character set, as "character set" is defined in RFC 2045. The rules
which must be assumed in the absence of a charset parameter, regarding line breaks detailed in the previous section must also be
is US-ASCII. observed -- a character set whose definition does not conform to
these rules cannot be used in a MIME "text" subtype.
The specification for any future subtypes of "text" must An initial list of predefined character set names can be found at the
specify whether or not they will also utilize a "charset" end of this section. Additional character sets may be registered
parameter, and may possibly restrict its values as well. When with IANA.
used with a particular body, the semantics of the "charset"
parameter should be identical to those specified here for
"text/plain", i.e., the body consists entirely of characters
in the given charset. In particular, definers of future text
subtypes should pay close attention to the implications of
multioctet character sets for their subtype definitions.
This RFC specifies the definition of the charset parameter for Other media types than subtypes of "text" might choose to employ the
the purposes of MIME to be the name of a character set, as charset parameter as defined here, but with the CRLF/line break
"character set" as defined in MIME-IMB. The rules regarding restriction removed. Therefore, all character sets that conform to
line breaks detailed in the previous section must also be the general definition of "character set" in RFC 2045 can be
observed -- a character set whose definition does not conform registered for MIME use.
to these rules cannot be used in a MIME text type.
An initial list of predefined character set names can be found Note that if the specified character set includes 8-bit characters
at the end of this section. Additional character sets may be and such characters are used in the body, a Content-Transfer-Encoding
registered with IANA. header field and a corresponding encoding on the data are required in
order to transmit the body via some mail transfer protocols, such as
SMTP [RFC-821].
Note that if the specified character set includes 8bit data, a The default character set, US-ASCII, has been the subject of some
Content-Transfer-Encoding header field and a corresponding confusion and ambiguity in the past. Not only were there some
encoding on the data are required in order to transmit the ambiguities in the definition, there have been wide variations in
body via some mail transfer protocols, such as SMTP [RFC-821]. practice. In order to eliminate such ambiguity and variations in the
future, it is strongly recommended that new user agents explicitly
specify a character set as a media type parameter in the Content-Type
header field. "US-ASCII" does not indicate an arbitrary 7-bit
character set, but specifies that all octets in the body must be
interpreted as characters according to the US-ASCII character set.
National and application-oriented versions of ISO 646 [ISO-646] are
usually NOT identical to US-ASCII, and in that case their use in
Internet mail is explicitly discouraged. The omission of the ISO 646
character set from this document is deliberate in this regard. The
character set name of "US-ASCII" explicitly refers to the character
set defined in ANSI X3.4-1986 [US- ASCII]. The new international
reference version (IRV) of the 1991 edition of ISO 646 is identical
to US-ASCII. The character set name "ASCII" is reserved and must not
be used for any purpose.
The default character set, US-ASCII, has been the subject of NOTE: RFC 821 explicitly specifies "ASCII", and references an earlier
some confusion and ambiguity in the past. Not only were there version of the American Standard. Insofar as one of the purposes of
some ambiguities in the definition, there have been wide specifying a media type and character set is to permit the receiver
variations in practice. In order to eliminate such ambiguity to unambiguously determine how the sender intended the coded message
and variations in the future, it is strongly recommended that to be interpreted, assuming anything other than "strict ASCII" as the
new user agents explicitly specify a character set as a media default would risk unintentional and incompatible changes to the
type parameter in the Content-Type header field. "US-ASCII" semantics of messages now being transmitted. This also implies that
does not indicate an arbitrary -bit character code, but messages containing characters coded according to other versions of
specifies that the body uses character coding that uses the ISO 646 than US-ASCII and the 1991 IRV, or using code-switching
exact correspondence of octets to characters specified in US- procedures (e.g., those of ISO 2022), as well as 8bit or multiple
ASCII. National use variations of ISO 646 [ISO-646] are NOT octet character encodings MUST use an appropriate character set
US-ASCII and their use in Internet mail is explicitly specification to be consistent with MIME.
discouraged. The omission of the ISO 646 character set from
this document is deliberate in this regard. The character set
name of "US-ASCII" explicitly refers to ANSI X3.4-1986 [US-
ASCII] only. The character set name "ASCII" is reserved and
must not be used for any purpose.
NOTE: RFC 821 explicitly specifies "ASCII", and references an The complete US-ASCII character set is listed in ANSI X3.4- 1986.
earlier version of the American Standard. Insofar as one of Note that the control characters including DEL (0-31, 127) have no
the purposes of specifying a media type and character set is defined meaning in apart from the combination CRLF (US-ASCII values
to permit the receiver to unambiguously determine how the 13 and 10) indicating a new line. Two of the characters have de
sender intended the coded message to be interpreted, assuming facto meanings in wide use: FF (12) often means "start subsequent
anything other than "strict ASCII" as the default would risk text on the beginning of a new page"; and TAB or HT (9) often (though
unintentional and incompatible changes to the semantics of not always) means "move the cursor to the next available column after
messages now being transmitted. This also implies that the current position where the column number is a multiple of 8
messages containing characters coded according to national (counting the first column as column 0)." Aside from these
variations on ISO 646, or using code-switching procedures conventions, any use of the control characters or DEL in a body must
(e.g., those of ISO 2022), as well as 8bit or multiple octet either occur
character encodings MUST use an appropriate character set
specification to be consistent with this specification.
The complete US-ASCII character set is listed in ANSI X3.4- (1) because a subtype of text other than "plain"
1986. Note that the control characters including DEL (0-31, specifically assigns some additional meaning, or
127) have no defined meaning apart from the combination CRLF
(US-ASCII values 13 and 10) indicating a new line. Two of the
characters have de facto meanings in wide use: FF (12) often
means "start subsequent text on the beginning of a new page";
and TAB or HT (9) often (though not always) means "move the
cursor to the next available column after the current position
where the column number is a multiple of 8 (counting the first
column as column 0)." Aside from these conventions, any use
of the control characters or DEL in a body must occur within
the context of a private agreement between the sender and
recipient. Such private agreements are discouraged and should
be replaced by the other capabilities of this document.
NOTE: Beyond US-ASCII, an enormous proliferation of character (2) within the context of a private agreement between the
sets is possible. It is the opinion of the IETF working group sender and recipient. Such private agreements are
that a large number of character sets is NOT a good thing. We discouraged and should be replaced by the other
would prefer to specify a SINGLE character set that can be capabilities of this document.
used universally for representing all of the world's languages
in Internet mail. Unfortunately, existing practice in several
communities seems to point to the continued use of multiple
character sets in the near future. For this reason, we define
names for a small number of character sets for which a strong
constituent base exists.
The defined charset values are: NOTE: An enormous proliferation of character sets exist beyond US-
ASCII. A large number of partially or totally overlapping character
sets is NOT a good thing. A SINGLE character set that can be used
universally for representing all of the world's languages in Internet
mail would be preferrable. Unfortunately, existing practice in
several communities seems to point to the continued use of multiple
character sets in the near future. A small number of standard
character sets are, therefore, defined for Internet use in this
document.
(1) US-ASCII -- as defined in ANSI X3.4-1986 [US-ASCII]. The defined charset values are:
(2) ISO-8859-X -- where "X" is to be replaced, as (1) US-ASCII -- as defined in ANSI X3.4-1986 [US-ASCII].
necessary, for the parts of ISO-8859 [ISO-8859]. Note
that the ISO 646 character sets have deliberately been
omitted in favor of their 8859 replacements, which are
the designated character sets for Internet mail. As of
the publication of this document, the legitimate values
for "X" are the digits 1 through 9.
All of these character sets are used as pure 7bit or 8bit sets (2) ISO-8859-X -- where "X" is to be replaced, as
without any shift or escape functions. The meaning of shift necessary, for the parts of ISO-8859 [ISO-8859]. Note
and escape sequences in these character sets is not defined. that the ISO 646 character sets have deliberately been
omitted in favor of their 8859 replacements, which are
the designated character sets for Internet mail. As of
the publication of this document, the legitimate values
for "X" are the digits 1 through 10.
The character sets specified above are the ones that were Characters in the range 128-159 has no assigned meaning in ISO-8859-
relatively uncontroversial during the drafting of MIME. This X. Characters with values below 128 in ISO-8859-X have the same
document does not endorse the use of any particular character assigned meaning as they do in US-ASCII.
set other than US-ASCII, and recognizes that the future
evolution of world character sets remains unclear. It is
expected that in the future, additional character sets will be
registered for use in MIME.
Note that the character set used, if anything other than US- Part 6 of ISO 8859 (Latin/Arabic alphabet) and part 8 (Latin/Hebrew
ASCII, must always be explicitly specified in the Content-Type alphabet) includes both characters for which the normal writing
field. direction is right to left and characters for which it is left to
right, but do not define a canonical ordering method for representing
bi-directional text. The charset values "ISO-8859-6" and "ISO-8859-
8", however, specify that the visual method is used [RFC-1556].
No other character set name may be used in Internet mail All of these character sets are used as pure 7bit or 8bit sets
without the publication of a formal specification and its without any shift or escape functions. The meaning of shift and
registration with IANA, or by private agreement, in which case escape sequences in these character sets is not defined.
the character set name must begin with "X-".
Implementors are discouraged from defining new character sets The character sets specified above are the ones that were relatively
unless absolutely necessary. uncontroversial during the drafting of MIME. This document does not
endorse the use of any particular character set other than US-ASCII,
and recognizes that the future evolution of world character sets
remains unclear.
The "charset" parameter has been defined primarily for the Note that the character set used, if anything other than US- ASCII,
purpose of textual data, and is described in this section for must always be explicitly specified in the Content-Type field.
that reason. However, it is conceivable that non-textual data
might also wish to specify a charset value for some purpose,
in which case the same syntax and values should be used.
In general, composition software should always use the "lowest No character set name other than those defined above may be used in
common denominator" character set possible. For example, if a Internet mail without the publication of a formal specification and
body contains only US-ASCII characters, it SHOULD be marked as its registration with IANA, or by private agreement, in which case
being in the US-ASCII character set, not ISO-8859-1, which, the character set name must begin with "X-".
like all the ISO-8859 family of character sets, is a superset
of US-ASCII. More generally, if a widely-used character set
is a subset of another character set, and a body contains only
characters in the widely-used subset, it should be labelled as
being in that subset. This will increase the chances that the
recipient will be able to view the resulting entity correctly.
6.1.3. Plain Subtype Implementors are discouraged from defining new character sets unless
absolutely necessary.
The simplest and most important subtype of text is "plain". The "charset" parameter has been defined primarily for the purpose of
This indicates plain text that does not contain any formatting textual data, and is described in this section for that reason.
commands or directives. Plain text is intended to be displayed However, it is conceivable that non-textual data might also wish to
"as-is", that is, no formatting operations of any sort other specify a charset value for some purpose, in which case the same
than support for the indicated character set should be syntax and values should be used.
necessary for proper display. The default media type of
"text/plain; charset=us-ascii" for Internet mail describes
existing Internet practice. That is, it is the type of body
defined by RFC 822.
No other text subtype is defined by this document. In general, composition software should always use the "lowest common
denominator" character set possible. For example, if a body contains
only US-ASCII characters, it SHOULD be marked as being in the US-
ASCII character set, not ISO-8859-1, which, like all the ISO-8859
family of character sets, is a superset of US-ASCII. More generally,
if a widely-used character set is a subset of another character set,
and a body contains only characters in the widely-used subset, it
should be labelled as being in that subset. This will increase the
chances that the recipient will be able to view the resulting entity
correctly.
6.1.4. Unrecognized Subtypes 4.1.3. Plain Subtype
Unrecognized subtypes of text should be treated as subtype The simplest and most important subtype of "text" is "plain". This
"plain" as long as the MIME implementation knows how to handle indicates plain text that does not contain any formatting commands or
the charset. Unrecognized subtypes which also specify an directives. Plain text is intended to be displayed "as-is", that is,
unrecognized charset should be treated as "application/octet- no interpretation of embedded formatting commands, font attribute
stream". specifications, processing instructions, interpretation directives,
or content markup should be necessary for proper display. The
default media type of "text/plain; charset=us-ascii" for Internet
mail describes existing Internet practice. That is, it is the type
of body defined by RFC 822.
6.2. Image Media Type No other "text" subtype is defined by this document.
A media type of "image" indicates that the body contains an 4.1.4. Unrecognized Subtypes
image. The subtype names the specific image format. These
names are not case sensitive. An initial subtype is "jpeg" for
the JPEG format using JFIF encoding [JPEG].
The list of image subtypes given here is neither exclusive nor Unrecognized subtypes of "text" should be treated as subtype "plain"
exhaustive, and is expected to grow as more types are as long as the MIME implementation knows how to handle the charset.
registered with IANA, as described in RFC MIME-REG. Unrecognized subtypes which also specify an unrecognized charset
should be treated as "application/octet- stream".
Unrecognized subtypes of image should at a miniumum be treated 4.2. Image Media Type
as "application/octet-stream". Implementations may optionally
elect to pass subtypes of image that they do not specifically
recognize to a secure and robust general-purpose image viewing
application, if such an application is available.
NOTE: Using of a generic-purpose image viewing application A media type of "image" indicates that the body contains an image.
this way inherits the security problems of the most dangerous The subtype names the specific image format. These names are not
type supported by the application. case sensitive. An initial subtype is "jpeg" for the JPEG format
using JFIF encoding [JPEG].
6.3. Audio Media Type The list of "image" subtypes given here is neither exclusive nor
exhaustive, and is expected to grow as more types are registered with
IANA, as described in RFC 2048.
A media type of "audio" indicates that the body contains audio Unrecognized subtypes of "image" should at a miniumum be treated as
data. Although there is not yet a consensus on an "ideal" "application/octet-stream". Implementations may optionally elect to
audio format for use with computers, there is a pressing need pass subtypes of "image" that they do not specifically recognize to a
for a format capable of providing interoperable behavior. secure and robust general-purpose image viewing application, if such
an application is available.
The initial subtype of "basic" is specified to meet this NOTE: Using of a generic-purpose image viewing application this way
requirement by providing an absolutely minimal lowest common inherits the security problems of the most dangerous type supported
denominator audio format. It is expected that richer formats by the application.
for higher quality and/or lower bandwidth audio will be
defined by a later document.
The content of the "audio/basic" subtype is single channel 4.3. Audio Media Type
audio encoded using 8bit ISDN mu-law [PCM] at a sample rate of
8000 Hz.
Unrecognized subtypes of audio should at a miniumum be treated A media type of "audio" indicates that the body contains audio data.
as "application/octet-stream". Implementations may optionally Although there is not yet a consensus on an "ideal" audio format for
elect to pass subtypes of audio that they do not specifically use with computers, there is a pressing need for a format capable of
recognize to a robust general-purpose audio playing providing interoperable behavior.
application, if such an application is available.
6.4. Video Media Type The initial subtype of "basic" is specified to meet this requirement
by providing an absolutely minimal lowest common denominator audio
format. It is expected that richer formats for higher quality and/or
lower bandwidth audio will be defined by a later document.
A media type of "video" indicates that the body contains a The content of the "audio/basic" subtype is single channel audio
time-varying-picture image, possibly with color and encoded using 8bit ISDN mu-law [PCM] at a sample rate of 8000 Hz.
coordinated sound. The term "video" is used extremely
generically, rather than with reference to any particular
technology or format, and is not meant to preclude subtypes
such as animated drawings encoded compactly. The subtype
"mpeg" refers to video coded according to the MPEG standard
[MPEG].
Note that although in general this document strongly Unrecognized subtypes of "audio" should at a miniumum be treated as
discourages the mixing of multiple media in a single body, it "application/octet-stream". Implementations may optionally elect to
is recognized that many so-called "video" formats include a pass subtypes of "audio" that they do not specifically recognize to a
representation for synchronized audio, and this is explicitly robust general-purpose audio playing application, if such an
permitted for subtypes of "video". application is available.
Unrecognized subtypes of video should at a minumum be treated 4.4. Video Media Type
as "application/octet-stream". Implementations may optionally
elect to pass subtypes of video that they do not specifically
recognize to a robust general-purpose video display
application, if such an application is available.
6.5. Application Media Type A media type of "video" indicates that the body contains a time-
varying-picture image, possibly with color and coordinated sound.
The term 'video' is used in its most generic sense, rather than with
reference to any particular technology or format, and is not meant to
preclude subtypes such as animated drawings encoded compactly. The
subtype "mpeg" refers to video coded according to the MPEG standard
[MPEG].
The "application" media type is to be used for discrete data Note that although in general this document strongly discourages the
which do not fit in any of the other categories, and mixing of multiple media in a single body, it is recognized that many
particularly for data to be processed by some type of so-called video formats include a representation for synchronized
application program. This is information which must be audio, and this is explicitly permitted for subtypes of "video".
processed by an application before it is viewable or usable by
a user. Expected uses for the application media type include
file transfer, spreadsheets, data for mail-based scheduling
systems, and languages for "active" (computational) material.
(The latter, in particular, can pose security problems which
must be understood by implementors, and are considered in
detail in the discussion of the application/PostScript media
type.)
For example, a meeting scheduler might define a standard Unrecognized subtypes of "video" should at a minumum be treated as
representation for information about proposed meeting dates. "application/octet-stream". Implementations may optionally elect to
An intelligent user agent would use this information to pass subtypes of "video" that they do not specifically recognize to a
conduct a dialog with the user, and might then send additional robust general-purpose video display application, if such an
material based on that dialog. More generally, there have application is available.
been several "active" messaging languages developed in which
programs in a suitably specialized language are transported to
a remote location and automatically run in the recipient's
environment.
Such applications may be defined as subtypes of the 4.5. Application Media Type
"application" media type. This document defines two subtypes:
octet-stream, and PostScript.
The subtype of application will often be the name of the The "application" media type is to be used for discrete data which do
application for which the data are intended. This does not not fit in any of the other categories, and particularly for data to
mean, however, that any application program name may be used be processed by some type of application program. This is
freely as a subtype of application. information which must be processed by an application before it is
viewable or usable by a user. Expected uses for the "application"
media type include file transfer, spreadsheets, data for mail-based
scheduling systems, and languages for "active" (computational)
material. (The latter, in particular, can pose security problems
which must be understood by implementors, and are considered in
detail in the discussion of the "application/PostScript" media type.)
For example, a meeting scheduler might define a standard
representation for information about proposed meeting dates. An
intelligent user agent would use this information to conduct a dialog
with the user, and might then send additional material based on that
dialog. More generally, there have been several "active" messaging
languages developed in which programs in a suitably specialized
language are transported to a remote location and automatically run
in the recipient's environment.
6.5.1. Octet-Stream Subtype Such applications may be defined as subtypes of the "application"
media type. This document defines two subtypes:
The "octet-stream" subtype is used to indicate that a body octet-stream, and PostScript.
contains arbitrary binary data. The set of currently defined
parameters is:
(1) TYPE -- the general type or category of binary data. The subtype of "application" will often be either the name or include
This is intended as information for the human recipient part of the name of the application for which the data are intended.
rather than for any automatic processing. This does not mean, however, that any application program name may be
used freely as a subtype of "application".
(2) PADDING -- the number of bits of padding that were 4.5.1. Octet-Stream Subtype
appended to the bit-stream comprising the actual
contents to produce the enclosed 8bit byte-oriented
data. This is useful for enclosing a bit-stream in a
body when the total number of bits is not a multiple of
8.
Both of these parameters are optional. The "octet-stream" subtype is used to indicate that a body contains
arbitrary binary data. The set of currently defined parameters is:
An additional parameter, "CONVERSIONS", was defined in RFC (1) TYPE -- the general type or category of binary data.
1341 but has since been removed. RFC 1341 also defined the This is intended as information for the human recipient
use of a "NAME" parameter which gave a suggested file name to rather than for any automatic processing.
be used if the data were to be written to a file. This has
been deprecated in anticipation of a separate Content-
Disposition header field, to be defined in a subsequent RFC.
The recommended action for an implementation that receives an (2) PADDING -- the number of bits of padding that were
application/octet-stream entity is to simply offer to put the appended to the bit-stream comprising the actual
data in a file, with any Content-Transfer-Encoding undone, or contents to produce the enclosed 8bit byte-oriented
perhaps to use it as input to a user-specified process. data. This is useful for enclosing a bit-stream in a
body when the total number of bits is not a multiple of
8.
To reduce the danger of transmitting rogue programs, it is Both of these parameters are optional.
strongly recommended that implementations NOT implement a
path-search mechanism whereby an arbitrary program named in
the Content-Type parameter (e.g., an "interpreter=" parameter)
is found and executed using the message body as input.
6.5.2. PostScript Subtype An additional parameter, "CONVERSIONS", was defined in RFC 1341 but
has since been removed. RFC 1341 also defined the use of a "NAME"
parameter which gave a suggested file name to be used if the data
were to be written to a file. This has been deprecated in
anticipation of a separate Content-Disposition header field, to be
defined in a subsequent RFC.
A media type of "application/postscript" indicates a The recommended action for an implementation that receives an
PostScript program. Currently two variants of the PostScript "application/octet-stream" entity is to simply offer to put the data
language are allowed; the original level 1 variant is in a file, with any Content-Transfer-Encoding undone, or perhaps to
described in [POSTSCRIPT] and the more recent level 2 variant use it as input to a user-specified process.
is described in [POSTSCRIPT2].
PostScript is a registered trademark of Adobe Systems, Inc. To reduce the danger of transmitting rogue programs, it is strongly
Use of the MIME media type "application/postscript" implies recommended that implementations NOT implement a path-search
recognition of that trademark and all the rights it entails. mechanism whereby an arbitrary program named in the Content-Type
parameter (e.g., an "interpreter=" parameter) is found and executed
using the message body as input.
The PostScript language definition provides facilities for 4.5.2. PostScript Subtype
internal labelling of the specific language features a given
program uses. This labelling, called the PostScript document
structuring conventions, or DSC, is very general and provides
substantially more information than just the language level.
The use of document structuring conventions, while not
required, is strongly recommended as an aid to
interoperability. Documents which lack proper structuring
conventions cannot be tested to see whether or not they will
work in a given environment. As such, some systems may assume
the worst and refuse to process unstructured documents.
The execution of general-purpose PostScript interpreters A media type of "application/postscript" indicates a PostScript
entails serious security risks, and implementors are program. Currently two variants of the PostScript language are
discouraged from simply sending PostScript bodies to "off- allowed; the original level 1 variant is described in [POSTSCRIPT]
the-shelf" interpreters. While it is usually safe to send and the more recent level 2 variant is described in [POSTSCRIPT2].
PostScript to a printer, where the potential for harm is
greatly constrained by typical printer environments,
implementors should consider all of the following before they
add interactive display of PostScript bodies to their MIME
readers.
The remainder of this section outlines some, though probably PostScript is a registered trademark of Adobe Systems, Inc. Use of
not all, of the possible problems with the transport of the MIME media type "application/postscript" implies recognition of
PostScript entities. that trademark and all the rights it entails.
(1) Dangerous operations in the PostScript language The PostScript language definition provides facilities for internal
include, but may not be limited to, the PostScript labelling of the specific language features a given program uses.
operators "deletefile", "renamefile", "filenameforall", This labelling, called the PostScript document structuring
and "file". "File" is only dangerous when applied to conventions, or DSC, is very general and provides substantially more
something other than standard input or output. information than just the language level. The use of document
Implementations may also define additional nonstandard structuring conventions, while not required, is strongly recommended
file operators; these may also pose a threat to as an aid to interoperability. Documents which lack proper
security. "Filenameforall", the wildcard file search structuring conventions cannot be tested to see whether or not they
operator, may appear at first glance to be harmless. will work in a given environment. As such, some systems may assume
Note, however, that this operator has the potential to the worst and refuse to process unstructured documents.
reveal information about what files the recipient has
access to, and this information may itself be
sensitive. Message senders should avoid the use of
potentially dangerous file operators, since these
operators are quite likely to be unavailable in secure
PostScript implementations. Message receiving and
displaying software should either completely disable
all potentially dangerous file operators or take
special care not to delegate any special authority to
their operation. These operators should be viewed as
being done by an outside agency when interpreting
PostScript documents. Such disabling and/or checking
should be done completely outside of the reach of the
PostScript language itself; care should be taken to
insure that no method exists for re-enabling full-
function versions of these operators.
(2) The PostScript language provides facilities for exiting The execution of general-purpose PostScript interpreters entails
the normal interpreter, or server, loop. Changes made serious security risks, and implementors are discouraged from simply
in this "outer" environment are customarily retained sending PostScript bodies to "off- the-shelf" interpreters. While it
across documents, and may in some cases be retained is usually safe to send PostScript to a printer, where the potential
semipermanently in nonvolatile memory. The operators for harm is greatly constrained by typical printer environments,
associated with exiting the interpreter loop have the implementors should consider all of the following before they add
potential to interfere with subsequent document interactive display of PostScript bodies to their MIME readers.
processing. As such, their unrestrained use
constitutes a threat of service denial. PostScript
operators that exit the interpreter loop include, but
may not be limited to, the exitserver and startjob
operators. Message sending software should not
generate PostScript that depends on exiting the
interpreter loop to operate, since the ability to exit
will probably be unavailable in secure PostScript
implementations. Message receiving and displaying
software should completely disable the ability to make
retained changes to the PostScript environment by
eliminating or disabling the "startjob" and
"exitserver" operations. If these operations cannot be
eliminated or completely disabled the password
associated with them should at least be set to a hard-
to-guess value.
(3) PostScript provides operators for setting system-wide The remainder of this section outlines some, though probably not all,
and device-specific parameters. These parameter of the possible problems with the transport of PostScript entities.
settings may be retained across jobs and may
potentially pose a threat to the correct operation of
the interpreter. The PostScript operators that set
system and device parameters include, but may not be
limited to, the "setsystemparams" and "setdevparams"
operators. Message sending software should not
generate PostScript that depends on the setting of
system or device parameters to operate correctly. The
ability to set these parameters will probably be
unavailable in secure PostScript implementations.
Message receiving and displaying software should
disable the ability to change system and device
parameters. If these operators cannot be completely
disabled the password associated with them should at
least be set to a hard-to-guess value.
(4) Some PostScript implementations provide nonstandard (1) Dangerous operations in the PostScript language
facilities for the direct loading and execution of include, but may not be limited to, the PostScript
machine code. Such facilities are quite obviously open operators "deletefile", "renamefile", "filenameforall",
to substantial abuse. Message sending software should and "file". "File" is only dangerous when applied to
not make use of such features. Besides being totally something other than standard input or output.
hardware-specific, they are also likely to be Implementations may also define additional nonstandard
unavailable in secure implementations of PostScript. file operators; these may also pose a threat to
Message receiving and displaying software should not security. "Filenameforall", the wildcard file search
allow such operators to be used if they exist. operator, may appear at first glance to be harmless.
(5) PostScript is an extensible language, and many, if not Note, however, that this operator has the potential to
most, implementations of it provide a number of their reveal information about what files the recipient has
own extensions. This document does not deal with such access to, and this information may itself be
extensions explicitly since they constitute an unknown sensitive. Message senders should avoid the use of
factor. Message sending software should not make use potentially dangerous file operators, since these
of nonstandard extensions; they are likely to be operators are quite likely to be unavailable in secure
missing from some implementations. Message receiving PostScript implementations. Message receiving and
and displaying software should make sure that any displaying software should either completely disable
nonstandard PostScript operators are secure and don't all potentially dangerous file operators or take
present any kind of threat. special care not to delegate any special authority to
their operation. These operators should be viewed as
being done by an outside agency when interpreting
PostScript documents. Such disabling and/or checking
should be done completely outside of the reach of the
PostScript language itself; care should be taken to
insure that no method exists for re-enabling full-
function versions of these operators.
(6) It is possible to write PostScript that consumes huge (2) The PostScript language provides facilities for exiting
amounts of various system resources. It is also the normal interpreter, or server, loop. Changes made
possible to write PostScript programs that loop in this "outer" environment are customarily retained
indefinitely. Both types of programs have the across documents, and may in some cases be retained
potential to cause damage if sent to unsuspecting semipermanently in nonvolatile memory. The operators
recipients. Message-sending software should avoid the associated with exiting the interpreter loop have the
construction and dissemination of such programs, which potential to interfere with subsequent document
is antisocial. Message receiving and displaying processing. As such, their unrestrained use
software should provide appropriate mechanisms to abort constitutes a threat of service denial. PostScript
processing of a document after a reasonable amount of operators that exit the interpreter loop include, but
time has elapsed. In addition, PostScript interpreters may not be limited to, the exitserver and startjob
should be limited to the consumption of only a operators. Message sending software should not
reasonable amount of any given system resource. generate PostScript that depends on exiting the
interpreter loop to operate, since the ability to exit
will probably be unavailable in secure PostScript
implementations. Message receiving and displaying
software should completely disable the ability to make
retained changes to the PostScript environment by
eliminating or disabling the "startjob" and
"exitserver" operations. If these operations cannot be
eliminated or completely disabled the password
associated with them should at least be set to a hard-
to-guess value.
(7) It is possible to include raw binary information inside (3) PostScript provides operators for setting system-wide
PostScript in various forms. This is not recommended and device-specific parameters. These parameter
for use in Internet mail, both because it is not settings may be retained across jobs and may
supported by all PostScript interpreters and because it potentially pose a threat to the correct operation of
significantly complicates the use of a MIME Content- the interpreter. The PostScript operators that set
Transfer-Encoding. (Without such binary, PostScript system and device parameters include, but may not be
may typically be viewed as line-oriented data. The limited to, the "setsystemparams" and "setdevparams"
treatment of CRLF sequences becomes extremely operators. Message sending software should not
problematic if binary and line-oriented data are mixed generate PostScript that depends on the setting of
in a single Postscript data stream.) system or device parameters to operate correctly. The
ability to set these parameters will probably be
unavailable in secure PostScript implementations.
Message receiving and displaying software should
disable the ability to change system and device
parameters. If these operators cannot be completely
disabled the password associated with them should at
least be set to a hard-to-guess value.
(8) Finally, bugs may exist in some PostScript interpreters (4) Some PostScript implementations provide nonstandard
which could possibly be exploited to gain unauthorized facilities for the direct loading and execution of
access to a recipient's system. Apart from noting this machine code. Such facilities are quite obviously open
possibility, there is no specific action to take to to substantial abuse. Message sending software should
prevent this, apart from the timely correction of such not make use of such features. Besides being totally
bugs if any are found. hardware-specific, they are also likely to be
unavailable in secure implementations of PostScript.
Message receiving and displaying software should not
allow such operators to be used if they exist.
6.5.3. Other Application Subtypes (5) PostScript is an extensible language, and many, if not
most, implementations of it provide a number of their
own extensions. This document does not deal with such
extensions explicitly since they constitute an unknown
factor. Message sending software should not make use
of nonstandard extensions; they are likely to be
missing from some implementations. Message receiving
and displaying software should make sure that any
nonstandard PostScript operators are secure and don't
present any kind of threat.
It is expected that many other subtypes of application will be (6) It is possible to write PostScript that consumes huge
defined in the future. MIME implementations must at a minimum amounts of various system resources. It is also
treat any unrecognized subtypes as being equivalent to possible to write PostScript programs that loop
"application/octet-stream". indefinitely. Both types of programs have the
potential to cause damage if sent to unsuspecting
recipients. Message-sending software should avoid the
construction and dissemination of such programs, which
is antisocial. Message receiving and displaying
software should provide appropriate mechanisms to abort
processing after a reasonable amount of time has
elapsed. In addition, PostScript interpreters should be
limited to the consumption of only a reasonable amount
of any given system resource.
7. Composite Media Type Values (7) It is possible to include raw binary information inside
PostScript in various forms. This is not recommended
for use in Internet mail, both because it is not
supported by all PostScript interpreters and because it
significantly complicates the use of a MIME Content-
Transfer-Encoding. (Without such binary, PostScript
may typically be viewed as line-oriented data. The
treatment of CRLF sequences becomes extremely
problematic if binary and line-oriented data are mixed
in a single Postscript data stream.)
The remaining two of the seven initial Content-Type values (8) Finally, bugs may exist in some PostScript interpreters
refer to composite entities. Composite entities are handled which could possibly be exploited to gain unauthorized
using MIME mechanisms -- a MIME processor typically handles access to a recipient's system. Apart from noting this
the body directly. possibility, there is no specific action to take to
prevent this, apart from the timely correction of such
bugs if any are found.
7.1. Multipart Media Type 4.5.3. Other Application Subtypes
In the case of multipart entities, in which one or more It is expected that many other subtypes of "application" will be
different sets of data are combined in a single body, a defined in the future. MIME implementations must at a minimum treat
"multipart" media type field must appear in the entity's any unrecognized subtypes as being equivalent to "application/octet-
header. The body must then contain one or more body parts, stream".
each preceded by a boundary delimiter line, and the last one
followed by a closing boundary delimiter line. After its
boundary delimiter line, each body part then consists of a
header area, a blank line, and a body area. Thus a body part
is similar to an RFC 822 message in syntax, but different in
meaning.
A body part is an entity and hence is NOT to be interpreted as 5. Composite Media Type Values
actually being an RFC 822 message. To begin with, NO header
fields are actually required in body parts. A body part that
starts with a blank line, therefore, is allowed and is a body
part for which all default values are to be assumed. In such
a case, the absence of a Content-Type header usually indicates
that the corresponding body has a content-type of "text/plain;
charset=US-ASCII".
The only header fields that have defined meaning for body The remaining two of the seven initial Content-Type values refer to
parts are those the names of which begin with "Content-". All composite entities. Composite entities are handled using MIME
other header fields may be ignored in body parts. Although mechanisms -- a MIME processor typically handles the body directly.
they should generally be retained if at all possible, they may
be discarded by gateways if necessary. Such other fields are
permitted to appear in body parts but must not be depended on.
"X-" fields may be created for experimental or private
purposes, with the recognition that the information they
contain may be lost at some gateways.
NOTE: The distinction between an RFC 822 message and a body 5.1. Multipart Media Type
part is subtle, but important. A gateway between Internet and
X.400 mail, for example, must be able to tell the difference
between a body part that contains an image and a body part
that contains an encapsulated message, the body of which is a
JPEG image. In order to represent the latter, the body part
must have "Content-Type: message/rfc822", and its body (after
the blank line) must be the encapsulated message, with its own
"Content-Type: image/jpeg" header field. The use of similar
syntax facilitates the conversion of messages to body parts,
and vice versa, but the distinction between the two must be
understood by implementors. (For the special case in which
parts actually are messages, a "digest" subtype is also
defined.)
As stated previously, each body part is preceded by a boundary In the case of multipart entities, in which one or more different
delimiter line that contains the boundary delimiter. The sets of data are combined in a single body, a "multipart" media type
boundary delimiter MUST NOT appear inside any of the field must appear in the entity's header. The body must then contain
encapsulated parts, on a line by itself or as the prefix of one or more body parts, each preceded by a boundary delimiter line,
any line. This implies that it is crucial that the composing and the last one followed by a closing boundary delimiter line.
agent be able to choose and specify a unique boundary After its boundary delimiter line, each body part then consists of a
parameter value that does not contain the boundary parameter header area, a blank line, and a body area. Thus a body part is
value of an enclosing multipart as a prefix. similar to an RFC 822 message in syntax, but different in meaning.
All present and future subtypes of the "multipart" type must A body part is an entity and hence is NOT to be interpreted as
use an identical syntax. Subtypes may differ in their actually being an RFC 822 message. To begin with, NO header fields
semantics, and may impose additional restrictions on syntax, are actually required in body parts. A body part that starts with a
but must conform to the required syntax for the multipart blank line, therefore, is allowed and is a body part for which all
type. This requirement ensures that all conformant user default values are to be assumed. In such a case, the absence of a
agents will at least be able to recognize and separate the Content-Type header usually indicates that the corresponding body has
parts of any multipart entity, even those of an unrecognized a content-type of "text/plain; charset=US-ASCII".
subtype.
As stated in the definition of the Content-Transfer-Encoding The only header fields that have defined meaning for body parts are
field [MIME-IMB], no encoding other than "7bit", "8bit", or those the names of which begin with "Content-". All other header
"binary" is permitted for entities of type "multipart". The fields may be ignored in body parts. Although they should generally
multipart boundary delimiters and header fields are always be retained if at all possible, they may be discarded by gateways if
represented as 7bit US-ASCII in any case (though the header necessary. Such other fields are permitted to appear in body parts
fields may encode non-US-ASCII header text as per RFC MIME- but must not be depended on. "X-" fields may be created for
HEADERS) and data within the body parts can be encoded on a experimental or private purposes, with the recognition that the
part-by-part basis, with Content-Transfer-Encoding fields for information they contain may be lost at some gateways.
each appropriate body part.
7.1.1. Common Syntax NOTE: The distinction between an RFC 822 message and a body part is
subtle, but important. A gateway between Internet and X.400 mail,
for example, must be able to tell the difference between a body part
that contains an image and a body part that contains an encapsulated
message, the body of which is a JPEG image. In order to represent
the latter, the body part must have "Content-Type: message/rfc822",
and its body (after the blank line) must be the encapsulated message,
with its own "Content-Type: image/jpeg" header field. The use of
similar syntax facilitates the conversion of messages to body parts,
and vice versa, but the distinction between the two must be
understood by implementors. (For the special case in which parts
actually are messages, a "digest" subtype is also defined.)
This section defines a common syntax for subtypes of As stated previously, each body part is preceded by a boundary
multipart. All subtypes of multipart must use this syntax. A delimiter line that contains the boundary delimiter. The boundary
simple example of a multipart message also appears in this delimiter MUST NOT appear inside any of the encapsulated parts, on a
section. An example of a more complex multipart message is line by itself or as the prefix of any line. This implies that it is
given in RFC MIME-CONF. crucial that the composing agent be able to choose and specify a
unique boundary parameter value that does not contain the boundary
parameter value of an enclosing multipart as a prefix.
The Content-Type field for multipart entities requires one All present and future subtypes of the "multipart" type must use an
parameter, "boundary". The boundary delimiter line is then identical syntax. Subtypes may differ in their semantics, and may
defined as a line consisting entirely of two hyphen characters impose additional restrictions on syntax, but must conform to the
("-", decimal value 45) followed by the boundary parameter required syntax for the "multipart" type. This requirement ensures
value from the Content-Type header field, optional linear that all conformant user agents will at least be able to recognize
whitespace, and a terminating CRLF. and separate the parts of any multipart entity, even those of an
unrecognized subtype.
NOTE: The hyphens are for rough compatibility with the As stated in the definition of the Content-Transfer-Encoding field
earlier RFC 934 method of message encapsulation, and for ease [RFC 2045], no encoding other than "7bit", "8bit", or "binary" is
of searching for the boundaries in some implementations. permitted for entities of type "multipart". The "multipart" boundary
However, it should be noted that multipart messages are NOT delimiters and header fields are always represented as 7bit US-ASCII
completely compatible with RFC 934 encapsulations; in in any case (though the header fields may encode non-US-ASCII header
particular, they do not obey RFC 934 quoting conventions for text as per RFC 2047) and data within the body parts can be encoded
embedded lines that begin with hyphens. This mechanism was on a part-by-part basis, with Content-Transfer-Encoding fields for
chosen over the RFC 934 mechanism because the latter causes each appropriate body part.
lines to grow with each level of quoting. The combination of
this growth with the fact that SMTP implementations sometimes
wrap long lines made the RFC 934 mechanism unsuitable for use
in the event that deeply-nested multipart structuring is ever
desired.
WARNING TO IMPLEMENTORS: The grammar for parameters on the 5.1.1. Common Syntax
Content-type field is such that it is often necessary to
enclose the boundary parameter values in quotes on the
Content-type line. This is not always necessary, but never
hurts. Implementors should be sure to study the grammar
carefully in order to avoid producing invalid Content-type
fields. Thus, a typical multipart Content-Type header field
might look like this:
Content-Type: multipart/mixed; boundary=gc0p4Jq0M2Yt08j34c0p This section defines a common syntax for subtypes of "multipart".
All subtypes of "multipart" must use this syntax. A simple example
of a multipart message also appears in this section. An example of a
more complex multipart message is given in RFC 2049.
But the following is not valid: The Content-Type field for multipart entities requires one parameter,
"boundary". The boundary delimiter line is then defined as a line
consisting entirely of two hyphen characters ("-", decimal value 45)
followed by the boundary parameter value from the Content-Type header
field, optional linear whitespace, and a terminating CRLF.
Content-Type: multipart/mixed; boundary=gc0pJq0M:08jU534c0p NOTE: The hyphens are for rough compatibility with the earlier RFC
934 method of message encapsulation, and for ease of searching for
the boundaries in some implementations. However, it should be noted
that multipart messages are NOT completely compatible with RFC 934
encapsulations; in particular, they do not obey RFC 934 quoting
conventions for embedded lines that begin with hyphens. This
mechanism was chosen over the RFC 934 mechanism because the latter
causes lines to grow with each level of quoting. The combination of
this growth with the fact that SMTP implementations sometimes wrap
long lines made the RFC 934 mechanism unsuitable for use in the event
that deeply-nested multipart structuring is ever desired.
(because of the colon) and must instead be represented as WARNING TO IMPLEMENTORS: The grammar for parameters on the Content-
type field is such that it is often necessary to enclose the boundary
parameter values in quotes on the Content-type line. This is not
always necessary, but never hurts. Implementors should be sure to
study the grammar carefully in order to avoid producing invalid
Content-type fields. Thus, a typical "multipart" Content-Type header
field might look like this:
Content-Type: multipart/mixed; boundary="gc0pJq0M:08jU534c0p" Content-Type: multipart/mixed; boundary=gc0p4Jq0M2Yt08j34c0p
This Content-Type value indicates that the content consists of But the following is not valid:
one or more parts, each with a structure that is syntactically
identical to an RFC 822 message, except that the header area
is allowed to be completely empty, and that the parts are each
preceded by the line
--gc0pJq0M:08jU534c0p Content-Type: multipart/mixed; boundary=gc0pJq0M:08jU534c0p
The boundary delimiter MUST occur at the beginning of a line, (because of the colon) and must instead be represented as
i.e., following a CRLF, and the initial CRLF is considered to
be attached to the boundary delimiter line rather than part of
the preceding part. The boundary may be followed by zero or
more characters of linear whitespace. It is then terminated by
either another CRLF and the header fields for the next part,
or by two CRLFs, in which case there are no header fields for
the next part. If no Content-Type field is present it is
assumed to be of message/rfc822 in a multipart/digest and
text/plain otherwise.
NOTE: The CRLF preceding the boundary delimiter line is Content-Type: multipart/mixed; boundary="gc0pJq0M:08jU534c0p"
conceptually attached to the boundary so that it is possible
to have a part that does not end with a CRLF (line break).
Body parts that must be considered to end with line breaks,
therefore, must have two CRLFs preceding the boundary
delimiter line, the first of which is part of the preceding
body part, and the second of which is part of the
encapsulation boundary.
Boundary delimiters must not appear within the encapsulated This Content-Type value indicates that the content consists of one or
material, and must be no longer than 70 characters, not more parts, each with a structure that is syntactically identical to
counting the two leading hyphens. an RFC 822 message, except that the header area is allowed to be
completely empty, and that the parts are each preceded by the line
--gc0pJq0M:08jU534c0p
The boundary delimiter line following the last body part is a The boundary delimiter MUST occur at the beginning of a line, i.e.,
distinguished delimiter that indicates that no further body following a CRLF, and the initial CRLF is considered to be attached
parts will follow. Such a delimiter line is identical to the to the boundary delimiter line rather than part of the preceding
previous delimiter lines, with the addition of two more part. The boundary may be followed by zero or more characters of
hyphens after the boundary parameter value. linear whitespace. It is then terminated by either another CRLF and
the header fields for the next part, or by two CRLFs, in which case
there are no header fields for the next part. If no Content-Type
field is present it is assumed to be "message/rfc822" in a
"multipart/digest" and "text/plain" otherwise.
--gc0pJq0M:08jU534c0p-- NOTE: The CRLF preceding the boundary delimiter line is conceptually
attached to the boundary so that it is possible to have a part that
does not end with a CRLF (line break). Body parts that must be
considered to end with line breaks, therefore, must have two CRLFs
preceding the boundary delimiter line, the first of which is part of
the preceding body part, and the second of which is part of the
encapsulation boundary.
NOTE TO IMPLEMENTORS: Boundary string comparisons must Boundary delimiters must not appear within the encapsulated material,
compare the boundary value with the beginning of each and must be no longer than 70 characters, not counting the two
candidate line. An exact match of the entire candidate line leading hyphens.
is not required; it is sufficient that the boundary appear in
its entirety following the CRLF.
There appears to be room for additional information prior to The boundary delimiter line following the last body part is a
the first boundary delimiter line and following the final distinguished delimiter that indicates that no further body parts
boundary delimiter line. These areas should generally be left will follow. Such a delimiter line is identical to the previous
blank, and implementations must ignore anything that appears delimiter lines, with the addition of two more hyphens after the
before the first boundary delimiter line or after the last boundary parameter value.
one.
NOTE: These "preamble" and "epilogue" areas are generally not --gc0pJq0M:08jU534c0p--
used because of the lack of proper typing of these parts and
the lack of clear semantics for handling these areas at
gateways, particularly X.400 gateways. However, rather than
leaving the preamble area blank, many MIME implementations
have found this to be a convenient place to insert an
explanatory note for recipients who read the message with
pre-MIME software, since such notes will be ignored by MIME-
compliant software.
NOTE: Because boundary delimiters must not appear in the body NOTE TO IMPLEMENTORS: Boundary string comparisons must compare the
parts being encapsulated, a user agent must exercise care to boundary value with the beginning of each candidate line. An exact
choose a unique boundary parameter value. The boundary match of the entire candidate line is not required; it is sufficient
parameter value in the example above could have been the that the boundary appear in its entirety following the CRLF.
result of an algorithm designed to produce boundary delimiters
with a very low probability of already existing in the data to
be encapsulated without having to prescan the data. Alternate
algorithms might result in more "readable" boundary delimiters
for a recipient with an old user agent, but would require more
attention to the possibility that the boundary delimiter might
appear at the beginning of some line in the encapsulated part.
The simplest boundary delimiter line possible is something
like "---", with a closing boundary delimiter line of "-----".
As a very simple example, the following multipart message has There appears to be room for additional information prior to the
two parts, both of them plain text, one of them explicitly first boundary delimiter line and following the final boundary
typed and one of them implicitly typed: delimiter line. These areas should generally be left blank, and
implementations must ignore anything that appears before the first
boundary delimiter line or after the last one.
From: Nathaniel Borenstein <[email protected]> NOTE: These "preamble" and "epilogue" areas are generally not used
To: Ned Freed <[email protected]> because of the lack of proper typing of these parts and the lack of
Date: Sun, 21 Mar 1993 23:56:48 -0800 (PST) clear semantics for handling these areas at gateways, particularly
Subject: Sample message X.400 gateways. However, rather than leaving the preamble area
MIME-Version: 1.0 blank, many MIME implementations have found this to be a convenient
Content-type: multipart/mixed; boundary="simple boundary" place to insert an explanatory note for recipients who read the
message with pre-MIME software, since such notes will be ignored by
MIME-compliant software.
This is the preamble. It is to be ignored, though it NOTE: Because boundary delimiters must not appear in the body parts
is a handy place for composition agents to include an being encapsulated, a user agent must exercise care to choose a
explanatory note to non-MIME conformant readers. unique boundary parameter value. The boundary parameter value in the
example above could have been the result of an algorithm designed to
produce boundary delimiters with a very low probability of already
existing in the data to be encapsulated without having to prescan the
data. Alternate algorithms might result in more "readable" boundary
delimiters for a recipient with an old user agent, but would require
more attention to the possibility that the boundary delimiter might
appear at the beginning of some line in the encapsulated part. The
simplest boundary delimiter line possible is something like "---",
with a closing boundary delimiter line of "-----".
--simple boundary As a very simple example, the following multipart message has two
parts, both of them plain text, one of them explicitly typed and one
of them implicitly typed:
This is implicitly typed plain US-ASCII text. From: Nathaniel Borenstein <[email protected]>
It does NOT end with a linebreak. To: Ned Freed <[email protected]>
--simple boundary Date: Sun, 21 Mar 1993 23:56:48 -0800 (PST)
Content-type: text/plain; charset=us-ascii Subject: Sample message
MIME-Version: 1.0
Content-type: multipart/mixed; boundary="simple boundary"
This is explicitly typed plain US-ASCII text. This is the preamble. It is to be ignored, though it
It DOES end with a linebreak. is a handy place for composition agents to include an
explanatory note to non-MIME conformant readers.
--simple boundary-- --simple boundary
This is the epilogue. It is also to be ignored. This is implicitly typed plain US-ASCII text.
It does NOT end with a linebreak.
--simple boundary
Content-type: text/plain; charset=us-ascii
The use of a media type of multipart in a body part within This is explicitly typed plain US-ASCII text.
another multipart entity is explicitly allowed. In such It DOES end with a linebreak.
cases, for obvious reasons, care must be taken to ensure that
each nested multipart entity uses a different boundary
delimiter. See RFC MIME-CONF for an example of nested
multipart entities.
The use of the multipart media type with only a single body --simple boundary--
part may be useful in certain contexts, and is explicitly
permitted.
NOTE: Experience has shown that a multipart media type with a This is the epilogue. It is also to be ignored.
single body part is useful for sending non-text media types.
It has the advantage of providing the preamble as a place to
include decoding instructions. In addition, a number of SMTP
gateways move or remove the MIME headers, and a clever MIME
decoder can take a good guess at multipart boundaries even in
the absence of the Content-Type header and thereby successful
decode the message.
The only mandatory global parameter for the multipart media The use of a media type of "multipart" in a body part within another
type is the boundary parameter, which consists of 1 to 70 "multipart" entity is explicitly allowed. In such cases, for obvious
characters from a set of characters known to be very robust reasons, care must be taken to ensure that each nested "multipart"
through mail gateways, and NOT ending with white space. (If a entity uses a different boundary delimiter. See RFC 2049 for an
boundary delimiter line appears to end with white space, the example of nested "multipart" entities.
white space must be presumed to have been added by a gateway,
and must be deleted.) It is formally specified by the
following BNF:
boundary := 0*69<bchars> bcharsnospace The use of the "multipart" media type with only a single body part
may be useful in certain contexts, and is explicitly permitted.
bchars := bcharsnospace / " " NOTE: Experience has shown that a "multipart" media type with a
single body part is useful for sending non-text media types. It has
the advantage of providing the preamble as a place to include
decoding instructions. In addition, a number of SMTP gateways move
or remove the MIME headers, and a clever MIME decoder can take a good
guess at multipart boundaries even in the absence of the Content-Type
header and thereby successfully decode the message.
bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" / The only mandatory global parameter for the "multipart" media type is
"+" / "_" / "," / "-" / "." / the boundary parameter, which consists of 1 to 70 characters from a
"/" / ":" / "=" / "?" set of characters known to be very robust through mail gateways, and
NOT ending with white space. (If a boundary delimiter line appears to
end with white space, the white space must be presumed to have been
added by a gateway, and must be deleted.) It is formally specified
by the following BNF:
Overall, the body of a multipart entity may be specified as boundary := 0*69<bchars> bcharsnospace
follows:
dash-boundary := "--" boundary bchars := bcharsnospace / " "
; boundary taken from the value of
; boundary parameter of the
; Content-Type field.
multipart-body := [preamble CRLF] bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" /
dash-boundary transport-padding CRLF "+" / "_" / "," / "-" / "." /
body-part *encapsulation "/" / ":" / "=" / "?"
close-delimiter transport-padding
[CRLF epilogue]
transport-padding := *LWSP-char Overall, the body of a "multipart" entity may be specified as
; Composers MUST NOT generate follows:
; non-zero length transport
; padding, but receivers MUST
; be able to handle padding
; added by message transports.
encapsulation := delimiter transport-padding dash-boundary := "--" boundary
CRLF body-part ; boundary taken from the value of
; boundary parameter of the
; Content-Type field.
delimiter := CRLF dash-boundary multipart-body := [preamble CRLF]
dash-boundary transport-padding CRLF
body-part *encapsulation
close-delimiter transport-padding
[CRLF epilogue]
close-delimiter := delimiter "--" transport-padding := *LWSP-char
preamble := discard-text ; Composers MUST NOT generate
; non-zero length transport
; padding, but receivers MUST
; be able to handle padding
; added by message transports.
epilogue := discard-text encapsulation := delimiter transport-padding
CRLF body-part
discard-text := *(*text CRLF) *text delimiter := CRLF dash-boundary
; May be ignored or discarded.
body-part := MIME-part-headers [CRLF *OCTET] close-delimiter := delimiter "--"
; Lines in a body-part must not start
; with the specified dash-boundary and
; the delimiter must not appear anywhere
; in the body part. Note that the
; semantics of a body-part differ from
; the semantics of a message, as
; described in the text.
OCTET := <any 0-255 octet value> preamble := discard-text
IMPORTANT: The free insertion of linear-white-space and RFC epilogue := discard-text
822 comments between the elements shown in this BNF is NOT
allowed since this BNF does not specify a structured header
field.
NOTE: In certain transport enclaves, RFC 822 restrictions discard-text := *(*text CRLF) *text
such as the one that limits bodies to printable US-ASCII ; May be ignored or discarded.
characters may not be in force. (That is, the transport
domains may exist that resemble standard Internet mail
transport as specified in RFC 821 and assumed by RFC 822, but
without certain restrictions.) The relaxation of these
restrictions should be construed as locally extending the
definition of bodies, for example to include octets outside of
the US-ASCII range, as long as these extensions are supported
by the transport and adequately documented in the Content-
Transfer-Encoding header field. However, in no event are
headers (either message headers or body part headers) allowed
to contain anything other than US-ASCII characters.
NOTE: Conspicuously missing from the multipart type is a body-part := MIME-part-headers [CRLF *OCTET]
notion of structured, related body parts. It is recommended ; Lines in a body-part must not start
that those wishing to provide more structured or integrated ; with the specified dash-boundary and
multipart messaging facilities should define subtypes of ; the delimiter must not appear anywhere
multipart that are syntactically identical but define ; in the body part. Note that the
relationships between the various parts. For example, subtypes ; semantics of a body-part differ from
of multipart could be defined that include a distinguished ; the semantics of a message, as
part which in turn is used to specify the relationships ; described in the text.
between the other parts, probably referring to them by their
Content-ID field. Old implementations will not recognize the
new subtype if this approach is used, but will treat it as
multipart/mixed and will thus be able to show the user the
parts that are recognized.
7.1.2. Handling Nested Messages and Multiparts OCTET := <any 0-255 octet value>
The "message/rfc822" subtype defined in a subsequent section IMPORTANT: The free insertion of linear-white-space and RFC 822
of this document has no terminating condition other than comments between the elements shown in this BNF is NOT allowed since
running out of data. Similarly, an improperly truncated this BNF does not specify a structured header field.
multipart entity may not have any terminating boundary marker,
and can turn up operationally due to mail system malfunctions.
It is essential that such entities be handled correctly when NOTE: In certain transport enclaves, RFC 822 restrictions such as
they are themselves imbedded inside of another multipart the one that limits bodies to printable US-ASCII characters may not
structure. MIME implementations are therefore required to be in force. (That is, the transport domains may exist that resemble
recognize outer level boundary markers at ANY level of inner standard Internet mail transport as specified in RFC 821 and assumed
nesting. It is not sufficient to only check for the next by RFC 822, but without certain restrictions.) The relaxation of
expected marker or other terminating condition. these restrictions should be construed as locally extending the
definition of bodies, for example to include octets outside of the
US-ASCII range, as long as these extensions are supported by the
transport and adequately documented in the Content- Transfer-Encoding
header field. However, in no event are headers (either message
headers or body part headers) allowed to contain anything other than
US-ASCII characters.
7.1.3. Mixed Subtype NOTE: Conspicuously missing from the "multipart" type is a notion of
structured, related body parts. It is recommended that those wishing
to provide more structured or integrated multipart messaging
facilities should define subtypes of multipart that are syntactically
identical but define relationships between the various parts. For
example, subtypes of multipart could be defined that include a
distinguished part which in turn is used to specify the relationships
between the other parts, probably referring to them by their
Content-ID field. Old implementations will not recognize the new
subtype if this approach is used, but will treat it as
multipart/mixed and will thus be able to show the user the parts that
are recognized.
The "mixed" subtype of multipart is intended for use when the 5.1.2. Handling Nested Messages and Multiparts
body parts are independent and need to be bundled in a
particular order. Any multipart subtypes that an
implementation does not recognize must be treated as being of
subtype "mixed".
7.1.4. Alternative Subtype The "message/rfc822" subtype defined in a subsequent section of this
document has no terminating condition other than running out of data.
Similarly, an improperly truncated "multipart" entity may not have
any terminating boundary marker, and can turn up operationally due to
mail system malfunctions.
The multipart/alternative type is syntactically identical to It is essential that such entities be handled correctly when they are
multipart/mixed, but the semantics are different. In themselves imbedded inside of another "multipart" structure. MIME
particular, each of the body parts is an "alternative" version implementations are therefore required to recognize outer level
of the same information. boundary markers at ANY level of inner nesting. It is not sufficient
to only check for the next expected marker or other terminating
condition.
Systems should recognize that the content of the various parts 5.1.3. Mixed Subtype
are interchangeable. Systems should choose the "best" type
based on the local environment and references, in some cases
even through user interaction. As with multipart/mixed, the
order of body parts is significant. In this case, the
alternatives appear in an order of increasing faithfulness to
the original content. In general, the best choice is the LAST
part of a type supported by the recipient system's local
environment.
Multipart/alternative may be used, for example, to send a The "mixed" subtype of "multipart" is intended for use when the body
message in a fancy text format in such a way that it can parts are independent and need to be bundled in a particular order.
easily be displayed anywhere: Any "multipart" subtypes that an implementation does not recognize
must be treated as being of subtype "mixed".
From: Nathaniel Borenstein <[email protected]> 5.1.4. Alternative Subtype
To: Ned Freed <[email protected]>
Date: Mon, 22 Mar 1993 09:41:09 -0800 (PST)
Subject: Formatted text mail
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary=boundary42
--boundary42 The "multipart/alternative" type is syntactically identical to
Content-Type: text/plain; charset=us-ascii "multipart/mixed", but the semantics are different. In particular,
each of the body parts is an "alternative" version of the same
information.
... plain text version of message goes here ... Systems should recognize that the content of the various parts are
interchangeable. Systems should choose the "best" type based on the
local environment and references, in some cases even through user
interaction. As with "multipart/mixed", the order of body parts is
significant. In this case, the alternatives appear in an order of
increasing faithfulness to the original content. In general, the
best choice is the LAST part of a type supported by the recipient
system's local environment.
--boundary42 "Multipart/alternative" may be used, for example, to send a message
Content-Type: text/enriched in a fancy text format in such a way that it can easily be displayed
anywhere:
... RFC 1563 text/enriched version of same message From: Nathaniel Borenstein <[email protected]>
goes here ... To: Ned Freed <[email protected]>
Date: Mon, 22 Mar 1993 09:41:09 -0800 (PST)
Subject: Formatted text mail
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary=boundary42
--boundary42 --boundary42
Content-Type: application/x-whatever Content-Type: text/plain; charset=us-ascii
... fanciest version of same message goes here ... ... plain text version of message goes here ...
--boundary42-- --boundary42
Content-Type: text/enriched
In this example, users whose mail systems understood the ... RFC 1896 text/enriched version of same message
"application/x-whatever" format would see only the fancy goes here ...
version, while other users would see only the enriched or
plain text version, depending on the capabilities of their
system.
In general, user agents that compose multipart/alternative --boundary42
entities must place the body parts in increasing order of Content-Type: application/x-whatever
preference, that is, with the preferred format last. For
fancy text, the sending user agent should put the plainest
format first and the richest format last. Receiving user
agents should pick and display the last format they are
capable of displaying. In the case where one of the
alternatives is itself of type "multipart" and contains
unrecognized sub-parts, the user agent may choose either to
show that alternative, an earlier alternative, or both.
NOTE: From an implementor's perspective, it might seem more ... fanciest version of same message goes here ...
sensible to reverse this ordering, and have the plainest
alternative last. However, placing the plainest alternative
first is the friendliest possible option when
multipart/alternative entities are viewed using a non-MIME-
conformant viewer. While this approach does impose some
burden on conformant MIME viewers, interoperability with older
mail readers was deemed to be more important in this case.
It may be the case that some user agents, if they can --boundary42--
recognize more than one of the formats, will prefer to offer
the user the choice of which format to view. This makes
sense, for example, if a message includes both a nicely-
formatted image version and an easily-edited text version.
What is most critical, however, is that the user not
automatically be shown multiple versions of the same data.
Either the user should be shown the last recognized version or
should be given the choice.
THE SEMANTICS OF CONTENT-ID IN MULTIPART/ALTERNATIVE: Each In this example, users whose mail systems understood the
part of a multipart/alternative entity represents the same "application/x-whatever" format would see only the fancy version,
data, but the mappings between the two are not necessarily while other users would see only the enriched or plain text version,
without information loss. For example, information is lost depending on the capabilities of their system.
when translating ODA to PostScript or plain text. It is
recommended that each part should have a different Content-ID
value in the case where the information content of the two
parts is not identical. And when the information content is
identical -- for example, where several parts of type
"message/external-body" specify alternate ways to access the
identical data -- the same Content-ID field value should be
used, to optimize any caching mechanisms that might be present
on the recipient's end. However, the Content-ID values used
by the parts should NOT be the same Content-ID value that
describes the multipart/alternative as a whole, if there is
any such Content-ID field. That is, one Content-ID value will
refer to the multipart/alternative entity, while one or more
other Content-ID values will refer to the parts inside it.
7.1.5. Digest Subtype In general, user agents that compose "multipart/alternative" entities
must place the body parts in increasing order of preference, that is,
with the preferred format last. For fancy text, the sending user
agent should put the plainest format first and the richest format
last. Receiving user agents should pick and display the last format
they are capable of displaying. In the case where one of the
alternatives is itself of type "multipart" and contains unrecognized
sub-parts, the user agent may choose either to show that alternative,
an earlier alternative, or both.
This document defines a "digest" subtype of the multipart NOTE: From an implementor's perspective, it might seem more sensible
Content-Type. This type is syntactically identical to to reverse this ordering, and have the plainest alternative last.
multipart/mixed, but the semantics are different. In However, placing the plainest alternative first is the friendliest
particular, in a digest, the default Content-Type value for a possible option when "multipart/alternative" entities are viewed
body part is changed from "text/plain" to "message/rfc822". using a non-MIME-conformant viewer. While this approach does impose
This is done to allow a more readable digest format that is some burden on conformant MIME viewers, interoperability with older
largely compatible (except for the quoting convention) with mail readers was deemed to be more important in this case.
RFC 934.
Note: Though it is possible to specify a Content-Type value It may be the case that some user agents, if they can recognize more
for a body part in a digest which is other than than one of the formats, will prefer to offer the user the choice of
"message/rfc822", such as a text/plain part containing a which format to view. This makes sense, for example, if a message
description of the material in the digest, actually doing so includes both a nicely- formatted image version and an easily-edited
is undesireble. The "multipart/digest" Content-Type is text version. What is most critical, however, is that the user not
intended to be used to send collections of messages. If a automatically be shown multiple versions of the same data. Either
"text/plain" part is needed, it should be included as a the user should be shown the last recognized version or should be
seperate part of a "multipart/mixed" message. given the choice.
A digest in this format might, then, look something like this: THE SEMANTICS OF CONTENT-ID IN MULTIPART/ALTERNATIVE: Each part of a
"multipart/alternative" entity represents the same data, but the
mappings between the two are not necessarily without information
loss. For example, information is lost when translating ODA to
PostScript or plain text. It is recommended that each part should
have a different Content-ID value in the case where the information
content of the two parts is not identical. And when the information
content is identical -- for example, where several parts of type
"message/external-body" specify alternate ways to access the
identical data -- the same Content-ID field value should be used, to
optimize any caching mechanisms that might be present on the
recipient's end. However, the Content-ID values used by the parts
should NOT be the same Content-ID value that describes the
"multipart/alternative" as a whole, if there is any such Content-ID
field. That is, one Content-ID value will refer to the
"multipart/alternative" entity, while one or more other Content-ID
values will refer to the parts inside it.
From: Moderator-Address 5.1.5. Digest Subtype
To: Recipient-List
Date: Mon, 22 Mar 1994 13:34:51 +0000
Subject: Internet Digest, volume 42
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="---- main boundary ----"
------ main boundary ---- This document defines a "digest" subtype of the "multipart" Content-
Type. This type is syntactically identical to "multipart/mixed", but
the semantics are different. In particular, in a digest, the default
Content-Type value for a body part is changed from "text/plain" to
"message/rfc822". This is done to allow a more readable digest
format that is largely compatible (except for the quoting convention)
with RFC 934.
...Introductory text or table of contents... Note: Though it is possible to specify a Content-Type value for a
body part in a digest which is other than "message/rfc822", such as a
"text/plain" part containing a description of the material in the
digest, actually doing so is undesireble. The "multipart/digest"
Content-Type is intended to be used to send collections of messages.
If a "text/plain" part is needed, it should be included as a seperate
part of a "multipart/mixed" message.
------ main boundary ---- A digest in this format might, then, look something like this:
Content-Type: multipart/digest;
boundary="---- next message ----"
------ next message ---- From: Moderator-Address
To: Recipient-List
Date: Mon, 22 Mar 1994 13:34:51 +0000
Subject: Internet Digest, volume 42
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="---- main boundary ----"
From: someone-else ------ main boundary ----
Date: Fri, 26 Mar 1993 11:13:32 +0200
Subject: my opinion
...body goes here ... ...Introductory text or table of contents...
------ next message ---- ------ main boundary ----
Content-Type: multipart/digest;
boundary="---- next message ----"
From: someone-else-again ------ next message ----
Date: Fri, 26 Mar 1993 10:07:13 -0500
Subject: my different opinion
... another body goes here ... From: someone-else
Date: Fri, 26 Mar 1993 11:13:32 +0200
Subject: my opinion
------ next message ------ ...body goes here ...
------ main boundary ------ ------ next message ----
7.1.6. Parallel Subtype From: someone-else-again
Date: Fri, 26 Mar 1993 10:07:13 -0500
Subject: my different opinion
This document defines a "parallel" subtype of the multipart ... another body goes here ...
Content-Type. This type is syntactically identical to
multipart/mixed, but the semantics are different. In
particular, in a parallel entity, the order of body parts is
not significant.
A common presentation of this type is to display all of the ------ next message ------
parts simultaneously on hardware and software that are capable
of doing so. However, composing agents should be aware that
many mail readers will lack this capability and will show the
parts serially in any event.
7.1.7. Other Multipart Subtypes ------ main boundary ------
Other multipart subtypes are expected in the future. MIME 5.1.6. Parallel Subtype
implementations must in general treat unrecognized subtypes of
multipart as being equivalent to "multipart/mixed".
7.2. Message Media Type This document defines a "parallel" subtype of the "multipart"
Content-Type. This type is syntactically identical to
"multipart/mixed", but the semantics are different. In particular,
in a parallel entity, the order of body parts is not significant.
It is frequently desirable, in sending mail, to encapsulate A common presentation of this type is to display all of the parts
another mail message. A special media type, "message", is simultaneously on hardware and software that are capable of doing so.
defined to facilitate this. In particular, the "rfc822" However, composing agents should be aware that many mail readers will
subtype of "message" is used to encapsulate RFC 822 messages. lack this capability and will show the parts serially in any event.
NOTE: It has been suggested that subtypes of message might be 5.1.7. Other Multipart Subtypes
defined for forwarded or rejected messages. However,
forwarded and rejected messages can be handled as multipart
messages in which the first part contains any control or
descriptive information, and a second part, of type
message/rfc822, is the forwarded or rejected message.
Composing rejection and forwarding messages in this manner
will preserve the type information on the original message and
allow it to be correctly presented to the recipient, and hence
is strongly encouraged.
Subtypes of message often impose restrictions on what Other "multipart" subtypes are expected in the future. MIME
encodings are allowed. These restrictions are described in implementations must in general treat unrecognized subtypes of
conjunction with each specific subtype. "multipart" as being equivalent to "multipart/mixed".
Mail gateways, relays, and other mail handling agents are 5.2. Message Media Type
commonly known to alter the top-level header of an RFC 822
message. In particular, they frequently add, remove, or
reorder header fields. These operations are explicitly
forbidden for the encapsulated headers embedded in the bodies
of messages of type "message."
7.2.1. RFC822 Subtype
A media type of "message/rfc822" indicates that the body It is frequently desirable, in sending mail, to encapsulate another
contains an encapsulated message, with the syntax of an RFC mail message. A special media type, "message", is defined to
822 message. However, unlike top-level RFC 822 messages, the facilitate this. In particular, the "rfc822" subtype of "message" is
restriction that each message/rfc822 body must include a used to encapsulate RFC 822 messages.
"From", "Date", and at least one destination header is removed
and replaced with the requirement that at least one of "From",
"Subject", or "Date" must be present.
It should be noted that, despite the use of the numbers "822", NOTE: It has been suggested that subtypes of "message" might be
a message/rfc822 entity isn't restricted to material in strict defined for forwarded or rejected messages. However, forwarded and
conformance to RFC822. Such entities can also include enhanced rejected messages can be handled as multipart messages in which the
information as defined in this document. In other words, a first part contains any control or descriptive information, and a
message/rfc822 message could well be a News article or a MIME second part, of type "message/rfc822", is the forwarded or rejected
message. message. Composing rejection and forwarding messages in this manner
will preserve the type information on the original message and allow
it to be correctly presented to the recipient, and hence is strongly
encouraged.
No encoding other than "7bit", "8bit", or "binary" is Subtypes of "message" often impose restrictions on what encodings are
permitted for the body of a "message/rfc822" entity. The allowed. These restrictions are described in conjunction with each
message header fields are always US-ASCII in any case, and specific subtype.
data within the body can still be encoded, in which case the
Content-Transfer-Encoding header field in the encapsulated
message will reflect this. Non-US-ASCII text in the headers
of an encapsulated message can be specified using the
mechanisms described in RFC MIME-HEADERS.
7.2.2. Partial Subtype Mail gateways, relays, and other mail handling agents are commonly
known to alter the top-level header of an RFC 822 message. In
particular, they frequently add, remove, or reorder header fields.
These operations are explicitly forbidden for the encapsulated
headers embedded in the bodies of messages of type "message."
The "partial" subtype is defined to allow large entities to be 5.2.1. RFC822 Subtype
delivered as several separate pieces of mail and automatically
reassembled by a receiving user agent. (The concept is
similar to IP fragmentation and reassembly in the basic
Internet Protocols.) This mechanism can be used when
intermediate transport agents limit the size of individual
messages that can be sent. The media type "message/partial"
thus indicates that the body contains a fragment of a larger
entity.
Because data of type "message" may never be encoded in base64 A media type of "message/rfc822" indicates that the body contains an
or quoted-printable, a problem might arise if message/partial encapsulated message, with the syntax of an RFC 822 message.
entities are constructed in an environment that supports However, unlike top-level RFC 822 messages, the restriction that each
binary or 8bit transport. The problem is that the binary data "message/rfc822" body must include a "From", "Date", and at least one
would be split into multiple message/partial messages, each of destination header is removed and replaced with the requirement that
them requiring binary transport. If such messages were at least one of "From", "Subject", or "Date" must be present.
encountered at a gateway into a 7bit transport environment,
there would be no way to properly encode them for the 7bit
world, aside from waiting for all of the fragments,
reassembling the inner message, and then encoding the
reassembled data in base64 or quoted-printable. Since it is
possible that different fragments might go through different
gateways, even this is not an acceptable solution. For this
reason, it is specified that entities of type message/partial
must always have a content-transfer-encoding of 7bit (the
default). In particular, even in environments that support
binary or 8bit transport, the use of a content-transfer-
encoding of "8bit" or "binary" is explicitly prohibited for
MIME entities of type message/partial. This in turn implies
that the inner message must not use "8bit" or "binary"
encoding.
Because some message transfer agents may choose to It should be noted that, despite the use of the numbers "822", a
automatically fragment large messages, and because such agents "message/rfc822" entity isn't restricted to material in strict
may use very different fragmentation thresholds, it is conformance to RFC822, nor are the semantics of "message/rfc822"
possible that the pieces of a partial message, upon objects restricted to the semantics defined in RFC822. More
reassembly, may prove themselves to comprise a partial specifically, a "message/rfc822" message could well be a News article
message. This is explicitly permitted. or a MIME message.
Three parameters must be specified in the Content-Type field No encoding other than "7bit", "8bit", or "binary" is permitted for
of type message/partial: The first, "id", is a unique the body of a "message/rfc822" entity. The message header fields are
identifier, as close to a world-unique identifier as possible, always US-ASCII in any case, and data within the body can still be
to be used to match the fragments together. (In general, the encoded, in which case the Content-Transfer-Encoding header field in
identifier is essentially a message-id; if placed in double the encapsulated message will reflect this. Non-US-ASCII text in the
quotes, it can be ANY message-id, in accordance with the BNF headers of an encapsulated message can be specified using the
for "parameter" given earlier in this specification.) The mechanisms described in RFC 2047.
second, "number", an integer, is the fragment number, which
indicates where this fragment fits into the sequence of
fragments. The third, "total", another integer, is the total
number of fragments. This third subfield is required on the
final fragment, and is optional (though encouraged) on the
earlier fragments. Note also that these parameters may be
given in any order.
Thus, the second piece of a 3-piece message may have either of 5.2.2. Partial Subtype
the following header fields:
Content-Type: Message/Partial; number=2; total=3; The "partial" subtype is defined to allow large entities to be
id="[email protected]" delivered as several separate pieces of mail and automatically
reassembled by a receiving user agent. (The concept is similar to IP
fragmentation and reassembly in the basic Internet Protocols.) This
mechanism can be used when intermediate transport agents limit the
size of individual messages that can be sent. The media type
"message/partial" thus indicates that the body contains a fragment of
a larger entity.
Content-Type: Message/Partial; Because data of type "message" may never be encoded in base64 or
id="[email protected]"; quoted-printable, a problem might arise if "message/partial" entities
number=2 are constructed in an environment that supports binary or 8bit
transport. The problem is that the binary data would be split into
multiple "message/partial" messages, each of them requiring binary
transport. If such messages were encountered at a gateway into a
7bit transport environment, there would be no way to properly encode
them for the 7bit world, aside from waiting for all of the fragments,
reassembling the inner message, and then encoding the reassembled
data in base64 or quoted-printable. Since it is possible that
different fragments might go through different gateways, even this is
not an acceptable solution. For this reason, it is specified that
entities of type "message/partial" must always have a content-
transfer-encoding of 7bit (the default). In particular, even in
environments that support binary or 8bit transport, the use of a
content- transfer-encoding of "8bit" or "binary" is explicitly
prohibited for MIME entities of type "message/partial". This in turn
implies that the inner message must not use "8bit" or "binary"
encoding.
But the third piece MUST specify the total number of Because some message transfer agents may choose to automatically
fragments: fragment large messages, and because such agents may use very
different fragmentation thresholds, it is possible that the pieces of
a partial message, upon reassembly, may prove themselves to comprise
a partial message. This is explicitly permitted.
Content-Type: Message/Partial; number=3; total=3; Three parameters must be specified in the Content-Type field of type
id="[email protected]" "message/partial": The first, "id", is a unique identifier, as close
to a world-unique identifier as possible, to be used to match the
fragments together. (In general, the identifier is essentially a
message-id; if placed in double quotes, it can be ANY message-id, in
accordance with the BNF for "parameter" given in RFC 2045.) The
second, "number", an integer, is the fragment number, which indicates
where this fragment fits into the sequence of fragments. The third,
"total", another integer, is the total number of fragments. This
third subfield is required on the final fragment, and is optional
(though encouraged) on the earlier fragments. Note also that these
parameters may be given in any order.
Note that fragment numbering begins with 1, not 0. Thus, the second piece of a 3-piece message may have either of the
following header fields:
When the fragments of an entity broken up in this manner are Content-Type: Message/Partial; number=2; total=3;
put together, the result is always a complete MIME entity, id="[email protected]"
which may have its own Content-Type header field, and thus may
contain any other data type.
7.2.2.1. Message Fragmentation and Reassembly Content-Type: Message/Partial;
id="[email protected]";
number=2
The semantics of a reassembled partial message must be those But the third piece MUST specify the total number of fragments:
of the "inner" message, rather than of a message containing
the inner message. This makes it possible, for example, to
send a large audio message as several partial messages, and
still have it appear to the recipient as a simple audio
message rather than as an encapsulated message containing an
audio message. That is, the encapsulation of the message is
considered to be "transparent".
When generating and reassembling the pieces of a Content-Type: Message/Partial; number=3; total=3;
message/partial message, the headers of the encapsulated id="[email protected]"
message must be merged with the headers of the enclosing
entities. In this process the following rules must be
observed:
(1) All of the header fields from the initial enclosing Note that fragment numbering begins with 1, not 0.
message, except those that start with "Content-" and
the specific header fields "Subject", "Message-ID",
"Encrypted", and "MIME-Version", must be copied, in
order, to the new message.
(2) The header fields in the enclosed message which start When the fragments of an entity broken up in this manner are put
with "Content-", plus the "Subject", "Message-ID", together, the result is always a complete MIME entity, which may have
"Encrypted", and "MIME-Version" fields, must be its own Content-Type header field, and thus may contain any other
appended, in order, to the header fields of the new data type.
message. Any header fields in the enclosed message
which do not start with "Content-" (except for the
"Subject", "Message-ID", "Encrypted", and "MIME-
Version" fields) will be ignored and dropped.
(3) All of the header fields from the second and any 5.2.2.1. Message Fragmentation and Reassembly
subsequent enclosing messages are discarded by the
reassembly process.
7.2.2.2. Fragmentation and Reassembly Example The semantics of a reassembled partial message must be those of the
"inner" message, rather than of a message containing the inner
message. This makes it possible, for example, to send a large audio
message as several partial messages, and still have it appear to the
recipient as a simple audio message rather than as an encapsulated
message containing an audio message. That is, the encapsulation of
the message is considered to be "transparent".
If an audio message is broken into two pieces, the first piece When generating and reassembling the pieces of a "message/partial"
might look something like this: message, the headers of the encapsulated message must be merged with
the headers of the enclosing entities. In this process the following
rules must be observed:
X-Weird-Header-1: Foo (1) Fragmentation agents must split messages at line
From: [email protected] boundaries only. This restriction is imposed because
To: [email protected] splits at points other than the ends of lines in turn
Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST) depends on message transports being able to preserve
Subject: Audio mail (part 1 of 2) the semantics of messages that don't end with a CRLF
Message-ID: <[email protected]> sequence. Many transports are incapable of preserving
MIME-Version: 1.0 such semantics.
Content-type: message/partial; id="[email protected]";
number=1; total=2
X-Weird-Header-1: Bar (2) All of the header fields from the initial enclosing
X-Weird-Header-2: Hello message, except those that start with "Content-" and
Message-ID: <[email protected]> the specific header fields "Subject", "Message-ID",
Subject: Audio mail "Encrypted", and "MIME-Version", must be copied, in
MIME-Version: 1.0 order, to the new message.
Content-type: audio/basic
Content-transfer-encoding: base64
... first half of encoded audio data goes here ... (3) The header fields in the enclosed message which start
with "Content-", plus the "Subject", "Message-ID",
"Encrypted", and "MIME-Version" fields, must be
appended, in order, to the header fields of the new
message. Any header fields in the enclosed message
which do not start with "Content-" (except for the
"Subject", "Message-ID", "Encrypted", and "MIME-
Version" fields) will be ignored and dropped.
and the second half might look something like this: (4) All of the header fields from the second and any
subsequent enclosing messages are discarded by the
reassembly process.
From: [email protected] 5.2.2.2. Fragmentation and Reassembly Example
To: [email protected]
Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST)
Subject: Audio mail (part 2 of 2)
MIME-Version: 1.0
Message-ID: <[email protected]>
Content-type: message/partial;
id="[email protected]"; number=2; total=2
... second half of encoded audio data goes here ... If an audio message is broken into two pieces, the first piece might
look something like this:
Then, when the fragmented message is reassembled, the X-Weird-Header-1: Foo
resulting message to be displayed to the user should look From: [email protected]
something like this: To: [email protected]
Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST)
Subject: Audio mail (part 1 of 2)
Message-ID: <[email protected]>
MIME-Version: 1.0
Content-type: message/partial; id="[email protected]";
number=1; total=2
X-Weird-Header-1: Foo X-Weird-Header-1: Bar
From: [email protected] X-Weird-Header-2: Hello
To: [email protected] Message-ID: <[email protected]>
Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST) Subject: Audio mail
Subject: Audio mail MIME-Version: 1.0
Message-ID: <[email protected]> Content-type: audio/basic
MIME-Version: 1.0 Content-transfer-encoding: base64
Content-type: audio/basic
Content-transfer-encoding: base64
... first half of encoded audio data goes here ... ... first half of encoded audio data goes here ...
... second half of encoded audio data goes here ...
The inclusion of a "References" field in the headers of the and the second half might look something like this:
second and subsequent pieces of a fragmented message that
references the Message-Id on the previous piece may be of
benefit to mail readers that understand and track references.
However, the generation of such "References" fields is
entirely optional.
Finally, it should be noted that the "Encrypted" header field From: [email protected]
has been made obsolete by Privacy Enhanced Messaging (PEM) To: [email protected]
[RFC1421, RFC1422, RFC1423, and RFC1424], but the rules above Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST)
are nevertheless believed to describe the correct way to treat Subject: Audio mail (part 2 of 2)
it if it is encountered in the context of conversion to and MIME-Version: 1.0
from message/partial fragments. Message-ID: <[email protected]>
Content-type: message/partial;
id="[email protected]"; number=2; total=2
7.2.3. External-Body Subtype ... second half of encoded audio data goes here ...
The external-body subtype indicates that the actual body data Then, when the fragmented message is reassembled, the resulting
are not included, but merely referenced. In this case, the message to be displayed to the user should look something like this:
parameters describe a mechanism for accessing the external
data.
When a MIME entity is of type "message/external-body", it X-Weird-Header-1: Foo
consists of a header, two consecutive CRLFs, and the message From: [email protected]
header for the encapsulated message. If another pair of To: [email protected]
consecutive CRLFs appears, this of course ends the message Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST)
header for the encapsulated message. However, since the Subject: Audio mail
encapsulated message's body is itself external, it does NOT Message-ID: <[email protected]>
appear in the area that follows. For example, consider the MIME-Version: 1.0
following message: Content-type: audio/basic
Content-transfer-encoding: base64
Content-type: message/external-body; ... first half of encoded audio data goes here ...
access-type=local-file; ... second half of encoded audio data goes here ...
name="/u/nsb/Me.jpeg"
Content-type: image/jpeg The inclusion of a "References" field in the headers of the second
Content-ID: <[email protected]> and subsequent pieces of a fragmented message that references the
Content-Transfer-Encoding: binary Message-Id on the previous piece may be of benefit to mail readers
that understand and track references. However, the generation of
such "References" fields is entirely optional.
THIS IS NOT REALLY THE BODY! Finally, it should be noted that the "Encrypted" header field has
been made obsolete by Privacy Enhanced Messaging (PEM) [RFC-1421,
RFC-1422, RFC-1423, RFC-1424], but the rules above are nevertheless
believed to describe the correct way to treat it if it is encountered
in the context of conversion to and from "message/partial" fragments.
The area at the end, which might be called the "phantom body", 5.2.3. External-Body Subtype
is ignored for most external-body messages. However, it may
be used to contain auxiliary information for some such
messages, as indeed it is when the access-type is "mail-
server". The only access-type defined in this document that
uses the phantom body is "mail-server", but other access-types
may be defined in the future in other documents that use this
area.
The encapsulated headers in ALL message/external-body entities The external-body subtype indicates that the actual body data are not
MUST include a Content-ID header field to give a unique included, but merely referenced. In this case, the parameters
identifier by which to reference the data. This identifier describe a mechanism for accessing the external data.
may be used for caching mechanisms, and for recognizing the
receipt of the data when the access-type is "mail-server".
Note that, as specified here, the tokens that describe When a MIME entity is of type "message/external-body", it consists of
external-body data, such as file names and mail server a header, two consecutive CRLFs, and the message header for the
commands, are required to be in the US-ASCII character set. encapsulated message. If another pair of consecutive CRLFs appears,
If this proves problematic in practice, a new mechanism may be this of course ends the message header for the encapsulated message.
required as a future extension to MIME, either as newly However, since the encapsulated message's body is itself external, it
defined access-types for message/external-body or by some does NOT appear in the area that follows. For example, consider the
other mechanism. following message:
As with message/partial, MIME entities of type Content-type: message/external-body;
message/external-body MUST have a content-transfer-encoding of access-type=local-file;
7bit (the default). In particular, even in environments that name="/u/nsb/Me.jpeg"
support binary or 8bit transport, the use of a content-
transfer-encoding of "8bit" or "binary" is explicitly
prohibited for entities of type message/external-body.
7.2.3.1. General External-Body Parameters Content-type: image/jpeg
Content-ID: <[email protected]>
Content-Transfer-Encoding: binary
The parameters that may be used with any message/external-body THIS IS NOT REALLY THE BODY!
are:
(1) ACCESS-TYPE -- A word indicating the supported access The area at the end, which might be called the "phantom body", is
mechanism by which the file or data may be obtained. ignored for most external-body messages. However, it may be used to
This word is not case sensitive. Values include, but contain auxiliary information for some such messages, as indeed it is
are not limited to, "FTP", "ANON-FTP", "TFTP", "LOCAL- when the access-type is "mail- server". The only access-type defined
FILE", and "MAIL-SERVER". Future values, except for in this document that uses the phantom body is "mail-server", but
experimental values beginning with "X-", must be other access-types may be defined in the future in other
registered with IANA, as described in RFC MIME-REG. specifications that use this area.
This parameter is unconditionally mandatory and MUST be
present on EVERY message/external-body.
(2) EXPIRATION -- The date (in the RFC 822 "date-time" The encapsulated headers in ALL "message/external-body" entities MUST
syntax, as extended by RFC 1123 to permit 4 digits in include a Content-ID header field to give a unique identifier by
the year field) after which the existence of the which to reference the data. This identifier may be used for caching
external data is not guaranteed. This parameter may be mechanisms, and for recognizing the receipt of the data when the
used with ANY access-type and is ALWAYS optional. access-type is "mail-server".
(3) SIZE -- The size (in octets) of the data. The intent Note that, as specified here, the tokens that describe external-body
of this parameter is to help the recipient decide data, such as file names and mail server commands, are required to be
whether or not to expend the necessary resources to in the US-ASCII character set.
retrieve the external data. Note that this describes
the size of the data in its canonical form, that is,
before any Content-Transfer-Encoding has been applied
or after the data have been decoded. This parameter
may be used with ANY access-type and is ALWAYS
optional.
(4) PERMISSION -- A case-insensitive field that indicates If this proves problematic in practice, a new mechanism may be
whether or not it is expected that clients might also required as a future extension to MIME, either as newly defined
attempt to overwrite the data. By default, or if access-types for "message/external-body" or by some other mechanism.
permission is "read", the assumption is that they are
not, and that if the data is retrieved once, it is
never needed again. If PERMISSION is "read-write",
this assumption is invalid, and any local copy must be
considered no more than a cache. "Read" and "Read-
write" are the only defined values of permission. This
parameter may be used with ANY access-type and is
ALWAYS optional.
The precise semantics of the access-types defined here are As with "message/partial", MIME entities of type "message/external-
described in the sections that follow. body" MUST have a content-transfer-encoding of 7bit (the default).
In particular, even in environments that support binary or 8bit
transport, the use of a content- transfer-encoding of "8bit" or
"binary" is explicitly prohibited for entities of type
"message/external-body".
7.2.3.2. The 'ftp' and 'tftp' Access-Types 5.2.3.1. General External-Body Parameters
An access-type of FTP or TFTP indicates that the message body The parameters that may be used with any "message/external- body"
is accessible as a file using the FTP [RFC-959] or TFTP [RFC- are:
783] protocols, respectively. For these access-types, the
following additional parameters are mandatory:
(1) NAME -- The name of the file that contains the actual (1) ACCESS-TYPE -- A word indicating the supported access
body data. mechanism by which the file or data may be obtained.
This word is not case sensitive. Values include, but
are not limited to, "FTP", "ANON-FTP", "TFTP", "LOCAL-
FILE", and "MAIL-SERVER". Future values, except for
experimental values beginning with "X-", must be
registered with IANA, as described in RFC 2048.
This parameter is unconditionally mandatory and MUST be
present on EVERY "message/external-body".
(2) SITE -- A machine from which the file may be obtained, (2) EXPIRATION -- The date (in the RFC 822 "date-time"
using the given protocol. This must be a fully syntax, as extended by RFC 1123 to permit 4 digits in
qualified domain name, not a nickname. the year field) after which the existence of the
external data is not guaranteed. This parameter may be
used with ANY access-type and is ALWAYS optional.
(3) Before any data are retrieved, using FTP, the user will (3) SIZE -- The size (in octets) of the data. The intent
generally need to be asked to provide a login id and a of this parameter is to help the recipient decide
password for the machine named by the site parameter. whether or not to expend the necessary resources to
For security reasons, such an id and password are not retrieve the external data. Note that this describes
specified as content-type parameters, but must be the size of the data in its canonical form, that is,
obtained from the user. before any Content-Transfer-Encoding has been applied
or after the data have been decoded. This parameter
may be used with ANY access-type and is ALWAYS
optional.
In addition, the following parameters are optional: (4) PERMISSION -- A case-insensitive field that indicates
whether or not it is expected that clients might also
attempt to overwrite the data. By default, or if
permission is "read", the assumption is that they are
not, and that if the data is retrieved once, it is
never needed again. If PERMISSION is "read-write",
this assumption is invalid, and any local copy must be
considered no more than a cache. "Read" and "Read-
write" are the only defined values of permission. This
parameter may be used with ANY access-type and is
ALWAYS optional.
(1) DIRECTORY -- A directory from which the data named by The precise semantics of the access-types defined here are described
NAME should be retrieved. in the sections that follow.
(2) MODE -- A case-insensitive string indicating the mode 5.2.3.2. The 'ftp' and 'tftp' Access-Types
to be used when retrieving the information. The valid
values for access-type "TFTP" are "NETASCII", "OCTET",
and "MAIL", as specified by the TFTP protocol [RFC-
783]. The valid values for access-type "FTP" are
"ASCII", "EBCDIC", "IMAGE", and "LOCALn" where "n" is a
decimal integer, typically 8. These correspond to the
representation types "A" "E" "I" and "L n" as specified
by the FTP protocol [RFC-959]. Note that "BINARY" and
"TENEX" are not valid values for MODE and that "OCTET"
or "IMAGE" or "LOCAL8" should be used instead. IF MODE
is not specified, the default value is "NETASCII" for
TFTP and "ASCII" otherwise.
7.2.3.3. The 'anon-ftp' Access-Type An access-type of FTP or TFTP indicates that the message body is
accessible as a file using the FTP [RFC-959] or TFTP [RFC- 783]
protocols, respectively. For these access-types, the following
additional parameters are mandatory:
The "anon-ftp" access-type is identical to the "ftp" access (1) NAME -- The name of the file that contains the actual
type, except that the user need not be asked to provide a name body data.
and password for the specified site. Instead, the ftp
protocol will be used with login "anonymous" and a password
that corresponds to the user's mail address.
7.2.3.4. The 'local-file' Access-Type (2) SITE -- A machine from which the file may be obtained,
using the given protocol. This must be a fully
qualified domain name, not a nickname.
An access-type of "local-file" indicates that the actual body (3) Before any data are retrieved, using FTP, the user will
is accessible as a file on the local machine. Two additional generally need to be asked to provide a login id and a
parameters are defined for this access type: password for the machine named by the site parameter.
For security reasons, such an id and password are not
specified as content-type parameters, but must be
obtained from the user.
(1) NAME -- The name of the file that contains the actual In addition, the following parameters are optional:
body data. This parameter is mandatory for the
"local-file" access-type.
(2) SITE -- A domain specifier for a machine or set of (1) DIRECTORY -- A directory from which the data named by
machines that are known to have access to the data NAME should be retrieved.
file. This optional parameter is used to describe the
locality of reference for the data, that is, the site
or sites at which the file is expected to be visible.
Asterisks may be used for wildcard matching to a part
of a domain name, such as "*.bellcore.com", to indicate
a set of machines on which the data should be directly
visible, while a single asterisk may be used to
indicate a file that is expected to be universally
available, e.g., via a global file system.
7.2.3.5. The 'mail-server' Access-Type (2) MODE -- A case-insensitive string indicating the mode
to be used when retrieving the information. The valid
values for access-type "TFTP" are "NETASCII", "OCTET",
and "MAIL", as specified by the TFTP protocol [RFC-
783]. The valid values for access-type "FTP" are
"ASCII", "EBCDIC", "IMAGE", and "LOCALn" where "n" is a
decimal integer, typically 8. These correspond to the
representation types "A" "E" "I" and "L n" as specified
by the FTP protocol [RFC-959]. Note that "BINARY" and
"TENEX" are not valid values for MODE and that "OCTET"
or "IMAGE" or "LOCAL8" should be used instead. IF MODE
is not specified, the default value is "NETASCII" for
TFTP and "ASCII" otherwise.
The "mail-server" access-type indicates that the actual body 5.2.3.3. The 'anon-ftp' Access-Type
is available from a mail server. Two additional parameters
are defined for this access-type:
(1) SERVER -- The addr-spec of the mail server from which The "anon-ftp" access-type is identical to the "ftp" access type,
the actual body data can be obtained. This parameter except that the user need not be asked to provide a name and password
is mandatory for the "mail-server" access-type. for the specified site. Instead, the ftp protocol will be used with
login "anonymous" and a password that corresponds to the user's mail
address.
(2) SUBJECT -- The subject that is to be used in the mail 5.2.3.4. The 'local-file' Access-Type
that is sent to obtain the data. Note that keying mail
servers on Subject lines is NOT recommended, but such
mail servers are known to exist. This is an optional
parameter.
Because mail servers accept a variety of syntaxes, some of An access-type of "local-file" indicates that the actual body is
which is multiline, the full command to be sent to a mail accessible as a file on the local machine. Two additional parameters
server is not included as a parameter in the content-type are defined for this access type:
header field. Instead, it is provided as the "phantom body"
when the media type is message/external-body and the access-
type is mail-server.
Note that MIME does not define a mail server syntax. Rather, (1) NAME -- The name of the file that contains the actual
it allows the inclusion of arbitrary mail server commands in body data. This parameter is mandatory for the
the phantom body. Implementations must include the phantom "local-file" access-type.
body in the body of the message it sends to the mail server
address to retrieve the relevant data.
Unlike other access-types, mail-server access is asynchronous (2) SITE -- A domain specifier for a machine or set of
and will happen at an unpredictable time in the future. For machines that are known to have access to the data
this reason, it is important that there be a mechanism by file. This optional parameter is used to describe the
which the returned data can be matched up with the original locality of reference for the data, that is, the site
message/external-body entity. MIME mail servers must use the or sites at which the file is expected to be visible.
same Content-ID field on the returned message that was used in Asterisks may be used for wildcard matching to a part
the original message/external-body entities, to facilitate of a domain name, such as "*.bellcore.com", to indicate
such matching. a set of machines on which the data should be directly
visible, while a single asterisk may be used to
indicate a file that is expected to be universally
available, e.g., via a global file system.
7.2.3.6. External-Body Security Issues 5.2.3.5. The 'mail-server' Access-Type
Message/external-body entities give rise to two important The "mail-server" access-type indicates that the actual body is
security issues: available from a mail server. Two additional parameters are defined
for this access-type:
(1) Accessing data via a message/external-body reference (1) SERVER -- The addr-spec of the mail server from which
effectively results in the message recipient performing the actual body data can be obtained. This parameter
an operation that was specified by the message is mandatory for the "mail-server" access-type.
originator. It is therefore possible for the message
originator to trick a recipient into doing something
they would not have done otherwise. For example, an
originator could specify a action that attempts
retrieval of material that the recipient is not
authorized to obtain, causing the recipient to
unwittingly violate some security policy. For this
reason, user agents capable of resolving external
references must always take steps to describe the
action they are to take to the recipient and ask for
explicit permisssion prior to performing it.
The 'mail-server' access-type is particularly (2) SUBJECT -- The subject that is to be used in the mail
vulnerable, in that it causes the recipient to send a that is sent to obtain the data. Note that keying mail
new message whose contents are specified by the servers on Subject lines is NOT recommended, but such
original message's originator. Given the potential for mail servers are known to exist. This is an optional
abuse, any such request messages that are constructed parameter.
should contain a clear indication that they were
generated automatically (e.g. in a Comments: header
field) in an attempt to resolve a MIME
message/external-body reference.
(2) MIME will sometimes be used in environments that Because mail servers accept a variety of syntaxes, some of which is
provide some guarantee of message integrity and multiline, the full command to be sent to a mail server is not
authenticity. If present, such guarantees may apply included as a parameter in the content-type header field. Instead,
only to the actual direct content of messages -- they it is provided as the "phantom body" when the media type is
may or may not apply to data accessed through MIME's "message/external-body" and the access-type is mail-server.
message/external-body mechanism. In particular, it may
be possible to subvert certain access mechanisms even
when the messaging system itself is secure.
It should be noted that this problem exists either with Note that MIME does not define a mail server syntax. Rather, it
or without the availabilty of MIME mechanisms. A allows the inclusion of arbitrary mail server commands in the phantom
casual reference to an FTP site containing a document body. Implementations must include the phantom body in the body of
in the text of a secure message brings up similar the message it sends to the mail server address to retrieve the
issues -- the only difference is that MIME provides for relevant data.
automatic retrieval of such material, and users may
place unwarranted trust is such automatic retrieval
mechanisms.
7.2.3.7. Examples and Further Explanations Unlike other access-types, mail-server access is asynchronous and
will happen at an unpredictable time in the future. For this reason,
it is important that there be a mechanism by which the returned data
can be matched up with the original "message/external-body" entity.
MIME mail servers must use the same Content-ID field on the returned
message that was used in the original "message/external-body"
entities, to facilitate such matching.
When the external-body mechanism is used in conjunction with 5.2.3.6. External-Body Security Issues
the multipart/alternative media type it extends the
functionality of multipart/alternative to include the case
where the same entity is provided in the same format but via
different accces mechanisms. When this is done the originator
of the message must order the parts first in terms of
preferred formats and then by preferred access mechanisms.
The recipient's viewer should then evaluate the list both in
terms of format and access mechanisms.
With the emerging possibility of very wide-area file systems, "Message/external-body" entities give rise to two important security
it becomes very hard to know in advance the set of machines issues:
where a file will and will not be accessible directly from the
file system. Therefore it may make sense to provide both a
file name, to be tried directly, and the name of one or more
sites from which the file is known to be accessible. An
implementation can try to retrieve remote files using FTP or
any other protocol, using anonymous file retrieval or
prompting the user for the necessary name and password. If an
external body is accessible via multiple mechanisms, the
sender may include multiple entities of type
message/external-body within the body parts of an enclosing
multipart/alternative entity.
However, the external-body mechanism is not intended to be (1) Accessing data via a "message/external-body" reference
limited to file retrieval, as shown by the mail-server effectively results in the message recipient performing
access-type. Beyond this, one can imagine, for example, using an operation that was specified by the message
a video server for external references to video clips. originator. It is therefore possible for the message
originator to trick a recipient into doing something
they would not have done otherwise. For example, an
originator could specify a action that attempts
retrieval of material that the recipient is not
authorized to obtain, causing the recipient to
unwittingly violate some security policy. For this
reason, user agents capable of resolving external
references must always take steps to describe the
action they are to take to the recipient and ask for
explicit permisssion prior to performing it.
The embedded message header fields which appear in the body of The 'mail-server' access-type is particularly
the message/external-body data must be used to declare the vulnerable, in that it causes the recipient to send a
media type of the external body if it is anything other than new message whose contents are specified by the
plain US-ASCII text, since the external body does not have a original message's originator. Given the potential for
header section to declare its type. Similarly, any Content- abuse, any such request messages that are constructed
transfer-encoding other than "7bit" must also be declared should contain a clear indication that they were
here. Thus a complete message/external-body message, generated automatically (e.g. in a Comments: header
referring to a document in PostScript format, might look like field) in an attempt to resolve a MIME
this: "message/external-body" reference.
From: Whomever (2) MIME will sometimes be used in environments that
To: Someone provide some guarantee of message integrity and
Date: Whenever authenticity. If present, such guarantees may apply
Subject: whatever only to the actual direct content of messages -- they
MIME-Version: 1.0 may or may not apply to data accessed through MIME's
Message-ID: <[email protected]> "message/external-body" mechanism. In particular, it
Content-Type: multipart/alternative; boundary=42 may be possible to subvert certain access mechanisms
Content-ID: <[email protected]> even when the messaging system itself is secure.
--42 It should be noted that this problem exists either with
Content-Type: message/external-body; name="BodyFormats.ps"; or without the availabilty of MIME mechanisms. A
site="thumper.bellcore.com"; mode="image"; casual reference to an FTP site containing a document
access-type=ANON-FTP; directory="pub"; in the text of a secure message brings up similar
expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)" issues -- the only difference is that MIME provides for
automatic retrieval of such material, and users may
place unwarranted trust is such automatic retrieval
mechanisms.
Content-type: application/postscript 5.2.3.7. Examples and Further Explanations
Content-ID: <[email protected]>
--42 When the external-body mechanism is used in conjunction with the
Content-Type: message/external-body; access-type=local-file; "multipart/alternative" media type it extends the functionality of
name="/u/nsb/writing/rfcs/RFC-MIME.ps"; "multipart/alternative" to include the case where the same entity is
site="thumper.bellcore.com"; provided in the same format but via different accces mechanisms.
expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)" When this is done the originator of the message must order the parts
first in terms of preferred formats and then by preferred access
mechanisms. The recipient's viewer should then evaluate the list
both in terms of format and access mechanisms.
Content-type: application/postscript With the emerging possibility of very wide-area file systems, it
Content-ID: <[email protected]> becomes very hard to know in advance the set of machines where a file
will and will not be accessible directly from the file system.
Therefore it may make sense to provide both a file name, to be tried
directly, and the name of one or more sites from which the file is
known to be accessible. An implementation can try to retrieve remote
files using FTP or any other protocol, using anonymous file retrieval
or prompting the user for the necessary name and password. If an
external body is accessible via multiple mechanisms, the sender may
include multiple entities of type "message/external-body" within the
body parts of an enclosing "multipart/alternative" entity.
--42 However, the external-body mechanism is not intended to be limited to
Content-Type: message/external-body; file retrieval, as shown by the mail-server access-type. Beyond
access-type=mail-server this, one can imagine, for example, using a video server for external
server="[email protected]"; references to video clips.
expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)"
Content-type: application/postscript The embedded message header fields which appear in the body of the
Content-ID: <[email protected]> "message/external-body" data must be used to declare the media type
of the external body if it is anything other than plain US-ASCII
text, since the external body does not have a header section to
declare its type. Similarly, any Content-transfer-encoding other
than "7bit" must also be declared here. Thus a complete
"message/external-body" message, referring to an object in PostScript
format, might look like this:
get RFC-MIME.DOC From: Whomever
To: Someone
Date: Whenever
Subject: whatever
MIME-Version: 1.0
Message-ID: <[email protected]>
Content-Type: multipart/alternative; boundary=42
Content-ID: <[email protected]>
--42-- --42
Content-Type: message/external-body; name="BodyFormats.ps";
site="thumper.bellcore.com"; mode="image";
access-type=ANON-FTP; directory="pub";
expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)"
Note that in the above examples, the default Content- Content-type: application/postscript
transfer-encoding of "7bit" is assumed for the external Content-ID: <[email protected]>
postscript data.
Like the message/partial type, the message/external-body media --42
type is intended to be transparent, that is, to convey the Content-Type: message/external-body; access-type=local-file;
data type in the external body rather than to convey a message name="/u/nsb/writing/rfcs/RFC-MIME.ps";
with a body of that type. Thus the headers on the outer and site="thumper.bellcore.com";
inner parts must be merged using the same rules as for expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)"
message/partial. In particular, this means that the Content-
type and Subject fields are overridden, but the From field is
preserved.
Note that since the external bodies are not transported along Content-type: application/postscript
with the external body reference, they need not conform to Content-ID: <[email protected]>
transport limitations that apply to the reference itself. In
particular, Internet mail transports may impose 7bit and line
length limits, but these do not automatically apply to binary
external body references. Thus a Content-Transfer-Encoding is
not generally necessary, though it is permitted.
Note that the body of a message of type "message/external- --42
body" is governed by the basic syntax for an RFC 822 message. Content-Type: message/external-body;
In particular, anything before the first consecutive pair of access-type=mail-server
CRLFs is header information, while anything after it is body server="[email protected]";
information, which is ignored for most access-types. expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)"
7.2.4. Other Message Subtypes Content-type: application/postscript
Content-ID: <[email protected]>
MIME implementations must in general treat unrecognized get RFC-MIME.DOC
subtypes of message as being equivalent to
"application/octet-stream".
Future subtypes of message intended for use with email should --42--
be restricted to "7bit" encoding. A type other than message
should be used if restriction to "7bit" is not possible.
8. Experimental Media Type Values Note that in the above examples, the default Content-transfer-
encoding of "7bit" is assumed for the external postscript data.
A media type value beginning with the characters "X-" is a Like the "message/partial" type, the "message/external-body" media
private value, to be used by consenting systems by mutual type is intended to be transparent, that is, to convey the data type
agreement. Any format without a rigorous and public in the external body rather than to convey a message with a body of
definition must be named with an "X-" prefix, and publicly that type. Thus the headers on the outer and inner parts must be
specified values shall never begin with "X-". (Older versions merged using the same rules as for "message/partial". In particular,
of the widely used Andrew system use the "X-BE2" name, so new this means that the Content-type and Subject fields are overridden,
systems should probably choose a different name.) but the From field is preserved.
In general, the use of "X-" top-level types is strongly
discouraged. Implementors should invent subtypes of the
existing types whenever possible. In many cases, a subtype of
application will be more appropriate than a new top-level
type.
9. Summary Note that since the external bodies are not transported along with
the external body reference, they need not conform to transport
limitations that apply to the reference itself. In particular,
Internet mail transports may impose 7bit and line length limits, but
these do not automatically apply to binary external body references.
Thus a Content-Transfer-Encoding is not generally necessary, though
it is permitted.
The five discrete media types provide provide a standardized Note that the body of a message of type "message/external-body" is
mechanism for tagging entities as audio, image, or several governed by the basic syntax for an RFC 822 message. In particular,
other kinds of data. The composite "multipart" and "message" anything before the first consecutive pair of CRLFs is header
media types allow mixing and hierarchical structuring of information, while anything after it is body information, which is
entities of different types in a single message. A ignored for most access-types.
distinguished parameter syntax allows further specification of
data format details, particularly the specification of
alternate character sets. Additional optional header fields
provide mechanisms for certain extensions deemed desirable by
many implementors. Finally, a number of useful media types are
defined for general use by consenting user agents, notably
message/partial, and message/external-body.
10. Security Considerations 5.2.4. Other Message Subtypes
Security issues are discussed in the context of the MIME implementations must in general treat unrecognized subtypes of
application/postscript type, the message/external-body type, "message" as being equivalent to "application/octet-stream".
and in RFC MIME-REG. Implementors should pay special
attention to the security implications of any media types that
can cause the remote execution of any actions in the
recipient's environment. In such cases, the discussion of the
application/postscript type may serve as a model for
considering other media types with remote execution
capabilities.
11. Authors' Addresses Future subtypes of "message" intended for use with email should be
restricted to "7bit" encoding. A type other than "message" should be
used if restriction to "7bit" is not possible.
For more information, the authors of this document are best 6. Experimental Media Type Values
contacted via Internet mail:
Nathaniel S. Borenstein A media type value beginning with the characters "X-" is a private
First Virtual Holdings value, to be used by consenting systems by mutual agreement. Any
25 Washington Avenue format without a rigorous and public definition must be named with an
Morristown, NJ 07960 "X-" prefix, and publicly specified values shall never begin with
USA "X-". (Older versions of the widely used Andrew system use the "X-
BE2" name, so new systems should probably choose a different name.)
Email: [email protected] In general, the use of "X-" top-level types is strongly discouraged.
Phone: +1 201 540 8967 Implementors should invent subtypes of the existing types whenever
Fax: +1 201 993 3032 possible. In many cases, a subtype of "application" will be more
appropriate than a new top-level type.
Ned Freed 7. Summary
Innosoft International, Inc.
1050 East Garvey Avenue South
West Covina, CA 91790
USA
Email: [email protected] The five discrete media types provide provide a standardized
Phone: +1 818 919 3600 mechanism for tagging entities as "audio", "image", or several other
Fax: +1 818 919 3614 kinds of data. The composite "multipart" and "message" media types
allow mixing and hierarchical structuring of entities of different
types in a single message. A distinguished parameter syntax allows
further specification of data format details, particularly the
specification of alternate character sets. Additional optional
header fields provide mechanisms for certain extensions deemed
desirable by many implementors. Finally, a number of useful media
types are defined for general use by consenting user agents, notably
"message/partial" and "message/external-body".
MIME is a result of the work of the Internet Engineering Task 9. Security Considerations
Force Working Group on Email Extensions. The chairman of that
group, Greg Vaudreuil, may be reached at:
Gregory M. Vaudreuil Security issues are discussed in the context of the
Octel Network Services "application/postscript" type, the "message/external-body" type, and
17080 Dallas Parkway in RFC 2048. Implementors should pay special attention to the
Dallas, TX 75248-1905 security implications of any media types that can cause the remote
USA execution of any actions in the recipient's environment. In such
cases, the discussion of the "application/postscript" type may serve
as a model for considering other media types with remote execution
capabilities.
Email: [email protected] 9. Authors' Addresses
Appendix A -- Collected Grammar
This appendix contains the complete BNF grammar for all the For more information, the authors of this document are best contacted
syntax specified by this document. via Internet mail:
By itself, however, this grammar is incomplete. It refers by Ned Freed
name to several syntax rules that are defined by RFC 822. Innosoft International, Inc.
Rather than reproduce those definitions here, and risk 1050 East Garvey Avenue South
unintentional differences between the two, this document West Covina, CA 91790
simply refers the reader to RFC 822 for the remaining USA
definitions. Wherever a term is undefined, it refers to the
RFC 822 definition.
boundary := 0*69<bchars> bcharsnospace Phone: +1 818 919 3600
Fax: +1 818 919 3614
EMail: [email protected]
bchars := bcharsnospace / " " Nathaniel S. Borenstein
First Virtual Holdings
25 Washington Avenue
Morristown, NJ 07960
USA
bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" / Phone: +1 201 540 8967
"+" / "_" / "," / "-" / "." / Fax: +1 201 993 3032
"/" / ":" / "=" / "?" EMail: [email protected]
body-part := <"message" as defined in RFC 822, with all MIME is a result of the work of the Internet Engineering Task Force
header fields optional, not starting with the Working Group on RFC 822 Extensions. The chairman of that group,
specified dash-boundary, and with the Greg Vaudreuil, may be reached at:
delimiter not occurring anywhere in the
body part. Note that the semantics of a
part differ from the semantics of a message,
as described in the text.>
close-delimiter := delimiter "--" Gregory M. Vaudreuil
Octel Network Services
17080 Dallas Parkway
Dallas, TX 75248-1905
USA
dash-boundary := "--" boundary EMail: [email protected]
; boundary taken from the value of
; boundary parameter of the
; Content-Type field.
delimiter := CRLF dash-boundary Appendix A -- Collected Grammar
discard-text := *(*text CRLF) This appendix contains the complete BNF grammar for all the syntax
; May be ignored or discarded. specified by this document.
encapsulation := delimiter transport-padding By itself, however, this grammar is incomplete. It refers by name to
CRLF body-part several syntax rules that are defined by RFC 822. Rather than
reproduce those definitions here, and risk unintentional differences
between the two, this document simply refers the reader to RFC 822
for the remaining definitions. Wherever a term is undefined, it
refers to the RFC 822 definition.
epilogue := discard-text boundary := 0*69<bchars> bcharsnospace
multipart-body := [preamble CRLF] bchars := bcharsnospace / " "
dash-boundary transport-padding CRLF
body-part *encapsulation
close-delimiter transport-padding
[CRLF epilogue]
preamble := discard-text bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" /
"+" / "_" / "," / "-" / "." /
"/" / ":" / "=" / "?"
transport-padding := *LWSP-char body-part := <"message" as defined in RFC 822, with all
; Composers MUST NOT generate header fields optional, not starting with the
; non-zero length transport specified dash-boundary, and with the
; padding, but receivers MUST delimiter not occurring anywhere in the
; be able to handle padding body part. Note that the semantics of a
; added by message transports. part differ from the semantics of a message,
as described in the text.>
close-delimiter := delimiter "--"
dash-boundary := "--" boundary
; boundary taken from the value of
; boundary parameter of the
; Content-Type field.
delimiter := CRLF dash-boundary
discard-text := *(*text CRLF)
; May be ignored or discarded.
encapsulation := delimiter transport-padding
CRLF body-part
epilogue := discard-text
multipart-body := [preamble CRLF]
dash-boundary transport-padding CRLF
body-part *encapsulation
close-delimiter transport-padding
[CRLF epilogue]
preamble := discard-text
transport-padding := *LWSP-char
; Composers MUST NOT generate
; non-zero length transport
; padding, but receivers MUST
; be able to handle padding
; added by message transports.
 End of changes. 358 change blocks. 
1748 lines changed or deleted 1641 lines changed or added

This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/