W3C home > Mailing lists > Public > xml-dist-app@w3.org > July 2003

regex for media types

From: Mark Nottingham <mark.nottingham@bea.com>
Date: Tue, 29 Jul 2003 15:25:48 -0700
To: "Xml-Dist-App@W3. Org" <xml-dist-app@w3.org>
Message-Id: <9A570914-C213-11D7-84D2-00039396E15A@bea.com>

I took an action item to double-check what a regex for a media type 
should look like.

RFC 2045 defines the syntax of the content-type header as:

[[[
      content := "Content-Type" ":" type "/" subtype
                 *(";" parameter)
                 ; Matching of media type and subtype
                 ; is ALWAYS case-insensitive.

      type := discrete-type / composite-type

      discrete-type := "text" / "image" / "audio" / "video" /
                       "application" / extension-token

      composite-type := "message" / "multipart" / extension-token

      extension-token := ietf-token / x-token

      ietf-token := <An extension token defined by a
                     standards-track RFC and registered
                     with IANA.>

      x-token := <The two characters "X-" or "x-" followed, with
                  no intervening white space, by any token>

      subtype := extension-token / iana-token

      iana-token := <A publicly-defined extension token. Tokens
                     of this form must be registered with IANA
                     as specified in RFC 2048.>

      parameter := attribute "=" value

      attribute := token
                   ; Matching of attributes
                   ; is ALWAYS case-insensitive.

      value := token / quoted-string

      token := 1*<any (US-ASCII) CHAR except SPACE, CTLs,
                  or tspecials>

      tspecials :=  "(" / ")" / "<" / ">" / "@" /
                    "," / ";" / ":" / "\" / <">
                    "/" / "[" / "]" / "?" / "="
                    ; Must be in quoted-string,
                    ; to use within parameter values
]]]

The interesting expression is:
    type "/" subtype

Given that the type rule is extensible by IANA, it seems most sensible 
to NOT enumerate the current types. This leaves us with:

    [a-zA-Z0-9!#$%^&\*_-\+{}\|'.`~]+/[a-zA-Z0-9!#$%^&\*_-\+{}\|'.`~]+
Received on Tuesday, 29 July 2003 18:25:54 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:59:14 GMT