regex for media types

I took an action item to double-check what a regex for a media type 
should look like.

RFC 2045 defines the syntax of the content-type header as:

[[[
      content := "Content-Type" ":" type "/" subtype
                 *(";" parameter)
                 ; Matching of media type and subtype
                 ; is ALWAYS case-insensitive.

      type := discrete-type / composite-type

      discrete-type := "text" / "image" / "audio" / "video" /
                       "application" / extension-token

      composite-type := "message" / "multipart" / extension-token

      extension-token := ietf-token / x-token

      ietf-token := <An extension token defined by a
                     standards-track RFC and registered
                     with IANA.>

      x-token := <The two characters "X-" or "x-" followed, with
                  no intervening white space, by any token>

      subtype := extension-token / iana-token

      iana-token := <A publicly-defined extension token. Tokens
                     of this form must be registered with IANA
                     as specified in RFC 2048.>

      parameter := attribute "=" value

      attribute := token
                   ; Matching of attributes
                   ; is ALWAYS case-insensitive.

      value := token / quoted-string

      token := 1*<any (US-ASCII) CHAR except SPACE, CTLs,
                  or tspecials>

      tspecials :=  "(" / ")" / "<" / ">" / "@" /
                    "," / ";" / ":" / "\" / <">
                    "/" / "[" / "]" / "?" / "="
                    ; Must be in quoted-string,
                    ; to use within parameter values
]]]

The interesting expression is:
    type "/" subtype

Given that the type rule is extensible by IANA, it seems most sensible 
to NOT enumerate the current types. This leaves us with:

    [a-zA-Z0-9!#$%^&\*_-\+{}\|'.`~]+/[a-zA-Z0-9!#$%^&\*_-\+{}\|'.`~]+

Received on Tuesday, 29 July 2003 18:25:54 UTC