MIME and X(HT)ML (Re: MIME types vs. DOCTYPE) from Rick Jelliffe on 1999-02-25 (www-html-editor@w3.org from January to March 1999)

From: Rick Jelliffe <ricko@allette.com.au>
Date: Fri, 26 Feb 1999 04:01:50 +1100
To: "xml mailing list" <xml-dev@ic.ac.uk>
Cc: <www-html-editor@w3.org>
Message-ID: <00c601be60e0$8d772890$3df96d8c@NT.JELLIFFE.COM.AU>
The format of MIME types is given in Freed and Borenstein
 Multipurpose Internet Mail Extensions   (MIME) Part Two:
Media  Types( ftp://ftp.isi.edu/in-notes/rfc2046.txt)

That RFC allows anyone to go
    text/X-???
    application/X-???
where ??? is any name you like. For example, text/X-xml-vml or whatever.
I think it would be more practical to have the XML MIME media type
allow "*/xml-???" for IANA-registered DTDs and "*/ xml-X-???" for
private use DTDs. That shows the defaulting OK. Maybe RFC 2046 might
need to be altered accordingly to allow these kinds of subtyping.

There are two interesting documents for backgrounding on MIME types:
RFC  "2376"  XML Media Types (Whitehead & Murata)
    RFC "1874" SGML Media Types (Levinson)
(Refer ftp://ftp.isi.edu/in-notes/  )

The most important thing about the XML types, is that they specify
parseable
entity transport only, *not* documents per se. So a future XHTML MIME
type
will also have to specify whether parseable entities or documents are
being
sent. (Actually, it opens up an interesting intermediate prospect:
perhaps an
XHTML application should accept parseable HTML entities, not complete
WF document: so  "a <a href="http://xml.ascc.net/" >link</a> is here"
is a WF parseable entity: perhaps it would be nice for XHTML systems to
accept those--it would be a little friendlier than full XML. Yikes.)

During discussion for the SGML MIME types (1995), debators split into
two irreconcilable camps:
    * the "SDIF" people, who want to be able to send documents with
unreferenced
entity declarations removed (i.e. do a transitive closure on the
document, and
send all entities referenced, perhaps to some entity-resolution depth);
and
    * the RFC people, who wanted to send arbitrary collections of
documents,
even if redundant or incomplete.

Just before the time of the XML MIME type RFC, I revisited the SGML
debate,
for ISO purposes, to see how much common ground there was and whether
XML
offered a chance for a new or reconciling apporach: I ended up being
firmly of the opinion that only an entity-based MIME type and not a
document-based
MIME type was practical; it gave us everything that HTML had, and would
not
naturally create divisions or have some developers refusing to implement
it.
 In any case, the text/sgml MIME type continues to exist, and the
multipart type
also is around tantalysingly. The writers of the XML RFC agreed with
this point too,
and I think it is working out OK.

I wonder whether it is now time to re-open the can of worms, but with a
different
perspective. In particular, I think there is a need to agree on a way to
specify the
name of  a document type.  For example,  allowing a DOCTYPE parameter on
the MIME headers for XML and XHTML. This would point to a resource in
some schema or declaration format: XML markup declarations,
DDML, the new Xschema, DCD or anything. The advantage of having it in
the
MIME header is that the schema can be fixed, and the prolog would not
override
it. Some users need this. Personally, I would prefer if the MIME media
type
could hold all information needed on type: that works with modern
browsers.

Of course, as with encoding, this kind of "primary metadata" needs to
have an
inband notation (an extra argument on the XML PI?) as well as a MIME
header equivalent. I hope the XSchema group will specify a method ASAP,
in the same way that James Clark's stylesheet declarations logically
precede
the specification of a stylesheet language. Can the XSchema people give
this
serious consideration?

As a further requirement, perhaps the need for bundling parseable
entities into
streams should also be considered: a form of multipart suitable for
open-ended
streams. Personally, I think it is a bad idea, because a protocol is two
way, and
a document is one-way. But HTTP 1.1 provides lots of nice hooks for
things,
they should be investigated.

Rick Jelliffe
Received on Thursday, 25 February 1999 11:58:49 UTC