Re: text/xml volunteers needed from lee@sq.com on 1997-02-01 (w3c-sgml-wg@w3.org from February 1997)

From: <lee@sq.com>
Date: Sat, 1 Feb 97 16:35:27 EST
To: w3c-sgml-wg@www10.w3.org
Message-Id: <9702012135.AA18432@sqrex.sq.com>

Peter@ursus.demon.co.uk (Peter Murray-Rust) wrote:

> One suggestion made was that we should use text/sgml for CML, which though
> _technically_ correct is useless _in practice_.  This is a problem which
> is going to reoccur as we develop non-textual applications.  If I send
> someone a molecule/integrated_circuit/etc. and they read it in a current 
> 'SGML browser' it won't
> do them much good at present.  IFF the browser recognised the DTD _and_ 
> had access for routing the whole problem to an appropriate system, fine - 
> but that is asking a lot.

The distinction between text/ and application/ is two-fold;
(1) if people can, in the worst case, make sense of the document by
    looking at the source, it should be text/*

(2) text/* documents are subject to newline substitution, so that
    CR or LF is translated to CR LF for transmission as MIME requires.
    They are also (strictly speaking) subject to a 72 octet line limit,
    when they are transmitted as text.  This is why NAMELEN is set to
    (I think) 70 in RFC 1866, for example.

(3) application/* documents are sent as binary images -- i.e. no
    transformations are performed on the stream -- and the expectation is
    that without the correct viewer, you can do nothing.

For CML, the best thing would be to use content negotiation, and send
the document as text/cml if you have a cml viewer, and text/sgml if
you have no CML viewer but do have an SGML viewer, and perhaps
text/html or text/plain otherwise...

For XML, the MIME type should be text/xml.

Note that most current HTTP servers may go wrong with text Unicode files,
performing incorrect CR LF substitution.  However, as more HTML
documents use Unicode, this will quickly get fixed.

With regard to your other mesage (Peter), note also that very few
(if any) existing HTML documents are valid XML documents, as they
  (1) lack the processing instruction at the start, and more importantly
  (2) are not well-formed, because they use the "wrong" syntax for EMPTY
      elements (<BR> instead of <BR/>) and use OMITTAG.

They also use SHORTTAG, but this is YES in the XML SGML declaration in
order to allow use of NET, and hence is sort of half allowed ;-), although
forbidden by the XML specification prose.

Lee

Received on Saturday, 1 February 1997 16:35:53 UTC