Re: XML 1.0 priorities from Steven J. DeRose on 1997-05-19 (w3c-sgml-wg@w3.org from May 1997)

From: Steven J. DeRose <sjd@eps.inso.com>
Date: Mon, 19 May 1997 11:39:14 -0400
To: w3c-sgml-wg@w3.org
Message-Id: <2.2.32.19970519153914.009e4510@pop>
At 12:37 AM 05/17/97 GMT, Peter Murray-Rust wrote:
>SD1 (short tags).  This is trivial for parser writers, but I am no longer in
>favour of it because (a) I suspect some XML documents will be hand-authored
>(or at least hand-edited) - a rich source of confusion - and (b) error
>recovery for missing endtags in WF documents would be effectively error
>creation.

Very well put. Omitting GIs from end-tags removes redundancy; and therefore
inevitably reduces the possibility or error detection (see anything by
Claude Shannon). Someone's idea that it allows us to avoid even having a
"wrong GI in end-tag" message and that that is an advantage, is only tenable
if when that error actually occurs, what one wants *instead* is a parser
that silently assigns the wrong end-tag without even noticing that it's wrong.

>SD2 (structured attributes).  This seems to require either (a) rewriting
>SGML and the parsers or (b) doing some fairly cunning pre-parser manipulation.
>I cannot see a working prototype by July 1, even if we agreed the syntax
>today, since there would be too many implied semantics to be resolved
>quickly.  It would also break the Esis and ElementTree output and although
>it might be representable by groves, there are not yet any XML applications 
>that use groves?

Note that massive changes are only required if you wish XML to *validate* or
constrain such attributes. You can stuff a stream of SGML or Lisp or
whatever in an attribute *right now* and have structure. I don't see your
point about "too many implied semantics" -- we don't have to say any more
about semantics in attributes than we do in content. 

For ***example***, we could easily state the syntax requirements for putting
structure inside attributes, simply by using XML syntax there (gee, you have
to escape quotes), or by using Lisp/Scheme/DSSSL Scheme syntax there (gee,
you still have to escape quotes). Just cite either definition. This may or
may not be the best thing to do, but the syntax is easy and we don't have to
touch semantics.

>SD3 (Data types).  I don't think there are syntactic problems here, although
>I suspect the namespace of potential types is sufficiently large that we 
>would debate for some time.  (Date?, Date+Time?, DateTime?, etc. and there
>will also have to be discussion about floats - precision, overflow behaviour,
>format specification, etc.)  I wouldn't object to a placeholder and make as 
>much progress as possible.

I thought the proposal was to take a very small, specific set of types: the
atomic datatypes agreed on in RDBs. Not a big deal. And there are plenty of
standards for exactly how such things behave (if we even have to care): IEEE
floating point, for example, which almost everybody uses. Just cite the
existing literature. Differing from it would be hugely more work, make us
gratuitously incompatible, and make us look foolish unless/until we showed
superb and compelling reasons. SGML already gets flak for not having
ubiquitous types like integer, real, character, and so on. If we were to add
those (I'm not certain we should, just "if"), I think few people would
complain that we didn't also add imaginary and transcendental numbers,
polynomials, color spaces, and so on.


Steven J. DeRose, Ph.D., Chief Scientist
Inso Electronic Publishing Solutions
   (formerly EBT)
Received on Monday, 19 May 1997 11:42:16 UTC