Re: ERB discussions and decisions from David G. Durand on 1996-11-15 (w3c-sgml-wg@w3.org from November 1996)

From: David G. Durand <dgd@cs.bu.edu>
Date: Fri, 15 Nov 1996 11:42:04 -0500
To: W3C SGML Working Group <w3c-sgml-wg@w3.org>
Message-Id: <v02130502aeb246374941@[128.148.157.46]>

At 3:55 PM 11/14/96, Michael Sperberg-McQueen wrote:
>Summary:
>
> * Removed (Paoli abstaining) the Cougar entities from XML 1.0.
> * Retained lt, gt, amp, quot, apos as non-redeclarable entities.
> * Retained ban on nondeterministic content models.
> * Prohibited (Bray, Hollander, Magliery dissenting) overlap among
>enumerated types in XML 1.0.
> * Dropped special handling of HTML EMPTY elements, added paragraph
>explaining Prescod method of making HTML valid XML (Bray, Sharpe
>dissenting).
> * Added version declaration.
> * Allowed processing instructions in DTD.

In other words, any SGML design flaw, even one slated for demolition, will
be preserved (except for whitespace, where we will choose to complicate the
syntax, with attribute names that affect the parsing of data). (2 cases
currently in evidence: determinism, attribute values).

   I am still interested in seeing an ERB discussion of MIME headers (an
already existing, generalized, extensible, syntax for version information
and other meta-data, which is compatible with SGML entity management
according to Goldfarb, has existing code base, and would eliminate all our
uses of the hideous PI feature (which seems like it will never die!)).

   I also ask, whither SDATA? This common processing convention (as
supported by Omnimark, SP, SGMLS, and probably others (given the ESIS,
probably a _lot_ of others) is still out, without any ERB formal
discussion. I'll also note that Private Use does not seem to be very
popular with Unicode advocates.  In recent discussion on the URN list,
Martin Duerst, I18N advocate extraordinary, reacted to my suggestion to use
Private Use characters (in a different context) with a plea that we not
advertise the presence of this feature of Unicode.

    I have yet to hear anyone offer an argument as to why a character
string describing missing data (an unknown glyph) is inferior to a number
describing missing data, especially when the web infrastructure does not
provide a convenient way to make the private arrangements needed for
private-use to work (funny property of a publishing medium, isn't it?).

   I stopped posting on these topics because it seemed that the arguments
were all out in the open, but if the ERB is going to listen to them and
then _not_ vote, maybe I should just keep repeating the obvious. "Numerical
codes for non-numeric objects are _always_ inferior to string codes unless
there is a critical storage-space tradeoff."

    -- David

PS. I agree with Paul Prescod that this was the right level of detail to
use in reporting on the decisions.

_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________

Received on Friday, 15 November 1996 11:36:32 UTC