Re: Comments: through clause 3 from lee@sq.com on 1996-11-13 (w3c-sgml-wg@w3.org from November 1996)

From: <lee@sq.com>
Date: Tue, 12 Nov 96 22:26:15 EST
To: crm@ebt.com, w3c-sgml-wg@w3.org
Message-Id: <9611130326.AA00485@sqrex.sq.com>
Just a few comments on Cris' comments (but I have not been able to
read the new draft yet, sorry)

> When did the ERB decision announced as "1. Reservation of name space"
> change from .XML. to -XML-?  Not that it matters, really.

some time ago, since the current draft "CSS1" style system can't cope with
dots in element types.  I am not sure why this is relevant to XML, though,
or, if it is, why it can't be fixed in CSS1, which is also a W3C thing.

> Clause 2.7:
> 
> Critique revised from 0.01.  I think that specifying two whitespace
> modes for the processor is a mistake.

I agree; it was done for the sake of SGML compatibility (which turned
out to be lost by it) and because some people thought that they could
change all EMPTY elements in their documents but not escape whitespace.
I.e. for no good reason.  I agree that it is a mistake, but it pales
into insignificance besides other more recent mistakes, and isn't
worth fighting for.  My hope is that enough is learnt from XML that
the SGML revision can be useful to use instead of XML.

> Clause 2.7, example:
>    <!ATTLIST * -XML-SPACE (PRESERVE|COLLAPSE) #IMPLIED>
> 
> Critique revised from 0.01.  Is this line to be included verbatim in
> all DTDs?

I think the idea was that you could add -XML-SPACE-"PRESERVE" whenever
you wanted the non-default behaviour, DTDs notwithstanding.  I'm not sure.
Certainly the syntax as given isn't legal SGML.  I'm not sure if it's
legal XML -- I can't read PostScript documents from here right now.

>    [31] Prolog := EncodingDecl? ...
> 
> Production [72] is defined as Encodingdecl, not EncodingDecl.

Note also that mandating that a system detect when a document is not
using the encoding and/or character set specified is not in general
technically possible.
You can't distinguish ISO 8859-1 and ISO 8859-7 without some fairly
complex AI algorithms, for example.  The wording needs to be changed to
say that the behaviour in that case is undefined.

> Clause 2.8, production [33] (and [70]):
>    [33] doctypedecl := '<!DOCTYPE' S Name ExternalID? S? ('['
>                        internalsubset* ']' S?)? '>'
> ...
>    [70] ExternalID  := 'SYSTEM' Literal
> 
> This mandates the form
> <!DOCTYPE fooSYSTEM"foo.dtd"[...] >

True (there are some other cases of missing S I think, or were in the
previous draft).  The right fix (I think) woud be to get rid of S
altogether and explain exactly which delimiters separate tokens.
See the PostScript language specification for a clear example -- or
see C for another one.

It is harder to ensure full SGML compatibility if you do this, but I
really do believe it's worth the effort.  For one thing, it encourages
simple token-stream implementations.. :-)


Note also that this production makes the HTML DOCTYPE illegal in XML,
since HTML requires a PUBLIC identifier, so that there is no document
valid in both XML and HTML.  (see RFC 1866; I believe HTML 3.2 to be
the same in this regard, excvept with a different PI)

> Clause 2.9, last paragraph:
>    If no RMD is provided, the effect is identical to an RMD with the
>    value ALL.
> 
> I feel that NONE should be the default.  The simplest XML document
> should not require the RMD at all.

I assumed that was an error in the specification!
A third value, -XML-IF-DTD-WAS-GIVEN, might be a better default.

> Clause 3.1, post-production [39] paragraph:
> 
> The special casing of HTML must be eliminated from the specification.
> It will *not* be implemented by most implementors, because they have
> separate tools for handling HTML.  Therefore, most XML implementations
> will be non-compliant, and this specification becomes moot anyway.

I fully agree, but it has been the source of bitter argument recently.
Since no XML document is a valid HTML document, and vice versa, it is
all in any case totally redundant.

> Clause 3.4.1, Validity checks:
> 
> ID and Idref do not mention normalization of case; Name token and Name
> tokens do.  This is inconsistent with both NAMECASE GENERAL YES and
> NAMECASE GENERAL NO.  It should be consistent.
> 
> I am opposed to case folding; I think it will be far easier to add it
> in XML 2.0 if a workable method is found (which I doubt will happen).

I agree.

[...]
> <screwup>
> <stuff id="�cole"/>
> <stuff id="ecole"/>
> </screwup>
> 
> is valid if parsed in Canada, but invalid if parsed in France.  That
> is a Bad Thing.  (Should we add an XML PI to indicate intended parsing
> locale? d-:)

In SGML it may or may not be valid depending on the SGML declaration,.
If you add
    <!Element xref EMPTY>
    <!AttList xref idref IDREF #IMPLIED>
and
    <xref id="ECOLE">
you have a cross reference which points to a different element in Quebec
and in France! This is obviously a potential source of errors, especially
given the almost complete internationalisation otherwise.

> Clause 3.5 in 0.01/W3C (now missing):
> 
> Carried from 0.01.  The DTD summary is no longer needed for empty
> elements, and is moot for mixed vs. element content distinction, but
> would be a VERY useful way to override the defaulted entities without
> requiring DTD parsing.

You can't override the default entities (believe it or not -- it's a
design bug).  Apart from that nonsense, it would be better to allow
entity declarations in a document type subset, perhaps?

Lee
Received on Tuesday, 12 November 1996 22:26:16 UTC