Re: XML design errors? from C M Sperberg-McQueen on 1998-06-19 (xml-editor@w3.org from April to June 1998)

From: C M Sperberg-McQueen <cmsmcq@uic.edu>
Date: Fri, 19 Jun 1998 12:54:46 -0500
To: connolly@w3.org
CC: Chris.Newman@innosoft.com, xml-editor@w3.org, cmsmcq@uic.edu
Message-Id: <199806191754.MAA241944@tigger.cc.uic.edu>
>Date: Thu, 18 Jun 1998 02:40:44 -0500
>From: Dan Connolly <connolly@w3.org>
>
>Dan Connolly wrote:
>> http://www.imc.org/ietf-calendar/mail-archive/1445.html
>> 
>> Have you written anything about these design errors
>> that you could point me to?
>
>Oops... just found it; archiving a copy at xml-editor@w3.org.
>
>
>========
>Yes there are specific technical deficiences in the XML spec.  My
>summary
>list from a quick skim of the spec follows:
>
>* Didn't define BNF notation.

Your copy is incomplete, then.

>* Processing Instructions introduce interoperability problems.  There
>  is also no registry for PITargets.

I have always seen processing instructions as a fairly useful way
to avoid and minimize interoperability problems:  using them the
owner of the data can hide processing instructions intended for one
family of processes from other processes -- or rather any process gets
the hooks necessary to allow it to recognize and ignore instructions
intended for other systems.

So the claim that PIs "introduce" interoperability problems takes me
by surprise.  It seems flat wrong to me:  PIs allow us to deal with
some of the interoperability problems that already exist.

Can anyone give an example of an interoperability problem introduced
by the notion of processing instructions that could not occur without
them?

>* "<![CDATA[" notation is cumbersome and creates new parser state and
>  alternate representations.

It's much less cumbersome than the alternative, which is to escape
each delimiter in the block individually.  It does create a new
parser state and allow alternate representations of the same character
stream; since providing only a single representation for a given
character stream is not a goal of XML, I am not sure why this counts
as a weakness.

If it were a goal, the use of any existing character set standard 
would defeat it in short order.

>* Version number text is broken -- likely to leave things stuck at
>  "1.0" just like MIME-Version.

How?  My understanding is that MIME built the version number into
the grammar, so that conforming MIME parsers were required to 
reject version numbers other than 1.0.  If the XML spec makes such
a requirement, I don't see where.  The relevant sentence text
says that it is an error to use the version number 1.0 if the
document does not conform to version 1.0 of the spec; it does not
say, in anything I see, that version 1.0 processors are required to
signal an error if they see any other version number.  I'm not
even sure they are even allowed to signal an error solely on the
basis of the version number.

>* Reference to UCS-2 which doesn'treally exist.

What does 'really exist' mean?  UCS-2 was defined by ISO 10646
the last time I looked; if you don't have access to 10646,
consult appendix C of Unicode 2.0.

If definition in an ISO standard does not meet the definition of
real existence, then 'real existence' is not an interesting or
useful concept for discussing the XML spec.

>* Too many encoding variations.  &#x;  &#; &; UTF-8, UTF-16.

Personally, I would agree:  I think decimal character references, and
UTF-8, would be better off omitted.  But I'm not sure the spec 
would really be better technically in that case: just smaller.  And
it would definitely be less widely adopted.

>* Byte-order mark replicates TIFF problem.

Can someone explain this?

-C. M. Sperberg-McQueen
Received on Friday, 19 June 1998 13:56:06 UTC