Re[2]: Clark's commentary from Chris Lilley on 2002-01-08 (www-tag@w3.org from January 2002)

From: Chris Lilley <chris@w3.org>
Date: Wed, 9 Jan 2002 00:20:28 +0100
To: Norman Walsh <Norman.Walsh@Sun.COM>
CC: www-tag@w3.org
Message-ID: <481454169775.20020109002028@w3.org>

On Monday, January 07, 2002, 11:08:19 PM, Norman wrote:

NW> James and I had several conversations on this topic at XML 2001. I've
NW> been persuaded that the way out of this dilemma is to accept that
NW> attributes should never be used for human readable text (as opposed to
NW> tokens or other simple datatypes).

I agree with that and it was one of the design influences on SVG -
human readable text is element content not attribute values (and
conversely, non-human-readable numerical stuff is stuffed into
attributes).

NW> This limitation can be justified, I think, by the argument that
NW> attribute values can't contain markup and I18N considerations always
NW> require markup in human readable text (e.g, for BIDI or Rubi (Ruby?)).

I agree in general (although BIDI does not, in fact, require markup
unless there is more than one level of nesting) but yes Ruby is one
example and xml:lang is another.

NW> If you restrict human readable text to element content, then you can
NW> use empty elements to replace named character entities.

NW> <para>An &eacute; has an accent.</para>

NW> could be written:

NW> <para>An <e:eacute/> has an accent.</para>

Thus making string matching on the DOM element nodes more complex
since instead of being stuff that the parser just deals with, you now
have to understand whatever namespace the prefix e: is bound to.

NW> or even

NW> <para>An <e:char name="latin small letter e with acute"/> has an accent.</para>

NW> Assuming some in-scope namespace declaration for "e:" of course :-)

once could remove both the requirement for an in-scope ns declaration
and my gripe abput knowing special namespaces with

<para>An <xml:char unicode="00E9"/> has an accent.</para>

or

<para>An <xml:char name="LATIN SMALL LETTER E WITH ACUTE"/> has an
accent.</para>

But, with the greater prevalence of Unicode-enabled editors and OS its
not clear that single characters are the primary use case for
entities, going forward. Even for plane-1-using applications like
MathML, given that Windows XP and MacOS X both support non-BMP
nowadays.

Plus (arguing against my own proposal) the former example has no
benefit as against

<para>An &#x00E9; has an accent.</para>

and the latter, besides being verbose beyond belief, is highly
succeptible to mistyping and bloats processors with string tables.

Who was it that said 'the current position is unsupportable. Except
when compared to the alternatives' or words to that effect ...

-- 
 Chris                            mailto:chris@w3.org

Received on Tuesday, 8 January 2002 18:20:33 UTC