Re: [XHTML2] Unicode line and paragraph separators

On 6 Apr 2003 at 22:09, William wrote:

> I think the idea of using "&ps;" instead of <p> is bad.
> 
> Aside from the arguments against already given, I wish to point
> out another: no name for &#8233 can be set aside in the XHTML
> spec for use on the web.  (A name would be OK for inhouse use
> of XHTML.)
> 
> As I understand things, the XML 1.0 (2nd Edition) spec,
> http://www.w3.org/TR/2000/REC-xml-20001006 ,
> at section 2.4 provides 5 named character entities: "amp", "lt", "gt",
> "apos", and "quot".
> 
> In order for other character entities in an XML document to be
> referenced by name rather than by code point, the entity name must be
> defined in the document type definition of the corresponding XML
> application.
> 
> Since section 4.4, XML Processor Treatment of Entities and References,
> states that a non-validating processor (such as a browser) is not
> required to retrieve an external entity, the use of a named character
> entity such as "&ps;" is ruled out for XHTML since XHTML browsers are
> not validating processors unless browsers are "required" to have
> "canned" knowledge of it.
> 
> I suppose the specification of XHTML 2 could try to insist that
> browsers must know something like "&ps;", but I hasten to point out
> that there is already some contention among major browser sponsors on
> whether a browser must know any of the root namespace vocabulary of
> XHTML, i.e., whether XHTML among XML document types deserves special
> treatment by browsers.

Since XHTML1 has the Latin-1, Special and Symbol characters as sets of 
defined entities that are part of the normative definition, I see no 
problem in adding &ps; and &ls; to the set of special character 
entities for XHTML2 if a decision were made to add them to XHTML2. Can 
you name a single browser that supports XHTML1 as application/xhtml+xml 
that does not support the three entity sets? Any application that tries 
to support XHTML as anything more than generic XML is going to have to 
understand predefined entities that are part of the normative 
definition, either by being a validating agent or by having an internal 
list of them.

Any special treatment of U+2029 would only occur when a browser renders 
a document, at which time it would need the same degree of internal 
knowledge of XHTML2 to render the document whether or not paragraph 
boundaries are indicated by markup or by formatting characters. If the 
application isn't trying to render the document, then the default XML 
behavior suggested for U+2028 and U+2029 suggested by Unicode Techical 
Report #20 (which by the way, does not have the force of a full-fledged 
standard either for Unicode or XML) of treating separator charactors as 
whitespace is I believe an adequate interprepation for most non-
rendering purposes. One might argue that an agent that doesn't know 
that &ps; should be replaced by U+2029 wouldn't know that &ps; is white 
space, but the same problem applies to the existing entities &nbsp;, 
&ensp;, &emsp;, &thinsp;, and &zwnj;. Unless you wish to argue that all 
entities except for those defined in XML should be removed from XHTML2, 
I cannot agree with the argument you have given. Obviously if such a 
decision e were to be made then there would be no reasonable 
alternative except to retain <l> and <p> and forget about separators 
entirely. (Not because &#8233; is too long, but because it does not 
make sense to have use a non-named entity put to such use.)

Received on Monday, 7 April 2003 00:24:25 UTC