W3C home > Mailing lists > Public > public-i18n-geo@w3.org > July 2005

Re: New FAQ: entities and NCRs

From: Chris Lilley <chris@w3.org>
Date: Mon, 4 Jul 2005 19:33:48 +0200
Message-ID: <16527938.20050704193348@w3.org>
To: "Richard Ishida" <ishida@w3.org>
Cc: "'Bjoern Hoehrmann'" <derhoermi@gmx.net>, "'GEO'" <public-i18n-geo@w3.org>

On Monday, July 4, 2005, 7:12:05 PM, Richard wrote:

RI> Hi Bjoern,

RI> I did try to alude to this by saying:

RI> "If you use entities (such as &aacute;) to represent characters in HTML, you
RI> should take care any time your content is processed using XML tools, or
RI> converted to XML. These entities have to be declared in the Document Type
RI> Definition or converted to NCRs to work. For this reason, it may sometimes
RI> be safer to use numeric values."

I think my suggested text also dealt with this (and clarified that the
internal DTD subset was safe and interoperable, while relying on the
external DTD subset was not). 'may sometimes be safer' is vague.

RI> I'm wary of saying that you should never use character entities in HTML.

Its fast becoming bad practice.

RI> But maybe I could rephrase as follows, and move that up into a
RI> previous section:

RI> "Using character entities in XML may be problematic

'problematic' is vague. 'not interoperable' is better

RI>  if the entities are
RI> defined externally to your document and the tools that process the XML do
RI> not read such external files.

I think using the defined terms from XML is better here, as well.

RI>  For this reason, it may be safer

it is more interoperable

RI> to use numeric values.

or to type the characters directly

RI> If you use HTML-defined entities (such as &aacute;) to represent
RI> characters in (X)HTML, you should take care any time your content is
RI> processed using XML tools, or converted to XML."

This is where the parenthetical (X) is problematic. Something is either
XML or it is not. If its XML, it follows the XML rules. MathML was hit
by the same problem, but more so of course.

RI> It seems a shame for us to lose the ability to represent certain
RI> characters using names -

You don't lose it. Indeed, you can use whatever names you find memorable
or convenient, and for whatever characters you happen to want to

RI> I'm thinking specifically of ambiguous or invisible characters such
RI> as &nbsp; or &rlm;. It is much more intuitive and less error prone.
RI> I wonder what we should do about that. Does it make sense to include
RI> files into documents that define these in the internal subset?
RI> Should we extend the list of predefined entities for XML? ...

Is there a short list of such ambiguous or invisible characters? Will
that list grow with revisions to Unicode?

(not that extending XML is likely to be a fast or cheap process, just
wondering aloud how big the list is).

>> From: Bjoern Hoehrmann [mailto:derhoermi@gmx.net] 
>> Sent: 01 July 2005 20:21
>> To: Richard Ishida
>> Cc: GEO
>> Subject: Re: New FAQ: entities and NCRs
>> * Richard Ishida wrote:
>> >http://www.w3.org/International/questions/qa-escapes.html
>> I think the document should note that using "character 
>> entities" is not interoperable and possibly dangerous. The 
>> HTML Working Group has been approached several times to 
>> clarify whether and how implementations are supposed to 
>> support the pre-defined entities if they do not read the 
>> external subset; the HTML Working Group so far refused to 
>> provide such clarification, so there are a number of old 
>> implementations that do not support use of them in XHTML 
>> documents at all (to the extend that some implementations 
>> incorrectly reject documents that use them) and current 
>> implementations that support them for some document types but 
>> not for others.
>> Also note that per XML 1.0 Third Edition,
>>   It is [A violation of the rules of this specification] if 
>> an attribute
>>   value contains a reference to an entity for which no declaration has
>>   been read. [Conforming software MAY detect and report an 
>> error and MAY
>>   recover from [this error]].
>> So to the extend that it is possible to have some kind of 
>> XHTML document that uses "character entities" in attributes 
>> but the user agent does not support the document type and/or 
>> did not process the entity declaration, it is perfectly 
>> permissable for the user agent to act in unexpected ways for 
>> the document. Robust documents do not use "character 
>> entities" at all unless they are pre-defined in XML 1.0 or 
>> declared in the internal subset.
>> The document should also link to the relevant requirements in 
>> Charmod Fundamentals.
>> --
>> Björn Höhrmann · mailto:bjoern@hoehrmann.de · 
>> http://bjoern.hoehrmann.de Weinh. Str. 22 · Telefon: 
>> +49(0)621/4309674 · http://www.bjoernsworld.de
>> 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · 
>> http://www.websitedev.de/ 

 Chris Lilley                    mailto:chris@w3.org
 Chair, W3C SVG Working Group
 W3C Graphics Activity Lead
Received on Monday, 4 July 2005 17:33:59 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:28:03 UTC