W3C home > Mailing lists > Public > public-i18n-geo@w3.org > July 2005

RE: New FAQ: entities and NCRs

From: Richard Ishida <ishida@w3.org>
Date: Mon, 4 Jul 2005 18:12:05 +0100
To: "'Bjoern Hoehrmann'" <derhoermi@gmx.net>
Cc: "'GEO'" <public-i18n-geo@w3.org>
Message-Id: <20050704171205.3B9914F022@homer.w3.org>

Hi Bjoern,

I did try to alude to this by saying:

"If you use entities (such as &aacute;) to represent characters in HTML, you
should take care any time your content is processed using XML tools, or
converted to XML. These entities have to be declared in the Document Type
Definition or converted to NCRs to work. For this reason, it may sometimes
be safer to use numeric values."

I'm wary of saying that you should never use character entities in HTML. But
maybe I could rephrase as follows, and move that up into a previous section:

"Using character entities in XML may be problematic if the entities are
defined externally to your document and the tools that process the XML do
not read such external files. For this reason, it may be safer to use
numeric values. If you use HTML-defined entities (such as &aacute;) to
represent characters in (X)HTML, you should take care any time your content
is processed using XML tools, or converted to XML."

It seems a shame for us to lose the ability to represent certain characters
using names - I'm thinking specifically of ambiguous or invisible characters
such as &nbsp; or &rlm;.  It is much more intuitive and less error prone.  I
wonder what we should do about that.  Does it make sense to include files
into documents that define these in the internal subset?  Should we extend
the list of predefined entities for XML? ...

RI

> From: Bjoern Hoehrmann [mailto:derhoermi@gmx.net] 
> Sent: 01 July 2005 20:21
> To: Richard Ishida
> Cc: GEO
> Subject: Re: New FAQ: entities and NCRs
> 
> * Richard Ishida wrote:
> >http://www.w3.org/International/questions/qa-escapes.html
> 
> I think the document should note that using "character 
> entities" is not interoperable and possibly dangerous. The 
> HTML Working Group has been approached several times to 
> clarify whether and how implementations are supposed to 
> support the pre-defined entities if they do not read the 
> external subset; the HTML Working Group so far refused to 
> provide such clarification, so there are a number of old 
> implementations that do not support use of them in XHTML 
> documents at all (to the extend that some implementations 
> incorrectly reject documents that use them) and current 
> implementations that support them for some document types but 
> not for others.
> 
> Also note that per XML 1.0 Third Edition,
> 
>   It is [A violation of the rules of this specification] if 
> an attribute
>   value contains a reference to an entity for which no declaration has
>   been read. [Conforming software MAY detect and report an 
> error and MAY
>   recover from [this error]].
> 
> So to the extend that it is possible to have some kind of 
> XHTML document that uses "character entities" in attributes 
> but the user agent does not support the document type and/or 
> did not process the entity declaration, it is perfectly 
> permissable for the user agent to act in unexpected ways for 
> the document. Robust documents do not use "character 
> entities" at all unless they are pre-defined in XML 1.0 or 
> declared in the internal subset.
> 
> The document should also link to the relevant requirements in 
> Charmod Fundamentals.
> --
> Björn Höhrmann · mailto:bjoern@hoehrmann.de · 
> http://bjoern.hoehrmann.de Weinh. Str. 22 · Telefon: 
> +49(0)621/4309674 · http://www.bjoernsworld.de
> 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · 
> http://www.websitedev.de/ 
> 
Received on Monday, 4 July 2005 17:12:10 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:12:40 GMT