- From: Chris Lilley <chris@w3.org>
- Date: Mon, 4 Jul 2005 19:33:48 +0200
- To: "Richard Ishida" <ishida@w3.org>
- Cc: "'Bjoern Hoehrmann'" <derhoermi@gmx.net>, "'GEO'" <public-i18n-geo@w3.org>
On Monday, July 4, 2005, 7:12:05 PM, Richard wrote: RI> Hi Bjoern, RI> I did try to alude to this by saying: RI> "If you use entities (such as á) to represent characters in HTML, you RI> should take care any time your content is processed using XML tools, or RI> converted to XML. These entities have to be declared in the Document Type RI> Definition or converted to NCRs to work. For this reason, it may sometimes RI> be safer to use numeric values." I think my suggested text also dealt with this (and clarified that the internal DTD subset was safe and interoperable, while relying on the external DTD subset was not). 'may sometimes be safer' is vague. RI> I'm wary of saying that you should never use character entities in HTML. Its fast becoming bad practice. RI> But maybe I could rephrase as follows, and move that up into a RI> previous section: RI> "Using character entities in XML may be problematic 'problematic' is vague. 'not interoperable' is better RI> if the entities are RI> defined externally to your document and the tools that process the XML do RI> not read such external files. I think using the defined terms from XML is better here, as well. RI> For this reason, it may be safer it is more interoperable RI> to use numeric values. or to type the characters directly RI> If you use HTML-defined entities (such as á) to represent RI> characters in (X)HTML, you should take care any time your content is RI> processed using XML tools, or converted to XML." This is where the parenthetical (X) is problematic. Something is either XML or it is not. If its XML, it follows the XML rules. MathML was hit by the same problem, but more so of course. RI> It seems a shame for us to lose the ability to represent certain RI> characters using names - You don't lose it. Indeed, you can use whatever names you find memorable or convenient, and for whatever characters you happen to want to represent. RI> I'm thinking specifically of ambiguous or invisible characters such RI> as or ‏. It is much more intuitive and less error prone. RI> I wonder what we should do about that. Does it make sense to include RI> files into documents that define these in the internal subset? RI> Should we extend the list of predefined entities for XML? ... Is there a short list of such ambiguous or invisible characters? Will that list grow with revisions to Unicode? (not that extending XML is likely to be a fast or cheap process, just wondering aloud how big the list is). >> From: Bjoern Hoehrmann [mailto:derhoermi@gmx.net] >> Sent: 01 July 2005 20:21 >> To: Richard Ishida >> Cc: GEO >> Subject: Re: New FAQ: entities and NCRs >> >> * Richard Ishida wrote: >> >http://www.w3.org/International/questions/qa-escapes.html >> >> I think the document should note that using "character >> entities" is not interoperable and possibly dangerous. The >> HTML Working Group has been approached several times to >> clarify whether and how implementations are supposed to >> support the pre-defined entities if they do not read the >> external subset; the HTML Working Group so far refused to >> provide such clarification, so there are a number of old >> implementations that do not support use of them in XHTML >> documents at all (to the extend that some implementations >> incorrectly reject documents that use them) and current >> implementations that support them for some document types but >> not for others. >> >> Also note that per XML 1.0 Third Edition, >> >> It is [A violation of the rules of this specification] if >> an attribute >> value contains a reference to an entity for which no declaration has >> been read. [Conforming software MAY detect and report an >> error and MAY >> recover from [this error]]. >> >> So to the extend that it is possible to have some kind of >> XHTML document that uses "character entities" in attributes >> but the user agent does not support the document type and/or >> did not process the entity declaration, it is perfectly >> permissable for the user agent to act in unexpected ways for >> the document. Robust documents do not use "character >> entities" at all unless they are pre-defined in XML 1.0 or >> declared in the internal subset. >> >> The document should also link to the relevant requirements in >> Charmod Fundamentals. >> -- >> Björn Höhrmann · mailto:bjoern@hoehrmann.de · >> http://bjoern.hoehrmann.de Weinh. Str. 22 · Telefon: >> +49(0)621/4309674 · http://www.bjoernsworld.de >> 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · >> http://www.websitedev.de/ >> -- Chris Lilley mailto:chris@w3.org Chair, W3C SVG Working Group W3C Graphics Activity Lead
Received on Monday, 4 July 2005 17:33:59 UTC