- From: Richard Ishida <ishida@w3.org>
- Date: Thu, 28 Feb 2013 13:31:30 +0000
- To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
- CC: "Costello, Roger L." <costello@mitre.org>, "www-international@w3.org" <www-international@w3.org>
Roger you may also want to read http://www.w3.org/International/questions/qa-escapes Note in particular the effect of using escapes in the Czech example on that page which makes it rather hard to read/maintain the content if you are working with the source code: Jako efektivnĕjší se nám jeví pořádání tzv. Road Show prostřednictvím našich autorizovaných dealerů v Čechách a na Moravě, které proběhnou v průběhu září a října. Note also, however, a few instances where character escapes are a good idea. RI On 28/02/2013 03:01, "Martin J. Dürst" wrote: > Hello Roger, > > On 2013/02/28 8:03, Costello, Roger L. wrote: >> Hi Folks, >> >> In the document "Character Model for the World Wide Web 1.0: >> Normalization" it says this at the bottom of section 3.3.3: >> >> With appropriate entity definitions, instead of A´, >> writeÁ > > Just while we are at it, this is because Á will be in NFC when > the entity reference is resolved, but A´ will not be in NFC. > >> (or better, use 'Á' directly). >> >> The statement in parenthesis is particularly intriguing. Is it >> suggesting that Best Practice is to write this: >> >> <Name>Ándre</Name> >> >> rather than this: >> >> <Name>&xC1;ndre</Name> >> >> where&xC1; is the character entity reference for Á. >> >> Why is the former preferred over the latter? > > In HTML and XML (and many other formats), escapes such as character > entity references are what their name says, escape hatches. That means > that you should only use them in "emergency situations". In the example > at hand, most people, starting with the bearer(s) of that name, will be > able to read Ándre without problems. But &xC1;ndre requires table lookup > in Unicode or some other mental gymnastics. > > The preference for using characters directly, rather than escapes, is > formally put down at http://www.w3.org/TR/charmod/#C047. This is in > "Character Model for the World Wide Web 1.0: Fundamentals", which, in > contrast to the Normalization part you cited, is a W3C Recommendation. > C047 says: > > >>>>>>>> > C047 [I] [C] Escapes SHOULD only be used when the characters to be > expressed are not directly representable in the format or the character > encoding of the document, or when the visual representation of the > character is unclear. > >>>>>>>> > > The [I] says that this applies to implementers, the [C] says that this > applies to content. The "are not directly representable" would apply if > e.g. your document is encoded in Shift_JIS (which doesn't have 'Á'). The > "the visual representation of the character is unclear" applies e.g. for > because it may be desirable when looking at the source that > there's a non-breaking space there rather than a plain space. It may > also apply if you don't have an editor that can show that character, if > you e.g. can't input it, or if you are not familiar enough with the > character/script to make sure you get the right one. But the former two > are rare these days, and the later should better be avoided, because the > person inputting/checking may have the same problem when looking at an > Unicode table. > > > Regards, Martin. > > -- Richard Ishida, W3C http://rishida.net/
Received on Thursday, 28 February 2013 13:32:00 UTC