Re: Why is the accented A character preferred over the equivalent character entity reference? from Richard Ishida on 2013-02-28 (www-international@w3.org from January to March 2013)

From: Richard Ishida <ishida@w3.org>
Date: Thu, 28 Feb 2013 13:31:30 +0000
To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
CC: "Costello, Roger L." <costello@mitre.org>, "www-international@w3.org" <www-international@w3.org>
Message-ID: <512F5C32.9040107@w3.org>

Roger  you may also want to read 
http://www.w3.org/International/questions/qa-escapes

Note in particular the effect of using escapes in the Czech example on 
that page which makes it rather hard to read/maintain the content if you 
are working with the source code:

Jako efektivn&#x115;j&#x161;&#xED; se n&#xE1;m jev&#xED; 
po&#x159;&#xE1;d&#xE1;n&#xED; tzv. Road Show prost&#x159;ednictv&#xED;m 
na&#x161;ich autorizovan&#xFD;ch dealer&#x16F; v &#x10C;ech&#xE1;ch a na 
Morav&#x11B;, kter&#xE9; prob&#x11B;hnou v pr&#x16F;b&#x11B;hu 
z&#xE1;&#x159;&#xED; a &#x159;&#xED;jna.

Note also, however, a few instances where character escapes are a good idea.

RI



On 28/02/2013 03:01, "Martin J. Dürst" wrote:
> Hello Roger,
>
> On 2013/02/28 8:03, Costello, Roger L. wrote:
>> Hi Folks,
>>
>> In the document "Character Model for the World Wide Web 1.0:
>> Normalization" it says this at the bottom of section 3.3.3:
>>
>>      With appropriate entity definitions, instead of A&acute;,
>>      write&Aacute;
>
> Just while we are at it, this is because &Aacute; will be in NFC when
> the entity reference is resolved, but A&acute; will not be in NFC.
>
>> (or better, use 'Á' directly).
>>
>> The statement in parenthesis is particularly intriguing. Is it
>> suggesting that Best Practice is to write this:
>>
>>     <Name>Ándre</Name>
>>
>> rather than this:
>>
>>     <Name>&xC1;ndre</Name>
>>
>> where&xC1; is the character entity reference for Á.
>>
>> Why is the former preferred over the latter?
>
> In HTML and XML (and many other formats), escapes such as character
> entity references are what their name says, escape hatches. That means
> that you should only use them in "emergency situations". In the example
> at hand, most people, starting with the bearer(s) of that name, will be
> able to read Ándre without problems. But &xC1;ndre requires table lookup
> in Unicode or some other mental gymnastics.
>
> The preference for using characters directly, rather than escapes, is
> formally put down at http://www.w3.org/TR/charmod/#C047. This is in
> "Character Model for the World Wide Web 1.0: Fundamentals", which, in
> contrast to the Normalization part you cited, is a W3C Recommendation.
> C047 says:
>
>  >>>>>>>>
> C047  [I]  [C]  Escapes SHOULD only be used when the characters to be
> expressed are not directly representable in the format or the character
> encoding of the document, or when the visual representation of the
> character is unclear.
>  >>>>>>>>
>
> The [I] says that this applies to implementers, the [C] says that this
> applies to content. The "are not directly representable" would apply if
> e.g. your document is encoded in Shift_JIS (which doesn't have 'Á'). The
> "the visual representation of the character is unclear" applies e.g. for
> &nbsp; because it may be desirable when looking at the source that
> there's a non-breaking space there rather than a plain space. It may
> also apply if you don't have an editor that can show that character, if
> you e.g. can't input it, or if you are not familiar enough with the
> character/script to make sure you get the right one. But the former two
> are rare these days, and the later should better be avoided, because the
> person inputting/checking may have the same problem when looking at an
> Unicode table.
>
>
> Regards,   Martin.
>
>


-- 
Richard Ishida, W3C
http://rishida.net/

Received on Thursday, 28 February 2013 13:32:00 UTC