Re: Why is the accented A character preferred over the equivalent character entity reference?

Roger  you may also want to read 
http://www.w3.org/International/questions/qa-escapes

Note in particular the effect of using escapes in the Czech example on 
that page which makes it rather hard to read/maintain the content if you 
are working with the source code:

Jako efektivnĕjší se nám jeví 
pořádání tzv. Road Show prostřednictvím 
našich autorizovaných dealerů v Čechách a na 
Moravě, které proběhnou v průběhu 
září a října.

Note also, however, a few instances where character escapes are a good idea.

RI



On 28/02/2013 03:01, "Martin J. Dürst" wrote:
> Hello Roger,
>
> On 2013/02/28 8:03, Costello, Roger L. wrote:
>> Hi Folks,
>>
>> In the document "Character Model for the World Wide Web 1.0:
>> Normalization" it says this at the bottom of section 3.3.3:
>>
>>      With appropriate entity definitions, instead of A´,
>>      writeÁ
>
> Just while we are at it, this is because Á will be in NFC when
> the entity reference is resolved, but A´ will not be in NFC.
>
>> (or better, use 'Á' directly).
>>
>> The statement in parenthesis is particularly intriguing. Is it
>> suggesting that Best Practice is to write this:
>>
>>     <Name>Ándre</Name>
>>
>> rather than this:
>>
>>     <Name>&xC1;ndre</Name>
>>
>> where&xC1; is the character entity reference for Á.
>>
>> Why is the former preferred over the latter?
>
> In HTML and XML (and many other formats), escapes such as character
> entity references are what their name says, escape hatches. That means
> that you should only use them in "emergency situations". In the example
> at hand, most people, starting with the bearer(s) of that name, will be
> able to read Ándre without problems. But &xC1;ndre requires table lookup
> in Unicode or some other mental gymnastics.
>
> The preference for using characters directly, rather than escapes, is
> formally put down at http://www.w3.org/TR/charmod/#C047. This is in
> "Character Model for the World Wide Web 1.0: Fundamentals", which, in
> contrast to the Normalization part you cited, is a W3C Recommendation.
> C047 says:
>
>  >>>>>>>>
> C047  [I]  [C]  Escapes SHOULD only be used when the characters to be
> expressed are not directly representable in the format or the character
> encoding of the document, or when the visual representation of the
> character is unclear.
>  >>>>>>>>
>
> The [I] says that this applies to implementers, the [C] says that this
> applies to content. The "are not directly representable" would apply if
> e.g. your document is encoded in Shift_JIS (which doesn't have 'Á'). The
> "the visual representation of the character is unclear" applies e.g. for
> &nbsp; because it may be desirable when looking at the source that
> there's a non-breaking space there rather than a plain space. It may
> also apply if you don't have an editor that can show that character, if
> you e.g. can't input it, or if you are not familiar enough with the
> character/script to make sure you get the right one. But the former two
> are rare these days, and the later should better be avoided, because the
> person inputting/checking may have the same problem when looking at an
> Unicode table.
>
>
> Regards,   Martin.
>
>


-- 
Richard Ishida, W3C
http://rishida.net/

Received on Thursday, 28 February 2013 13:32:00 UTC