Hex NCRs in generated XML: nice, but hardly essential.

Hello www-i18n-comments,

In Character Model for the World Wide Web 1.0: Fundamentals

we read:

http://www.w3.org/TR/2005/REC-charmod-20050215/#C043

  C043 [S] The number of different ways to escape a character SHOULD be
  minimized (ideally to one).

  A well-known counter-example is that for historical reasons, both HTML
  and XML have redundant decimal (&#ddddd;) and hexadecimal (&#xhhhh;)
  character escapes.

Yes. Given that XML does, as noted, have both of them, we find that

http://www.w3.org/TR/2005/REC-charmod-20050215/#C048

  C048 [I] [C] Content SHOULD use the hexadecimal form of character
  escapes rather than the decimal form when there are both.

  NOTE: The hexadecimal form is preferred because character encoding
  standards (in particular Unicode) usually list character numbers as
  hexadecimal, making lookup easier.

to be overly strong. Its certainly sound advice for hand authors, and a
content creation tool might well be coded up to choose hex rather than
decimal escapes, since it makes no particular difference which to use.
Requiring all content to use hex NCRs, though, seems rather strong.
Saying that software which emits XML does not conform because it allows
decimal NCRs to be generated is also overly strong - fair enough for
NCRs that are machine generated, but if the author put them in then
software has no real business changing them.


It slightly increases readability (though not as much as using the actual
character does), but so does a two-character indent or other forms of
pretty printing.


-- 
 Chris Lilley                    mailto:chris@w3.org
 Chair, W3C SVG Working Group
 W3C Graphics Activity Lead
 Co-Chair, W3C Hypertext CG

Received on Monday, 27 March 2006 16:17:12 UTC