- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Fri, 17 Jun 2005 14:38:53 +0300 (EEST)
- To: Laurie Permaloff <LPERMALOFF@MULTITECH.COM>
- Cc: www-validator@w3.org
On Thu, 16 Jun 2005, Laurie Permaloff wrote: > The trademark symbol (character number 153) properly displays in my XML > file when the following XML encoding is declared: > > <?xml version="1.0" encoding="ISO-8859-1"?> Character number 153 is reserved for control functions in ISO-8859-1, so anything you see is just an attempt at error recovery. If it happens to do what you meant, the document is still in error. On the technical side, it is difficult to say what a markup validator should do. Surely character number 153 is UNUSED in the "document character set", but in this case the encoding is ISO-8859-1, so conceptually the data stream first needs to be mapped from it to ISO 10646 (Unicode). The official mapping tables at http://www.unicode.org/Public/MAPPINGS/ISO8859/8859-1.TXT maps all code positions to the positions with the same number. So it seems that formally 153 needs to be treated as corresponding to the ISO 10646 character with that number, hence declared UNUSED, hence a reportable markup error. > However, the SGML validation fails on this character. I guess you mean XML validation, though this makes no difference here. > It will only > validate properly if the (tm) character is placed or if the ™ > entity is used. But doing this yields a (?) in the XML output. You can of course use (tm) to avoid the problem, but you can also use a different encoding (such as UTF-8) where you can use the trade mark sign properly (in that encoding). But you can also use the character reference ™ (or, equivalently, using hexadecimal notation, ™), which works irrespectively of the encoding. The reason why you cannot use ™ (which should really be reported as an undefined entity reference - did you miss that?) in XML is that there is no such predefined entity in XML, unlike in HTML (including XHTML). It would work if you somehow provided a declaration for it, but that would normally be pointless, since you can use character references for casual needs, Unicode and UTF-8 enabled authoring software for more serious and more permanent needs of using a rich repertoire of characters. -- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Friday, 17 June 2005 11:39:05 UTC