Re: [VE][139] Error Message Feedback

On Thu, 16 Jun 2005, Laurie Permaloff wrote:

> The trademark symbol (character number 153) properly displays in my XML
> file when the following XML encoding is declared:
>
> <?xml version="1.0" encoding="ISO-8859-1"?>

Character number 153 is reserved for control functions in ISO-8859-1,
so anything you see is just an attempt at error recovery. If it happens to
do what you meant, the document is still in error.

On the technical side, it is difficult to say what a markup validator
should do. Surely character number 153 is UNUSED in the "document
character set", but in this case the encoding is ISO-8859-1, so
conceptually the data stream first needs to be mapped from it to ISO 10646
(Unicode). The official mapping tables at
http://www.unicode.org/Public/MAPPINGS/ISO8859/8859-1.TXT
maps all code positions to the positions with the same number.
So it seems that formally 153 needs to be treated as corresponding
to the ISO 10646 character with that number, hence declared UNUSED,
hence a reportable markup error.

> However, the SGML validation fails on this character.

I guess you mean XML validation, though this makes no difference here.

> It will only
> validate properly if the (tm) character is placed or if the &trade;
> entity is used.  But doing this yields a (?) in the XML output.

You can of course use (tm) to avoid the problem, but you can also
use a different encoding (such as UTF-8) where you can use the trade mark
sign properly (in that encoding). But you can also use the character
reference &#8482; (or, equivalently, using hexadecimal notation,
&#x2122;), which works irrespectively of the encoding.

The reason why you cannot use &trade; (which should really be reported as
an undefined entity reference - did you miss that?) in XML is that there
is no such predefined entity in XML, unlike in HTML (including XHTML).
It would work if you somehow provided a declaration for it, but that
would normally be pointless, since you can use character references for
casual needs, Unicode and UTF-8 enabled authoring software for more
serious and more permanent needs of using a rich repertoire of characters.

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Received on Friday, 17 June 2005 11:39:05 UTC