- From: Jon Hanna <jon@spin.ie>
- Date: Mon, 15 Jul 2002 17:23:33 +0100
- To: <w3c-wai-ig@w3.org>
> I too have been looking for a standard set of icons. > Additionally, though, since I am blind I am really looking for a > table that would be like: > description:code - that is, two text columns. Are http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent and http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent readable to you? > I am unfamiliar with the methods to do this so probably confuse things. > There are "entities" in HTML such as &sup for superscript that > JFW 4.01 reads very well (JFW says "superscript"). ⊃ means "superset of", a mathematical symbol that looks like a bit like a capital U or a chicken-wire nail on its side. If JFW is reading that as "superscript" it is in error. There are entities ¹ ² and ³ for "superscript digit 1", "superscript digit 2" and "superscript digit 3" respectively, is that what you are referring to? > Apparently there is also unicode and SGML. Okay there seems to be some confusion here. SGML and XML both define rules of syntax (amongst other things) that are used by other applications. HTML used to be an SGML application until around 3 years ago. It is now an XML application. XML does not define any character set, character encoding, or any other way of defining a relationship between an collection of bits and a character. What it *does* do is firstly use one of these encodings in that it is written as text, and hence must be written in some sort of character set, and secondly define mechanisms for the author of the document to express characters that are either outside of the character set being used, illegal at the current position, or were it is simply more convenient for the author to use the mechanism. Now while XML can be encoded in any character set (there are problems with the term "character set", but this mail is looking like it's going to be long already so I'll skip that for now) that contains symbols for < > / and at least one of " and ' it should always be thought of as being in the Universal Character Set (UCS). In other words if one XML document is encoded in UTF-16BE, another in UTF-16LE, another in UTF-8 and another in the Windows Western character set then to indicate a trademark sign directly the first could use the byte values 33 followed by 34, the second could use bytes 34 the 33, the third bytes of value 226 then 132, then 162 and the latter by a byte of value 153. However all of these different ways of encoding the character are just different conventions for the server indicating to the client that it means UCS character 8482 - and hence that it means a trademark symbol. Once the bytes are loaded into the browser it has had the character 8482 communicated to it and how this happened doesn't matter, just like it doesn't matter if you are reading the number 8482 in this mail on a screen, printed page, Braille reader, or it is read to you by a screen reader; what matters is that I have gotten 8482 from here to there. Character entities are, as I said, a way of indicating a character point that it is impossible or inconvenient to express in the character set that is being used to encode the document. Because the author is here stating a character point in code it MUST always be the UCS value that is used, therefore you would use ™ or ™ to mean the trademark symbol (these are the same number, but the latter uses hexadecimal, the former decimal) no matter what character set was used to encode the document. Hence ™ is completely wrong. However because there is no character at all at position 153 it is so completely wrong that a browser can realise that the author must have made a mistake and try to guess what the author meant to do, which is why it works on some browsers. The named entities like ™ are another feature of XML (and SGML before it). An author of a DTD can define various named entities that are to be replaced by something else when it is encountered. The DTD for HTML points to the three files I gave URLs for above. In one of these there is the code: <!ENTITY trade "™"> which means that whenever a browser comes across ™ it should replace it with ™ which as we noted above means the trademark symbol.
Received on Monday, 15 July 2002 12:23:26 UTC