- From: Tony Graham <Tony.Graham@Sun.COM>
- Date: Wed, 31 Jul 2002 17:27:31 +0100
- To: "'Www-Xsl-Fo" <www-xsl-fo@w3.org>
Use a Char. Do not use 'U+xxxx'. Arved Sandstrom wrote at 29 Jul 2002 19:15:09 -0300: > A number of properties are typed as having <character> values: "character", > "grouping-separator", and "hyphenation-character". > > <character> is described as being a single Unicode character, in Section > 5.11. > > However, the property description for fo:character embellishes this rather > terse description, and says that a <character> specifies "the code point of > the Unicode character to be presented". To me this pretty clearly means a > specification of form U+xxxx. Pick your Unicode version. Prior to Unicode 3.1, 'U+xxxx' was a 'Unicode value.' Today, "[i]n running text, an individual Unicode code point can be expressed as U+n, where n is from four to six hexadecimal digits..." A 'character' property value is hardly running text. On a different tack, is U+FB01, LATIN SMALL LIGATURE FI, one character or two? Either way, it is one code point. See Section 3.4, Strings, of the Character Model for the World Wide Web 1.0 [1]. A character is represented by a code point in Unicode, but it ends up as one or more code units in your document. > With the other 2 properties this distinction is not made; we are left with > the idea that a Unicode character, as opposed to a codepoint (or code value; i.e., one or more code units of n bits each. > the integer in other words), will be used. That is, if someone wished to use > a 3-octet UTF-8 encoded value that would seemingly be OK. If the document is encoded in UTF-8. > "grouping-separator" is defined wrt XSLT, where it is a single instance of > the XML 'Char' production, that is, a Unicode character, either UTF-8 or > UTF-16 encoded (at a minimum), or specified as #x9 | #xA | #xD | Same encoding as the rest of the document. > [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]. If you can't represent it in the current encoding, use a numeric character reference. This does raise the interesting question of what happens if I need to use a base character plus combining characters to make my grouping separator? It seems you can only use precomposed characters for grouping separator. > So our (myself and Eric Bischoff) question is, what have other implementors > elected to use? Regards, Tony Graham ------------------------------------------------------------------------ XML Technology Center - Dublin mailto:tony.graham@sun.com Sun Microsystems Ireland Ltd Phone: +353 1 8199708 Hamilton House, East Point Business Park, Dublin 3 x(70)19708 [1] http://www.w3.org/TR/charmod/#sec-Strings
Received on Wednesday, 31 July 2002 12:24:29 UTC