- From: Mark Davis <mark.davis@us.ibm.com>
- Date: Wed, 21 Feb 2001 13:56:09 -0800
- To: Karlsson Kent - keka <keka@im.se>
- Cc: "'duerst@w3.org'" <duerst@w3.org>, "'www-i18n-comments@w3.org'" <www-i18n-comments@w3.org>, misha.wolf@reuters.com, "'Asmus Freytag'" <asmusf@ix.netcom.com>, "'Kenneth Whistler'" <kenw@sybase.com>
Brief comments with *** This example touches on what the term "character" means. But using the term "character" in the 10646/Unicode sense, the fi ligature stores two letters in a single character (which in some encodings fit in a single 'unit of physical storage'). Not two characters in a single character (which in some encodings fit in a single 'unit of physical storage')... *** To be more accurate, for the case of the fi ligature, a sequence of two abstract characters in a particular presentation form are represented by a single encoded character > On the other hand, if we leave out many-to-one, readers > will ask why. My reaction was: why is many-to-one left *in*... If what you are talking about here are such things as the "squared" ligatures and other ligatures, then that should be made explicit. Side remark: The fi ligature is especially unfortunate. Some softwares automatically replaces fi with the fi ligature, and have no other means (yet) of handling ligatures. They then miss out on fj resulting in poor typographic result for words like fjarde (fourth), fjord, fjolaret (the previous year), fjall (scales or mountain...). *** We should definitely leave in the discussion of many:one relationships -- can't hide that bit of ugliness. > > >* clause 3.2 > >There is no definition of terms in the document. Terms such as "byte" and > >"wyde" are left for the reader to guess, likewise for "octet", though that > >is more precise. Note that some well-known standards (such as that for C) > >does NOT limit a "byte" to be an "octet". > > Does anything in the spec not work out because the reader doesn't > know what a byte is? I don't know, but if that's not the case, > then we don't have to be more precise, or do we? After seeing the recent discussion on the "Open Group" e-mail list about the next version of POSIX, where a discussion thread is going on and on about 9-bit bytes, 10-bit bytes (for historic architectures) and the eventual possibility of 16-bit bytes, I find it best to avoid the term byte all-together and just write octet. *** I agree about not using wyde (certainly not without a definition). Byte, on the other hand, is simply much better understood than octet. One can qualify it on first use by saying that it is always 8-bit. > > >"code point"...; "code position" seems to be the 10646 term, though not > >formally defined. > > We checked that, you are right. I think we decided to add > "code position" in parenthesis to give the link to 10646 terminology. Or just write "code position" throughout... (I think it's a better term, since it does not involve the term "point", which has other connotations.) *** While defined in 10646, it is rarely used in 10646. Code point is used throughout the Unicode documents. "Position" also has its own connotations.
Received on Wednesday, 21 February 2001 16:56:22 UTC