W3C home > Mailing lists > Public > www-international@w3.org > July to September 2001

RE: Unicode character names

From: Mike Ksar <mikeksar@microsoft.com>
Date: Sun, 26 Aug 2001 08:32:46 -0700
Message-ID: <72129E9450B396458A1149FA7AFAD8CA03D8B420@red-msg-05.redmond.corp.microsoft.com>
To: "Mark Davis" <mark@macchiato.com>, <www-international@w3.org>, "Susan Lesch" <lesch@w3.org>
Cc: <fyergeau@alis.com>, <book@unicode.org>
In addition to what Mark Davis said, it is important to note that the
character names are NORMATIVE parts of both Unicode and ISO 10646.  Mark
is referring to the character name as the “formal” name.

 

Mike Ksar

 

-----Original Message-----
From: Mark Davis [mailto:mark@macchiato.com] 
Sent: Saturday, August 25, 2001 10:13 PM
To: www-international@w3.org; Susan Lesch
Cc: fyergeau@alis.com
Subject: Re: Unicode character names

 

The formal character name is designed to serve as an identification
across different character encoding standards, especially within ISO
standards. It's primary function is to be a unique identifier across
these standards, and within the Unicode Standard. It may or may not be
the best description of the function and usage of the character in the
Standard. In many cases it is, and in some it is not. [The NameList
alone is insufficient to determine the complete semantics of the
character according; many characters have complex semantics and usage,
and thus have detailed descriptions in a relevant script chapter or
elsewhere in the standard.]

 

The Unicode 1.0 name was the formal character name in the first version
of Unicode. As the quote says, it was changed in the merger with ISO
10646, and can now be considered an alias -- but does not have a
privilidged status among the aliases. Thus U+002E could be referred to
as a period, as a full stop, as a dot, as a decimal point (in some
countries), and so on. All are reasonable aliases, and FULL STOP is also
the formal name. There may also be aliases that are not yet mentioned in
the standard. For example:

 

eroteme \Er"o*teme\, n. [Gr. ? question.] A mark indicating a question;
a note of interrogation. 

Source
<http://www.dictionary.com/cgi-bin/dict.pl?config=about&term=00-database

-info&db=web1913> : Webster's Revised Unabridged Dictionary, © 1996,
1998 MICRA, Inc.

Note: The convention in the Unicode Standard within body text is for
aliases to be in italic, and the formal name in small caps.

 

I hope this helps,

 

Mark

—————

 

Γνῶθι σαυτόν — Θαλῆς
[http://www.macchiato.com]

----- Original Message ----- 

From: "Susan Lesch" <lesch@w3.org <mailto:lesch@w3.org> >

To: <www-international@w3.org <mailto:www-international@w3.org> >

Cc: <fyergeau@alis.com <mailto:fyergeau@alis.com> >

Sent: Saturday, August 25, 2001 18:39

Subject: Unicode character names

 

> Hello,
> 
> I am preparing a W3C publications guide, and would like to link to
your
> response to the question below. I plan to recommend that editors of
W3C
> specifications refer to characters by their correct Unicode names.
> 
>  From _The Unicode Standard, Version 3.0_ (sorry that is the latest
hard
> copy available to me at this time), page 101:
> 
>    "The Unicode 1.0 character name is an informative property of the
>    characters defined in Version 1.0 of the Unicode Standard. The
>    names of Unicode characters were changed in the process of merging
>    the standard with ISO/IEC 10646. The Version 1.0 character names
>    can be obtained from the CD-ROM accompanying the standard or from
>    the ftp site. See also Appendix D, Changes from Unicode Version
>    2.0. Where the Version 1.0 character name provides additional
>    useful information, it is listed in Chapter 14, Code Charts. For
>    example, U+00B6 PILCROW SIGN has its Version 1.0 name, PARAGRAPH
>    SIGN, listed for clarity."
> 
> To select an example with more variables from the ftp site at
> ftp://ftp.unicode.org/Public/3.1-Update/NamesList-3.1.0.txt

<ftp://ftp.unicode.org/Public/3.1-Update/NamesList-3.1.0.txt> 
> 
> 002E FULL STOP
> = PERIOD
> = dot, decimal point
> * may be rendered as a raised decimal point in old style numbers
> x (arabic full stop - 06D4)
> x (ideographic full stop - 3002)
> 
> Thanks to the I18n Working Group, I learned that PERIOD is the Unicode
> 1.0 name, and "dot" and "decimal point" are acceptable aliases.
> 
> My question is this. Is PERIOD outdated? Is it correct to refer to
this
> character (.) as "full stop, dot, or decimal point, and NOT period"?
> (Or is PERIOD capitalized to show it is the best alias?)
> 
> Thank you,
> -- 
> Susan Lesch - mailto:lesch@w3.org <mailto:lesch@w3.org>
tel:+1.858.483.4819
> World Wide Web Consortium (W3C) - http://www.w3.org/

<http://www.w3.org/> 
> 
> 

Received on Sunday, 26 August 2001 11:33:53 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:57 GMT