[bp-i18n-specdev] Should include advice on specifying what a letter is. from spemberton via GitHub on 2016-07-11 (www-international@w3.org from July to September 2016)

From: spemberton via GitHub <sysbot+gh@w3.org>
Date: Mon, 11 Jul 2016 13:06:32 +0000
To: www-international@w3.org
Message-ID: <issues.opened-164832465-1468242390-sysbot+gh@w3.org>

spemberton has just created a new issue for 
https://github.com/w3c/bp-i18n-specdev:

== Should include advice on specifying what a letter is. ==
Several specifications define "names". As one example, XML says 
(https://www.w3.org/TR/REC-xml/#NT-Nmtoken)

NameStartChar      ::=          ":" | [A-Z] | "_" | [a-z] | 
[#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | 
[#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] |
 [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | 
[#x10000-#xEFFFF]

NameChar           ::=          NameStartChar | "-" | "." | [0-9] | 
#xB7 | [#x0300-#x036F] | [#x203F-#x2040]

It is really not clear where these list of characters come from, and 
why some of these are acceptable as name characters, and others not.

Unicode has the concept of 'category values', 
http://www.unicode.org/reports/tr44/#General_Category_Values that 
classify characters as, for instance "Uppercase_Letter", 
"Lowercase_Letter", etc.

It seems to me that it would be good advice for specification writers 
to use the Unicode Category Values as basis for defining (amongst 
other things) names, rather than apparently randomly chosen lists of 
character numbers.

See https://github.com/w3c/bp-i18n-specdev/issues/16
Please do NOT reply to this email. If you'd like to contribute to the 
discussion, please do so at the above link. You will need to subscribe
 yourself to the issue (using the button provided by that page) to 
receive notifications of further comments.

Received on Monday, 11 July 2016 13:06:41 UTC