Re: XML-editor broken (RE: XML Erratum? U+061C in XML NameStartChar) from MURATA on 2024-09-24 (public-i18n-core@w3.org from July to September 2024)

From: MURATA <eb2mmrt@gmail.com>
Date: Wed, 25 Sep 2024 07:33:23 +0900
To: Addison Phillips <addisoni18n@gmail.com>
Cc: Philippe Le Hégaret <plh@w3.org>, public-i18n-core@w3.org, Mark Davis Ⓤ <mark@unicode.org>
Message-ID: <CALvn5EDUrBOQBZoLkfNYhoaJ_bVQR66hNfPgJ0XAbmbs+ZqrmA@mail.gmail.com>

Addison,

In my understanding, nobody implements the 5th edition.  All
implementations are based on the 4th edition, which has much more
limitations on name characters.  Even if W3C creates the 6th edition,
will it be implemented?

Incidentally, I was a member of the original XML WG and was involved in
discussing name characters.  Stupidly, I thought that XML parsers can be
easily updated when more name characters are introduced.

Regards,
Makoto



2024年9月25日(水) 1:57 Addison Phillips <addisoni18n@gmail.com>:

> Hi PLH,
>
>
>
> Yesterday I tried to send the below email to xml-editor@w3.org, which is
> the address listed in the XML Spec and its errata. That mailing list
> appears to have gone out of business in 2017.
>
>
>
> Can you point me to the replacement or the process for requesting fixes
> (one of which would be to fix the part of XML1.0 5e that says to use the
> editor’s email address…)?
>
>
>
> Thanks,
>
>
>
> Addison
>
>
>
> *From:* Addison Phillips <addisoni18n@gmail.com>
> *Sent:* Monday, September 23, 2024 9:01 AM
> *To:* xml-editor@w3.org
> *Cc:* public-i18n-core@w3.org; 'Mark Davis Ⓤ' <mark@unicode.org>
> *Subject:* XML Erratum? U+061C in XML NameStartChar
>
>
>
> The I18N working group has discussed an issue [1] raised by a Unicode
> working group [2] related to the use of U+061C in XML Name
>
>
>
> https://www.w3.org/TR/REC-xml-names/#ns-qualnames
>
> https://www.w3.org/TR/xml/#sec-common-syn
>
>
>
> ```
>
> [4] | NameStartChar | ::= | ":"  \| [A-Z] \| "_" \| [a-z] \| [#xC0-#xD6]
> \| [#xD8-#xF6] \| [#xF8-#x2FF] \|  [#x370-#x37D] \| [#x37F-#x1FFF] \|
> [#x200C-#x200D] \| [#x2070-#x218F] \|  [#x2C00-#x2FEF] \| [#x3001-#xD7FF]
> \| [#xF900-#xFDCF] \| [#xFDF0-#xFFFD] \|  [#x10000-#xEFFFF]
>
> [4a] | NameChar | ::= | NameStartChar \| "-" \| "." \| [0-9] \| #xB7 \|
> [#x0300-#x036F] \| [#x203F-#x2040]
>
> [5] | Name | ::= | NameStartChar (NameChar)*
>
> [6] | Names | ::= | Name (#x20 Name)*
>
> [7] | Nmtoken | ::= | (NameChar)+
>
> [8] | Nmtokens | ::= | Nmtoken (#x20 Nmtoken)*
>
> ```
>
>
>
> XML 1.0 5e and XML Names 1.0 use the construct `NameStartChar` shown
> above. The characters in names defined using `NameStartChar` were
> deliberately limited to avoid known problem characters at the time of
> adoption. Notable among the characters excluded from names are invisible
> formatting controls.
>
>
>
> The character U+061C `ARABIC LETTER MARK` was added to Unicode in version
> 6.3 (in 2013). This character is similar to U+200F `RIGHT-TO-LEFT MARK`,
> which is not a `NameStartChar`. It is unusual that an invisible,
> non-spacing mark like this be added to Unicode. An XML name that consists
> of this single, invisible formatting control is thus valid, but it seems
> like a bug, not a feature.
>
>
>
> (This issue was encountered in creating the MessageFormat 2.0 standard at
> Unicode, where we are attempting to use NCName and Name to define valid
> identifiers).
>
>
>
> The downside, of course, is that very many implementations will not be
> aware if a change were made to `NameStartChar`.
>
>
>
> We think an erratum should be created to omit or at least note the problem
> with using U+061C in names.
>
>
>
>
>
> [1] https://github.com/w3c/i18n-activity/issues/1903
>
>
>
> Addison Phillips
>
> Chair (W3C Internationalization WG)
>
>
>
> Internationalization is not a feature.
>
> It is an architecture.
>
>
>


-- 
 --
慶應義塾大学政策・メディア研究科特任教授
村田 真

Received on Tuesday, 24 September 2024 22:34:05 UTC