XML Erratum? U+061C in XML NameStartChar

The I18N working group has discussed an issue [1] raised by a Unicode working group [2] related to the use of U+061C in XML Name

 

https://www.w3.org/TR/REC-xml-names/#ns-qualnames

https://www.w3.org/TR/xml/#sec-common-syn

 

```

[4] | NameStartChar | ::= | ":"  \| [A-Z] \| "_" \| [a-z] \| [#xC0-#xD6] \| [#xD8-#xF6] \| [#xF8-#x2FF] \|  [#x370-#x37D] \| [#x37F-#x1FFF] \| [#x200C-#x200D] \| [#x2070-#x218F] \|  [#x2C00-#x2FEF] \| [#x3001-#xD7FF] \| [#xF900-#xFDCF] \| [#xFDF0-#xFFFD] \|  [#x10000-#xEFFFF]

[4a] | NameChar | ::= | NameStartChar \| "-" \| "." \| [0-9] \| #xB7 \| [#x0300-#x036F] \| [#x203F-#x2040]

[5] | Name | ::= | NameStartChar (NameChar)*

[6] | Names | ::= | Name (#x20 Name)*

[7] | Nmtoken | ::= | (NameChar)+

[8] | Nmtokens | ::= | Nmtoken (#x20 Nmtoken)*

```

 

XML 1.0 5e and XML Names 1.0 use the construct `NameStartChar` shown above. The characters in names defined using `NameStartChar` were deliberately limited to avoid known problem characters at the time of adoption. Notable among the characters excluded from names are invisible formatting controls.

 

The character U+061C `ARABIC LETTER MARK` was added to Unicode in version 6.3 (in 2013). This character is similar to U+200F `RIGHT-TO-LEFT MARK`, which is not a `NameStartChar`. It is unusual that an invisible, non-spacing mark like this be added to Unicode. An XML name that consists of this single, invisible formatting control is thus valid, but it seems like a bug, not a feature.

 

(This issue was encountered in creating the MessageFormat 2.0 standard at Unicode, where we are attempting to use NCName and Name to define valid identifiers).

 

The downside, of course, is that very many implementations will not be aware if a change were made to `NameStartChar`.

 

We think an erratum should be created to omit or at least note the problem with using U+061C in names.

 

 

[1] https://github.com/w3c/i18n-activity/issues/1903

 

Addison Phillips

Chair (W3C Internationalization WG)

 

Internationalization is not a feature.

It is an architecture.

 

Received on Monday, 23 September 2024 16:00:45 UTC