- From: Addison Phillips <addisoni18n@gmail.com>
- Date: Mon, 9 Mar 2026 08:59:56 -0700
- To: public-i18n-core@w3.org
All,
I have an action item [1] to check our guidelines for clarity on Unicode
subsets for identifiers. This is a quick summary of what I found and
propose to do about it.
Specdev added section 8.3 relatively recently. This contains guidance
for specs on "application internal identifiers" [2] and for identifiers
that are not "application internal identifiers" (effectively everything
else). Note that charmod-norm ("String Matching") is the source for
section 8.3's guidance [3].
Application internal identifiers are meant to be "never shown to users
and are always used for matching or processing within an application or
protocol" and our guidance is that these should be case-insensitive
printable ASCII. The 2119 keyword here is SHOULD.
Other identifiers we recommend (SHOULD) allow Unicode characters and be
case and normalization sensitive.
We have some additional guidance that I won't go into here, mostly to do
with non-character code points.
We are missing clear guidance on how spec authors should decide between
these two regimes. We also don't provide guidance on how best to subset
ASCII for identifiers. Note that identifiers in this context include the
spec-local syntax or domain specific languages (DSL), as well as
user-defined values (such as variable names).
We should probably develop guidelines to cover the decision tree. Very
few DSLs are *never* shown to end users. The other factor is whether the
syntax is machine generated from business objects ("data") that might
include non-ASCII values. We want to stay out of situations like CSS has
(with two regimes simultaneously). We also want to document best
practices in choosing start and part characters and at least mention
bidi controls (which present spoofing problems if not addressed).
I think the next step is to propose a pull request outlining proposed
additional guidelines. That PR would also include some addition of
internal pointers from other parts of specdev that talk about
identifiers, notably section 6.2, to make navigation easier. I do not
propose to edit charmod-norm at this time, although we probably could
add a pointer back to specdev.
Look forward to discussing,
Addison
[1] https://github.com/w3c/i18n-actions/issues/207
[2] https://www.w3.org/TR/charmod-norm/#dfn-application-internal-identifier
[3] https://www.w3.org/TR/charmod-norm/#specifying-content-restrictions
--
Internationalization is not a feature.
It is an architecture.
Received on Wednesday, 11 March 2026 00:25:29 UTC