W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > April 2009

[Bug 6746] case-insensitivity of other than a-z and A-Z, e.g., diacritics

From: <bugzilla@wiggum.w3.org>
Date: Thu, 02 Apr 2009 16:37:52 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1LpPvM-00006a-Ff@wiggum.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=6746





--- Comment #10 from Nick Levinson <Nick_Levinson@yahoo.com>  2009-04-02 16:37:52 ---
The proposal, recapped, is for UAs and tools to recognize case-insensitivity
beyond 7-bit ASCII in order that script content (including ECMAScript),
attribute values, and possibly attribute names can be written in more languages
with less demand that authors be English-proficient.

HTML5 already intends that they be parsed.

The solution to the security issue and a research burden is to extend but not
as far as I had originally conceived. Thus, I'm narrowing my own proposal.

Include more than ASCII but not all of Unicode within the scope of
case-insensitivity required for HTML5 compliance. Include all caseless
characters and all character pairs defined by case in the simple terms of one
lower case character and one capital with no ambiguity about case, but, to ease
the research burden, include only from the upper boundary of ASCII to some
arbitrary boundary thereafter such that what the boundaries encompass are
entirely either caseless or simple case pairs. A few exceptions may exist
within a given range of characters; if so, itemize them in the HTML5 standard
as exceptions, to be treated as if caseless.

The easiest range extension seems to be U+0080 through U+00FF (yielding 256
characters when including ASCII). That excludes the Turkish situation and, to
my knowledge, has no exceptional characters. Later, additional ranges can be
defined as more research into where simple pairs and caseless characters
reside. Perhaps a Wiki can be set up to receive proposed character singles as
caseless and pairs as case-simple.

To that end, I would rename the terminology _extended level-1
case-insensitivity_ and _extended level-1 case-sensitivity_. These would be
distinct from _ASCII case-insensitivity_ and _ASCII case-sensitivity_. Level 2
and up would not be defined until warranted by research.

Thank you.

-- 
Nick


-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Thursday, 2 April 2009 16:38:01 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:00:53 UTC