[Bug 6746] case-insensitivity of other than a-z and A-Z, e.g., diacritics from bugzilla@wiggum.w3.org on 2009-03-30 (public-html-bugzilla@w3.org from March 2009)

From: <bugzilla@wiggum.w3.org>
Date: Mon, 30 Mar 2009 00:57:06 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1Lo5oI-0006LL-GC@wiggum.w3.org>

http://www.w3.org/Bugs/Public/show_bug.cgi?id=6746





--- Comment #4 from Nick Levinson <Nick_Levinson@yahoo.com>  2009-03-30 00:57:06 ---
Interesting. I don't know the Turkish situation. Maybe someone else can explore
it and any similar situations around the world. We don't need to add a security
hole; perhaps there's a solution that meets both sets of needs.

On whether ASCII is all that is of interest, the standard,
<http://www.w3.org/html/wg/markup-spec/>, as accessed today (29th), defines
_case-insensitivity_ separately from _ASCII case-insensitivity_. The term
"case-insensitive" fits within the term "ASCII case-insensitive", so defining
both as separate semantic entities only makes sense in a concise document if
meanings are at least subtly different. Both offer essentially the same
definitions as to the 26 letters. No other character within 7-bit ASCII, to my
knowledge, is subject to case differentiation. So case-insensitivity that is
not ASCII case-insensitivity must encompass, either now or in the future,
non-ASCII case-insensitivity. Non-ASCII case-insensitivity, if not to be
redundant, must encompass letters other than the 26. I assume that includes not
only diacritically-marked letters (we treat all of them for computer purposes
as not of the 26) but also some like the yogh, the thorn, and the edh, which
have case (I don't know if they come with diacriticals).

Attribute names may consist of almost any Unicode character (per id., section
5.6), thus of letters not of the 26. If no attribute is now spelled with a
letter not of the 26, section 5.6 anticipates such attributes being added
later. Attribute values may be spelled with almost any Unicode character (per
id., sections 5.6 (value) and 5.7 (text)), thus of letters not of the 26, and
that's now, not just in the future. Scripts may use almost any Unicode
character (per id., sections 5.5 and 5.7), thus again letters not of the 26. 

Does this mean the Turkish issue is already an issue in HTML5? I don't know
enough to answer that.

Should HTML5 and compliant user agents and tools treat a letter not of the 26
case-insensitively or -sensitively when found in a attribute value or name? I
would favor insensitivity for those contexts, for the sake of consistency and
meeting authors' expectations. I would extend case-insensitivity within a
context from ASCII to non-ASCII, although not from contexts where any
insensitivity is now required by HTML to contexts where it is not, such as
phrasing content or what normally appears visibly to a user in a browser
window.

On the other hand, I would favor case-sensitivity within scripts, albeit not
for attribute names and values for the script element, because script content
is often not HTML and thus must follow the requirements that apply to a script
language such as JavaScript, which HTML should not constrain any more than it
may have to.

Thank you for helping me bring the argument more tightly within HTML5.

-- 
Nick


-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Monday, 30 March 2009 00:57:14 UTC