W3C home > Mailing lists > Public > whatwg@whatwg.org > June 2007

[whatwg] Allowed characters in attribute names (was: Re: Steps for finding one or two numbers in a string)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Wed, 13 Jun 2007 12:15:53 +0300
Message-ID: <7C0C9248-7CFC-4B45-95DA-84E6C4100395@iki.fi>
On Jun 13, 2007, at 10:27, Simon Pieters wrote:

>> I'd rather change the #tokenisation section to generate more parse  
>> errors.

Or the DOM-level conformance for embed could make non-ASCII attribute  
names non-conforming.

> Why?

When you put non-ASCII in element or attribute names (or variable and  
function names), you aren't really making your format (or software)  
international. You are more likely to *nationalize* the document  
format (or software) by creating a barrier for developers from  
outside your locale.

When you start doing a lot of stuff along the lines of sm?rg?sbord=""  
in markup, you create a barrier of inconvenience for everyone else  
but Swedes and Finns. That might be OK for you and me, but it won't  
be OK for us when people start using something that our input methods  
and cognitive background don't cover.

Compare with Chinese in markup in UOF--a nationalized fork of ODF.
(See http://blogs.msdn.com/dmahugh/archive/2007/05/22/uof-translator- 
project.aspx )

To keep markup internationally tractable, identifiers should use  
ASCII only with English-based mnemonics.

> What if you want to pass a paramater to a plugin with non-ASCII  
> characters using <embed>?

People who want that should readjust their wishes, in my opinion.

Henri Sivonen
hsivonen at iki.fi
Received on Wednesday, 13 June 2007 02:15:53 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 January 2020 16:58:56 UTC