- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Wed, 28 Jul 2010 20:34:11 +0300
- To: xml-editor@w3.org
I'm sorry if this has been reported before. However, both XML 1.0 and XML 1.1 have the following line: [1][2] ]] [81] EncName ::= [A-Za-z] ([A-Za-z0-9._] | '-')* /* Encoding name contains only Latin characters */ [[ The expression "Latin characters" is confusing. It makes it possible to read the comment as saying that Latin letters only - thus no numbers, no punctuation and no non-Latin letters - are found in an encoding name. And it may also be misinterpreted to mean that even non-ASCII "Latin characters" are allowed. I assume that the expression is an attempt to assure that no one interprets [A-Za-z] to mean "any uppercase or lowercase letter, irrespective of script or charset". I.e. readers are to understand [A-Za-z] as referring to ASCII Latin letters only. Probably, in the days when non-UNICODE encodings dominated text editing, it is was common for RegEx implementations to use [A-Za-z] as a reference to UPPERCASE/lowercase letters irrespective of the script in use. (At the very least, before Mac OS X arrived, the Macintosh text editor I used myself, worked like that.) If I am obliged to suggest a replacement, then I'd say: /* A letter in an encoding name is always a Latin ASCII letter. */ or - probably better: /* [A-Za-z] refers to ASCII Latin letters only. */ [1] http://www.w3.org/TR/2008/REC-xml-20081126/#NT-EncName [2] http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncName -- leif halvard silli
Received on Thursday, 29 July 2010 12:51:51 UTC