definition of "language" type from Kohsuke KAWAGUCHI on 2001-03-22 (www-xml-schema-comments@w3.org from January to March 2001)

From: Kohsuke KAWAGUCHI <kohsuke.kawaguchi@eng.sun.com>
Date: Thu, 22 Mar 2001 12:27:27 -0800
To: www-xml-schema-comments@w3.org
Message-Id: <20010322120521.2300.KOHSUKE.KAWAGUCHI@eng.sun.com>

Dear XML Schema WG members,

As for lexical/value space of "language" type, the spec states that

> The lexical space of language is the set of all strings that are valid
> language identifiers as defined in the language identification section
> of [XML 1.0 (Second Edition)]. 

But those production rules are thrown away in the 2nd edition. So please
refer to the 1st edition or copy production rule into the spec.


Also RFC 1766 explicitly states that the language identifiers are "to be
treated as case insensitive", whereas the current definition of XML
Schema considers that "en-US" and "EN-US" are different. If the
intention of Schema WG is to treat them as case sensitive, please explicitly
state so because this is inconsistent with RFC1766. If this is not the
intention of Schema WG, please make it a primitive type.



Furthermore, the pattern facet which I found in normative definition:

> "([a-zA-Z]{2}|[iI]-[a-zA-Z]+|[xX]-[a-zA-Z]+)(-[a-zA-Z]+)*

does not correctly model BNF specified in RFC 1766.

>    Language-Tag = Primary-tag *( "-" Subtag )
>    Primary-tag = 1*8ALPHA
>    Subtag = 1*8ALPHA

As you see, every "subtag" must be no longer than 8 characters, but this
constraint have not implemented in the normative definition. (maybe this
is a problem of XML1.0 spec)

To incorporate this constraint, pattern facet should be changed to
> "([a-zA-Z]{2}|[iI]-[a-zA-Z]+|[xX]-[a-zA-Z]{1,8})(-[a-zA-Z]{1,8})*



Also, semantics of "length" facet for "language" type is not defined at
all. This should be defined in section 4.3.1


regards,
----------------------
K.Kawaguchi
E-Mail: k-kawa@bigfoot.com

Received on Thursday, 22 March 2001 15:27:16 UTC