W3C home > Mailing lists > Public > www-international@w3.org > April to June 2011

Declaring xml:lang in a RelaxNG schema

From: Chris Lilley <chris@w3.org>
Date: Tue, 24 May 2011 17:51:01 +0200
Message-ID: <1164306076.20110524175101@w3.org>
To: www-international@w3.org
CC: www-font@w3.org
Hello www-international,

I'm updating a RelaxNG schema to use xml:lang rather than it's own lang attribute. (In fact, the WOFF schema, and at the request of the I18N Corw WG). But I ran into a problem in terms of the datatypes and wanted to be sure how to proceed hence this email.

The snippet I am using is

      <optional>
        <attribute name="lang" ns="http://www.w3.org/XML/1998/namespace">
          <value type="language"/>
        </attribute>
      </optional>

because RNG uses the types system form XML Schema part 2: datatypes
http://www.oasis-open.org/committees/relax-ng/spec-20011203.html#IDA0ZYR

however, XML Schema datatypes seems to over-constrain the language type such that it can only be an RFC3066-compatible string:


[Definition:]   language represents natural language identifiers as defined by by [RFC 3066]. The ·value space· of language is the set of all strings that are valid language identifiers as defined [RFC 3066] . The ·lexical space· of language is the set of all strings that conform to the pattern [a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})* . The ·base type· of language is token. 
http://www.w3.org/TR/xmlschema-2/#language

My understanding was that the recommended practice was to use BCP47
http://tools.ietf.org/rfc/bcp/bcp47.txt
which is currently a concatenation of RFC 5646 and RFC 4647.

Should I just drop the datatype (so it takes the default value of 'token')? Is there a better definition of a BCP47 language type that I should reference instead? Should the schema datatype be deprecated, or is it planned to update it?

I had a look at the I18N QA on xml:lang
http://www.w3.org/International/questions/qa-when-xmllang.en.php
but that is more about se in a document instance than use in a schema definition; and also, it references RFC 3066 not BCP47.


(In terms of WOFF last call, this relates to WOFF Issue 7 "xsd:NCName is too constraining for lang attributes" and WOFF Issue 9 "I18n-ISSUE-2: Why not using xml:lang? ")


-- 
 Chris Lilley   Technical Director, Interaction Domain                 
 W3C Graphics Activity Lead, Fonts Activity Lead
 Co-Chair, W3C Hypertext CG
 Member, CSS, WebFonts, SVG Working Groups
Received on Tuesday, 24 May 2011 15:52:06 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 24 May 2011 15:52:07 GMT