RE: Declaring xml:lang in a RelaxNG schema

Hi Chris,

In addition to Felix’s comment, let me add: the RFC 3066 language tag grammar is a *superset* of the BCP47 grammar, that is, the value space it represents is *less* constrained than the current BCP 47 value space. This is by design: all BCP 47 well-formed language tags are also well-formed in RFC 3066 terms. Some RFC 3066 “well-formed” tags are not well-formed or valid in BCP 47 terms, but such tags were never valid language tags.

The lexical space defined by xs:language is the same as BCP47 (RFC 5646) production “obs-language-tag”. See Section 2.2.9 (Classes of Conformance) [1]. For reasons of compatibility, it makes sense to allow the larger (RFC 3066) range of language tags to be “well-formed” (in the XML sense), even though the reference is to BCP 47 and language tag validators should use the more strict requirements of BCP 47. Note that language tag matching (BCP 47, RFC 4647) depends only on obs-language-tag.

Hope that helps,


Addison Phillips
Globalization Architect (Lab126)
Editor (IETF BCP 47)
Chair (W3C I18N WG)

Internationalization is not a feature.
It is an architecture.


From: [] On Behalf Of Felix Sasaki
Sent: Tuesday, May 24, 2011 10:16 AM
To: Chris Lilley
Subject: Re: Declaring xml:lang in a RelaxNG schema

Hello Chris,

FYI, the XML Schema for xml:lang at

Already refers to BCP 47, not RFC 3066. Note also that that definition encompasses the empty value


   <xs:union memberTypes="xs:language">


     <xs:restriction base="xs:string">

      <xs:enumeration value=""/>




So does XML Schema 1.1



2011/5/24 Chris Lilley <<>>
Hello www-international,

I'm updating a RelaxNG schema to use xml:lang rather than it's own lang attribute. (In fact, the WOFF schema, and at the request of the I18N Corw WG). But I ran into a problem in terms of the datatypes and wanted to be sure how to proceed hence this email.

The snippet I am using is

       <attribute name="lang" ns="">
         <value type="language"/>

because RNG uses the types system form XML Schema part 2: datatypes

however, XML Schema datatypes seems to over-constrain the language type such that it can only be an RFC3066-compatible string:

[Definition:]   language represents natural language identifiers as defined by by [RFC 3066]. The ·value space· of language is the set of all strings that are valid language identifiers as defined [RFC 3066] . The ·lexical space· of language is the set of all strings that conform to the pattern [a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})* . The ·base type· of language is token.

My understanding was that the recommended practice was to use BCP47

which is currently a concatenation of RFC 5646 and RFC 4647.

Should I just drop the datatype (so it takes the default value of 'token')? Is there a better definition of a BCP47 language type that I should reference instead? Should the schema datatype be deprecated, or is it planned to update it?

I had a look at the I18N QA on xml:lang

but that is more about se in a document instance than use in a schema definition; and also, it references RFC 3066 not BCP47.

(In terms of WOFF last call, this relates to WOFF Issue 7 "xsd:NCName is too constraining for lang attributes" and WOFF Issue 9 "I18n-ISSUE-2: Why not using xml:lang? ")

 Chris Lilley   Technical Director, Interaction Domain
 W3C Graphics Activity Lead, Fonts Activity Lead
 Co-Chair, W3C Hypertext CG
 Member, CSS, WebFonts, SVG Working Groups

Received on Tuesday, 24 May 2011 17:36:00 UTC