W3C home > Mailing lists > Public > www-international@w3.org > April to June 2011

Re: Declaring xml:lang in a RelaxNG schema

From: Felix Sasaki <felix.sasaki@fh-potsdam.de>
Date: Tue, 24 May 2011 19:16:02 +0200
Message-ID: <BANLkTimwqdYA-bnKw+vMDYTx=xa6CsT-cQ@mail.gmail.com>
To: Chris Lilley <chris@w3.org>
Cc: www-international@w3.org, www-font@w3.org
Hello Chris,

FYI, the XML Schema for xml:lang at
Already refers to BCP 47, not RFC 3066. Note also that that definition
encompasses the empty value

   <xs:union memberTypes="xs:language">
     <xs:restriction base="xs:string">
      <xs:enumeration value=""/>

So does XML Schema 1.1



2011/5/24 Chris Lilley <chris@w3.org>

> Hello www-international,
> I'm updating a RelaxNG schema to use xml:lang rather than it's own lang
> attribute. (In fact, the WOFF schema, and at the request of the I18N Corw
> WG). But I ran into a problem in terms of the datatypes and wanted to be
> sure how to proceed hence this email.
> The snippet I am using is
>      <optional>
>        <attribute name="lang" ns="http://www.w3.org/XML/1998/namespace">
>          <value type="language"/>
>        </attribute>
>      </optional>
> because RNG uses the types system form XML Schema part 2: datatypes
> http://www.oasis-open.org/committees/relax-ng/spec-20011203.html#IDA0ZYR
> however, XML Schema datatypes seems to over-constrain the language type
> such that it can only be an RFC3066-compatible string:
> [Definition:]   language represents natural language identifiers as defined
> by by [RFC 3066]. The ·value space· of language is the set of all strings
> that are valid language identifiers as defined [RFC 3066] . The ·lexical
> space· of language is the set of all strings that conform to the pattern
> [a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})* . The ·base type· of language is token.
> http://www.w3.org/TR/xmlschema-2/#language
> My understanding was that the recommended practice was to use BCP47
> http://tools.ietf.org/rfc/bcp/bcp47.txt
> which is currently a concatenation of RFC 5646 and RFC 4647.
> Should I just drop the datatype (so it takes the default value of 'token')?
> Is there a better definition of a BCP47 language type that I should
> reference instead? Should the schema datatype be deprecated, or is it
> planned to update it?
> I had a look at the I18N QA on xml:lang
> http://www.w3.org/International/questions/qa-when-xmllang.en.php
> but that is more about se in a document instance than use in a schema
> definition; and also, it references RFC 3066 not BCP47.
> (In terms of WOFF last call, this relates to WOFF Issue 7 "xsd:NCName is
> too constraining for lang attributes" and WOFF Issue 9 "I18n-ISSUE-2: Why
> not using xml:lang? ")
> --
>  Chris Lilley   Technical Director, Interaction Domain
>  W3C Graphics Activity Lead, Fonts Activity Lead
>  Co-Chair, W3C Hypertext CG
>  Member, CSS, WebFonts, SVG Working Groups
Received on Tuesday, 24 May 2011 17:16:30 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:40:59 UTC