Re: Declaring xml:lang in a RelaxNG schema

Hello Chris,

FYI, the XML Schema for xml:lang at
http://www.w3.org/2001/xml.xsd
Already refers to BCP 47, not RFC 3066. Note also that that definition
encompasses the empty value

<xs:simpleType>
   <xs:union memberTypes="xs:language">
    <xs:simpleType>
     <xs:restriction base="xs:string">
      <xs:enumeration value=""/>
     </xs:restriction>
    </xs:simpleType>
   </xs:union>
  </xs:simpleType>

So does XML Schema 1.1
http://www.w3.org/TR/2009/WD-xmlschema11-2-20091203/datatypes.html#language

Regards,

Felix

2011/5/24 Chris Lilley <chris@w3.org>

> Hello www-international,
>
> I'm updating a RelaxNG schema to use xml:lang rather than it's own lang
> attribute. (In fact, the WOFF schema, and at the request of the I18N Corw
> WG). But I ran into a problem in terms of the datatypes and wanted to be
> sure how to proceed hence this email.
>
> The snippet I am using is
>
>      <optional>
>        <attribute name="lang" ns="http://www.w3.org/XML/1998/namespace">
>          <value type="language"/>
>        </attribute>
>      </optional>
>
> because RNG uses the types system form XML Schema part 2: datatypes
> http://www.oasis-open.org/committees/relax-ng/spec-20011203.html#IDA0ZYR
>
> however, XML Schema datatypes seems to over-constrain the language type
> such that it can only be an RFC3066-compatible string:
>
>
> [Definition:]   language represents natural language identifiers as defined
> by by [RFC 3066]. The ·value space· of language is the set of all strings
> that are valid language identifiers as defined [RFC 3066] . The ·lexical
> space· of language is the set of all strings that conform to the pattern
> [a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})* . The ·base type· of language is token.
> http://www.w3.org/TR/xmlschema-2/#language
>
> My understanding was that the recommended practice was to use BCP47
> http://tools.ietf.org/rfc/bcp/bcp47.txt
> which is currently a concatenation of RFC 5646 and RFC 4647.
>
> Should I just drop the datatype (so it takes the default value of 'token')?
> Is there a better definition of a BCP47 language type that I should
> reference instead? Should the schema datatype be deprecated, or is it
> planned to update it?
>
> I had a look at the I18N QA on xml:lang
> http://www.w3.org/International/questions/qa-when-xmllang.en.php
> but that is more about se in a document instance than use in a schema
> definition; and also, it references RFC 3066 not BCP47.
>
>
> (In terms of WOFF last call, this relates to WOFF Issue 7 "xsd:NCName is
> too constraining for lang attributes" and WOFF Issue 9 "I18n-ISSUE-2: Why
> not using xml:lang? ")
>
>
> --
>  Chris Lilley   Technical Director, Interaction Domain
>  W3C Graphics Activity Lead, Fonts Activity Lead
>  Co-Chair, W3C Hypertext CG
>  Member, CSS, WebFonts, SVG Working Groups
>
>
>

Received on Tuesday, 24 May 2011 17:16:30 UTC