- From: Biron,Paul V <Paul.V.Biron@kp.org>
- Date: Thu, 20 Jul 2000 14:01:53 -0700
- To: "'Stefan.Keller@lt.admin.ch'" <Stefan.Keller@lt.admin.ch>
- Cc: www-xml-schema-comments@w3.org, xml-dev@xml.org
> -----Original Message----- > From: Stefan.Keller@lt.admin.ch [SMTP:Stefan.Keller@lt.admin.ch] > Sent: Wednesday, July 19, 2000 4:47 AM > To: xmlschema-dev@w3.org; www-xml-schema-comments@w3.org; > xml-dev@xml.org > Subject: XML Schema: enumeration 'value' subtyping > > I would like to hang in a new thread as a newbie. I understand that > enumeration represents an enumerated type supported on attributes only. > [Note: I removed xmlschema-dev from the address list, since that list is for discussion of issues surrounding implementation of schema processors, not general questions/issues about XML Schema as a language]. It's beside the point, but, it is true that in DTDs enumerations could only specified for attributes, but in XML Schema enumeration is not limited to attributes. For example, given a schema fragment such as the following: <element name='building' type='languageCode'/> and an instance fragment: <building>DE</building> <!-- valid --> <building>IT</building> <!-- invalid --> > My/our requirement is now *not* only the reduction of list of permitted > values *but* to subtype one (ore more) specific enumeration values, like > EN-us, EN-uk, EN-aus, ... meaning us-english, uk-english, > australian-english, ... Here is an example: > > <simpleType name="languageCode" base="string"> > <enumeration value="DE"/> > <enumeration value="EN"/> > <enumeration value="FR"/> > <enumeration value="ZW"/> > </simpleType> > There is a built-in datatype called language [1], whose value space is the set of language codes provided for in RFC 1766, "Tags for the Identification of Languages" [2]. > Example solution with a (very probably incorrect) syntax invented by > myself: > > <specializedType > name="languageSubCode" source="simpleType" deriveBy="extension"> > <group ref="EN"> > <enumeration value="US"/ > <!-- Remark: means EN-US --> > <enumeration value="EN-UK"/> <!-- Remark: means EN-UK --> > ... > </group> > </specializedType> > The language datatype has a basic structure (its actually more complex than what follows, as described in detail in 1766) of: a 2 letter language code, optionally followed by "-" and a 2 letter country code The 2 letter language code is interpreted according to ISO 639, "Code for the representation of names of languages". The 2 letter country codes are interpreted as ISO 3166 alpha-2 country codes denoting the area in which the language is used. Thus, in the language datatype you have just what you need. Granted, the schema processor will not "understand" that "en-US" and "en-UK" are related, but your application certainly can easily be made to understand that. You can derive datatypes from language, using either enumeration or the pattern facet, such as: <simpleType name='english' base='language'> <pattern value='en(-[a-zA-Z]{1,8})*'/> </simpleType> (the country code can actually be a maximum of 8 chars, see RFC 1766 for details, the 3-8 letter country codes could be, for instance, a locale, such as "en-calif" or "en-cockney"). > We have quite some experience and practice with this type in a national > geodata description and transfer standard, called 'INTERLIS'. The > advantage > is, that we can define object- and/or codelists at national/international > level (i.e. landcover, vegetataion types, building or street classnames) > and > allow then cantons/admin.regions to make a (hierarchical) subdivision of > this enumeration. This ensures compatibility trough structural > polymorphism > down to the data integration level. > Does this help? pvb References [1] http://www.w3.org/TR/xmlschema-2/#language [2] http://www.ietf.org/rfc/rfc1766.txt
Received on Thursday, 20 July 2000 17:26:31 UTC