RE: XML Schema: enumeration 'value' subtyping

> -----Original Message-----
> From:	Stefan.Keller@lt.admin.ch [SMTP:Stefan.Keller@lt.admin.ch]
> Sent:	Wednesday, July 19, 2000 4:47 AM
> To:	xmlschema-dev@w3.org; www-xml-schema-comments@w3.org;
> xml-dev@xml.org
> Subject:	XML Schema: enumeration 'value' subtyping
> 
> I would like to hang in a new thread as a newbie. I understand that
> enumeration represents an enumerated type supported on attributes only.
> 
[Note: I removed xmlschema-dev from the address list, since that list is for
discussion of issues surrounding implementation of schema processors, not
general questions/issues about XML Schema as a language].

It's beside the point, but, it is true that in DTDs enumerations could only
specified for attributes, but in XML Schema enumeration is not limited to
attributes.  For example, given a schema fragment such as the following:

	<element name='building' type='languageCode'/>

and an instance fragment:

	<building>DE</building> <!-- valid -->
	<building>IT</building>   <!-- invalid -->

> My/our requirement is now *not* only the reduction of list of permitted
> values *but* to subtype one (ore more) specific enumeration values, like
> EN-us, EN-uk, EN-aus, ... meaning us-english, uk-english,
> australian-english, ... Here is an example:
> 
> <simpleType name="languageCode" base="string"> 
>   <enumeration value="DE"/> 
>   <enumeration value="EN"/> 
>   <enumeration value="FR"/> 
>   <enumeration value="ZW"/> 
> </simpleType> 
> 
There is a built-in datatype called language [1], whose value space is the
set of language codes provided for in RFC 1766, "Tags for the Identification
of Languages" [2]. 

> Example solution with a (very probably incorrect) syntax invented by
> myself:
> 
> <specializedType 
>  name="languageSubCode" source="simpleType" deriveBy="extension"> 
>   <group ref="EN"> 
>     <enumeration value="US"/ >    <!-- Remark: means EN-US -->
>     <enumeration value="EN-UK"/>  <!-- Remark: means EN-UK -->
>     ...
>   </group>
> </specializedType> 
> 
The language datatype has a basic structure (its actually more complex than
what follows, as described in detail in 1766) of:

	a 2 letter language code,
	optionally followed by "-" and a 2 letter country code

The 2 letter language code is interpreted according to ISO 639, "Code for
the representation of names of languages".  The 2 letter country codes are
interpreted as ISO 3166 alpha-2 country codes denoting the area in which the
language is used.

Thus, in the language datatype you have just what you need.  Granted, the
schema processor will not "understand" that "en-US" and "en-UK" are related,
but your application certainly can easily be made to understand that.

You can derive datatypes from language, using either enumeration or the
pattern facet, such as:

	<simpleType name='english' base='language'>
		<pattern value='en(-[a-zA-Z]{1,8})*'/>
	</simpleType>

(the country code can actually be a maximum of 8 chars, see RFC 1766 for
details, the 3-8 letter country codes could be, for instance, a locale, such
as "en-calif" or "en-cockney").

> We have quite some experience and practice with this type in a national
> geodata description and transfer standard, called 'INTERLIS'. The
> advantage
> is, that we can define object- and/or codelists at national/international
> level (i.e. landcover, vegetataion types, building or street classnames)
> and
> allow then cantons/admin.regions to make a (hierarchical) subdivision of
> this enumeration. This ensures compatibility trough structural
> polymorphism
> down to the data integration level.
> 
Does this help?

pvb

References
[1] http://www.w3.org/TR/xmlschema-2/#language
[2] http://www.ietf.org/rfc/rfc1766.txt 

Received on Thursday, 20 July 2000 17:26:31 UTC