W3C home > Mailing lists > Public > xmlschema-dev@w3.org > January 2011

Re: Express length constraints in a regex or use maxLength and minLength?

From: Pete Cordell <petexmldev@codalogic.com>
Date: Mon, 10 Jan 2011 09:51:05 -0000
Message-ID: <CA6D90311AAE4A63B122CE599332CBE6@Codalogic>
To: "Costello, Roger L." <costello@mitre.org>, <xmlschema-dev@w3.org>
Original Message From: "Costello, Roger L."

Roger,

You're types in your union need to be re-ordered because a value instance is 
matched against the members of a union in the order they are specified. 
Since your loose constraint (_v2) will match anything that your medium 
constraint will match (_v3), then anything that you might hope would be 
matched by your medium constraint would be matched by your loose constraint. 
In other words, you need:

        <union memberTypes="fn:English-language-family-name_v1
                            fn:English-language-family-name_v3
                            fn:English-language-family-name_v2"/>

However, personally I'm not a fan of unions, and I think over specification 
of types like this on a large scale eventually leads to more flaky systems 
because developers can't see the wood for the trees and end up introducing 
other sorts of bugs!

my 2 cents,

Pete Cordell
Codalogic Ltd
Interface XML to C++ the easy way using C++ XML
data binding to convert XSD schemas to C++ classes.
Visit http://codalogic.com/lmx/ or http://www.xml2cpp.com
for more info
----- Original Message ----- 
From: "Costello, Roger L." <costello@mitre.org>
To: <xmlschema-dev@w3.org>
Sent: Sunday, January 09, 2011 11:20 PM
Subject: RE: Express length constraints in a regex or use maxLength and 
minLength?



Michael Sperberg-McQueen wrote:

> It may be worth pointing out that XSD union types are designed
> to make this kind of thing relatively easy:  you can define several
> simple types, for example one for the simplest most regular values,
> another for values which are less likely (and thus more likely to
> require special handling), and at the bottom one which is (as
> Roger Costello has suggested) essentially a renamed version
> of xsd:string (or xsd:string itself).
>
> The application can then (if the XSD validator provides access to
> the appropriate information in the PSVI) dispatch the value for further
> processing to an appropriate routine or workflow suitable for a
> particular class of input.

Neat!

I think that I know how to do this, but let me reconfirm. Suppose that I 
create three versions of the English-language-family-name simpleType:

1. TIGHT CONSTRAINTS

<simpleType name="English-language-family-name_v1">
     <restriction base="string">
            <minLength value="1" />
            <maxLength value="100" />
            <pattern value="[a-zA-Z' \.-]+" />
     </restriction>
</simpleType>


2. LOOSE CONSTRAINTS

<simpleType name="English-language-family-name_v2">
     <restriction base="string" />
</simpleType>


3. MEDIUM CONSTRAINTS

<simpleType name="English-language-family-name_v3">
     <restriction base="string">
            <minLength value="1" />
            <maxLength value="500" />
     </restriction>
</simpleType>


Next, I declare Family-name to be a union of the three versions:

<element name="Family-name">
    <simpleType>
        <union memberTypes="fn:English-language-family-name_v1
                            fn:English-language-family-name_v2
                            fn:English-language-family-name_v3"/>
    </simpleType>
</element>


Here's what an XML instance document looks like:

    <Family-name>__________</Family-name>


Suppose that at data entry I want the data validated against tight 
constraints, i.e., English-language-family-name_v1.

How do I instruct an XML Schema validator to use that version of the 
simpleType?

I think that this is how to do it:

    <Family-name 
xsi:type="fn:English-language-family-name_v1">__________</Family-name>

Is that correct?

/Roger
Received on Monday, 10 January 2011 09:51:43 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 11 January 2011 00:15:31 GMT