W3C home > Mailing lists > Public > xmlschema-dev@w3.org > January 2011

Re: Express length constraints in a regex or use maxLength and minLength?

From: Michael Kay <mike@saxonica.com>
Date: Mon, 03 Jan 2011 20:09:43 +0000
Message-ID: <4D222D07.8030707@saxonica.com>
To: "Costello, Roger L." <costello@mitre.org>
CC: "xmlschema-dev@w3.org" <xmlschema-dev@w3.org>
On 03/01/2011 19:44, Costello, Roger L. wrote:

I can't add to your list of advantages/disadvantages, but I would 
seriously question why you want to impose a limit of 100 characters on a 
string.

Some people seem to do this as an ingrained habit - they haven't got rid 
of the punched-card mentality where strings were always fixed length.

There may be good reasons for doing it - for example, the data is going 
to be processed by an ancient COBOL application with limits that you 
can't afford to change; or you want to protect against certain kinds of 
DOS attack - but most of the time I see this kind of thing, the 
constraints are spurious. For example, people will put a limit of 10 
characters on a phone number because they've never travelled widely 
enough to realize that's not a hard limit at all.

Michael Kay
Saxonica
> Hi Folks,
>
> I am interested in hearing your thoughts on the advantages and disadvantages of the following two approaches to restricting the length of a string value.
>
> Approach #1: In this simpleType the regex does not restrict the length; instead, the minLength and maxLength facets are used to restrict the length:
>
>      <simpleType name="English-language-family-name">
>          <restriction base="string">
>              <minLength value="1" />
>              <maxLength value="100" />
>              <pattern value="[a-zA-Z' \.-]+" />
>          </restriction>
>      </simpleType>
>
>
> Approach #2: Here is the same simpleType except the length restriction is implemented in the regex:
>
>      <simpleType name="English-language-family-name">
>          <restriction base="string">
>              <pattern value="[a-zA-Z' \.-]{1,100}" />
>          </restriction>
>      </simpleType>
>
>
> The disadvantage of the first approach is that maxLength and minLength are non-transferrable length restriction mechanisms. They are not something that could be used directly by Schematron or HTML5.
>
> The disadvantage of the second approach is that an application would require sophistication to parse the regex to understand its length constraints.
>
>
> The advantage of the second approach is that the constraints are completely contained within the regex. Thus, the regex could, with little or no modification, be lifted and dropped into an XSLT regex expression or a Schematron regex expression or an HTML5 regex expression.
>
> The advantage of the first approach is that it is easier for a machine to determine the simpleType's length restrictions.
>
>
> What other advantages and disadvantages do each approach have? Which approach do you recommend? Why?
>
> /Roger
>
>
Received on Monday, 3 January 2011 20:10:10 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 11 January 2011 00:15:30 GMT