RE: limits of regular expressions

I think this discussion is leading us down a slippery slope.  The schema 
recommendation is clear that no language, other than a Turing-complete 
programming language, can provide all the validation one might reasonably 
want for one application or another.  From section "1.1 Purpose" [1]:

"Any application that consumes well-formed XML can use the XML Schema: 
Structures formalism to express syntactic, structural and value 
constraints applicable to its document instances. The XML Schema: 
Structures formalism allows a useful level of constraint checking to be 
described and implemented for a wide spectrum of XML applications. 
However, the language defined by this specification does not attempt to 
provide all the facilities that might be needed by any application. Some 
applications may require constraint capabilities not expressible in this 
language, and so may need to perform their own additional validations."

The proposed requirement in this case seems to be to have enough 
computational capability to derive some sort of check digit in a credit 
card number or similar code.  Well, there will always be things we cannot 
validate.  For example, we can make sure that a credit card looks like a 
credit card number, to some degree, but we cannot hope to prove that the 
card isn't stolen.  That's presumably what it really means for a credit 
card number to be valid.

Consider the requirements of a mathematician.  Would it not be reasonable 
for him or her to request the ability to derive a sub type of integer to 
be known as "PrimeNumber"?  Are we supposed to validate that -- make sure 
the number is prime?

My point is that systems like schema can embody a reasonable level of 
checking, but cannot in general meet the validation needs of particular 
applications.  Schemas can give you a pre-filter, and some very useful 
constraints that aid in mapping to data structures and databases, and that 
greatly simplify the validation remaining to be done by applications. Even 
our mathematician will be glad that we check for positive integer, which 
significantly facilitates the work that he or she then has to do to prove 
primeness.

Bottom line: I think that regex's represent a very reasonable 80/20 point 
in the design space.  They provide a quite powerful and generally useful 
level of checking, without requiring that we invent a portable programming 
language in which to capture additional logic.  Thank you very much.

[1] http://www.w3.org/TR/xmlschema-1/#intro-purpose

------------------------------------------------------------------
Noah Mendelsohn                              Voice: 1-617-693-4036
IBM Corporation                                Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------

Received on Friday, 23 August 2002 13:21:55 UTC