- From: Colin Mackenzie <colin@elecmc.com>
- Date: Tue, 27 Aug 2002 09:10:10 +0100
- To: <noah_mendelsohn@us.ibm.com>
- Cc: <r.becker@Nitro-Software.com>, <xmlschema-dev@w3.org>
No problem re discussion. It is also not obvious to me how to extend the schema for these features but perhaps it would be worth a short debate. Why would the following be difficult to implement/a bad idea? I do not develop parsers so I am sure there may be good technical reasons. a) have a new element type to implement new validation, say <xs:test expr="expression.."/> The only feature of this element is to cause a validation error if the expression resolves to False b) the expression supports XPath syntax so, <xs:test expr="/document/otherelement"/> resolves to true if the other element exists (a co-occurrence constraint) <xs:test expr="/document/element[1] = /document/otherelement[1]"/> resolves to true if the content of the two element instances are the same etc etc including the ability to add, subtract, multiply and divide BUT say no functions, no if else logic no changing the elements allowed at certain points based on values of elements otherwise the whole thing would get too complicated I know this would mean potentially horrendous expressions (when calculating a check digit using 10 other values) but at least something as core as this could be done within the schema itself. I guess another problem occurs due to the use of XPath as the final element names may not be known (say within a schema file containing useful types to be used within other schemas) but this is understood by users of xs:key and accepted as a limitation (perhaps xs:test could only occur at the same place xs:key does. perhaps the XPath does not support relative paths).This would make the whole thing less useful of course. c)to solve the original checksum issue we would have to do more as the digits used in the checksum were contained within element content and checked with a RegExp rather than being in directly addressable XPath nodes. So, would it be a good idea to extend XPath to allow the selection of a piece of node content using a regExp? Xpath already supports string functions for breaking down content so surely the idea of RegExp support is not too far fetched? In Perl and other Regexp implementations you can create a RegExp and put any piece of the Reg exp in Parenthesis to identify it as something that should be passed back to the user as $1, $2 etc (you know what I mean) So using the example above we could do something like <xs:test expr="/document/otherelement[1]/[0-9]{2}([0-9]) = /document/otherelement[1]"/[0-9]{5}([0-9])> would test that the value of the third digit is the same as that of the sixth digit. Ok, this is probably the direction that you don't want to go and there may be numerous gaping holes but as someone who has had to write several complex schemas I would really appreciate co-occurrence constraints and some arithmetic checking within the body of the schema (rather than as a Schematron post process) Colin -----Original Message----- From: xmlschema-dev-request@w3.org [mailto:xmlschema-dev-request@w3.org]On Behalf Of noah_mendelsohn@us.ibm.com Sent: 25 August 2002 04:05 To: Colin Mackenzie Cc: r.becker@Nitro-Software.com; xmlschema-dev@w3.org Subject: RE: limits of regular expressions Sorry, I did not write as carefully as perhaps I should have. Obviously, there is no value in discouraging reasonable discussion. What I meant to say was: the design direction signalled by the discussion suggests the risk of a slippery slope. I still think that's true. Though I'd be glad to hear reasonable suggestions that would prove me wrong, it's not immediately obvious to me how to add features to the schema language that would do reasonably generalized check-digit calcuations, in that would be the sort of more widely useful features that would represent a good 80/20 compromise in terms of power, general utility, simplicity and portability. I certainly never meant to discourage discussion, and if I appeared to I apologize. ------------------------------------------------------------------ Noah Mendelsohn Voice: 1-617-693-4036 IBM Corporation Fax: 1-617-693-8676 One Rogers Street Cambridge, MA 02142 ------------------------------------------------------------------ "Colin Mackenzie" <colin@elecmc.com> Sent by: xmlschema-dev-request@w3.org 08/24/02 05:57 AM To: <noah_mendelsohn@us.ibm.com> cc: "Rainer Becker" <r.becker@Nitro-Software.com>, <xmlschema-dev@w3.org> Subject: RE: limits of regular expressions Discussion does not lead us down a slippery slope, only perhaps incorrect decisions made as a result of discussion. I have found the discussion illuminating already (see your response below). Of course there will be a limit as to what a schema language can achieve sensibly. Of course users would like the schema language to allow as much "early data checking" as possible (I remember checking the values of HTML form fields using server side CGI before we had decent client side JavaScript). Surely there is nothing wrong with debating the issue as it only leads to further understanding, and possibly the odd good idea for the next version of schema? FYI - even a stolen credit card can have a valid number, it is just not legal to use it. The last time I did any credit/debit card processing (a number of years ago) most systems only perform the checksum (on the till or client) during a purchase and only do a "server side" check for stolen cards/credit limits off-line in batches. Colin -----Original Message----- From: noah_mendelsohn@us.ibm.com [mailto:noah_mendelsohn@us.ibm.com] Sent: 23 August 2002 18:21 To: Colin Mackenzie Cc: Rainer Becker; xmlschema-dev@w3.org Subject: RE: limits of regular expressions I think this discussion is leading us down a slippery slope. The schema recommendation is clear that no language, other than a Turing-complete programming language, can provide all the validation one might reasonably want for one application or another. From section "1.1 Purpose" [1]: "Any application that consumes well-formed XML can use the XML Schema: Structures formalism to express syntactic, structural and value constraints applicable to its document instances. The XML Schema: Structures formalism allows a useful level of constraint checking to be described and implemented for a wide spectrum of XML applications. However, the language defined by this specification does not attempt to provide all the facilities that might be needed by any application. Some applications may require constraint capabilities not expressible in this language, and so may need to perform their own additional validations." The proposed requirement in this case seems to be to have enough computational capability to derive some sort of check digit in a credit card number or similar code. Well, there will always be things we cannot validate. For example, we can make sure that a credit card looks like a credit card number, to some degree, but we cannot hope to prove that the card isn't stolen. That's presumably what it really means for a credit card number to be valid. Consider the requirements of a mathematician. Would it not be reasonable for him or her to request the ability to derive a sub type of integer to be known as "PrimeNumber"? Are we supposed to validate that -- make sure the number is prime? My point is that systems like schema can embody a reasonable level of checking, but cannot in general meet the validation needs of particular applications. Schemas can give you a pre-filter, and some very useful constraints that aid in mapping to data structures and databases, and that greatly simplify the validation remaining to be done by applications. Even our mathematician will be glad that we check for positive integer, which significantly facilitates the work that he or she then has to do to prove primeness. Bottom line: I think that regex's represent a very reasonable 80/20 point in the design space. They provide a quite powerful and generally useful level of checking, without requiring that we invent a portable programming language in which to capture additional logic. Thank you very much. [1] http://www.w3.org/TR/xmlschema-1/#intro-purpose ------------------------------------------------------------------ Noah Mendelsohn Voice: 1-617-693-4036 IBM Corporation Fax: 1-617-693-8676 One Rogers Street Cambridge, MA 02142 ------------------------------------------------------------------
Received on Tuesday, 27 August 2002 04:16:37 UTC