- From: Casey Jordan <casey.jordan@jorsek.com>
- Date: Fri, 13 Aug 2010 14:34:45 -0400
- To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Cc: xmlschema-dev@w3.org
- Message-ID: <AANLkTikFLKGwKp65GYTq04sZP-g6OBWq_Y8Lwx1gMte6@mail.gmail.com>
C.M., Thanks for the info. It's good to know that there is discussion going on around this. Too bad a conclusion has not been reached. Cheers, Casey On Tue, Aug 10, 2010 at 9:40 PM, C. M. Sperberg-McQueen < cmsmcq@blackmesatech.com> wrote: > > On 10 Aug 2010, at 11:16 , Casey Jordan wrote: > >> >> >> Here is the main problem, the greedy algorithm I developed does not work >> for patterns like this: >> >> <sequence> >> <element ref="e1"/> >> <choice minOccurs="2"> >> <element ref="e2" maxOccurs="2"/> >> <element ref="e3"/> >> </choice> >> <element ref="e4"/> >> </sequence> >> >> This can be seen as a regular expression e1( e2{1,2} | e3 ){2,2} e4. My >> greedy algorithm cannot validate this correctly. >> >> Assuming input like <e1/><e2/><e2/><e4/> >> >> My algorithm will match all the way up to the <e4/> in the first pass >> through the choice: ( e2{1,2} | e3 ){2,2} because it does not realize that >> both <e2/> elements should be treated individually as matches for the >> choice, and not a single match for the choice. Validation then fails because >> it is expecting one more of <e2/> or <e3/>. There is not an easy way to >> build this into my current algorithm. >> > > There is a good, solid discussion and solution of this issue > in the paper "Towards efficient implementation of XML schema > content models", by Pekka Kilpeläinen and Rauno Tuhkanen, > in the proceedings of Document Engineering 2004. See > http://portal.acm.org/citation.cfm?doid=1030397.1030441 > for more bibliographic information and a link to the full > text (for subscribers to the ACM digital library). > > It may be worth while to note that the problem you have > encountered results from a blunder on the part of the XSD working > group, which failed to notice, when introducing numeric ranges > for occurrence indicators, that the difference between > weak determinism and strong determinism was now important > and visible in the language, and thus failed to reach any > considered decision on whether to require weak determinism > or strong determinism. As the rule known as the 'Unique Particle > Attribution' constraint shows, the WG ended up with weak > determinism, but this was not a conscious choice. I wish the > WG had had the guts to fix this design error in 1.1, but we > did not. > > Instead, each implementation has found a different way of > fudging the issue, with the result that schemas which exhibit > the problem (i.e. content models which are weakly deterministic > but not strongly deterministic) are unambiguously legal and have a > well defined interpretation, but relatively few implementations > actually implement what the spec says, and no two of them are > likely to implement exactly the same thing. The schema > author thus gets the best of both worlds: different implementations > will do different things with the schema, but since the schema > is legal, it's unlikely that the schema author will be > alerted to the difficulty. > > > -- > **************************************************************** > * C. M. Sperberg-McQueen, Black Mesa Technologies LLC > * http://www.blackmesatech.com > * http://cmsmcq.com/mib > * http://balisage.net > **************************************************************** > > > > > -- -- Casey Jordan Jorsek Software LLC. "CaseyDJordan" on LinkedIn, Twitter & Facebook Cell (585) 348 7399 Office (585) 239 6060 Jorsek.com This message is intended only for the use of the Addressee(s) and may contain information that is privileged, confidential, and/or exempt from disclosure under applicable law. If you are not the intended recipient, please be advised that any disclosure copying, distribution, or use of the information contained herein is prohibited. If you have received this communication in error, please destroy all copies of the message, whether in electronic or hard copy format, as well as attachments, and immediately contact the sender by replying to this e-mail or by phone. Thank you.
Received on Friday, 13 August 2010 18:35:22 UTC