Re: Looking for help. Implementing a XML validation engine in JavaScript.

C.M.,

Thanks for the info. It's good to know that there is discussion going on
around this. Too bad a conclusion has not been reached.

Cheers,

Casey

On Tue, Aug 10, 2010 at 9:40 PM, C. M. Sperberg-McQueen <
cmsmcq@blackmesatech.com> wrote:

>
> On 10 Aug 2010, at 11:16 , Casey Jordan wrote:
>
>>
>>
>> Here is the main problem, the greedy algorithm I developed does not work
>> for patterns like this:
>>
>> <sequence>
>>    <element ref="e1"/>
>>    <choice minOccurs="2">
>>        <element ref="e2" maxOccurs="2"/>
>>        <element ref="e3"/>
>>    </choice>
>>    <element ref="e4"/>
>> </sequence>
>>
>> This can be seen as a regular expression e1( e2{1,2} | e3 ){2,2} e4. My
>> greedy algorithm cannot validate this correctly.
>>
>> Assuming input like <e1/><e2/><e2/><e4/>
>>
>> My algorithm will match all the way up to the <e4/> in the first pass
>> through the choice: ( e2{1,2} | e3 ){2,2} because it does not realize that
>> both <e2/> elements should be treated individually as matches for the
>> choice, and not a single match for the choice. Validation then fails because
>> it is expecting one more of <e2/> or <e3/>. There is not an easy way to
>> build this into my current algorithm.
>>
>
> There is a good, solid discussion and solution of this issue
> in the paper "Towards efficient implementation of XML schema
> content models", by Pekka Kilpeläinen and Rauno Tuhkanen,
> in the proceedings of Document Engineering 2004. See
> http://portal.acm.org/citation.cfm?doid=1030397.1030441
> for more bibliographic information and a link to the full
> text (for subscribers to the ACM digital library).
>
> It may be worth while to note that the problem you have
> encountered results from a blunder on the part of the XSD working
> group, which failed to notice, when introducing numeric ranges
> for occurrence indicators, that the difference between
> weak determinism and strong determinism was now important
> and visible in the language, and thus failed to reach any
> considered decision on whether to require weak determinism
> or strong determinism.  As the rule known as the 'Unique Particle
> Attribution' constraint shows, the WG ended up with weak
> determinism, but this was not a conscious choice. I wish the
> WG had had the guts to fix this design error in 1.1, but we
> did not.
>
> Instead, each implementation has found a different way of
> fudging the issue, with the result that schemas which exhibit
> the problem (i.e. content models which are weakly deterministic
> but not strongly deterministic) are unambiguously legal and have a
> well defined interpretation, but relatively few implementations
> actually implement what the spec says, and no two of them are
> likely to implement exactly the same thing.  The schema
> author thus gets the best of both worlds:  different implementations
> will do different things with the schema, but since the schema
> is legal, it's unlikely that the schema author will be
> alerted to the difficulty.
>
>
> --
> ****************************************************************
> * C. M. Sperberg-McQueen, Black Mesa Technologies LLC
> * http://www.blackmesatech.com
> * http://cmsmcq.com/mib
> * http://balisage.net
> ****************************************************************
>
>
>
>
>


-- 
--
Casey Jordan
Jorsek Software LLC.
"CaseyDJordan" on LinkedIn, Twitter & Facebook
Cell (585) 348 7399
Office (585) 239 6060
Jorsek.com


This message is intended only for the use of the Addressee(s) and may
contain information that is privileged, confidential, and/or exempt from
disclosure under applicable law.  If you are not the intended recipient,
please be advised that any disclosure  copying, distribution, or use of
the information contained herein is prohibited.  If you have received
this communication in error, please destroy all copies of the message,
whether in electronic or hard copy format, as well as attachments, and
immediately contact the sender by replying to this e-mail or by phone.
Thank you.

Received on Friday, 13 August 2010 18:35:22 UTC